Search the web
Sign In
New User? Sign Up
search_dev · Independent Search Engine Developers
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 1 - 30 of 858   Newest  |  < Newer  |  Older >  |  Oldest
Messages: Show Message Summaries   (Group by Topic) Sort by Date v  
#30 From: "Garth" <garth_grimm@...>
Date: Wed May 17, 2006 1:32 pm
Subject: Re: Enterprise Search Summit
gdgrimm
Offline Offline
Send Email Send Email
 
"Web Services Solutions for Metatagging Challenges", day 2 of the
conference -- http://www.enterprisesearchsummit.com/daytwo.shtml

--- In search_dev@yahoogroups.com, ed.dale@... wrote:
>
> Garth:
>
> On which topic?
>
> Ed
>

#29 From: "Mark Bennett" <mbennett@...>
Date: Wed May 17, 2006 9:18 am
Subject: FW: Re. Opinion on HTML Frames and search results?
ttennebkram
Offline Offline
Send Email Send Email
 
Forwarded on behalf of Garth Grimm [gdgrimm@...] (and with permission)

-----Original Message-----
From: notify@yahoogroups.com [mailto:notify@yahoogroups.com] On Behalf Of
Garth
Sent: Tuesday, May 16, 2006 6:58 AM
To: Mark
Subject: Re: Opinion on HTML Frames and search results?

Our studies have found that users (and the business delivering the
page to the user) want the whole frameset.

Two workarounds that we use....
1) A special META tag that gets attached to the HTML of the page in
the primary frame.  It's value is the URL that will load the entire
frameset, along with that particular frame's page.  Use that META tag
to override the <a> link presented in the results.

2) Include on the page in the frame a short JavaScript routine that
identifies if the page is loaded as a parent in the browser (i.e. not
in a frameset), and if so, reloads the browser with a URL that will
include the entire frameset and this particular page.

Of course, the best thing is to stop using frames.  They're only
handy in very niche situations, and in those situations, the benefit
comes primarily through the ability to provide a navigational frame
that makes search less important.  A JavaDocs website is a good
example.

--- In search_dev@yahoogroups.com, "Mark" <mbennett@...> wrote:
>
> Hi All,
>
> When you index content that contains frames, what do your users want
> to see from the results list when they click on a link?
>
> In other words, do they want to see the entire entry as it would
> appear in frames, or is it OK to just show the individual frame that
> had the matching content?  (which would be the default for most
engines)
>
> Curious as to how you folks have handled this in the past.
>
> Mark
>

#28 From: "miles_b_kehoe" <mbk@...>
Date: Tue May 16, 2006 6:02 pm
Subject: Re: Enterprise Search Summit
miles_b_kehoe
Offline Offline
Send Email Send Email
 
Actually, Enterprise Search Summit is May 22-24. Monday the 22nd is
the 'pre-show conference day and the show officially stars Tuesday.
The URL is http://www.enterprisesearchsummit.com/default.shtml

We have some exhibit passes we can give out if anyone is in the
area; let me know (mbk@... is best so the whole group
doesn't get bothered).


--- In search_dev@yahoogroups.com, "mbwebman" <mbwebman@...> wrote:
>
> Hi,
>
> I think I'll be there.  You're talking about the 19th, right?
>
> Alan B.
>
> --- In search_dev@yahoogroups.com, Sam Mefford <meffords@> wrote:
> >
> > I am.
> >
> > wjasonjones wrote:
> > > I see that New Idea Engineering is one of the sponsors for
Enterprise
> > > Search Summit in NYC next week.
> > >
> > > Is anyone else from this list planning to attend?
> > >
> > >
> >
>

#27 From: "Avi Rappoport" <analyst@...>
Date: Tue May 16, 2006 4:19 pm
Subject: Re: Welcome new members! Which engines do you use?
searchtools1
Offline Offline
Send Email Send Email
 
Hi all,

Glad this group is going!  I'm Avi, I'm pretty much all of Search Tools
Consutling, and hoping
to meet some of you in New York at the Enterprise Search Summit next week.

As for what search engines I use -- as many as possible!  I'm lucky in that my
consulting
jobs let me try out new engines all the time.

Avi

#26 From: "Avi Rappoport" <analyst@...>
Date: Tue May 16, 2006 4:22 pm
Subject: Re: Search97
searchtools1
Offline Offline
Send Email Send Email
 
--- In search_dev@yahoogroups.com, "miles_b_kehoe" <mbk@...> wrote:
>
> Ed raises an interesting point; Search 97 was pretty common out there
> for a while, and was a pretty decent technology. I know some folks are
> sill using Search 97 based on quick search of Google; anyone care to
> admit it just among us friends?
>

I have one client using the OEM Stellant version, and it's driving everyone nuts
It can't see
their other servers, it ranks Excel spreadsheets far too highly, and it doesn't
show match
terms in context.  We're replacing it with Ultraseek which happens to be
licensed already: if
it weren't for that, we'd look at other low-cost but modern search engines.

#25 From: ed.dale@...
Date: Tue May 16, 2006 4:03 pm
Subject: Re: Re: Enterprise Search Summit
arentanji
Offline Offline
Send Email Send Email
 

Garth:

On which topic?

Ed



"Garth" <garth_grimm@...>
Sent by: search_dev@yahoogroups.com

05/16/2006 10:45 AM

Please respond to
search_dev@yahoogroups.com

To
search_dev@yahoogroups.com
cc
Subject
[search_dev] Re: Enterprise Search Summit





I'll be there.  I'm speaking.

--- In search_dev@yahoogroups.com, "wjasonjones" <jasjones@...> wrote:
>
> I see that New Idea Engineering is one of the sponsors for Enterprise
> Search Summit in NYC next week.
>
> Is anyone else from this list planning to attend?
>








------------------------ Yahoo! Groups Sponsor --------------------~-->
Get to your groups with one click. Know instantly when new email arrives
http://us.click.yahoo.com/.7bhrC/MGxNAA/yQLSAA/NhFolB/TM
--------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
   http://groups.yahoo.com/group/search_dev/

<*> To unsubscribe from this group, send an email to:
   search_dev-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
   http://docs.yahoo.com/info/terms/





Any U.S. tax advice contained in the body of this e-mail was not intended or written to be used, and cannot be used, by the recipient for the purpose of avoiding penalties that may be imposed under the Internal Revenue Code or applicable state or local tax law provisions.
________________________________________________________________________
The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer.

Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Ernst & Young LLP (except for Ernst & Young Online and the ey.com website, which track e-mail preferences through a separate process) at this e-mail address by forwarding this message to no-more-mail@.... If you do so, the sender of this message will be notified promptly. Our principal postal address is 5 Times Square, New York, NY 10036. Thank you. Ernst & Young LLP

#24 From: "Garth" <garth_grimm@...>
Date: Tue May 16, 2006 2:45 pm
Subject: Re: Enterprise Search Summit
gdgrimm
Offline Offline
Send Email Send Email
 
I'll be there.  I'm speaking.

--- In search_dev@yahoogroups.com, "wjasonjones" <jasjones@...> wrote:
>
> I see that New Idea Engineering is one of the sponsors for Enterprise
> Search Summit in NYC next week.
>
> Is anyone else from this list planning to attend?
>

#23 From: "mbwebman" <mbwebman@...>
Date: Tue May 16, 2006 1:57 pm
Subject: Re: Enterprise Search Summit
mbwebman
Online Now Online Now
Send Email Send Email
 
Hi,

I think I'll be there.  You're talking about the 19th, right?

Alan B.

--- In search_dev@yahoogroups.com, Sam Mefford <meffords@...> wrote:
>
> I am.
>
> wjasonjones wrote:
> > I see that New Idea Engineering is one of the sponsors for Enterprise
> > Search Summit in NYC next week.
> >
> > Is anyone else from this list planning to attend?
> >
> >
>

#22 From: "Mark" <mbennett@...>
Date: Mon May 15, 2006 5:12 pm
Subject: Re: http 401 error with nutch crawler
ttennebkram
Offline Offline
Send Email Send Email
 
Hi Les,

Looking back at this, I was wondering if you made any progress with it?

Rereading it, I think a 401 has more to do with security than with
which user agent you send.

Does the site you're trying to get at normally require a login?

Or perhaps you were thinking that the site requests a login if it
doesn't recognize you as Internet Explorer / Firefox?

Mark

--- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@...>
wrote:
>
> I am getting a 401 error with the default nutch setting when trying
> to crawl the intranet.  I checked the meta tags out and they don't
> prevent it from crawling, and there is no username or password
> necessary if you are on the network.  So I was wondering if anyone
> knows a way around it.
>
> Here is the error:
> fetch of http://blah/ faled with: java.lang.Exception:
> org.apache.nutch.protocol.http.HttpError:  HTTP Error: 401
>
> I think it is the user agent info it is passing.  Is there any way to
> trick it or bypass it with the nutch-default.xml file?
>
> <!-- HTTP properties -->
>
> <property>
>   <name>http.agent.name</name>
>   <value>NutchCVS</value>
>   <description>Our HTTP 'User-Agent' request header.</description>
> </property>
>
> <property>
>   <name>http.robots.agents</name>
>   <value>NutchCVS,Nutch,*</value>
>   <description>The agent strings we'll look for in robots.txt files,
>   comma-separated, in decreasing order of precedence.</description>
> </property>
>
> <property>
>   <name>http.robots.403.allow</name>
>   <value>true</value>
>   <description>Some servers return HTTP status 403 (Forbidden) if
>   /robots.txt doesn't exist. This should probably mean that we are
>   allowed to crawl the site nonetheless. If this is set to false,
>   then such sites will be treated as forbidden.</description>
> </property>
>
> <property>
>   <name>http.agent.description</name>
>   <value>Nutch</value>
>   <description>Further description of our bot- this text is used in
>   the User-Agent header.  It appears in parenthesis after the agent
> name.
>   </description>
> </property>
>
> <property>
>   <name>http.agent.url</name>
>   <value>http://lucene.apache.org/nutch/bot.html</value>
>   <description>A URL to advertise in the User-Agent header.  This
> will
>    appear in parenthesis after the agent name.
>   </description>
> </property>
>
> <property>
>   <name>http.agent.email</name>
>   <value>nutch-agent@...</value>
>   <description>An email address to advertise in the HTTP 'From'
> request
>    header and User-Agent header.</description>
> </property>
>
> <property>
>   <name>http.agent.version</name>
>   <value>0.7.2</value>
>   <description>A version string to advertise in the User-Agent
>    header.</description>
> </property>
>
> <property>
>   <name>http.timeout</name>
>   <value>10000</value>
>   <description>The default network timeout, in
> milliseconds.</description>
> </property>
>
> <property>
>   <name>http.max.delays</name>
>   <value>3</value>
>   <description>The number of times a thread will delay when trying to
>   fetch a page.  Each time it finds that a host is busy, it will wait
>   fetcher.server.delay.  After http.max.delays attepts, it will give
>   up on the page for now.</description>
> </property>
>
> <property>
>   <name>http.content.limit</name>
>   <value>65536</value>
>   <description>The length limit for downloaded content, in bytes.
>   If this value is nonnegative (>=0), content longer than it will be
> truncated;
>   otherwise, no truncation at all.
>   </description>
> </property>
>
> <property>
>   <name>http.proxy.host</name>
>   <value></value>
>   <description>The proxy hostname.  If empty, no proxy is
> used.</description>
> </property>
>
> <property>
>   <name>http.proxy.port</name>
>   <value></value>
>   <description>The proxy port.</description>
> </property>
>
> <property>
>   <name>http.verbose</name>
>   <value>false</value>
>   <description>If true, HTTP will log more verbosely.</description>
> </property>
>
> <property>
>   <name>http.redirect.max</name>
>   <value>3</value>
>   <description>The maximum number of redirects the fetcher will
> follow when
>     trying to fetch a page.</description>
> </property>
>

#21 From: "Martin" <martin.pratt@...>
Date: Mon May 15, 2006 12:15 pm
Subject: Re: Cost of Autonomy elements
mbspuk
Offline Offline
Send Email Send Email
 
A starting point for a price list for Autonomy elements can be found
here:

http://66.249.93.104/search?
q=cache:kExAaIWvJB4J:www.microlinkllc.com/NR/rdonlyres/8C9FA9F7-DEEF-
44C0-93A0-F8DCE6E37569/0/AutonomyGSAProductList.xls%20autonomy%20dish%
20dashboard&hl=en&ct=clnk&cd=8&client=opera

I found this via a Google search.  I expect if you look for other
Excel documents you'll find plenty more out there.


--- In search_dev@yahoogroups.com, "wjasonjones" <jasjones@...> wrote:
>
> In the past I have had some bad exeperiences with K2 patches.  More
> than once, applying a patch has either broken something else or re-
> introduced a bug fixed by a previous patch.  I complained pretty
> strongly about this for a while, and the process has gotten better
> but I still only "patch-up" when I have to - e.g. patch fixes a
> known bug that is impacting my installation.
>
> I have been trying to educate myself about IDOL but still have a
> long way to go in this process.  I am anxious to get K2 v7.x in a
> lab and start playing with it.  From a technology standpoint, there
> seems to be some advantages to moving to v7.  The eRoom connector
> (among others) might be a big deal to us.  I haven't even started
> considering the move from a business standpoint... Does anyone have
> any information about the license costs for IDOL functions and
other
> Autonomy products such as AWE?
>
> Jason
>
> --- In search_dev@yahoogroups.com, "Mark Bennett" <mbennett@>
> wrote:
> >
> > Great Ed, K2 is a good product.  I was curious, have you folks
> been keeping
> > up with the updates/patches for K2 5.5?
> >
> > Have you guys looked at the IDOL stuff at all?  (with Autonomy's
> > acquisition)
> >
> > -----Original Message-----
> > From: search_dev@yahoogroups.com
> [mailto:search_dev@yahoogroups.com] On
> > Behalf Of arentanji
> > Sent: Monday, April 03, 2006 2:24 PM
> > To: search_dev@yahoogroups.com
> > Subject: [search_dev] Re: Welcome new members! Which engines do
> you use?
> >
> > Mark:
> >
> > I'll start off and say that I use Verity K2 5.5. We built a ton
of
> > applications on top of Search 97 and the transition was not easy.
> >
> > Thanks,
> >
> > Ed
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
>

#20 From: Sam Mefford <meffords@...>
Date: Mon May 15, 2006 3:44 pm
Subject: Re: Enterprise Search Summit
sammefford
Offline Offline
Send Email Send Email
 
I am.

wjasonjones wrote:
> I see that New Idea Engineering is one of the sponsors for Enterprise
> Search Summit in NYC next week.
>
> Is anyone else from this list planning to attend?
>
>
Attachment: vcard [not shown]

#19 From: "Mark Bennett" <mbennett@...>
Date: Mon May 15, 2006 4:24 pm
Subject: RE: Enterprise Search Summit, We Have Passes!
ttennebkram
Offline Offline
Send Email Send Email
 
I believe your operations person said we had extra passes.

If anyone's interested, lemme know and I'll ask her if we still do.

mark

-----Original Message-----
From: search_dev@yahoogroups.com [mailto:search_dev@yahoogroups.com] On
Behalf Of wjasonjones
Sent: Monday, May 15, 2006 8:39 AM
To: search_dev@yahoogroups.com
Subject: [search_dev] Enterprise Search Summit

I see that New Idea Engineering is one of the sponsors for Enterprise
Search Summit in NYC next week.

Is anyone else from this list planning to attend?








Yahoo! Groups Links

#18 From: ed.dale@...
Date: Mon May 15, 2006 4:03 pm
Subject: Re: Enterprise Search Summit
arentanji
Offline Offline
Send Email Send Email
 

I will be there.

Ed Dale
Ernst & Young LLP.



"wjasonjones" <jasjones@...>
Sent by: search_dev@yahoogroups.com

05/15/2006 11:38 AM

Please respond to
search_dev@yahoogroups.com

To
search_dev@yahoogroups.com
cc
Subject
[search_dev] Enterprise Search Summit





I see that New Idea Engineering is one of the sponsors for Enterprise
Search Summit in NYC next week.

Is anyone else from this list planning to attend?






------------------------ Yahoo! Groups Sponsor --------------------~-->
Protect your PC from spy ware with award winning anti spy technology. It's free.
http://us.click.yahoo.com/97bhrC/LGxNAA/yQLSAA/NhFolB/TM
--------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
   http://groups.yahoo.com/group/search_dev/

<*> To unsubscribe from this group, send an email to:
   search_dev-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
   http://docs.yahoo.com/info/terms/






Any U.S. tax advice contained in the body of this e-mail was not intended or written to be used, and cannot be used, by the recipient for the purpose of avoiding penalties that may be imposed under the Internal Revenue Code or applicable state or local tax law provisions.
________________________________________________________________________
The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer.

Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Ernst & Young LLP (except for Ernst & Young Online and the ey.com website, which track e-mail preferences through a separate process) at this e-mail address by forwarding this message to no-more-mail@.... If you do so, the sender of this message will be notified promptly. Our principal postal address is 5 Times Square, New York, NY 10036. Thank you. Ernst & Young LLP

#17 From: "wjasonjones" <jasjones@...>
Date: Mon May 15, 2006 3:38 pm
Subject: Enterprise Search Summit
wjasonjones
Offline Offline
Send Email Send Email
 
I see that New Idea Engineering is one of the sponsors for Enterprise
Search Summit in NYC next week.

Is anyone else from this list planning to attend?

#16 From: "les_claypoo1" <thomasgkrier@...>
Date: Thu Apr 27, 2006 6:13 pm
Subject: http 401 error with nutch crawler
les_claypoo1
Offline Offline
Send Email Send Email
 
I am getting a 401 error with the default nutch setting when trying
to crawl the intranet.  I checked the meta tags out and they don't
prevent it from crawling, and there is no username or password
necessary if you are on the network.  So I was wondering if anyone
knows a way around it.

Here is the error:
fetch of http://blah/ faled with: java.lang.Exception:
org.apache.nutch.protocol.http.HttpError:  HTTP Error: 401

I think it is the user agent info it is passing.  Is there any way to
trick it or bypass it with the nutch-default.xml file?

<!-- HTTP properties -->

<property>
   <name>http.agent.name</name>
   <value>NutchCVS</value>
   <description>Our HTTP 'User-Agent' request header.</description>
</property>

<property>
   <name>http.robots.agents</name>
   <value>NutchCVS,Nutch,*</value>
   <description>The agent strings we'll look for in robots.txt files,
   comma-separated, in decreasing order of precedence.</description>
</property>

<property>
   <name>http.robots.403.allow</name>
   <value>true</value>
   <description>Some servers return HTTP status 403 (Forbidden) if
   /robots.txt doesn't exist. This should probably mean that we are
   allowed to crawl the site nonetheless. If this is set to false,
   then such sites will be treated as forbidden.</description>
</property>

<property>
   <name>http.agent.description</name>
   <value>Nutch</value>
   <description>Further description of our bot- this text is used in
   the User-Agent header.  It appears in parenthesis after the agent
name.
   </description>
</property>

<property>
   <name>http.agent.url</name>
   <value>http://lucene.apache.org/nutch/bot.html</value>
   <description>A URL to advertise in the User-Agent header.  This
will
    appear in parenthesis after the agent name.
   </description>
</property>

<property>
   <name>http.agent.email</name>
   <value>nutch-agent@...</value>
   <description>An email address to advertise in the HTTP 'From'
request
    header and User-Agent header.</description>
</property>

<property>
   <name>http.agent.version</name>
   <value>0.7.2</value>
   <description>A version string to advertise in the User-Agent
    header.</description>
</property>

<property>
   <name>http.timeout</name>
   <value>10000</value>
   <description>The default network timeout, in
milliseconds.</description>
</property>

<property>
   <name>http.max.delays</name>
   <value>3</value>
   <description>The number of times a thread will delay when trying to
   fetch a page.  Each time it finds that a host is busy, it will wait
   fetcher.server.delay.  After http.max.delays attepts, it will give
   up on the page for now.</description>
</property>

<property>
   <name>http.content.limit</name>
   <value>65536</value>
   <description>The length limit for downloaded content, in bytes.
   If this value is nonnegative (>=0), content longer than it will be
truncated;
   otherwise, no truncation at all.
   </description>
</property>

<property>
   <name>http.proxy.host</name>
   <value></value>
   <description>The proxy hostname.  If empty, no proxy is
used.</description>
</property>

<property>
   <name>http.proxy.port</name>
   <value></value>
   <description>The proxy port.</description>
</property>

<property>
   <name>http.verbose</name>
   <value>false</value>
   <description>If true, HTTP will log more verbosely.</description>
</property>

<property>
   <name>http.redirect.max</name>
   <value>3</value>
   <description>The maximum number of redirects the fetcher will
follow when
     trying to fetch a page.</description>
</property>

#15 From: "wjasonjones" <jasjones@...>
Date: Thu Apr 27, 2006 2:44 pm
Subject: Re: Use of Taxonomies in your Enterprise Search App?
wjasonjones
Offline Offline
Send Email Send Email
 
Our situation is very similar to what Rameez describes. We formerly
used K2 Knowledge trees to implement a Yahoo style browse page.  We
now use parametric search tied to our global taxonomy to implement
more of a faceted navigation page.

BTW:  our global taxonomy currently has ~1100 nodes. Is this large
or small compared to what others are using?

I ask this because user feedback seems to suggest that users aren't
necessarily happy/comfortable with our browse interface and it seems
like it is mostly due to them not fully understanding how the
taxonomy interrelates and thus have difficulty navigating
successfully to relevant content.  We are currently playing with
Topic Maps as a possible means of helping users better understand
the relationships of our taxonomy nodes.

Are others experiencing this?

Jason

--- In search_dev@yahoogroups.com, "Rameez Meerasahib"
<rameez.meerasahib@...> wrote:
>
> We have implemented Parametric Indexing from Verity for taxonomy
Navigation.
> We had issues in sorting and relevancy of documents in categories
initially.
> Verity took almost 6-7 months to fix the issues. SP2 of 5.5 has
all fixes
> and it is doing well now. Our implementations are quite huge with
large
> number of taxonomy nodes and huge size of PI's. We have experienced
> Knowledge Tree from Verity before using PI's.
>
>
>
> Taxonomies have a very important role to play in intranet/Internet
scenario
> in coming days as number of documents returned for a normal search
is
> growing exponentially. I believe we will see more customers for
> taxonomies/categorization…
>
> Regards,
> Rameez
>

#14 From: "Rameez Meerasahib" <rameez.meerasahib@...>
Date: Fri Apr 14, 2006 8:00 pm
Subject: Re: Use of Taxonomies in your Enterprise Search App?
rameezmeeras...
Offline Offline
Send Email Send Email
 

We have implemented Parametric Indexing from Verity for taxonomy Navigation. We had issues in sorting and relevancy of documents in categories initially. Verity took almost 6-7 months to fix the issues. SP2 of 5.5 has all fixes and it is doing well now. Our implementations are quite huge with large number of taxonomy nodes and huge size of PI's. We have experienced Knowledge Tree from Verity before using PI's.

 

Taxonomies have a very important role to play in intranet/Internet scenario in coming days as number of documents returned for a normal search is growing exponentially. I believe we will see more customers for taxonomies/categorization…


Regards,
Rameez
 
On 4/14/06, Mark Bennett <mbennett@...> wrote:

Though taxonomies got huge press back in the late 90s and early 2000s, I still see quite a bit of interest in them.  The odd thing is, we don't seem them being actively used as often, although some companies do have them implemented.  And the term itself, "taxonomies", seems to mean different things to different people.

 

I'm kind of curious what you folks have actually seen used or have implemented, and what business objective it was in support of?

 

Examples of how folks use:

* You could organize your content sort of like Yahoo and use it for browsing

* Or you could use it for searching, and let people drill down through results lists; to me this is the most useful.

* Some folks actually mean tagging documents, automatic document classification, etc, when they speak of taxonomies

* While others, who have used Verity, think of taxonomies in terms of Topic trees and Agents

* Lately the "faceted" search trend has spawned "multi-dimensional" taxonomies, where you can navigate by product line, or by department, or by "business cycle", etc.  Interesting stuff, though I've only seem one client really go full tilt with this.

* Some vendors lump taxonomies in with automatic document clustering based on keywords and phrases and call the result "topics" or "taxonomies"; our history has been more with human created, or at least human supervised topics, ala the SageWare stuff, etc.

 

Then there's the question of where taxonomies come from:

* Back in the 1990s I tended to use vi and notepad  J

* There's "canned" taxonomies for certain industries, for example pharmaceuticals

* There's "in house" taxonomies, very specific to the language used at that company or agency

* Or you can try to mix the last 2 - start with a canned taxonomy then glom on your custom vocabulary and products

* And of course there's a whole bunch of statistically based automatic creation tools - lots of folks have offered those - your mileage may vary J

 

Regardless of how they are generated, I tend to classify taxonomies into 1 of 3 broad categories:

* Subject Based Taxonomies - some expert or library sciences person has logically organized a particular domain of knowledge

* Content Based Taxonomies - somewhat similar to the above, but driven more by the content that is actually present - the automated tools usually go this route

* Behavior Based Taxonomies - focuses on organizing and optimizing searches based on what users are actually searching for - "tweak your top 1,000 searches first" (and their related areas) - in my mind this is the best "bang for the buck" if personnel resources are limited

 

Hype aside, what are folks actually IMPLEMENTING and using?

 

Mark

 



YAHOO! GROUPS LINKS





#13 From: search_dev@yahoogroups.com
Date: Fri Apr 14, 2006 6:09 am
Subject: New file uploaded to search_dev
search_dev@yahoogroups.com
Send Email Send Email
 
Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the search_dev
group.

   File        : /ultraseek/Xpa_win.zip
   Uploaded by : miles_b_kehoe <mbk@...>
   Description : Hello World basic Ultraseek XPA Sample

You can access this file at the URL:
http://groups.yahoo.com/group/search_dev/files/ultraseek/Xpa_win.zip

To learn more about file sharing for your group, please visit:
http://help.yahoo.com/help/us/groups/files

Regards,

miles_b_kehoe <mbk@...>

#12 From: "Mark Bennett" <mbennett@...>
Date: Fri Apr 14, 2006 5:00 am
Subject: Use of Taxonomies in your Enterprise Search App?
ttennebkram
Offline Offline
Send Email Send Email
 

Though taxonomies got huge press back in the late 90s and early 2000s, I still see quite a bit of interest in them.  The odd thing is, we don’t seem them being actively used as often, although some companies do have them implemented.  And the term itself, “taxonomies”, seems to mean different things to different people.

 

I’m kind of curious what you folks have actually seen used or have implemented, and what business objective it was in support of?

 

Examples of how folks use:

* You could organize your content sort of like Yahoo and use it for browsing

* Or you could use it for searching, and let people drill down through results lists; to me this is the most useful.

* Some folks actually mean tagging documents, automatic document classification, etc, when they speak of taxonomies

* While others, who have used Verity, think of taxonomies in terms of Topic trees and Agents

* Lately the “faceted” search trend has spawned “multi-dimensional” taxonomies, where you can navigate by product line, or by department, or by “business cycle”, etc.  Interesting stuff, though I’ve only seem one client really go full tilt with this.

* Some vendors lump taxonomies in with automatic document clustering based on keywords and phrases and call the result “topics” or “taxonomies”; our history has been more with human created, or at least human supervised topics, ala the SageWare stuff, etc.

 

Then there’s the question of where taxonomies come from:

* Back in the 1990s I tended to use vi and notepad  J

* There’s “canned” taxonomies for certain industries, for example pharmaceuticals

* There’s “in house” taxonomies, very specific to the language used at that company or agency

* Or you can try to mix the last 2 - start with a canned taxonomy then glom on your custom vocabulary and products

* And of course there’s a whole bunch of statistically based automatic creation tools - lots of folks have offered those - your mileage may vary J

 

Regardless of how they are generated, I tend to classify taxonomies into 1 of 3 broad categories:

* Subject Based Taxonomies - some expert or library sciences person has logically organized a particular domain of knowledge

* Content Based Taxonomies - somewhat similar to the above, but driven more by the content that is actually present - the automated tools usually go this route

* Behavior Based Taxonomies - focuses on organizing and optimizing searches based on what users are actually searching for - “tweak your top 1,000 searches first” (and their related areas) - in my mind this is the best “bang for the buck” if personnel resources are limited

 

Hype aside, what are folks actually IMPLEMENTING and using?

 

Mark

 


#11 From: "les_claypoo1" <thomasgkrier@...>
Date: Tue Apr 11, 2006 5:48 pm
Subject: Re: Anybody here use nutch for their intranet??? got it
les_claypoo1
Offline Offline
Send Email Send Email
 
Nevermind I got it going with cygwin.  I will play around with it
now.  I was wondering is there any other good free/open source search
engines w/ the crawler and parsers like nutch?

--- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@...>
wrote:
>
> Well the place I work for is a big open source shop.  So they are
big
> into java and anything low cost/open source.  Which Google doesn't
> fall under.  Because of the size of the intranet, I thought open
> source could produce similar results as google.  I don't intend on
> them replacing the google box, because I'm sure its results are
> better and quicker than most, but my job is to provide at least
> documentation on what open source has to offer, which is why I want
> to try and get nutch running. (I will look into Ultraseek also).
So
> it is not a performance issue with the google compliance it is
> strictly cost based.
>
> The specific glitch I guess I am running into with the nutch set up
> to get it to crawl is trying to run any of the unix commands
through
> cygwin.  According to the instructions on the link listed below I
> should be able to type in bin/nutch and it will display
documentation
> on Nutch, but I don't get that to happen.  I might have my folders
> setup wrong.
> http://lucene.apache.org/nutch/tutorial8.html#Getting+Started
> Thanks for the help.  Most appreciated!!!
> -Tom
>
>
>
> --- In search_dev@yahoogroups.com, "Mark" <mbennett@> wrote:
> >
> > We know one guy who got nutch going quickly, and he is not really
a
> > programmer.  I was impressed by what he got done in a short
amount
> of
> > time.  I'll mention this group to him, so maybe he can comment
> > further.  Was there a specific glitch with the Windows Nutch
setup?
> >
> > I was curious if you could talk more about, if you have the Google
> > box, why you might be looking at Nutch?  Did the Google box not
live
> > up to expectations?  Lucene/Nutch are fine open source choices;
if
> you
> > are looking at commercial, then that depends on requirements and
> > budget.  Depending on the # of documents, you might conisder
> Ultraseek.
> >
> > Mark
> >
> > --- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@>
> > wrote:
> > >
> > > I am researching possibly replacing the Google appliance with
> > > nutch/lucene technology.  Since it is on the intranet scale I
> would
> > > like to give it a test run.  The problem is the documentation
on
> how to
> > > get nutch working in the windows XP environment isn't that
> clear.  I
> > > bought the book on lucene, and basically got the main concepts
of
> > > lucene and nutch down, I just need help in getting them started
> and
> > > creating an index would be step one.  Any help would be great.
> > > thanks!!!
> > >
> >
>

#10 From: "les_claypoo1" <thomasgkrier@...>
Date: Mon Apr 10, 2006 10:46 pm
Subject: Re: Anybody here use nutch for their intranet???
les_claypoo1
Offline Offline
Send Email Send Email
 
Well the place I work for is a big open source shop.  So they are big
into java and anything low cost/open source.  Which Google doesn't
fall under.  Because of the size of the intranet, I thought open
source could produce similar results as google.  I don't intend on
them replacing the google box, because I'm sure its results are
better and quicker than most, but my job is to provide at least
documentation on what open source has to offer, which is why I want
to try and get nutch running. (I will look into Ultraseek also).  So
it is not a performance issue with the google compliance it is
strictly cost based.

The specific glitch I guess I am running into with the nutch set up
to get it to crawl is trying to run any of the unix commands through
cygwin.  According to the instructions on the link listed below I
should be able to type in bin/nutch and it will display documentation
on Nutch, but I don't get that to happen.  I might have my folders
setup wrong.
http://lucene.apache.org/nutch/tutorial8.html#Getting+Started
Thanks for the help.  Most appreciated!!!
-Tom



--- In search_dev@yahoogroups.com, "Mark" <mbennett@...> wrote:
>
> We know one guy who got nutch going quickly, and he is not really a
> programmer.  I was impressed by what he got done in a short amount
of
> time.  I'll mention this group to him, so maybe he can comment
> further.  Was there a specific glitch with the Windows Nutch setup?
>
> I was curious if you could talk more about, if you have the Google
> box, why you might be looking at Nutch?  Did the Google box not live
> up to expectations?  Lucene/Nutch are fine open source choices; if
you
> are looking at commercial, then that depends on requirements and
> budget.  Depending on the # of documents, you might conisder
Ultraseek.
>
> Mark
>
> --- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@>
> wrote:
> >
> > I am researching possibly replacing the Google appliance with
> > nutch/lucene technology.  Since it is on the intranet scale I
would
> > like to give it a test run.  The problem is the documentation on
how to
> > get nutch working in the windows XP environment isn't that
clear.  I
> > bought the book on lucene, and basically got the main concepts of
> > lucene and nutch down, I just need help in getting them started
and
> > creating an index would be step one.  Any help would be great.
> > thanks!!!
> >
>

#9 From: "Mark" <mbennett@...>
Date: Mon Apr 10, 2006 9:52 pm
Subject: Re: Anybody here use nutch for their intranet???
ttennebkram
Offline Offline
Send Email Send Email
 
We know one guy who got nutch going quickly, and he is not really a
programmer.  I was impressed by what he got done in a short amount of
time.  I'll mention this group to him, so maybe he can comment
further.  Was there a specific glitch with the Windows Nutch setup?

I was curious if you could talk more about, if you have the Google
box, why you might be looking at Nutch?  Did the Google box not live
up to expectations?  Lucene/Nutch are fine open source choices; if you
are looking at commercial, then that depends on requirements and
budget.  Depending on the # of documents, you might conisder Ultraseek.

Mark

--- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@...>
wrote:
>
> I am researching possibly replacing the Google appliance with
> nutch/lucene technology.  Since it is on the intranet scale I would
> like to give it a test run.  The problem is the documentation on how to
> get nutch working in the windows XP environment isn't that clear.  I
> bought the book on lucene, and basically got the main concepts of
> lucene and nutch down, I just need help in getting them started and
> creating an index would be step one.  Any help would be great.
> thanks!!!
>

#8 From: "Mark" <mbennett@...>
Date: Mon Apr 10, 2006 6:07 pm
Subject: Opinion on HTML Frames and search results?
ttennebkram
Offline Offline
Send Email Send Email
 
Hi All,

When you index content that contains frames, what do your users want
to see from the results list when they click on a link?

In other words, do they want to see the entire entry as it would
appear in frames, or is it OK to just show the individual frame that
had the matching content?  (which would be the default for most engines)

Curious as to how you folks have handled this in the past.

Mark

#7 From: "les_claypoo1" <thomasgkrier@...>
Date: Sun Apr 9, 2006 9:24 pm
Subject: Anybody here use nutch for their intranet???
les_claypoo1
Offline Offline
Send Email Send Email
 
I am researching possibly replacing the Google appliance with
nutch/lucene technology.  Since it is on the intranet scale I would
like to give it a test run.  The problem is the documentation on how to
get nutch working in the windows XP environment isn't that clear.  I
bought the book on lucene, and basically got the main concepts of
lucene and nutch down, I just need help in getting them started and
creating an index would be step one.  Any help would be great.
thanks!!!

#6 From: ed.dale@...
Date: Fri Apr 7, 2006 1:42 pm
Subject: RE: Re: Welcome new members! Which engines do you use?
arentanji
Offline Offline
Send Email Send Email
 

Mark:

Just started to think about our upgrade plans. This would be a discretionary project for us, so I expect we will not be on the cutting edge. I see 4 possibilities: Stay on K2 5.5 until support is cut, move to K2 v7 and stay with K2 through 8 and 9, move to IDOL and last open the doors to any search engine and do some sort of shoot out with all available vendors.

I suspect that we will do the least effort course, but who can tell?

Open question to the group:
What are other people using? Any good stories to tell about other vendors? Any vendors to avoid?

Thanks,

Ed




"Mark Bennett" <mbennett@...>
Sent by: search_dev@yahoogroups.com

04/07/2006 02:25 AM

Please respond to
search_dev@yahoogroups.com

To
<search_dev@yahoogroups.com>
cc
Subject
RE: [search_dev] Re: Welcome new members!  Which engines do you use?





Great Ed, K2 is a good product.  I was curious, have you folks been keeping
up with the updates/patches for K2 5.5?

Have you guys looked at the IDOL stuff at all?  (with Autonomy's
acquisition)

-----Original Message-----
From: search_dev@yahoogroups.com [mailto:search_dev@yahoogroups.com] On
Behalf Of arentanji
Sent: Monday, April 03, 2006 2:24 PM
To: search_dev@yahoogroups.com
Subject: [search_dev] Re: Welcome new members! Which engines do you use?

Mark:

I'll start off and say that I use Verity K2 5.5. We built a ton of
applications on top of Search 97 and the transition was not easy.

Thanks,

Ed









Yahoo! Groups Links









Yahoo! Groups Links

<*> To visit your group on the web, go to:
   http://groups.yahoo.com/group/search_dev/

<*> To unsubscribe from this group, send an email to:
   search_dev-unsubscribe@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
   http://docs.yahoo.com/info/terms/






Any U.S. tax advice contained in the body of this e-mail was not intended or written to be used, and cannot be used, by the recipient for the purpose of avoiding penalties that may be imposed under the Internal Revenue Code or applicable state or local tax law provisions.
________________________________________________________________________
The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer.

Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Ernst & Young LLP (except for Ernst & Young Online and the ey.com website, which track e-mail preferences through a separate process) at this e-mail address by forwarding this message to no-more-mail@.... If you do so, the sender of this message will be notified promptly. Our principal postal address is 5 Times Square, New York, NY 10036. Thank you. Ernst & Young LLP

#5 From: "wjasonjones" <jasjones@...>
Date: Fri Apr 7, 2006 1:41 pm
Subject: Re: Welcome new members! Which engines do you use?
wjasonjones
Offline Offline
Send Email Send Email
 
In the past I have had some bad exeperiences with K2 patches.  More
than once, applying a patch has either broken something else or re-
introduced a bug fixed by a previous patch.  I complained pretty
strongly about this for a while, and the process has gotten better
but I still only "patch-up" when I have to - e.g. patch fixes a
known bug that is impacting my installation.

I have been trying to educate myself about IDOL but still have a
long way to go in this process.  I am anxious to get K2 v7.x in a
lab and start playing with it.  From a technology standpoint, there
seems to be some advantages to moving to v7.  The eRoom connector
(among others) might be a big deal to us.  I haven't even started
considering the move from a business standpoint... Does anyone have
any information about the license costs for IDOL functions and other
Autonomy products such as AWE?

Jason

--- In search_dev@yahoogroups.com, "Mark Bennett" <mbennett@...>
wrote:
>
> Great Ed, K2 is a good product.  I was curious, have you folks
been keeping
> up with the updates/patches for K2 5.5?
>
> Have you guys looked at the IDOL stuff at all?  (with Autonomy's
> acquisition)
>
> -----Original Message-----
> From: search_dev@yahoogroups.com
[mailto:search_dev@yahoogroups.com] On
> Behalf Of arentanji
> Sent: Monday, April 03, 2006 2:24 PM
> To: search_dev@yahoogroups.com
> Subject: [search_dev] Re: Welcome new members! Which engines do
you use?
>
> Mark:
>
> I'll start off and say that I use Verity K2 5.5. We built a ton of
> applications on top of Search 97 and the transition was not easy.
>
> Thanks,
>
> Ed
>
>
>
>
>
>
>
>
>
> Yahoo! Groups Links
>

#4 From: "Mark Bennett" <mbennett@...>
Date: Fri Apr 7, 2006 6:25 am
Subject: RE: Re: Welcome new members! Which engines do you use?
ttennebkram
Offline Offline
Send Email Send Email
 
Great Ed, K2 is a good product.  I was curious, have you folks been keeping
up with the updates/patches for K2 5.5?

Have you guys looked at the IDOL stuff at all?  (with Autonomy's
acquisition)

-----Original Message-----
From: search_dev@yahoogroups.com [mailto:search_dev@yahoogroups.com] On
Behalf Of arentanji
Sent: Monday, April 03, 2006 2:24 PM
To: search_dev@yahoogroups.com
Subject: [search_dev] Re: Welcome new members! Which engines do you use?

Mark:

I'll start off and say that I use Verity K2 5.5. We built a ton of
applications on top of Search 97 and the transition was not easy.

Thanks,

Ed









Yahoo! Groups Links

#3 From: "miles_b_kehoe" <mbk@...>
Date: Fri Apr 7, 2006 5:43 am
Subject: Search97
miles_b_kehoe
Offline Offline
Send Email Send Email
 
Ed raises an interesting point; Search 97 was pretty common out there
for a while, and was a pretty decent technology. I know some folks are
sill using Search 97 based on quick search of Google; anyone care to
admit it just among us friends?

#2 From: "arentanji" <ed.dale@...>
Date: Mon Apr 3, 2006 9:23 pm
Subject: Re: Welcome new members! Which engines do you use?
arentanji
Offline Offline
Send Email Send Email
 
Mark:

I'll start off and say that I use Verity K2 5.5. We built a ton of
applications on top of Search 97 and the transition was not easy.

Thanks,

Ed

#1 From: "Mark Bennett" <mbennett@...>
Date: Mon Apr 3, 2006 3:25 pm
Subject: Welcome new members! Which engines do you use?
ttennebkram
Offline Offline
Send Email Send Email
 

Hello All,

 

We’ve picked up some members just in our first full weekend.

 

I was wondering, if you wanted to get the ball rolling, by talking about which engine or engines you use, and what’s on your mind?

 

Happy Monday,

Mark

 


Messages 1 - 30 of 858   Newest  |  < Newer  |  Older >  |  Oldest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help