Forwarded on behalf of Garth Grimm [gdgrimm@...] (and with permission)
-----Original Message-----
From: notify@yahoogroups.com [mailto:notify@yahoogroups.com] On Behalf Of
Garth
Sent: Tuesday, May 16, 2006 6:58 AM
To: Mark
Subject: Re: Opinion on HTML Frames and search results?
Our studies have found that users (and the business delivering the
page to the user) want the whole frameset.
Two workarounds that we use....
1) A special META tag that gets attached to the HTML of the page in
the primary frame. It's value is the URL that will load the entire
frameset, along with that particular frame's page. Use that META tag
to override the <a> link presented in the results.
2) Include on the page in the frame a short JavaScript routine that
identifies if the page is loaded as a parent in the browser (i.e. not
in a frameset), and if so, reloads the browser with a URL that will
include the entire frameset and this particular page.
Of course, the best thing is to stop using frames. They're only
handy in very niche situations, and in those situations, the benefit
comes primarily through the ability to provide a navigational frame
that makes search less important. A JavaDocs website is a good
example.
--- In search_dev@yahoogroups.com, "Mark" <mbennett@...> wrote:
>
> Hi All,
>
> When you index content that contains frames, what do your users want
> to see from the results list when they click on a link?
>
> In other words, do they want to see the entire entry as it would
> appear in frames, or is it OK to just show the individual frame that
> had the matching content? (which would be the default for most
engines)
>
> Curious as to how you folks have handled this in the past.
>
> Mark
>
Actually, Enterprise Search Summit is May 22-24. Monday the 22nd is
the 'pre-show conference day and the show officially stars Tuesday.
The URL is http://www.enterprisesearchsummit.com/default.shtml
We have some exhibit passes we can give out if anyone is in the
area; let me know (mbk@... is best so the whole group
doesn't get bothered).
--- In search_dev@yahoogroups.com, "mbwebman" <mbwebman@...> wrote:
>
> Hi,
>
> I think I'll be there. You're talking about the 19th, right?
>
> Alan B.
>
> --- In search_dev@yahoogroups.com, Sam Mefford <meffords@> wrote:
> >
> > I am.
> >
> > wjasonjones wrote:
> > > I see that New Idea Engineering is one of the sponsors for
Enterprise
> > > Search Summit in NYC next week.
> > >
> > > Is anyone else from this list planning to attend?
> > >
> > >
> >
>
Hi all,
Glad this group is going! I'm Avi, I'm pretty much all of Search Tools
Consutling, and hoping
to meet some of you in New York at the Enterprise Search Summit next week.
As for what search engines I use -- as many as possible! I'm lucky in that my
consulting
jobs let me try out new engines all the time.
Avi
--- In search_dev@yahoogroups.com, "miles_b_kehoe" <mbk@...> wrote:
>
> Ed raises an interesting point; Search 97 was pretty common out there
> for a while, and was a pretty decent technology. I know some folks are
> sill using Search 97 based on quick search of Google; anyone care to
> admit it just among us friends?
>
I have one client using the OEM Stellant version, and it's driving everyone nuts
It can't see
their other servers, it ranks Excel spreadsheets far too highly, and it doesn't
show match
terms in context. We're replacing it with Ultraseek which happens to be
licensed already: if
it weren't for that, we'd look at other low-cost but modern search engines.
"Garth" <garth_grimm@...> Sent by: search_dev@yahoogroups.com
05/16/2006 10:45 AM
Please respond to
search_dev@yahoogroups.com
To
search_dev@yahoogroups.com
cc
Subject
[search_dev] Re: Enterprise Search Summit
I'll be there. I'm speaking.
--- In search_dev@yahoogroups.com, "wjasonjones" <jasjones@...>
wrote:
>
> I see that New Idea Engineering is one of the sponsors for Enterprise
> Search Summit in NYC next week.
>
> Is anyone else from this list planning to attend?
>
------------------------ Yahoo! Groups Sponsor --------------------~-->
Get to your groups with one click. Know instantly when new email arrives
http://us.click.yahoo.com/.7bhrC/MGxNAA/yQLSAA/NhFolB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/search_dev/
<*> To unsubscribe from this group, send an email to:
search_dev-unsubscribe@yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Any U.S. tax advice contained in the body of this e-mail was not intended or written to be used, and cannot be used, by the recipient for the purpose of avoiding penaltiesthat may be imposed under the Internal Revenue Code or applicable state or local tax law provisions. ________________________________________________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer.
Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Ernst & Young LLP (except for Ernst & Young Online and the ey.com website, which track e-mail preferences through a separate process) at this e-mail address by forwarding this message to no-more-mail@.... If you do so, the sender of this message will be notified promptly. Our principal postal address is 5 Times Square, New York, NY 10036. Thank you. Ernst & Young LLP
I'll be there. I'm speaking.
--- In search_dev@yahoogroups.com, "wjasonjones" <jasjones@...> wrote:
>
> I see that New Idea Engineering is one of the sponsors for Enterprise
> Search Summit in NYC next week.
>
> Is anyone else from this list planning to attend?
>
Hi,
I think I'll be there. You're talking about the 19th, right?
Alan B.
--- In search_dev@yahoogroups.com, Sam Mefford <meffords@...> wrote:
>
> I am.
>
> wjasonjones wrote:
> > I see that New Idea Engineering is one of the sponsors for Enterprise
> > Search Summit in NYC next week.
> >
> > Is anyone else from this list planning to attend?
> >
> >
>
Hi Les,
Looking back at this, I was wondering if you made any progress with it?
Rereading it, I think a 401 has more to do with security than with
which user agent you send.
Does the site you're trying to get at normally require a login?
Or perhaps you were thinking that the site requests a login if it
doesn't recognize you as Internet Explorer / Firefox?
Mark
--- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@...>
wrote:
>
> I am getting a 401 error with the default nutch setting when trying
> to crawl the intranet. I checked the meta tags out and they don't
> prevent it from crawling, and there is no username or password
> necessary if you are on the network. So I was wondering if anyone
> knows a way around it.
>
> Here is the error:
> fetch of http://blah/ faled with: java.lang.Exception:
> org.apache.nutch.protocol.http.HttpError: HTTP Error: 401
>
> I think it is the user agent info it is passing. Is there any way to
> trick it or bypass it with the nutch-default.xml file?
>
> <!-- HTTP properties -->
>
> <property>
> <name>http.agent.name</name>
> <value>NutchCVS</value>
> <description>Our HTTP 'User-Agent' request header.</description>
> </property>
>
> <property>
> <name>http.robots.agents</name>
> <value>NutchCVS,Nutch,*</value>
> <description>The agent strings we'll look for in robots.txt files,
> comma-separated, in decreasing order of precedence.</description>
> </property>
>
> <property>
> <name>http.robots.403.allow</name>
> <value>true</value>
> <description>Some servers return HTTP status 403 (Forbidden) if
> /robots.txt doesn't exist. This should probably mean that we are
> allowed to crawl the site nonetheless. If this is set to false,
> then such sites will be treated as forbidden.</description>
> </property>
>
> <property>
> <name>http.agent.description</name>
> <value>Nutch</value>
> <description>Further description of our bot- this text is used in
> the User-Agent header. It appears in parenthesis after the agent
> name.
> </description>
> </property>
>
> <property>
> <name>http.agent.url</name>
> <value>http://lucene.apache.org/nutch/bot.html</value>
> <description>A URL to advertise in the User-Agent header. This
> will
> appear in parenthesis after the agent name.
> </description>
> </property>
>
> <property>
> <name>http.agent.email</name>
> <value>nutch-agent@...</value>
> <description>An email address to advertise in the HTTP 'From'
> request
> header and User-Agent header.</description>
> </property>
>
> <property>
> <name>http.agent.version</name>
> <value>0.7.2</value>
> <description>A version string to advertise in the User-Agent
> header.</description>
> </property>
>
> <property>
> <name>http.timeout</name>
> <value>10000</value>
> <description>The default network timeout, in
> milliseconds.</description>
> </property>
>
> <property>
> <name>http.max.delays</name>
> <value>3</value>
> <description>The number of times a thread will delay when trying to
> fetch a page. Each time it finds that a host is busy, it will wait
> fetcher.server.delay. After http.max.delays attepts, it will give
> up on the page for now.</description>
> </property>
>
> <property>
> <name>http.content.limit</name>
> <value>65536</value>
> <description>The length limit for downloaded content, in bytes.
> If this value is nonnegative (>=0), content longer than it will be
> truncated;
> otherwise, no truncation at all.
> </description>
> </property>
>
> <property>
> <name>http.proxy.host</name>
> <value></value>
> <description>The proxy hostname. If empty, no proxy is
> used.</description>
> </property>
>
> <property>
> <name>http.proxy.port</name>
> <value></value>
> <description>The proxy port.</description>
> </property>
>
> <property>
> <name>http.verbose</name>
> <value>false</value>
> <description>If true, HTTP will log more verbosely.</description>
> </property>
>
> <property>
> <name>http.redirect.max</name>
> <value>3</value>
> <description>The maximum number of redirects the fetcher will
> follow when
> trying to fetch a page.</description>
> </property>
>
A starting point for a price list for Autonomy elements can be found
here:
http://66.249.93.104/search?
q=cache:kExAaIWvJB4J:www.microlinkllc.com/NR/rdonlyres/8C9FA9F7-DEEF-
44C0-93A0-F8DCE6E37569/0/AutonomyGSAProductList.xls%20autonomy%20dish%
20dashboard&hl=en&ct=clnk&cd=8&client=opera
I found this via a Google search. I expect if you look for other
Excel documents you'll find plenty more out there.
--- In search_dev@yahoogroups.com, "wjasonjones" <jasjones@...> wrote:
>
> In the past I have had some bad exeperiences with K2 patches. More
> than once, applying a patch has either broken something else or re-
> introduced a bug fixed by a previous patch. I complained pretty
> strongly about this for a while, and the process has gotten better
> but I still only "patch-up" when I have to - e.g. patch fixes a
> known bug that is impacting my installation.
>
> I have been trying to educate myself about IDOL but still have a
> long way to go in this process. I am anxious to get K2 v7.x in a
> lab and start playing with it. From a technology standpoint, there
> seems to be some advantages to moving to v7. The eRoom connector
> (among others) might be a big deal to us. I haven't even started
> considering the move from a business standpoint... Does anyone have
> any information about the license costs for IDOL functions and
other
> Autonomy products such as AWE?
>
> Jason
>
> --- In search_dev@yahoogroups.com, "Mark Bennett" <mbennett@>
> wrote:
> >
> > Great Ed, K2 is a good product. I was curious, have you folks
> been keeping
> > up with the updates/patches for K2 5.5?
> >
> > Have you guys looked at the IDOL stuff at all? (with Autonomy's
> > acquisition)
> >
> > -----Original Message-----
> > From: search_dev@yahoogroups.com
> [mailto:search_dev@yahoogroups.com] On
> > Behalf Of arentanji
> > Sent: Monday, April 03, 2006 2:24 PM
> > To: search_dev@yahoogroups.com
> > Subject: [search_dev] Re: Welcome new members! Which engines do
> you use?
> >
> > Mark:
> >
> > I'll start off and say that I use Verity K2 5.5. We built a ton
of
> > applications on top of Search 97 and the transition was not easy.
> >
> > Thanks,
> >
> > Ed
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
>
I am.
wjasonjones wrote:
> I see that New Idea Engineering is one of the sponsors for Enterprise
> Search Summit in NYC next week.
>
> Is anyone else from this list planning to attend?
>
>
I believe your operations person said we had extra passes.
If anyone's interested, lemme know and I'll ask her if we still do.
mark
-----Original Message-----
From: search_dev@yahoogroups.com [mailto:search_dev@yahoogroups.com] On
Behalf Of wjasonjones
Sent: Monday, May 15, 2006 8:39 AM
To: search_dev@yahoogroups.com
Subject: [search_dev] Enterprise Search Summit
I see that New Idea Engineering is one of the sponsors for Enterprise
Search Summit in NYC next week.
Is anyone else from this list planning to attend?
Yahoo! Groups Links
"wjasonjones"
<jasjones@...> Sent by: search_dev@yahoogroups.com
05/15/2006 11:38 AM
Please respond to
search_dev@yahoogroups.com
To
search_dev@yahoogroups.com
cc
Subject
[search_dev] Enterprise Search Summit
I see that New Idea Engineering is one of the sponsors
for Enterprise
Search Summit in NYC next week.
Is anyone else from this list planning to attend?
------------------------ Yahoo! Groups Sponsor --------------------~-->
Protect your PC from spy ware with award winning anti spy technology. It's
free.
http://us.click.yahoo.com/97bhrC/LGxNAA/yQLSAA/NhFolB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/search_dev/
<*> To unsubscribe from this group, send an email to:
search_dev-unsubscribe@yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Any U.S. tax advice contained in the body of this e-mail was not intended or written to be used, and cannot be used, by the recipient for the purpose of avoiding penaltiesthat may be imposed under the Internal Revenue Code or applicable state or local tax law provisions. ________________________________________________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer.
Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Ernst & Young LLP (except for Ernst & Young Online and the ey.com website, which track e-mail preferences through a separate process) at this e-mail address by forwarding this message to no-more-mail@.... If you do so, the sender of this message will be notified promptly. Our principal postal address is 5 Times Square, New York, NY 10036. Thank you. Ernst & Young LLP
I see that New Idea Engineering is one of the sponsors for Enterprise
Search Summit in NYC next week.
Is anyone else from this list planning to attend?
I am getting a 401 error with the default nutch setting when trying
to crawl the intranet. I checked the meta tags out and they don't
prevent it from crawling, and there is no username or password
necessary if you are on the network. So I was wondering if anyone
knows a way around it.
Here is the error:
fetch of http://blah/ faled with: java.lang.Exception:
org.apache.nutch.protocol.http.HttpError: HTTP Error: 401
I think it is the user agent info it is passing. Is there any way to
trick it or bypass it with the nutch-default.xml file?
<!-- HTTP properties -->
<property>
<name>http.agent.name</name>
<value>NutchCVS</value>
<description>Our HTTP 'User-Agent' request header.</description>
</property>
<property>
<name>http.robots.agents</name>
<value>NutchCVS,Nutch,*</value>
<description>The agent strings we'll look for in robots.txt files,
comma-separated, in decreasing order of precedence.</description>
</property>
<property>
<name>http.robots.403.allow</name>
<value>true</value>
<description>Some servers return HTTP status 403 (Forbidden) if
/robots.txt doesn't exist. This should probably mean that we are
allowed to crawl the site nonetheless. If this is set to false,
then such sites will be treated as forbidden.</description>
</property>
<property>
<name>http.agent.description</name>
<value>Nutch</value>
<description>Further description of our bot- this text is used in
the User-Agent header. It appears in parenthesis after the agent
name.
</description>
</property>
<property>
<name>http.agent.url</name>
<value>http://lucene.apache.org/nutch/bot.html</value>
<description>A URL to advertise in the User-Agent header. This
will
appear in parenthesis after the agent name.
</description>
</property>
<property>
<name>http.agent.email</name>
<value>nutch-agent@...</value>
<description>An email address to advertise in the HTTP 'From'
request
header and User-Agent header.</description>
</property>
<property>
<name>http.agent.version</name>
<value>0.7.2</value>
<description>A version string to advertise in the User-Agent
header.</description>
</property>
<property>
<name>http.timeout</name>
<value>10000</value>
<description>The default network timeout, in
milliseconds.</description>
</property>
<property>
<name>http.max.delays</name>
<value>3</value>
<description>The number of times a thread will delay when trying to
fetch a page. Each time it finds that a host is busy, it will wait
fetcher.server.delay. After http.max.delays attepts, it will give
up on the page for now.</description>
</property>
<property>
<name>http.content.limit</name>
<value>65536</value>
<description>The length limit for downloaded content, in bytes.
If this value is nonnegative (>=0), content longer than it will be
truncated;
otherwise, no truncation at all.
</description>
</property>
<property>
<name>http.proxy.host</name>
<value></value>
<description>The proxy hostname. If empty, no proxy is
used.</description>
</property>
<property>
<name>http.proxy.port</name>
<value></value>
<description>The proxy port.</description>
</property>
<property>
<name>http.verbose</name>
<value>false</value>
<description>If true, HTTP will log more verbosely.</description>
</property>
<property>
<name>http.redirect.max</name>
<value>3</value>
<description>The maximum number of redirects the fetcher will
follow when
trying to fetch a page.</description>
</property>
Our situation is very similar to what Rameez describes. We formerly
used K2 Knowledge trees to implement a Yahoo style browse page. We
now use parametric search tied to our global taxonomy to implement
more of a faceted navigation page.
BTW: our global taxonomy currently has ~1100 nodes. Is this large
or small compared to what others are using?
I ask this because user feedback seems to suggest that users aren't
necessarily happy/comfortable with our browse interface and it seems
like it is mostly due to them not fully understanding how the
taxonomy interrelates and thus have difficulty navigating
successfully to relevant content. We are currently playing with
Topic Maps as a possible means of helping users better understand
the relationships of our taxonomy nodes.
Are others experiencing this?
Jason
--- In search_dev@yahoogroups.com, "Rameez Meerasahib"
<rameez.meerasahib@...> wrote:
>
> We have implemented Parametric Indexing from Verity for taxonomy
Navigation.
> We had issues in sorting and relevancy of documents in categories
initially.
> Verity took almost 6-7 months to fix the issues. SP2 of 5.5 has
all fixes
> and it is doing well now. Our implementations are quite huge with
large
> number of taxonomy nodes and huge size of PI's. We have experienced
> Knowledge Tree from Verity before using PI's.
>
>
>
> Taxonomies have a very important role to play in intranet/Internet
scenario
> in coming days as number of documents returned for a normal search
is
> growing exponentially. I believe we will see more customers for
> taxonomies/categorization…
>
> Regards,
> Rameez
>
We have implemented Parametric Indexing from Verity for taxonomy Navigation. We had issues in sorting and relevancy of documents in categories initially. Verity took almost 6-7 months to fix the issues. SP2 of 5.5 has all fixes and it is doing well now. Our implementations are quite huge with large number of taxonomy nodes and huge size of PI's. We have experienced Knowledge Tree from Verity before using PI's.
Taxonomies have a very important role to play in intranet/Internet scenario in coming days as number of documents returned for a normal search is growing exponentially. I believe we will see more customers for taxonomies/categorization…
Though taxonomies got huge press back in the late 90s and early 2000s, I still see quite a bit of interest in them. The odd thing is, we don't seem them being actively used as often, although some companies do have them implemented. And the term itself, "taxonomies", seems to mean different things to different people.
I'm kind of curious what you folks have actually seen used or have implemented, and what business objective it was in support of?
Examples of how folks use:
* You could organize your content sort of like Yahoo and use it for browsing
* Or you could use it for searching, and let people drill down through results lists; to me this is the most useful.
* Some folks actually mean tagging documents, automatic document classification, etc, when they speak of taxonomies
* While others, who have used Verity, think of taxonomies in terms of Topic trees and Agents
* Lately the "faceted" search trend has spawned "multi-dimensional" taxonomies, where you can navigate by product line, or by department, or by "business cycle", etc. Interesting stuff, though I've only seem one client really go full tilt with this.
* Some vendors lump taxonomies in with automatic document clustering based on keywords and phrases and call the result "topics" or "taxonomies"; our history has been more with human created, or at least human supervised topics, ala the SageWare stuff, etc.
Then there's the question of where taxonomies come from:
* Back in the 1990s I tended to use vi and notepad
J
* There's "canned" taxonomies for certain industries, for example pharmaceuticals
* There's "in house" taxonomies, very specific to the language used at that company or agency
* Or you can try to mix the last 2 - start with a canned taxonomy then glom on your custom vocabulary and products
* And of course there's a whole bunch of statistically based automatic creation tools - lots of folks have offered those - your mileage may vary J
Regardless of how they are generated, I tend to classify taxonomies into 1 of 3 broad categories:
* Subject Based Taxonomies - some expert or library sciences person has logically organized a particular domain of knowledge
* Content Based Taxonomies - somewhat similar to the above, but driven more by the content that is actually present - the automated tools usually go this route
* Behavior Based Taxonomies - focuses on organizing and optimizing searches based on what users are actually searching for - "tweak your top 1,000 searches first" (and their related areas) - in my mind this is the best "bang for the buck" if personnel resources are limited
Hype aside, what are folks actually IMPLEMENTING and using?
Hello,
This email message is a notification to let you know that
a file has been uploaded to the Files area of the search_dev
group.
File : /ultraseek/Xpa_win.zip
Uploaded by : miles_b_kehoe <mbk@...>
Description : Hello World basic Ultraseek XPA Sample
You can access this file at the URL:
http://groups.yahoo.com/group/search_dev/files/ultraseek/Xpa_win.zip
To learn more about file sharing for your group, please visit:
http://help.yahoo.com/help/us/groups/files
Regards,
miles_b_kehoe <mbk@...>
Though taxonomies got huge press back in the late 90s and
early 2000s, I still see quite a bit of interest in them. The odd thing is, we
don’t seem them being actively used as often, although some companies do
have them implemented. And the term itself, “taxonomies”, seems to
mean different things to different people.
I’m kind of curious what you folks have actually seen
used or have implemented, and what business objective it was in support of?
Examples of how folks use:
* You could organize your content sort of like Yahoo and use
it for browsing
* Or you could use it for searching, and let people drill
down through results lists; to me this is the most useful.
* Some folks actually mean tagging documents, automatic
document classification, etc, when they speak of taxonomies
* While others, who have used Verity, think of taxonomies in
terms of Topic trees and Agents
* Lately the “faceted” search trend has spawned “multi-dimensional”
taxonomies, where you can navigate by product line, or by department, or by “business
cycle”, etc. Interesting stuff, though I’ve only seem one client
really go full tilt with this.
* Some vendors lump taxonomies in with automatic document
clustering based on keywords and phrases and call the result “topics”
or “taxonomies”; our history has been more with human created, or
at least human supervised topics, ala the SageWare stuff, etc.
Then there’s the question of where taxonomies come
from:
* Back in the 1990s I tended to use vi and notepad J
* There’s “canned” taxonomies for certain
industries, for example pharmaceuticals
* There’s “in house” taxonomies, very
specific to the language used at that company or agency
* Or you can try to mix the last 2 - start with a canned
taxonomy then glom on your custom vocabulary and products
* And of course there’s a whole bunch of statistically
based automatic creation tools - lots of folks have offered those - your mileage
may vary J
Regardless of how they are generated, I tend to classify taxonomies
into 1 of 3 broad categories:
* Subject Based Taxonomies - some expert or library sciences
person has logically organized a particular domain of knowledge
* Content Based Taxonomies - somewhat similar to the above,
but driven more by the content that is actually present - the automated tools
usually go this route
* Behavior Based Taxonomies - focuses on organizing and
optimizing searches based on what users are actually searching for - “tweak
your top 1,000 searches first” (and their related areas) - in my mind
this is the best “bang for the buck” if personnel resources are
limited
Hype aside, what are folks actually IMPLEMENTING and using?
Nevermind I got it going with cygwin. I will play around with it
now. I was wondering is there any other good free/open source search
engines w/ the crawler and parsers like nutch?
--- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@...>
wrote:
>
> Well the place I work for is a big open source shop. So they are
big
> into java and anything low cost/open source. Which Google doesn't
> fall under. Because of the size of the intranet, I thought open
> source could produce similar results as google. I don't intend on
> them replacing the google box, because I'm sure its results are
> better and quicker than most, but my job is to provide at least
> documentation on what open source has to offer, which is why I want
> to try and get nutch running. (I will look into Ultraseek also).
So
> it is not a performance issue with the google compliance it is
> strictly cost based.
>
> The specific glitch I guess I am running into with the nutch set up
> to get it to crawl is trying to run any of the unix commands
through
> cygwin. According to the instructions on the link listed below I
> should be able to type in bin/nutch and it will display
documentation
> on Nutch, but I don't get that to happen. I might have my folders
> setup wrong.
> http://lucene.apache.org/nutch/tutorial8.html#Getting+Started
> Thanks for the help. Most appreciated!!!
> -Tom
>
>
>
> --- In search_dev@yahoogroups.com, "Mark" <mbennett@> wrote:
> >
> > We know one guy who got nutch going quickly, and he is not really
a
> > programmer. I was impressed by what he got done in a short
amount
> of
> > time. I'll mention this group to him, so maybe he can comment
> > further. Was there a specific glitch with the Windows Nutch
setup?
> >
> > I was curious if you could talk more about, if you have the Google
> > box, why you might be looking at Nutch? Did the Google box not
live
> > up to expectations? Lucene/Nutch are fine open source choices;
if
> you
> > are looking at commercial, then that depends on requirements and
> > budget. Depending on the # of documents, you might conisder
> Ultraseek.
> >
> > Mark
> >
> > --- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@>
> > wrote:
> > >
> > > I am researching possibly replacing the Google appliance with
> > > nutch/lucene technology. Since it is on the intranet scale I
> would
> > > like to give it a test run. The problem is the documentation
on
> how to
> > > get nutch working in the windows XP environment isn't that
> clear. I
> > > bought the book on lucene, and basically got the main concepts
of
> > > lucene and nutch down, I just need help in getting them started
> and
> > > creating an index would be step one. Any help would be great.
> > > thanks!!!
> > >
> >
>
Well the place I work for is a big open source shop. So they are big
into java and anything low cost/open source. Which Google doesn't
fall under. Because of the size of the intranet, I thought open
source could produce similar results as google. I don't intend on
them replacing the google box, because I'm sure its results are
better and quicker than most, but my job is to provide at least
documentation on what open source has to offer, which is why I want
to try and get nutch running. (I will look into Ultraseek also). So
it is not a performance issue with the google compliance it is
strictly cost based.
The specific glitch I guess I am running into with the nutch set up
to get it to crawl is trying to run any of the unix commands through
cygwin. According to the instructions on the link listed below I
should be able to type in bin/nutch and it will display documentation
on Nutch, but I don't get that to happen. I might have my folders
setup wrong.
http://lucene.apache.org/nutch/tutorial8.html#Getting+Started
Thanks for the help. Most appreciated!!!
-Tom
--- In search_dev@yahoogroups.com, "Mark" <mbennett@...> wrote:
>
> We know one guy who got nutch going quickly, and he is not really a
> programmer. I was impressed by what he got done in a short amount
of
> time. I'll mention this group to him, so maybe he can comment
> further. Was there a specific glitch with the Windows Nutch setup?
>
> I was curious if you could talk more about, if you have the Google
> box, why you might be looking at Nutch? Did the Google box not live
> up to expectations? Lucene/Nutch are fine open source choices; if
you
> are looking at commercial, then that depends on requirements and
> budget. Depending on the # of documents, you might conisder
Ultraseek.
>
> Mark
>
> --- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@>
> wrote:
> >
> > I am researching possibly replacing the Google appliance with
> > nutch/lucene technology. Since it is on the intranet scale I
would
> > like to give it a test run. The problem is the documentation on
how to
> > get nutch working in the windows XP environment isn't that
clear. I
> > bought the book on lucene, and basically got the main concepts of
> > lucene and nutch down, I just need help in getting them started
and
> > creating an index would be step one. Any help would be great.
> > thanks!!!
> >
>
We know one guy who got nutch going quickly, and he is not really a
programmer. I was impressed by what he got done in a short amount of
time. I'll mention this group to him, so maybe he can comment
further. Was there a specific glitch with the Windows Nutch setup?
I was curious if you could talk more about, if you have the Google
box, why you might be looking at Nutch? Did the Google box not live
up to expectations? Lucene/Nutch are fine open source choices; if you
are looking at commercial, then that depends on requirements and
budget. Depending on the # of documents, you might conisder Ultraseek.
Mark
--- In search_dev@yahoogroups.com, "les_claypoo1" <thomasgkrier@...>
wrote:
>
> I am researching possibly replacing the Google appliance with
> nutch/lucene technology. Since it is on the intranet scale I would
> like to give it a test run. The problem is the documentation on how to
> get nutch working in the windows XP environment isn't that clear. I
> bought the book on lucene, and basically got the main concepts of
> lucene and nutch down, I just need help in getting them started and
> creating an index would be step one. Any help would be great.
> thanks!!!
>
Hi All,
When you index content that contains frames, what do your users want
to see from the results list when they click on a link?
In other words, do they want to see the entire entry as it would
appear in frames, or is it OK to just show the individual frame that
had the matching content? (which would be the default for most engines)
Curious as to how you folks have handled this in the past.
Mark
I am researching possibly replacing the Google appliance with
nutch/lucene technology. Since it is on the intranet scale I would
like to give it a test run. The problem is the documentation on how to
get nutch working in the windows XP environment isn't that clear. I
bought the book on lucene, and basically got the main concepts of
lucene and nutch down, I just need help in getting them started and
creating an index would be step one. Any help would be great.
thanks!!!
Just started to think about our upgrade
plans. This would be a discretionary project for us, so I expect we will
not be on the cutting edge. I see 4 possibilities: Stay on K2 5.5 until
support is cut, move to K2 v7 and stay with K2 through 8 and 9, move to
IDOL and last open the doors to any search engine and do some sort of shoot
out with all available vendors.
I suspect that we will do the least
effort course, but who can tell?
Open question to the group:
What are other people using? Any good
stories to tell about other vendors? Any vendors to avoid?
Thanks,
Ed
"Mark Bennett"
<mbennett@...> Sent by: search_dev@yahoogroups.com
04/07/2006 02:25 AM
Please respond to
search_dev@yahoogroups.com
To
<search_dev@yahoogroups.com>
cc
Subject
RE: [search_dev] Re: Welcome new members!
Which engines do you use?
Great Ed, K2 is a good product. I was curious,
have you folks been keeping
up with the updates/patches for K2 5.5?
Have you guys looked at the IDOL stuff at all? (with Autonomy's
acquisition)
-----Original Message-----
From: search_dev@yahoogroups.com [mailto:search_dev@yahoogroups.com] On
Behalf Of arentanji
Sent: Monday, April 03, 2006 2:24 PM
To: search_dev@yahoogroups.com
Subject: [search_dev] Re: Welcome new members! Which engines do you use?
Mark:
I'll start off and say that I use Verity K2 5.5. We built a ton of
applications on top of Search 97 and the transition was not easy.
Thanks,
Ed
Yahoo! Groups Links
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/search_dev/
<*> To unsubscribe from this group, send an email to:
search_dev-unsubscribe@yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Any U.S. tax advice contained in the body of this e-mail was not intended or written to be used, and cannot be used, by the recipient for the purpose of avoiding penaltiesthat may be imposed under the Internal Revenue Code or applicable state or local tax law provisions. ________________________________________________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer.
Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Ernst & Young LLP (except for Ernst & Young Online and the ey.com website, which track e-mail preferences through a separate process) at this e-mail address by forwarding this message to no-more-mail@.... If you do so, the sender of this message will be notified promptly. Our principal postal address is 5 Times Square, New York, NY 10036. Thank you. Ernst & Young LLP
In the past I have had some bad exeperiences with K2 patches. More
than once, applying a patch has either broken something else or re-
introduced a bug fixed by a previous patch. I complained pretty
strongly about this for a while, and the process has gotten better
but I still only "patch-up" when I have to - e.g. patch fixes a
known bug that is impacting my installation.
I have been trying to educate myself about IDOL but still have a
long way to go in this process. I am anxious to get K2 v7.x in a
lab and start playing with it. From a technology standpoint, there
seems to be some advantages to moving to v7. The eRoom connector
(among others) might be a big deal to us. I haven't even started
considering the move from a business standpoint... Does anyone have
any information about the license costs for IDOL functions and other
Autonomy products such as AWE?
Jason
--- In search_dev@yahoogroups.com, "Mark Bennett" <mbennett@...>
wrote:
>
> Great Ed, K2 is a good product. I was curious, have you folks
been keeping
> up with the updates/patches for K2 5.5?
>
> Have you guys looked at the IDOL stuff at all? (with Autonomy's
> acquisition)
>
> -----Original Message-----
> From: search_dev@yahoogroups.com
[mailto:search_dev@yahoogroups.com] On
> Behalf Of arentanji
> Sent: Monday, April 03, 2006 2:24 PM
> To: search_dev@yahoogroups.com
> Subject: [search_dev] Re: Welcome new members! Which engines do
you use?
>
> Mark:
>
> I'll start off and say that I use Verity K2 5.5. We built a ton of
> applications on top of Search 97 and the transition was not easy.
>
> Thanks,
>
> Ed
>
>
>
>
>
>
>
>
>
> Yahoo! Groups Links
>
Great Ed, K2 is a good product. I was curious, have you folks been keeping
up with the updates/patches for K2 5.5?
Have you guys looked at the IDOL stuff at all? (with Autonomy's
acquisition)
-----Original Message-----
From: search_dev@yahoogroups.com [mailto:search_dev@yahoogroups.com] On
Behalf Of arentanji
Sent: Monday, April 03, 2006 2:24 PM
To: search_dev@yahoogroups.com
Subject: [search_dev] Re: Welcome new members! Which engines do you use?
Mark:
I'll start off and say that I use Verity K2 5.5. We built a ton of
applications on top of Search 97 and the transition was not easy.
Thanks,
Ed
Yahoo! Groups Links
Ed raises an interesting point; Search 97 was pretty common out there
for a while, and was a pretty decent technology. I know some folks are
sill using Search 97 based on quick search of Google; anyone care to
admit it just among us friends?