Hi, We are looking at enterprise search tools for our organization. We do not have an enterprise search now. I have a question about searching structured and...
... Jeff: Most search technologies can handle both structured and unstructured content pretty well; some even support what you';d think of as a "JOIN"...
I missed the original email. Drupal & Solr are good friends, actually: http://drupal.org/project/apachesolr Otis -- Sematext -- http://sematext.com/ -- Lucene...
... It's really a matter of two things: 1. how to get the data out of your structured repository, preserving any metadata 2. how to index, including indexing...
SearchDev has been getting lots and lots of traffic lately, thanks to all the great questions and answers that get posted. The volunteer moderators are now...
Has anyone been using the Business Console brought over from K2 to the IDOL product? We'd like to talk with anyone who has, whether you decided to use it or...
Hi, Has anyone had any luck indexing Office 2007 files (docx, pptx, etc.) using Ultraseek v5.8? I asked customer support about this and apparently office 2007...
Ravi fyi What version of Ultraseek 5.8 are you using? Ultraseek sent me a later version of Keyview than was downloaded in the original 5.8.7 upgrade bundle...
Anyone has an idea of the api calls Autonomy Notes connector uses while indexing? We are having issue with notes server dropping connection during the indexing...
Hi, Can someone please tell me which manufacturers of non-open source products, are specialised in indexing/searching database content(mySQL) ? Of course most...
Hi, Sematext has this: http://sematext.com/product-db-indexer.html (MySQL mention in the FAQ at the bottom) There is DataImportHandler for Solr, also free. ...
The history of eBay search (I'm an alumnus 2002-2005) as I remember it, at least since 1999 or so. 1) Thunderstone (1999(?) - 2003), run on the same machines...
Not sure what you mean by specializing. Endeca does a beautiful job of indexing mySql as well as oracle sybase etc. What are you looking for? On Feb 22, 2009,...
In preparing documents for indexing, I'd like to get any suggestions for language detection (natural language, not programming language) tools. Ideally, I'd...
... Here's a useful page: http://code.activestate.com/recipes/326576/ (lots of good links at the bottom as well). We haven't used this ourselves, but the...
... Excellent article, Mark. We've used the IFilters approach in our Flax system, which does at least work on Windows systems (most of the time, and especially...
Detecting primary language is fairly straightforward, detecting secondary languages is a bit more challenging. All of the systems I know about use n-grams....
Basis Technology licenses a product (called the Rosette Language Identifier) that uses the n-gram approach mentioned earlier. It identifies 55 different...
Hi, The filter stuff is not really about LangID. In Java/Apache/Lucene world we have an equivalent that works well (and similar to how Charlie described...
... Hi Miles, We are currently using the ABC and the ACC. We are using the ABC for promotions - AKA Quicklinks (Ultraseek) and ACC for our categories. Its...
Does anyone have a proven project or program for improving relevance in Japanese language queries in IDOL? Or a set of resources to point me at so I can get my...
Hi Ed, I don't have answers specifically for Japanese, though we have dealt a bit with CJK in the past. We do have an article on Autonomy relevance in general:...
Mark: Thanks for the quick reply. These are some interesting and valuable articles. We are applying some of these techniques now with our English queries. My...
In my experience with Japanese search in Ultraseek, the customer needs to add terms to the user stemming dictionary. Terms like place names and product names...
I can offer a few general Japanese / Chinese comments, though not specific to your engine: 0: You might try to quantify if they are having trouble with...
HI Ed Just of curiosity,is your content a mix of English and Japanese or just Japanese and wht is the charset in the metadata fields..we have problem with...
Kalyan: We have a mixture of English and other language content. Japanese is one of our top 10 content languages. English accounts for 90% of our content in...