Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Heritrix processor for use with rainbow   Message List  
Reply | Forward Message #1905 of 6151 |
Dear all,

We have made available a Heritrix processor that interfaces with
rainbow, the most widely known, and perhaps most widely used, text
classification system in the last decade.

If you include this processor in your Heritrix crawls, you can either
focus crawls to a particular topic by training rainbow to recognize
the topic, or else weed out unwanted pages by training rainbow to
spot those pages.

For more information, download the software (113K) from
http://www.metacombine.org/software. Grab the file named
"metacombine_focusedCrawl_module1.0.tar.gz". There is a
README for brief install instructions, and a .pdf for more complete
documentation. Feedback welcomed.

Saurabh Pathak, Emory University <spatha2@...>
Donna Bergmark, Cornell University <bergmark@...>






Fri Jun 3, 2005 12:46 am

bergmark_d
Offline Offline
Send Email Send Email

Forward
Message #1905 of 6151 |
Expand Messages Author Sort by Date

Dear all, We have made available a Heritrix processor that interfaces with rainbow, the most widely known, and perhaps most widely used, text classification...
bergmark_d
Offline Send Email
Jun 3, 2005
12:47 am

... Thank you both for the excellent contribution. The doc. is really great too: i.e. Overview.pdf (I like the suggestion of the classifier being used to...
stack
stackarchiveorg
Offline Send Email
Jun 3, 2005
6:13 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help