Search the web
Sign In
New User? Sign Up
archive-crawler

Group Information

  • Members: 581
  • Category: Cyberculture
  • Founded: Dec 1, 2002
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Stay up to speed on the latest Groups news and updates, visit the Groups blog today!

Home

 

Activity within 7 days:

2 New Members - 10 New Messages - New Questions

Description

Discussion group for the Heritrix open-source archival web crawler project.

Most Recent Messages

  (View All)
(Group by Topic)
Advanced
   Start Topic
Re: poor performance
Hi we have found what was the problem with speed we had set mirrorwriter processor, which caused a lot of disk operations which slowed disk down and used a lot
Posted - Sat Jul 4, 2009 7:23 am
nukleonrus
Offline Offline
Send Email Send Email
homepage content extraction
Hi, Is there any way of extracting contents of homepage given a URI under that domain/host at runtime in heritrix? ex: URI is,
Posted - Thu Jul 2, 2009 8:35 am
ramab1988
Offline Offline
Send Email Send Email
Re: how should the field "Content-Length" be calculated?
cross-posted from [Archive-access-discuss] On Tue, 30 Jun 2009 13:47:51 -0400 Zhenzhen Xue <zjuzhenzhen@...> ... interesting. is this really necessary?
Posted - Wed Jul 1, 2009 6:01 pm
steve@...
stearcorg
Online Now Online Now
Send Email Send Email
Reloading processor classes in Heritrix 2.0.2 without restarting the
Hi, I'm using Heritrix 2.0.2 with a few custom processor classes which I use to extract relevant data from the HTML (by replacing ARCWriterProcessor with one
Posted - Wed Jul 1, 2009 7:35 am
Enrico Detoma
enrico.detoma@...
Send Email Send Email
Re: Crawl stops after some time
Hi Gordon, An instance pretty much stopped ( like 0 active toe threads of 50 ). But it continued after sometime. Regards Abin Varghese ... From: Gordon Mohr
Posted - Wed Jul 1, 2009 2:07 am
Ebin
mail2abin
Offline Offline
Send Email Send Email
Add archive-crawler to your personalized My Yahoo! page Add to My Yahoo! XML What's This?

Message History

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2009 30 51 42 72 51 38 4
2008 72 80 60 72 90 89 39 56 64 63 29 33
2007 132 87 140 213 71 118 86 52 41 70 102 129
2006 126 113 46 54 70 104 140 86 152 119 78 64
2005 138 177 81 62 127 114 46 88 71 76 85 106
2004 56 3 20 62 135 63 168 204 130 72 97 82
2003 14 18 20 15 25 41 14 2 9 30 33
2002 1
What is Yahoo! Answers?

Yahoo! Answers, a new Yahoo! community, is a question and answer exchange where the world gathers to share what they know...and make each other's day. People can ask questions on any topic, and help others out by answering their questions.

What is Yahoo! Answers?

Yahoo! Answers, a new Yahoo! community, is a question and answer exchange where the world gathers to share what they know...and make each other's day. People can ask questions on any topic, and help others out by answering their questions.

Questions in Computers & Internet

  • Questions are currently unavailable.

Want to help answer other questions? Go to Yahoo! Answers

Group Email Addresses


Copyright © 2007 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help