Skip to search.
archive-crawler

Group Information

  • Members: 763
  • Category: Cyberculture
  • Founded: Dec 1, 2002
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.
Notice  Upcoming calendar upgrade: Yahoo! Groups calendars are being upgraded to a much improved version. You may not have access to the calendar (for up to 24 hours) when this group is upgraded. We sincerely apologize for this inconvenience.

Home

 

Activity within 7 days:

1 New Member - 4 New Messages - New Questions

Description

Discussion group for the Heritrix open-source archival web crawler project.

Most Recent Messages

  (View All)
(Group by Topic)
Advanced
   Start Topic
HashCrawlMapper & -63 failures Re: [archive-crawler] Re: Slow (?) lo
... Looking up the code in the FetchStatusCodes class which collects such constants, -63 means a URI failed because its own prerequisitite URI (such as a
Posted - Mon May 21, 2012 5:18 pm
Gordon Mohr
gojomo
Offline Offline
Send Email Send Email
Re: Slow (?) loading millions of seeds
Thanks Kris, But I have configured crawler-beans.cxml as the link as follows: <bean id="hashCrawlMapper" class="org.archive.crawler.processor.HashCrawlMapper">
Posted - Mon May 21, 2012 12:39 pm
Mahmoud A. Mubarak
mahmoudmubar...
Offline Offline
Send Email Send Email
Re: Slow (?) loading millions of seeds
Thanks Kris, But I have configured crawler-beans.cxml as the link as follows: <bean id="hashCrawlMapper" class="org.archive.crawler.processor.HashCrawlMapper">
Posted - Mon May 21, 2012 12:34 pm
Mahmoud A. Mubarak
mahmoudmubar...
Offline Offline
Send Email Send Email
Re: Crawl a limited number of documents/host
http://tech.groups.yahoo.com/group/archive-crawler/message/7672 vielleicht hilft dir das weiter ? Gruß Tom
Posted - Mon May 21, 2012 9:31 am
Thomas Zeithaml
tomzeithaml
Offline Offline
Send Email Send Email
Crawl a limited number of documents/host
Hi, in Heritrix 1.5 was the possibility to limit the number of crawled documents/host with the "HostnameQueueAssignmentPolicy" and the setting in
Posted - Sat May 19, 2012 12:47 pm
mirschi74
Offline Offline
Send Email Send Email
Add archive-crawler to your personalized My Yahoo! page Add to My Yahoo! XML What's This?

Message History

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 113 31 14 37 20
2011 46 50 27 57 27 35 52 61 29 42 32 74
2010 90 66 39 58 49 46 31 77 54 31 46 113
2009 30 51 42 72 51 38 44 54 62 68 42 74
2008 72 80 60 72 90 89 39 56 64 63 29 33
2007 132 87 140 213 71 118 86 52 41 70 102 129
2006 126 113 46 54 70 104 140 86 152 119 78 64
2005 138 177 81 62 127 114 46 88 71 76 85 106
2004 56 3 20 62 135 63 168 204 130 72 97 82
2003 14 18 20 15 25 41 14 2 9 30 33
2002 1
What is Yahoo! Answers?

Yahoo! Answers, a new Yahoo! community, is a question and answer exchange where the world gathers to share what they know...and make each other's day. People can ask questions on any topic, and help others out by answering their questions.

What is Yahoo! Answers?

Yahoo! Answers, a new Yahoo! community, is a question and answer exchange where the world gathers to share what they know...and make each other's day. People can ask questions on any topic, and help others out by answering their questions.

Questions in Computers & Internet

  • Questions are currently unavailable.

Want to help answer other questions? Go to Yahoo! Answers


Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help