Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 3139 - 3169 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
3139
Well, that explains it. Thanks! :) Frank...
Frank McCown
mccownf
Offline Send Email
Aug 1, 2006
2:42 pm
3140
Hi, I have been running heritrix for 3 days on a big pack of seeds (420 000). It has been ended normaly but downloaded only 99GB and only about 4 millions...
goblin_cz
Offline Send Email
Aug 1, 2006
9:37 pm
3141
Download Complete Business Package download 130 Million Email Address 250+ professional layered Photoshop PSD website templates 650+ HTML Website and...
cfsdc
moh_fadel_1000
Offline Send Email
Aug 1, 2006
11:40 pm
3142
... Yes -- it means that the predicted false-positive rate inherent to a bloom filter won't go over 1-in-4million (1 in 2^22) up through 125million inserts. ...
Gordon Mohr
gojomo
Offline Send Email
Aug 1, 2006
11:46 pm
3143
Download Complete Business Package download 130 Million Email Address 250+ professional layered Photoshop PSD website templates 650+ HTML Website and...
bus4all
moh_fadel_1000
Offline Send Email
Aug 1, 2006
11:47 pm
3144
... Normally, this would mean that the hostname DNS lookup for those URLs failed. With no successful DNS lookup, the URL cannot be fetched. Are these same URLs...
Gordon Mohr
gojomo
Offline Send Email
Aug 2, 2006
12:24 am
3145
... rate ... Actually that might be good enough. My current idea is to have all 8 crawlers (total 8) download 1B pages in total. Assume ideal page distribution...
joehung302
Offline Send Email
Aug 2, 2006
12:51 am
3146
... Yes -- an URL is tested against (and inserted in) the already-included set just before it is queued/scheduled, not when it is downloaded. - Gordon @ IA...
Gordon Mohr
gojomo
Offline Send Email
Aug 2, 2006
1:45 am
3147
Hi, I have an application that runs multiple instances of Heritrix in a single JVM. The application creates a new Heritrix instance to run each harvest and...
nicwaight
Offline Send Email
Aug 2, 2006
4:11 am
3148
... The latter I'd say. Do you have a suggested patch? St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Aug 2, 2006
6:36 am
3149
In Heritrix 1.9, after a crawl job has been paused, the user can click "View or Edit Frontier URIs" and be taken to a screen where they can add, view, or...
Frank McCown
mccownf
Offline Send Email
Aug 2, 2006
5:35 pm
3150
... Seeds often have special treatment, for example by changing the crawl's scope -- so you might want to add URIs that are not treated specially. Note that if...
Gordon Mohr
gojomo
Offline Send Email
Aug 2, 2006
5:58 pm
3151
Heritrix is running on Solaris, but my browser is running on Windows where my file is located. A file upload (Browse) button would be useful in this...
Frank McCown
mccownf
Offline Send Email
Aug 2, 2006
6:08 pm
3152
... Yes, they are. ... settings ... I will explore that. ... begin ... would be ... I am using 1.8 version. I add surt prefix (+http://(cz,) to my seeds.txt...
goblin_cz
Offline Send Email
Aug 3, 2006
8:21 am
3153
Sorry, I forgotten... How many Toe Threads (max-toe-threads) can be on broad crawl with 4GB RAM and Pentium III 900MHz? And what exactly mean seeds.ignored -...
goblin_cz
Offline Send Email
Aug 3, 2006
8:31 am
3154
In heritrix 1.9 , Does heritrix support craw a web which has been crawled before?...
vretr
Offline Send Email
Aug 3, 2006
1:21 pm
3155
... Yes (If I understand you correctly). The second time it runs, it has no knowledge of the first run and will happily travel the same path as the previous...
Michael Stack
stackarchiveorg
Offline Send Email
Aug 3, 2006
4:23 pm
3156
... Hard to say. Start with default and work your way up. See how your throughput changes. Watch net and disk i/o and your CPU consumption. Try and balance...
Michael Stack
stackarchiveorg
Offline Send Email
Aug 3, 2006
8:51 pm
3158
Thanks for filing the bug Eric. Meantime, can you take a look at the below RFEs when you get a chance: ...
Michael Stack
stackarchiveorg
Offline Send Email
Aug 4, 2006
6:24 am
3159
hi . The problem that I hava met one, when I collection a designated web site, whenever arrival 99%, do not carry out collection. They needs 7 or 8 hours . ...
vretr
Offline Send Email
Aug 4, 2006
7:15 am
3160
Hi all, I am new to using heritrix. The manual says that the heritrix supports only ISO format. Anybody has worked on making heritrix to follow utf-8 charset?...
thiru_sundaram
Offline Send Email
Aug 4, 2006
11:09 pm
3161
Hi, i have run broad crawl (with deciding scope) and everything was ok. But today we have got problem with electricity and we have to reboot the server. I made...
goblin_cz
Offline Send Email
Aug 7, 2006
6:06 pm
3162
Who wants to make $1000+ per day? Just Entering Simple Data From Home What if I told you that you can stay at home, quit your existing job, use my amazing...
agency_bus_1000
Offline Send Email
Aug 7, 2006
11:16 pm
3163
Does this parameter "total-bandwidth-usage-KB-sec" count against (1) downloaded data, or (2) saved data (into arc files). The reason why I ask this question...
joehung302
Offline Send Email
Aug 7, 2006
11:56 pm
3164
Hi there, Is there a way to obtain the UID (method getUID() of CrawlJob instance), from a processor and from the scope? If yes, how that can be done? If not,...
tizo_trico
Offline Send Email
Aug 8, 2006
12:16 am
3165
Great Bonus : Download 130 Million Email Address ourComplete Business Package: 250+ professional layered Photoshop PSD website templates 650+ HTML Website and...
ivan_torris
Offline Send Email
Aug 8, 2006
1:23 am
3166
I have been trying unsuccessfully to do a CVS checkout of Heritrix the past few weeks based on the instructions here: http://crawler.archive.org/cvs-usage.html...
Frank McCown
mccownf
Offline Send Email
Aug 8, 2006
2:08 pm
3167
Fixed. Thanks. St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Aug 8, 2006
2:58 pm
3168
Not without acrobatics (If running single Heritrix instance: Heritrix.getSingleInstance().getJobHandler().getCurrentJob().getUID();). I'd be interested to...
Michael Stack
stackarchiveorg
Offline Send Email
Aug 8, 2006
7:16 pm
3169
Please supply Heritrix version and how you set up the 'recovery'? Were you doing 'fast' checkpointing or letting the crawler manage the bdbje log files for...
Michael Stack
stackarchiveorg
Offline Send Email
Aug 8, 2006
7:39 pm
Messages 3139 - 3169 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help