Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 2295 - 2324 of 6143   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
2295
The sourceforge-link is wrong. It should be: http://sourceforge.net/tracker/index.php?func=detail&aid=1106992&group_id=73833&atid=539102 regards ... Søren...
svc@...
svc400
Offline Send Email
Nov 1, 2005
3:36 pm
2296
... Exception is coming from here: http://crawler.archive.org/xref/org/archive/crawler/framework/CrawlController.html#651 Are you using bdbfrontier? What kind...
stack
stackarchiveorg
Offline Send Email
Nov 1, 2005
5:56 pm
2297
Sounds like a regression in Heritrix, Jay. Can we have the original page URL. Will help making the fix (You can send privately if you like). Thanks, St.Ack...
stack
stackarchiveorg
Offline Send Email
Nov 1, 2005
5:59 pm
2298
... Thanks for the correction Søren. St.Ack...
stack
stackarchiveorg
Offline Send Email
Nov 1, 2005
6:48 pm
2299
Will heritrix do HTTP1.1 requests ? If not - any plans to make it do ? best -- Bjarne Andersen IT-udvikler STATSBIBLIOTEKET Universitetsparken 8000 Århus C ...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Nov 2, 2005
11:25 am
2300
Hi! I can't get the org.archive.crawler.frontier.IPQueueAssignmentPolicy to work It seems that host-names are still Queue-names ?? the delay values...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Nov 2, 2005
11:58 am
2301
Not at the moment. There has been talk about making this an option, but no concrete plans last time I heard. I think people were unsure how you could best...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Nov 2, 2005
12:37 pm
2302
I mostly thought of the option to zip the content and reduce bandwidth best Bjarne Andersen ... -- Bjarne Andersen IT-udvikler STATSBIBLIOTEKET ...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Nov 2, 2005
2:58 pm
2303
Hi all, I downloaded and set up the 1.5 version of Heritrix from svn, on the hopes that its memory performance was significantly better than the older 1.3...
Karl Wright
daddywri
Offline Send Email
Nov 2, 2005
3:23 pm
2304
I don't think it is correct to extand the memory usage in a linear fashion like that. I'm currently running a crawl that has completed 5.3 million documents...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Nov 2, 2005
3:42 pm
2305
Hi, I want to know that how heretrix stops toeThreads from copying the already seen URIs in FrontierDB. is there any chance of DUPLICACY OF URIs in Database. ...
callforshadab
Offline Send Email
Nov 3, 2005
7:30 am
2306
The only thing I did was to: * Login into the web admin. * Create a new job based on the default template * Settings / change 'user agent', and 'from' fields *...
Michael Hansen
flexhansen
Offline Send Email
Nov 3, 2005
7:33 am
2307
I'm not exactly sure what you are asking about, but I'll try to answer. The ToeThreads do not handle duplicate detection. This is done in the Frontier....
Kristinn Sigurdsson
kristsi25
Offline Send Email
Nov 3, 2005
8:09 am
2308
Hi, i am running the crawler with 50 threads. But many times in console i see "Active count thread : 0 of 50". And i found the crawler with no progress....
callforshadab
Offline Send Email
Nov 3, 2005
9:33 am
2309
Due to politeness rules, if you are crawling only a few hosts, the crawler will often be idleing, waiting before it can go fetch the next document. This is...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Nov 3, 2005
9:48 am
2310
Hi! I'm going to do further work on the DominnameQueueAssignmentPolicy that Bjarne posted earlier, which splits the host name down to the last two parts (need...
Lars Clausen
lrclause
Offline Send Email
Nov 3, 2005
9:58 am
2311
Hi all, I'm trying to use artifact build heritrix-1.5.1-200511011835.tar.gz on a Solaris 10 (AMD64) machine using Java 1.5.0. On startup it never ...
Tom Emerson
tree02139
Offline Send Email
Nov 3, 2005
1:34 pm
2312
... Ok. You did nothing out-of-the-ordinary. ... Thats what I'd look at next only I have no winxp box on my end. Would be great if you could figure what...
stack
stackarchiveorg
Offline Send Email
Nov 3, 2005
4:56 pm
2313
Hey Tom: Is HERITRIX_HOME set? Otherwise, missing from your stdout/stderr output are the usual: 23:20:43.884 EVENT Starting Jetty/4.2.23 23:20:44.956 EVENT...
stack
stackarchiveorg
Offline Send Email
Nov 3, 2005
5:36 pm
2314
... Queue names can be arbitrary Strings -- the exact format depends on the QueueAssignmentPolicy in use. Other parts of the code are not looking into the...
Gordon Mohr
gojomo
Online Now Send Email
Nov 3, 2005
7:38 pm
2315
Hello, Going through the code I get the feeling that the organization of urls in pendingUrisDB (present in BdbMultipleWorkQueues) have been organized from the...
Vishwesh Thakur
vishwesh_thakur
Online Now Send Email
Nov 3, 2005
11:48 pm
2316
Hi St.Ack / Michael Hansen , As per as my limited experience with winxp and Heritrix 1.4.0, it doesn't take default profiles from jar file. This works fine...
Subramanya C R
subramanyacr
Offline Send Email
Nov 4, 2005
4:45 am
2317
... Yeah. I'd guess the problem is here: http://crawler.archive.org/xref/org/archive/crawler/admin/CrawlJobHandler.html#335. We're using File.separator when...
stack@...
stackarchiveorg
Offline Send Email
Nov 4, 2005
8:13 am
2318
Trying to build CVS head with Maven 1.0.2 on JDK 1.5 under Solaris 10 is giving me fits. I'll assume that others can build with Maven 1.0.2 on JDK 1.5 on Linux...
Tom Emerson
tree02139
Offline Send Email
Nov 6, 2005
6:54 pm
2319
Dear all, I have been running heritrix (1.4.0) for about 10 days, with about 10,000 seeds, broad scope, Tom Emerson's "HTML only" filters, 150 threads, and ...
Marco Baroni
kumaraja2000
Offline Send Email
Nov 7, 2005
8:49 am
2320
... In my experience the crawl is pretty much dead at that point. I have yet to succeed in doing anything beyond shutting it down. I'm re-running a crawl that...
Tom Emerson
tree02139
Offline Send Email
Nov 7, 2005
3:41 pm
2321
Folks, I am having trouble crawling where moderately large numbers of seeds are involved. Some of the seeds are accepted, but most defer for reasons I cannot...
Karl Wright
daddywri
Offline Send Email
Nov 7, 2005
4:02 pm
2322
FYI: every once in a while in my current crawl I get alerts like: Time: Nov. 7, 2005 15:09:40 GMT Level: SEVERE Message: Failed get of replay char...
Tom Emerson
tree02139
Offline Send Email
Nov 7, 2005
4:41 pm
2323
... The version complaint usually happens when you mix classfiles made with different versions of the jdk (1.4 vs. 1.5). What happens if you do a 'maven...
stack
stackarchiveorg
Offline Send Email
Nov 7, 2005
6:12 pm
2324
Thanks Tom. This one has been around for a while (See http://sourceforge.net/tracker/index.php?func=detail&aid=1218961&group_id=73833&atid=539099. Kris also...
stack
stackarchiveorg
Offline Send Email
Nov 7, 2005
6:22 pm
Messages 2295 - 2324 of 6143   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help