Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 2219 - 2248 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
2219
Hi ! We are using the budget-facilities in heritrix with great success. However it seems that dns-requests are counted on the same queues as the URI's...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Oct 3, 2005
11:09 am
2220
Yes ... It could be reasonable for DNS URIs to be 'free' with regard to the queue-budgeting. The reason they aren't now is that the budgeting process was ...
Gordon Mohr
gojomo
Online Now Send Email
Oct 4, 2005
10:35 pm
2221
Using: heritrix-1.5.1-200509291859 Heritrix tried to crawl this URL: http://www.anns-personalized-books.com/jiff/my_zoo_rhymes/my-zoo-rhymes-page.htm And then...
ryanatl
Online Now Send Email
Oct 5, 2005
3:37 pm
2222
This is by design; these values (and options) are often fetchable URIs that browsers visit, either because of Javascript code that triggers on form actions or...
Gordon Mohr
gojomo
Online Now Send Email
Oct 5, 2005
5:58 pm
2223
Hi, We are using a version of Heritrix which was taken from the CVS HEAD back in May. It works reasonably well except for one major problem - which is that it...
Karl Wright
daddywri
Offline Send Email
Oct 7, 2005
9:27 am
2224
... Many improvements have been made since May (All main in-memory structures have been made bdbje disk-based). Can you update your instance? A while back we...
stack
stackarchiveorg
Offline Send Email
Oct 7, 2005
5:06 pm
2225
Hi, Does anyone know why i get the following: bash: ./bin/arcreader: Permission denied It happens when i am trying to read the arc files using the arc reader. ...
ashwind18
Offline Send Email
Oct 8, 2005
4:50 pm
2226
Hi, did you 'chmod a+rx arcreader' in the bin directory? Can't remember if the docs mention doing this when you chmod heritrix cheers mark...
markw
m_j_williamson
Online Now Send Email
Oct 8, 2005
5:30 pm
2227
Hi, I am kind of new to the Heritrix crawler and i am currently using it for some project. What i noticed when crawling my site was that the crawler seems to ...
ashwind18
Offline Send Email
Oct 8, 2005
6:34 pm
2228
The crawler is setup to be very polite. So compared to something like httrack it will seem slow. You need to make sure your scope is setup properly so that the...
markw
m_j_williamson
Online Now Send Email
Oct 8, 2005
6:42 pm
2229
Hi: I had a problem like this with a more recent version of Heretrix, and it turned out that the problem was that at the end of a single site crawl, there was...
Gordon Paynter
Gordon.Paynter@...
Send Email
Oct 9, 2005
9:04 pm
2230
Hi, Can i exclude a certain file type from being crawled? For example, video files like .wmv files....
ashwind18
Offline Send Email
Oct 14, 2005
3:14 am
2231
I have been getting this message in the seeds report while the job is running. Like I am crawling only 3 seeds and one of the seeds has this message and never...
Jay
bighead007us
Offline Send Email
Oct 16, 2005
6:49 pm
2232
Which class does the URLencoding in heritrix ? it looks like URLs like: http://www.bs.dk/showfile.aspx?IdGuid={B0A} gets encoded to: ...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Oct 17, 2005
7:57 am
2233
... org.archive.net.UURI. Study its superclasses LaxURI and commons-httpclient URI. Also see the UURIFactory#fixup code. ... Sounds like something we should...
stack
stackarchiveorg
Offline Send Email
Oct 17, 2005
4:57 pm
2234
Can we see the problematic seed? Thanks, St.Ack...
stack
stackarchiveorg
Offline Send Email
Oct 17, 2005
5:08 pm
2235
Hi, The problematic seed has been changing. It doesn't have redirect or anything. Like in one job, i have 3 seeds and the problematic seed can be anything,...
Jay
bighead007us
Offline Send Email
Oct 17, 2005
5:26 pm
2236
Hi, Jay. Some ideas: (1) Update your code. CVS HEAD often has problems, even occasionally fatal problems, but they are also regularly fixed. Your version from ...
Gordon Mohr
gojomo
Online Now Send Email
Oct 17, 2005
6:35 pm
2237
Hello Gordon, Thanks for your suggestion. I have and just tried everything except updating the code which I will try in momentarily. My comments following...
Jay
bighead007us
Offline Send Email
Oct 17, 2005
8:05 pm
2238
Hi Folks, While I am checking out CVS Head and rebuilding heritrix, I run into this question. Do you guys planning to leave compatibility with java 1.4.2 and...
Jay
bighead007us
Offline Send Email
Oct 18, 2005
12:06 am
2239
Thanks for the pointers - the URL I gave was not complete - this ons is: http://www.bs.dk/content.aspx?itemguid={31637766-92B4-4ACA-9A0D-5CFF042B151E} URLs...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Oct 18, 2005
6:52 am
2240
Dear all. The current Heritrix 1.5.1 now breaks one of our unit-tests, that tests for validity of an ARC-file: ARCReader ar = ARCReaderFactory.get(anArc): ...
svc@...
svc400
Offline Send Email
Oct 18, 2005
2:18 pm
2241
Hey Søren: Apoloize for breaking your test. Below is excerpt from the commit message that removed isValid. revision 1.48 date: 2005/07/16 01:47:59; author:...
stack@...
stackarchiveorg
Offline Send Email
Oct 18, 2005
4:47 pm
2242
... Ok. The example helps. Looks like straight-forward fix. Meantime I've made an issue: ...
stack@...
stackarchiveorg
Offline Send Email
Oct 18, 2005
5:46 pm
2243
We intend to keep compatible with 1.4.2 for now -- that commit slipped in inadvertently, and will be fixed. In advance of some future official release, we'll...
Gordon Mohr
gojomo
Online Now Send Email
Oct 18, 2005
5:52 pm
2244
... Mac OS X. -- Tom Emerson Basis Technology Corp. Software Architect...
Tom Emerson
tree02139
Offline Send Email
Oct 18, 2005
5:57 pm
2245
Thx for keeping the support on 1.4.2. There is one more of PrintWriter in frontier.jsp, just FYI. - Jay...
Jay
bighead007us
Offline Send Email
Oct 18, 2005
6:34 pm
2246
Hi Gordon, I get the head from CVS and doing test crawl and so far, I didn't see the problem yet. But Delete function from "View or Edit Frontier URIs" is...
Jay
bighead007us
Offline Send Email
Oct 18, 2005
6:39 pm
2247
I am trying to drop Heritrix into Tomcat 5.0.28 to test it out and so I can run a remote debug on it to see exactly what is going on as the JSP interacts with...
bmadaras9
Offline Send Email
Oct 18, 2005
9:21 pm
2248
Spam detection software, running on the system "ia00524.archive.org", has identified this incoming email as possible spam. The original message has been...
stack@...
stackarchiveorg
Offline Send Email
Oct 18, 2005
10:52 pm
Messages 2219 - 2248 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help