Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 1136 - 1165 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
1136
I've been able to build and run the crawler. However, now that I've made some changes to the code, how do I build my new version? The 'maven dist' doesn't...
adamb90
Offline Send Email
Nov 1, 2004
8:28 pm
1137
... Please paste in the error you're seeing Adam. 'maven dist' or 'maven jar' should just work including any code changes you've made to the base src into the...
stack
stackarchiveorg
Offline Send Email
Nov 1, 2004
8:39 pm
1138
thanks for the quick reply. i made a simple change to the ARCReader class and then did 'maven dist:build' which seems to take forever. here's the output....
adamb90
Offline Send Email
Nov 2, 2004
6:54 pm
1139
... The 'dist' target does a bunch of packaging (Builds two webapps, generates documentation, builds src and bin packages). Its probably not what you want. ...
stack
stackarchiveorg
Offline Send Email
Nov 3, 2004
2:01 am
1140
I'm trying to run the precompiled heritrix-1.0.4.zip on (gasp!) Windows XP and am running into problems riht from the getgo. ... set HERITRIX_HOME=C:\heritrix ...
db65487899
Offline Send Email
Nov 3, 2004
3:44 pm
1141
pls try this script ... @rem ************************************************************************* @rem This script is used to start Heritrix. @rem @rem...
ansi
mymaillist@...
Send Email
Nov 3, 2004
4:47 pm
1142
I have a couple of large crawls that I want to start, but will hold off for 1.2 to be labeled before doing them if we're close. How close is HEAD to what will...
Tom Emerson
tree02139
Offline Send Email
Nov 5, 2004
3:27 pm
1143
... Its looking like Monday or Tuesday. We've run our base test plan and all seems fine and dandy but a shadow 1.2 crawl of a 1.0.5 crawl -- the HEAD of the...
stack
stackarchiveorg
Offline Send Email
Nov 6, 2004
5:53 am
1144
It seems to me that the filedesc:// URL-record in the generated ARC-files has an error There are 2 newlines after the content which causes the length of the ...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Nov 8, 2004
12:17 pm
1145
In ARCWriter#generateARCFileMetaData it does this after writing the metadata: // Write out a couple of LINE_SEPARATORs to end this record. metabaos.write(("" +...
stack
stackarchiveorg
Offline Send Email
Nov 8, 2004
7:33 pm
1146
I've read message 841 and the article at http://www.dreamersrealm.net/tree/blog/2004/08/19. a hybrid method is proposed there to limite crawls to HTML. I...
bjhong02
Offline Send Email
Nov 9, 2004
2:42 am
1147
... The whole saga can be found at http://www.dreamersrealm.net/tree/blog/heritrix/ which has some further notes not included in the 19 August post. ... ...
Tom Emerson
tree02139
Offline Send Email
Nov 9, 2004
4:41 am
1148
Its looking like release won't happen till Friday at the earliest. We're going to let some comparison test crawls that we have running here go to completion so...
stack
stackarchiveorg
Offline Send Email
Nov 10, 2004
12:34 am
1149
I also notice in the FAQ at the homepage of Heritrix, the answer for the common problem 5, "..., or, if you want to instead look at document mimetypes, you can...
bjhong02
Offline Send Email
Nov 10, 2004
1:24 am
1150
... Here's a note on midfetch filter from user manual: "Its also possible to add in filters that are checked after the download of the HTTP response headers...
stack
stackarchiveorg
Offline Send Email
Nov 10, 2004
1:39 am
1151
where can i find *midfetch-filters* filter, i'm using version 1.0.0, should i download a new version. ... possible to ... response ... filters to ... (Aborted ...
bjhong02
Offline Send Email
Nov 10, 2004
4:59 am
1152
... Pardon me. I should have said this feature is only present in HEAD. See here for the latest bundles: ...
stack
stackarchiveorg
Offline Send Email
Nov 10, 2004
4:26 pm
1153
hello - new to heritrix trying to crawl a web site that needs cookies. i want to use my cookie file generated by firefox. how do i set that....
ozimmels
Offline Send Email
Nov 10, 2004
8:21 pm
1154
Hi ozimmels, You can specify a cookie file in 'settings' tab -- HTTP Fetcher: load-cookies-from-file. File needs to be in Netscape format. If I am not mistaken...
Igor Ranitovic
iranitovic
Offline Send Email
Nov 10, 2004
8:38 pm
1155
I'm running a build synched from CVS head this afternoon. All 50 threads are stuck: here's a subset of the toe threads report: Toe threads report -...
Tom Emerson
tree02139
Offline Send Email
Nov 13, 2004
12:09 am
1156
You're stuck in the HTTP fetcher. We've seen issues fetching https in old versions but haven't seen it happening in 1.2.0 as yet (Below do not seem to be...
stack
stackarchiveorg
Offline Send Email
Nov 13, 2004
12:19 am
1157
Here's a trace, though I don't think it did anything useful. The -SIGQUIT didn't work. Attaching to process ID 10533, please wait... Debugger attached...
Tom Emerson
tree02139
Offline Send Email
Nov 14, 2004
12:21 am
1158
... No, it didn't. Looks like its failing dumping the stack trace. You might try the released version of 1.5.0 JVM Tom. St.Ack...
stack
stackarchiveorg
Offline Send Email
Nov 14, 2004
2:22 am
1159
i downloaded the proxy viewer from http://www.netarchive.dk/website/sources/index-en.htm and i can't get it to work. the code didn't compile - ...
ozimmels
Offline Send Email
Nov 14, 2004
11:37 am
1160
You have to build an index-file first (.cdx) this is the file you launch with the proxy-viewer For building index-files you can use...
bja@...
bjarne_dk2000
Offline Send Email
Nov 15, 2004
8:35 am
1161
... Remember that you have to sort the CDX file -- ExtractCDX doesn't do that as the file can easily become too big to have in memory, but e.g. Unix sort()...
Lars Clausen
lrclause
Offline Send Email
Nov 15, 2004
8:51 am
1162
first - thanks for the help. somethings are not clear to me yet. this is what i know 1. heritrix creates a : IAH-20041115081418-00001-zen.arc.gz 2. gunzip -d...
ozimmels
Offline Send Email
Nov 15, 2004
9:25 am
1163
... So you've renamed sortedArc.cdx arc.cdx afterwards? Otherwise, you'll need to point at sortedArc.cdx when starting the proxyviewer below. ... It will try...
Lars Clausen
lrclause
Offline Send Email
Nov 15, 2004
9:35 am
1164
first - thanks for the help. somethings are not clear to me yet. this is what i know 1. heritrix creates a : IAH-20041115081418-00001-zen.arc.gz You might want...
bja@...
bjarne_dk2000
Offline Send Email
Nov 15, 2004
9:41 am
1165
... Many browsers look for that file to make the little icon next to the URL. Most sites don't have it, though a number of popular ones do. -Lars...
Lars Clausen
lrclause
Offline Send Email
Nov 15, 2004
10:18 am
Messages 1136 - 1165 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help