Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 3225 - 3254 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
3225
Well, Its simple. The RegEx rule i added [ when the job was crawling] did not apply on already queued URIs . The canonicalization rules applied only before...
thiru_sundaram
Offline Send Email
Sep 2, 2006
5:40 pm
3226
Hi, couldn't find this in the docs anywhere but I may well have missed it. Is there a mimetype for arcfiles themselves? (ie. if I was serving them from a...
mark
m_j_williamson
Offline Send Email
Sep 2, 2006
6:44 pm
3227
... No. ... Lets agree to use your above suggestion from here on out? (Suggested WARC type is application/warc). St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 2, 2006
7:32 pm
3228
... actually I got it the wrong way round by the looks of it - should be application/x-arc ... I guess it would make sense to get this officially recognised...
mark
m_j_williamson
Offline Send Email
Sep 2, 2006
8:56 pm
3229
... Ok. Thats better. ... Yes. A TODO. St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 2, 2006
9:51 pm
3230
Hi InternatArchive Team, thanks alot for adding my first two DecideRules into Heritrix. I now also changes my SuccessfulFetchFilter into a pair of DecideRules ...
pandae667
Offline Send Email
Sep 4, 2006
9:30 am
3231
Dear Heritrix friends, I am a researcher using heritrix to build a focused crawler in my experimental work. I have a new crawling strategy which determines...
ahmed ghouzia
ghouzia
Offline Send Email
Sep 4, 2006
8:57 pm
3232
My understanding is that you want to analyse the fetched pages and extract links based on that. I think you should write a new Extractor which calls your...
sundaram subramanian
thiru_sundaram
Offline Send Email
Sep 5, 2006
10:44 am
3233
You might also take a look at how Heritrix is integrated into the metacombine project: http://www.metacombine.org/. Check under the software tab. Here is the...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 5, 2006
3:39 pm
3234
Hello Olaf: Should be no problem adding multiple writers, each with its own rule set. We implemented code freeze friday in readyness for 1.10.0 release (Code ...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 5, 2006
5:48 pm
3235
Stack, It may be a little late to request this, but I thought it would be really useful to list several use cases in the user manual for the most typical types...
Frank McCown
mccownf
Offline Send Email
Sep 5, 2006
6:16 pm
3236
Sounds great Frank. Any chance you'd like to take a first cut at it even if it was only an outline for the rest of us to fill in. Good stuff, St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 5, 2006
7:48 pm
3237
I could give it a stab in the next few days. Is there a hard deadline for finishing up the documentation? Frank...
Frank McCown
mccownf
Offline Send Email
Sep 5, 2006
8:18 pm
3238
Doc. can go in up to the second before release. I'd say end-of-this week, start-of-next should see release of 1.10.0 unless we trip over the unexpected. Good...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 5, 2006
9:18 pm
3239
I took a look and these are the response headers we're getting back: [Date: Tue, 05 Sep 2006 21:26:58 GMT , Server: Apache/2.0.59 (Unix) mod_ssl/2.0.59...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 5, 2006
9:42 pm
3240
Did you try stripping all but the id parameter from the query string? Would the following java-string regex work for you in the regex canonicalization rule? ...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 5, 2006
10:49 pm
3241
thank you very much for your helpful advices, these projects are useful and i would work on that and i will tell you the results. Michael Stack...
ahmed ghouzia
ghouzia
Offline Send Email
Sep 6, 2006
10:48 am
3242
I found some syntax errors while reading the heritrix manulas, so if it does worth, where to go and correct it ... Do you Yahoo!? Everyone is raving about...
ahmed ghouzia
ghouzia
Offline Send Email
Sep 6, 2006
10:55 am
3243
You can send them to me off list or put them into a bug up on sourceforge: http://sourceforge.net/tracker/?group_id=73833&atid=539099. Thanks Ahmed, St.Ack...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 6, 2006
4:30 pm
3244
Stack, I created 3 use cases here: http://www.cs.odu.edu/~fmccown/heritrix/use_cases.html The parts in red are where someone more experienced than me should ...
Frank McCown
mccownf
Offline Send Email
Sep 6, 2006
6:21 pm
3245
... Fantastic. Thanks Frank. Stack's a little busy at the moment so I'm going to see if I can flesh out the sections in red. We'll then add an appendix to...
Paul Jack
poetbeware
Offline Send Email
Sep 6, 2006
9:05 pm
3246
... Hello again, So I have filled out the red bits in case #1 and case #2, but I'm not sure what you're asking in case #3 -- "How could the rule be applied ...
Paul Jack
poetbeware
Offline Send Email
Sep 7, 2006
12:50 am
3247
Hi Michael, ... set. I think there is a problem about doing it - the webUI simply doesn't allow me to do it. It just allows me to add one writer of each type, ...
pandae667
Offline Send Email
Sep 7, 2006
7:34 am
3248
It is a shortcoming of the WebUI (that was intentional at the time). There is a dirty fix to it by editing the Processors.options file and having several...
Kristinn Sigurðsson
kristsi25
Offline Send Email
Sep 7, 2006
7:53 am
3249
... Hello Paul- I appreciate you taking the time to edit my use cases. I've only been using Heritrix a few months, so I hope what I have written so far makes ...
Frank McCown
mccownf
Offline Send Email
Sep 7, 2006
1:40 pm
3250
Hello, I need to feed somehow heretrix with a URI list within a SQL database, is there any SQL based frontier? or simpler, using an existing frontier, can I...
Laurian Gridinoc
lauriangridinoc
Offline Send Email
Sep 7, 2006
3:38 pm
3251
... Yes to the latter. See the JMX overview in the manual getting started: http://crawler.archive.org/articles/user_manual/outside.html#mon_com. The API is...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 7, 2006
5:10 pm
3252
Hello, ... I get an importUris operation not found. I'm using Heritrix 1.8.0: java -jar bin\cmdline-jmxclient-0.10.5.jar foo:bar localhost:9999 ...
Laurian Gridinoc
lauriangridinoc
Offline Send Email
Sep 7, 2006
5:49 pm
3253
... The importUris operation is available in the CrawlJob MBean, not in the Heritrix instance CrawlService MBean (Your listing below is from the CrawlService...
Michael Stack
stackarchiveorg
Offline Send Email
Sep 7, 2006
6:05 pm
3254
The arcreader tool has the ability to either output a resource with its http header or without it. There doesn't appear to be an option to just print the http...
Frank McCown
mccownf
Offline Send Email
Sep 7, 2006
8:02 pm
Messages 3225 - 3254 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help