Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 5253 - 5282 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
5253
You can prevent this if you include in decide-rules a ContentTypeMatchesRegExpDecideRule o ContentTypeNotMatchesRegExpDecideRule. Mario....
ermhes82
Offline Send Email
Jun 1, 2008
2:20 pm
5254
Thanks Gordon but I cannot login yet. Can you give me your sheets for I check it? Also I have the same problem in this URL: ...
ermhes82
Offline Send Email
Jun 1, 2008
2:52 pm
5255
Hi, can anybody tell me how to make custom modules like processors, DecideRules etc. show up in the create drop-down menu of the heritrix2 webui? In heritrix...
Christian Krumm
chuk_ol
Offline Send Email
Jun 1, 2008
3:05 pm
5256
Thanks a lot Gordon, it does work. It solved lot of problems for me... Sorry for letting you know so late, however. thanks Pratyush ... From: Gordon Mohr...
Pratyush Banerjee
Pratyushbanerjee@...
Send Email
Jun 2, 2008
7:10 am
5257
Probably the easiest way to do this is to have the mid-fetch filter 'tag' the CrawlURI (using CrawlURI.put methods) and then have a processor run directly...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Jun 2, 2008
10:02 am
5258
Hello, At http://crawler.archive.org/apidocs/index.html, the javadocs correspond to version 1.15.1. I am using version 2.0 and would like to find Javadocs...
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 2, 2008
2:17 pm
5259
*bump* Still not understanding why this happens....
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 2, 2008
3:02 pm
5260
Hi Jean-Noël Rivasseau, have you tried implementing the interface org.archive.state.Initializable and put your initial code in the initialTasks-method? ...
Christian Krumm
chuk_ol
Offline Send Email
Jun 2, 2008
3:08 pm
5261
Hello, I have a processor that needs to access some HTML documents fetched by Heritrix. If my seed urls contain an URL corresponding to such a document, I...
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 2, 2008
3:13 pm
5262
Hi Christian, thanks a lot, your suggestion works perfectly. Now, for my own curiosity, is it easy to understand why a constructor did not work? Internally,...
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 2, 2008
3:23 pm
5263
... Hi Jean-Noel, sorry I don't really know it either. I think this has something to do with the internals of the settings framework, but I don't really know....
Christian Krumm
chuk_ol
Offline Send Email
Jun 2, 2008
3:51 pm
5264
I dont know either - I currently always instantiate "manually". If you one day find out the answer I would be interested in knowing too. Jean-Noel...
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 2, 2008
3:57 pm
5265
Hi! Thanks for the reply. I did have a ContentTypeMatchesRegExpDecideRule under the writer processor section with the following regex (?i)application/xml.* But...
lpeterus
Offline Send Email
Jun 2, 2008
5:40 pm
5266
Hello, I had a problem with encoding today and took a look at Heritrix code. Unfortunately it seems to me (from my understanding of the code) that Heritrix...
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 2, 2008
6:27 pm
5267
... Here's an excerpt from the configuration that worked for me on the mangosproject.org website: root:credential-store=primary, ...
Gordon Mohr
gojomo
Online Now Send Email
Jun 2, 2008
6:50 pm
5268
The code which builds the lists shown in the web UI lives at org.archive.crawler.webui.Settings, in the method getSubclasses(). It looks for premade text files...
Gordon Mohr
gojomo
Online Now Send Email
Jun 2, 2008
7:17 pm
5269
Thanks Kris, I will give this a try. -glen ... ...
gkbrown22
Offline Send Email
Jun 2, 2008
7:42 pm
5270
I don't know what's happening, but parts of your description don't add up. In particular, all the concrete Processor classes standard with Heritrix have...
Gordon Mohr
gojomo
Online Now Send Email
Jun 2, 2008
7:43 pm
5271
Thanks a lot Gordon. I'll give it a try. Christian...
Christian Krumm
chuk_ol
Offline Send Email
Jun 2, 2008
7:45 pm
5272
... It depends on where your Processor is in the chain. The same URI can enter processing several times, especially if when it first comes up, the DNS/robots...
Gordon Mohr
gojomo
Online Now Send Email
Jun 2, 2008
9:26 pm
5273
Do you have a REJECT rule first that applies to everything, then the ContentTypeMatchesRegExpDecideRule to ACCEPT the right kind of content? Otherwise, the...
Gordon Mohr
gojomo
Online Now Send Email
Jun 2, 2008
9:30 pm
5274
... Heritrix should already support many other encodings -- limited mainly by what support is in your Java VM. As per HTTP/1.1, when no other charset is...
Gordon Mohr
gojomo
Online Now Send Email
Jun 2, 2008
9:36 pm
5275
... We don't yet have the autogenerated Javadocs or Maven2 project site automatically uploaded to the main Heritrix website. Until we have that set up, one...
Gordon Mohr
gojomo
Online Now Send Email
Jun 2, 2008
11:06 pm
5276
I also have experience with a couple of newspaper sites that do not allow heritrix logging in although I'm quite sure I give all the nessecary credentials...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Jun 3, 2008
10:12 am
5277
I had a default REJECT in the scope to start with but not in the mid-fetch or writer processor phase. I also tried adding those in as the first rule for them...
lpeterus
Offline Send Email
Jun 3, 2008
3:59 pm
5278
So if I may summarize: - You have a set of decide rules set up on the writer processor - these rules begin with a REJECT-all rule, then have a ...
Gordon Mohr
gojomo
Online Now Send Email
Jun 3, 2008
7:57 pm
5279
At Wed, 28 May 2008 15:04:25 -0000, ... Just to confirm that I have seen identical traces in our runtime-errors.log files. We are running up to 10 simultaneous...
Erik Hetzner
e_hetzner
Offline Send Email
Jun 4, 2008
4:00 am
5280
Thanks for the tip, this worked fine....
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 4, 2008
10:51 am
5281
Hello. My problem was that I had a page that was actually encoded in windows-1252 cp, but advertised itself as a ISO-8859-1 page (although it did this only in...
Jean-Noël Rivasseau
elvanor@...
Send Email
Jun 4, 2008
11:35 am
5282
Yes, the setup is like you described. I'm using the standard arc writer and I checked the arc files to see the type. Besides application/xml content, the other...
lpeterus
Offline Send Email
Jun 4, 2008
4:17 pm
Messages 5253 - 5282 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help