Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want your group to be featured on the Yahoo! Groups website? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 5381 - 5410 of 6140   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
5381
... Sure. Here it is. I hope this is what you have been looking for. For preventing non HTML-Documents to be downloaded I've added three DecideRules to the...
Christian Krumm
chuk_ol
Offline Send Email
Aug 1, 2008
10:45 am
5382
Thanks so much for the config information. I have merged my configuration as I would like to go to a single site. Here is my complete config with the merge but...
ivar_sr
Offline Send Email
Aug 1, 2008
3:52 pm
5383
... I've forgotten to tell you: I'm using my own heritrix release, called heritrix-2.0.0-OFFIS, based on the 2.0.0 release of the IA. Within the 2.0.0. there...
Christian Krumm
chuk_ol
Offline Send Email
Aug 1, 2008
5:06 pm
5384
... Hi Ravi, do you get it working using the 2.0.1-SNAPSHOT or are there still validation problems? I may send you the patched jar-files I'm currently using ...
Christian Krumm
chuk_ol
Offline Send Email
Aug 4, 2008
6:30 pm
5385
Hi, I got the extraction of linktext, linktitle and linkenvironment of an HTML-Anchor implemented useing a custom JerichoExtractorHTML and a custom...
Christian Krumm
chuk_ol
Offline Send Email
Aug 4, 2008
7:00 pm
5386
... Thanks for the offer! The best approach to make a contribution is to... (1) create an issue in the Heritrix JIRA tracker (2) attach a patch with your...
Gordon Mohr
gojomo
Offline Send Email
Aug 4, 2008
7:44 pm
5387
Christian, I haven't pulled out the 2.0.1 snapshot and please send the jar file and I will try to test it. Thanks for checking with me. Also, I want to get...
ivar_sr
Offline Send Email
Aug 4, 2008
10:36 pm
5388
Hi Gordon, thanks for your advises. I've created an issue in JIRA (HER-1543). I'll attach the code in september or oktober. Currently I've haven't much time...
Christian Krumm
chuk_ol
Offline Send Email
Aug 5, 2008
11:29 am
5389
Great -- we can track the issue, collect comments or votes from others who are interested, and whenever the code is battle-tested and in good shape it can be...
Gordon Mohr
gojomo
Offline Send Email
Aug 5, 2008
8:14 pm
5390
Heritrix installation is fine. Profiles and setting done according to the user manual. After I create job, it finishs immediately. The crawl report shows that...
kenny
shonkenny
Offline Send Email
Aug 6, 2008
6:36 am
5391
... Your configuration has serious problems. In particular, it appears all of the usual and necessary Processors that perform the steps of handling a single...
Gordon Mohr
gojomo
Offline Send Email
Aug 6, 2008
7:06 am
5392
I do what you recommand, and I retain original mudules settings and create job again, it shows 2 warning and 1 severe alert, and I list ... Time: ??. 7, 2008...
kenny
shonkenny
Offline Send Email
Aug 7, 2008
2:17 am
5393
Ok, this looks similar to this old issue: http://webteam.archive.org/jira/browse/HER-510 Are you using the Heritrix WAR version inside another servlet...
Gordon Mohr
gojomo
Offline Send Email
Aug 7, 2008
5:38 am
5394
I use it in standalone mode, in Windows XP SP2. I haven't try it in Eclipse or other container, and I think it's the last way to resolve the question if I am...
kenny
shonkenny
Offline Send Email
Aug 7, 2008
6:21 am
5395
OK, I can reproduce your problem here, and there's actually another old issue describing it: http://webteam.archive.org/jira/browse/HER-540 At the Internet...
Gordon Mohr
gojomo
Offline Send Email
Aug 7, 2008
7:42 am
5396
Heritrix releases 1.14.1 and 2.0.1 are now available at Sourceforge: http://sourceforge.net/project/showfiles.php?group_id=73833 These are both primarily...
Gordon Mohr
gojomo
Offline Send Email
Aug 7, 2008
10:54 pm
5397
Hi, I have a question concerning the function getContentSize() in the class ReplayInputStream. The Java Doc indicates that the function will return the total...
Christian Krumm
chuk_ol
Offline Send Email
Aug 8, 2008
10:56 am
5398
... Yes, it should... and getSize() will get the full recorded data size, including headers. ... You're getting these using the JerichoExtractorHTML, right? ...
Gordon Mohr
gojomo
Offline Send Email
Aug 8, 2008
5:05 pm
5399
It seems to me Heritrix does not consider the hash nor save it in the arc files. It could be useful to add this support. What do you think? Jean-Noel...
Jean-Noël Rivasseau
elvanor@...
Send Email
Aug 8, 2008
6:23 pm
5400
Hi Christian, are you using jericho html 2.5 or allready jericho html 2.6 that hasn't made it into 1.x yet? I'm asking cause there seems to be a serious bug...
aaron667@...
pandae667
Offline Send Email
Aug 8, 2008
6:35 pm
5401
As of yesterday's 1.14.1 release, Heritrix 1 is using the jericho JAR version 2.6. I was only able to bring Heritrix 2 up to Jericho 2.5 because of an issue...
Gordon Mohr
gojomo
Offline Send Email
Aug 8, 2008
6:43 pm
5402
Hi Gordon and Olaf. Thanks for your help! I'll give it a try. Olaf, I'm currently useing the 2.6 Version of Jericho, which seems to work fine. I've downloaded...
Christian Krumm
chuk_ol
Offline Send Email
Aug 9, 2008
9:06 am
5403
Hello, I am attempting to modify the scope rules in a sheet in one of my profiles, and am receiving this exception when clicking on "add': Problem:...
Jean-Noël Rivasseau
elvanor@...
Send Email
Aug 11, 2008
3:02 pm
5404
No replies to this, anyone can at least confirm that this is the case?...
Jean-Noël Rivasseau
elvanor@...
Send Email
Aug 11, 2008
3:05 pm
5405
I have the following reproducable behavior, both in 2.0.0 and in 2.0.1: I launch an engine and then access it remotely via JNX. In the web UI, when I go to a...
Jean-Noël Rivasseau
elvanor@...
Send Email
Aug 11, 2008
7:24 pm
5406
IIRC, yes, anything past the # is ignored. Two URLs that different only in the component that follows the # are considered the same (I do not recall whether...
stack
stackarchiveorg
Offline Send Email
Aug 11, 2008
7:41 pm
5407
At Fri, 08 Aug 2008 20:23:32 +0200, ... The URI fragment (aka hash) is interpreted by the client and is media type specific. The client-server interaction to...
Erik Hetzner
e_hetzner
Offline Send Email
Aug 11, 2008
7:58 pm
5408
(sorry, forgot to reply to list) I agree that it's a hack of course, but some (mainly Ajax based) sites store informations in the hash. In such a site, I could...
Jean-Noël Rivasseau
elvanor@...
Send Email
Aug 11, 2008
8:46 pm
5409
The portion after the '#' (the 'fragment') is not sent on HTTP requests, and so does not affect what is returned from servers. So from the perspective of...
Gordon Mohr
gojomo
Offline Send Email
Aug 11, 2008
8:48 pm
5410
... Good explanation of the network-equivalence of two URIs differing only after the '#'. But regarding... ... The situation I've seen is where a JS/AJAXy...
Gordon Mohr
gojomo
Offline Send Email
Aug 11, 2008
8:51 pm
Messages 5381 - 5410 of 6140   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help