Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 3377 - 3407 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
3377
Hi, I'm trying to get the links from rss on the following address... http://www.valoronline.com.br/valoronline/Geral.rss ...
Guilherme Mascarenhas...
gaguigu
Offline Send Email
Oct 4, 2006
5:13 am
3378
Hi, ... did you add the XML extractor (org.archive.crawler.extractor.ExtractorXML) in the "Modules" tab of your job configuration? Regards, Max...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 4, 2006
9:42 am
3379
(yea, I was using the xml extractor...) I got it now!! The problem was where I was looking at... And this can be solved by using...
Guilherme Mascarenhas...
gaguigu
Offline Send Email
Oct 5, 2006
12:45 am
3381
Hi *, ... http://www.ifi.lmu.de/~schoefma/howto/run_heritrix_on_windows/heritrix.cmd ...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 5, 2006
7:57 am
3382
Hi I am using Heritrix1.10.1 and sun Java 1.5.0_06 under window xp operating system. I run the heritrix by c:\Heritrix1.10.1\bin>heritrix then using WUI, I...
jls_nayak1983
Offline Send Email
Oct 5, 2006
10:08 am
3383
... and ... org.archive.crawler.settings.ModuleType ... Hi One thing I missed to write,that I am using Window verion of heritrix. Thanks...
jls_nayak1983
Offline Send Email
Oct 5, 2006
10:09 am
3384
Hi, ... I've just discovered that the order in which the jar files are loaded was wrong in the current heritrix script for windows. That's fixed in the one I...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 5, 2006
11:41 am
3385
Hi I am working on window xp platform. I have sun java1.5.06. I download heritrix src. To build it it download the maven1.0.2. Folder structure is like this ...
jls_nayak1983
Offline Send Email
Oct 5, 2006
1:12 pm
3386
... loaded was ... the one ... your ... Hi I download the updated heritrix script for window and used that. But still I am getting same problem. This show...
jls_nayak1983
Offline Send Email
Oct 5, 2006
1:16 pm
3387
Hi, ... Did you also copy the "profiles" directory from the heritrix-1.10.1.jar to HERITRIX_HOME\conf ? See:...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 5, 2006
1:30 pm
3388
Just a note of thanks to the all who work on Heritrix. I found it easy to get up and crawling. When I did have a problem found a post here in the yahoo group....
Cody
oliverc.rm
Offline Send Email
Oct 5, 2006
7:39 pm
3389
Just wanted to point everyone to this article about become.com's crawler they implemented in Java. Perhaps some interesting comparisons between heritrix' and...
Eric
mar1ow2003
Offline Send Email
Oct 5, 2006
7:59 pm
3390
Hi! Is possbible to execute 2 or more jobs on heritrix 1.10.0 at the same time? If it is true, how could I do this? Thanks everyone! Guilherme - Brazil...
Guilherme Mascarenhas...
gaguigu
Offline Send Email
Oct 5, 2006
8:13 pm
3391
l like the idea of a Non-monlithic architecture, Itll be great to attempt a refactor of Heritrix for Distributed Crawls. ... -- Its fun being a realist.... ...
Anmol Bhasin
molzbh
Online Now Send Email
Oct 5, 2006
8:44 pm
3392
... 1.10.1.jar to ... crawler/message/2085 ... of ... Max Hi, Thanks for solving "FetalInitializationException". I extracted the "profile" folder from the...
jls_nayak1983
Offline Send Email
Oct 6, 2006
4:58 am
3393
Greetings, I just came in touch with heritrix and found interest to run it. i'm working on WindowsXP platform. and downloaded heritrix-1.10.1 version. istalled...
fandufunkyman
Offline Send Email
Oct 6, 2006
6:21 am
3394
Hi, I noticed on Windows Server 2003 that the owner of my files wsa set to the group "Administrators" instead of my own user. Java will then caugh on the JMX...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 6, 2006
9:43 am
3395
Hi Guilherme, ... This seems to work in the current Heritrix. In the Web UI, click on "Setup" and browse to "Local instances". You can then create a new...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 6, 2006
10:00 am
3396
Hi again, ... It really seems like the jars are still loaded in the wrong order. Please update the script again, you seem to have downloaded it before I...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 6, 2006
10:27 am
3397
Hi I am using Heritrix1.10.1 and sun Java 1.5.0_06 under window xp platform. I run the heritrix by c:\Heritrix1.10.1\bin>heritrix then using WUI, I create a...
jls_nayak1983
Offline Send Email
Oct 6, 2006
10:27 am
3398
... Please ... noticed ... Hi Max, What this operator.journal file is. And why this is need. The exception "StartNextJob" java.lang.NoSuchMethodError is...
jls_nayak1983
Offline Send Email
Oct 6, 2006
10:40 am
3399
... Please ... noticed ... Hi Max, What this operator.journal file is. And why this is need. The exception "StartNextJob" java.lang.NoSuchMethodError is...
jls_nayak1983
Offline Send Email
Oct 6, 2006
11:05 am
3400
... You can use the journal to take personal notes during the crawl. It's normally not needed (unless you take notes) and is just a plain text file. The...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 6, 2006
11:08 am
3401
... It's ... text file. ... thrown ... operation). ... the Web UI. ... Hi Max Thanks for your reply. Ok, now I am presenting Alerts-section of wui. I am...
jls_nayak1983
Offline Send Email
Oct 6, 2006
11:24 am
3402
... I personally don't care too much about Heritrix warnings anymore as long as my crawl crawls :-) I'm no expert here and don't know what can trigger this...
Maximilian Schoefmann
schoefma@...
Send Email
Oct 6, 2006
11:57 am
3403
I wonder if become.com will ever release any of the source code publicly? I doubt it since that would be helping their competitors. You'd think they'd at...
Frank McCown
mccownf
Offline Send Email
Oct 6, 2006
12:50 pm
3404
Hi I am using Heritrix1.10.1 and sun Java 1.5.0_06 under window xp operating system. When I created a job, and start crawler to crawl the job. I saw job status...
jls_nayak1983
Offline Send Email
Oct 7, 2006
9:45 am
3405
Hi, I using Heritrix1.10.1 with sun java1.5.06 under window xp platform. I want to run heritrix from command prompt. I run the Heritrix using command prompt....
jls_nayak1983
Offline Send Email
Oct 8, 2006
11:23 am
3406
... Winzip can open a .gz file and extract the .arc file(s) inside, but it can't open the .arc file. The .arc file is not in a format that Winzip ...
Frank McCown
mccownf
Offline Send Email
Oct 8, 2006
4:33 pm
3407
... I've not seen this one before. Heritrix wants to run a stylesheet (arcMetaheaderBody.xsl) against the xml order file to extract attributes such as...
Michael Stack
stackarchiveorg
Offline Send Email
Oct 9, 2006
12:58 am
Messages 3377 - 3407 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help