Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Starting Heritrix from another app...   Message List  
Reply | Forward Message #1276 of 6151 |
Re: [archive-crawler] Starting Heritrix from another app...

The problem is that if one starts a crawl with the method: requestCrawlStart() on the CrawlController the crawler starts OK - but returns at once - so it looks like the crawl ends almost before it has begun - you have to make your calling program wait for the crawler to finish !
I struggeled a little with this myself !

I have attached a little simple javaprogram that works with 1.2.0
You can launch a crawl with: java -Xmx128m dk.netarkivet.harvestcontroller.SimpleHeritrixLauncher <orderfile>

It waits for the crawler to finish by attaching a CrawlStatusListener !

Remember to include both heritrix.jar and the jars in $HERITRIX_HOME/lib in your CLASSPATH

best
Bjarne Andersen
www.netarchive.dk

stack wrote:
spielc wrote:

>
> Hi everybody!
>
> I'm trying to start Heritrix (Heritrix - version 1.2.0) from another
> Java-Application. As i don't need the Web-UI for this app i started
> (better said tried to start) it using the main-method of heritrix with
> two arguments: Heritrix.main(new
> String[]{"--nowui",orderFile.getAbsolutePath()}); orderFile is a
> java.io.File-Object pointing to the Order-File i want to run. Well it
> starts to crawl but it finishes too early (by looking at the report
> files i could see that just 2 files were crawled...). I ran Heritrix
> from command-line with --nowui and the same Order-File and it works
> alot longer and more correct in my eyes. The lil code fragment that's
> in the FAQ doesn't work neither as launch is a protected static
> method...
>
> I would be grateful for every assistance i can get!!

Do the logs tell you anything about why the crawl runs for a shorter
time?  Paste in the crawl.log if its only two lines  (Look in
local-errors and in STDOUT/STDERR for any exceptions).  The crawler
should do the same thing in the two contexts.
Yours,
St.Ack
P.S. #launch access is changed in Heritrix HEAD.

>
>
>
>
> *Yahoo! Groups Sponsor*
> ADVERTISEMENT
> click here
> <http://us.ard.yahoo.com/SIG=1294div7n/M=294855.5468653.6549235.3001176/D=groups/S=1705004924:HM/EXP=1103036404/A=2455396/R=0/SIG=119u9qmi7/*http://smallbusiness.yahoo.com/domains/>
>
>
>
> ------------------------------------------------------------------------
> *Yahoo! Groups Links*
>
>     * To visit your group on the web, go to:
>       http://groups.yahoo.com/group/archive-crawler/
>       
>     * To unsubscribe from this group, send an email to:
>       archive-crawler-unsubscribe@yahoogroups.com
>       <mailto:archive-crawler-unsubscribe@yahoogroups.com?subject=Unsubscribe>
>       
>     * Your use of Yahoo! Groups is subject to the Yahoo! Terms of
>       Service <http://docs.yahoo.com/info/terms/>.
>
>





Tue Dec 14, 2004 8:21 am

bjarne_dk2000
Offline Offline
Send Email Send Email

Attachment
SimpleHeritrixLauncher.java
Type:
java/*
Forward
Message #1276 of 6151 |
Expand Messages Author Sort by Date

Hi everybody! I'm trying to start Heritrix (Heritrix - version 1.2.0) from another Java-Application. As i don't need the Web-UI for this app i started (better...
spielc
Offline Send Email
Dec 13, 2004
3:00 pm

Like it says in the FAQ, this functionality is only (properly) supported in post 1.2.0. So, you'll need to get the latest HEAD build, either from CVS or from...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Dec 13, 2004
3:12 pm

... Do the logs tell you anything about why the crawl runs for a shorter time? Paste in the crawl.log if its only two lines (Look in local-errors and in...
stack
stackarchiveorg
Offline Send Email
Dec 13, 2004
5:39 pm

The problem is that if one starts a crawl with the method: requestCrawlStart() on the CrawlController the crawler starts OK - but returns at once - so it looks...
Bjarne Andersen
bjarne_dk2000
Offline Send Email
Dec 14, 2004
8:21 am

... $HERITRIX_HOME/lib ... Umm sorry to be annoying but ummm where is the Java-File you attached??? Further down the page is a a box with attachment but it ...
spielc
Offline Send Email
Dec 14, 2004
11:39 am

This may be a settings issue with you account settings not allowing attachments to be included in your mailing list posts ... Anyway I've forwarded Bjarne's...
Kristinn Sigurdsson
kristsi25
Offline Send Email
Dec 14, 2004
11:48 am

Hello Everybody, This is my first post to this list. My name is Philippe Moulin, and I am currently developing a GPL'ed search engine:...
pm5400845
Offline Send Email
Dec 18, 2004
11:52 am

The code comes here ! best Bjarne Andersen /* $RCSfile: SimpleHeritrixLauncher.java,v $ * $Revision: 1.1$ * $Author: bja $ * * Copyright Det Kongelige...
bja@...
bjarne_dk2000
Offline Send Email
Dec 18, 2004
3:18 pm

... Thank you! I have successfully started Heritrix from my app, but there is still something i don't understand: With the web interface, everything wors fine....
pm5400845
Offline Send Email
Jan 5, 2005
4:45 pm

... The same order file is used in standalone Heritrix and works? Same platform? Is this a windows box (Perhaps the following is related: ...
stack
stackarchiveorg
Offline Send Email
Jan 5, 2005
5:05 pm

... Thank You for your answer! Yes, i am running Heritrix on a Windows 2000 box. I have moved heritrix's JAR to the top of my classpath, and now, it works. But...
Philippe MOULIN
pm5400845
Offline Send Email
Jan 6, 2005
4:48 pm

... On the same machine, the standalone Heritrix works without need of specifying seeds as IPs? Otherwise, I'd say there's an issue with DNS on your windows...
stack
stackarchiveorg
Offline Send Email
Jan 6, 2005
6:27 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help