list1dism writes:
[...]
> Another thing I'd like to do is crawl more jobs that one at a time
> But by reading some messages from the list-archive, I understood that
> I need to have different heritrix installations to do this. I cannot
> do it with one. Is that true? Is that an easier way to do this?
No, you don't need separate installations. Rather you just need
different heritrix.properties files for each instance you want to
run. Minimally you will want to specify the jobs directory for each
one (though you could use a single directory). I run multiple
instances with the webui on different ports, different job
directories, and different login credentials. I name the heritrix
properties with the convention 'heritrix-PORT.properties', e.g.,
'heritrix-8080.properties'. I wrote a little script that will start a
Heritrix instance on the given port, including redirecting
heritrix_out.log to a unique file:
--
#!/bin/sh
export HERITRIX_HOME=/usr/local/heritrix
export JAVA_HOME=/usr/local/java
if [ "$#" -eq 0 ]
then
echo "Usage $0 PORT"
exit 1
fi
PROPS_DIR=$HERITRIX_HOME/conf/heritrix-${1}.properties
if [ ! -f $PROPS_DIR ]
then
echo "Error: the properties file for port $1 does not exist."
exit 1
fi
export HERITRIX_OUT=$HERITRIX_HOME/heritrix_${1}_out.log
export JAVA_OPTS="-Xmx512m -Dheritrix.properties=$PROPS_DIR"
$HERITRIX_HOME/bin/heritrix
--
This requires Heritrix 1.2 (because of the use of HERITRIX_OUT).
Sorry if this isn't clear --- I haven't had my morning coffee yet.
-tree
--
Tom Emerson Basis Technology Corp.
Software Architect http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"