Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 5821 - 5850 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
5821
At Thu, 30 Apr 2009 12:32:37 -0400, ... Hi Eric - I think that most tools depend to some extent on hijacking the display system and turning that into a raster...
Erik Hetzner
e_hetzner
Offline Send Email
May 3, 2009
1:39 am
5822
Cool! It's one of those things that seems like everybody wants it, but no one has quite figured out. And the various "services" like thumbshots all feel kinda...
Eric Pugh
dep4b
Online Now Send Email
May 3, 2009
3:19 pm
5823
At Sun, 3 May 2009 11:19:17 -0400, ... Here is the text (thanks to Mark Phillips for this): Khtml2png - http://khtml2png.sourceforge.net/ “Khtml2png is a ...
Erik Hetzner
e_hetzner
Offline Send Email
May 4, 2009
5:42 pm
5824
Hi all, How can I tweak heritrix to crawl only within a seed? E.g. if my seed is www.espn.com, I would like to retrieve/download only links within espn.com. I...
enigmacodes
Offline Send Email
May 6, 2009
2:46 am
5825
... If you're just starting with Heritrix, we recommend using 1.14.3. With the default configuration, crawling will generally stay on the sites defined by the...
Gordon Mohr
gojomo
Offline Send Email
May 6, 2009
3:40 am
5826
Hi Gordan, Thanks for the reply. I have been using heritrix 2.0.0 for a couple of months. Is the process you have mentioned the same for 2.0.0? Thanks...
enigmacodes
Offline Send Email
May 6, 2009
7:29 am
5827
Hey everyone, I am experimenting with Heritrix to try out some simple search algorithms that I have designed. Unfortunately, my bandwidth sucks hence I will...
disappearedng@...
disappearedn...
Offline Send Email
May 8, 2009
4:10 pm
5828
... i've the same question. i've a bunch of url, resolved with handle.net how do i configure heritrix2 to follow redirection from hdl and then crawl only the...
raffaele messuti
raffaele@...
Send Email
May 9, 2009
4:43 pm
5829
Hey everyone, I have been experimenting with heritrix over the weekend but was not able to obtain any fruitful results. I have read the documentation quite...
disappearedng@...
disappearedn...
Offline Send Email
May 10, 2009
10:15 am
5830
I always recommend starting with the default rules, then making individual changes that are each understood. You've left off the PrerequisiteAcceptDecideRule...
Gordon Mohr
gojomo
Offline Send Email
May 11, 2009
7:42 pm
5831
The same rules and settings apply in 2.x. - Gordon @ IA...
Gordon Mohr
gojomo
Offline Send Email
May 11, 2009
7:43 pm
5832
To crawl just the seed page and its inline resources itself -- and not continue crawling other pages -- you can set the 'max-hops' value in the ...
Gordon Mohr
gojomo
Offline Send Email
May 11, 2009
7:47 pm
5833
Thanks a lot, I got it to work finally. Appreciate it...
enigmacodes
Offline Send Email
May 12, 2009
5:56 am
5834
I used the default regex expression with Heritrix 2.0.2 to display remaining URIs without a problem. I used the same expression in 1.14.3 and get no results....
vtkingc
Offline Send Email
May 12, 2009
6:45 pm
5835
Make sure there are no stray spaces in your 1.14.3 regex. Use the frontier report or other status info to make sure there are still URIs queued. Also, I don't...
Gordon Mohr
gojomo
Offline Send Email
May 12, 2009
7:20 pm
5836
Would you know how the global.sheet should be modified to get this done in Heritrix 2.0.0? Thanks...
enigmacodes
Offline Send Email
May 14, 2009
7:23 am
5837
Hello, How do I control overrides from the command line? I am able to control Heritrix and the running job from the command line using: java -jar...
Joo Miranda
miranda_fccn
Offline Send Email
May 14, 2009
2:02 pm
5838
Hi all , I would like to know if heritrix has some modules or functions to support sitemaps[1] and the sitemap protocol[2]. Especially if heritrix is parsing...
juergen@...
Send Email
May 14, 2009
4:54 pm
5839
hello. could somebody help in using this tool. it wont work on my computer with windows xp os...... I tried so hard but still it wont work. I have followed...
developmentalnerd
developmenta...
Offline Send Email
May 16, 2009
7:10 pm
5840
I'm running a job seeded with 7M urls. It ran to about 25% completion, then died. I'm now trying to restart the job (new job based on recovery-log), but the...
Steven Webb
scumola
Online Now Send Email
May 17, 2009
2:20 am
5841
One simple thing I've done is move the seedlist file from the job directory to something else, and put in a one site seed list. After all, by now, the whole...
John Lekashman
lekash
Offline Send Email
May 17, 2009
2:33 am
5842
It seems you need change the attribute of the file named "jmxremote.password" to only can be changed by the owner(in Windows you can use command: cacls). If...
邢玉梅
happyxinglele
Offline Send Email
May 18, 2009
1:07 am
5843
It seems you need change the attribute of the file named "jmxremote.password" to only can be changed by the owner(in Windows you can use command: cacls). If...
邢玉梅
happyxinglele
Offline Send Email
May 18, 2009
1:10 am
5844
The jobs and profiles in under the directory ./jobs, the complete job should be named as "completed-randomNumber", copy this directory and rename it to...
邢玉梅
happyxinglele
Offline Send Email
May 18, 2009
7:40 am
5845
Can you provide some details? A screenshot of what the page looks like? Here is what the Plone project recommends as an approach for asking for help: ...
Eric Pugh
dep4b
Online Now Send Email
May 18, 2009
2:08 pm
5846
to avoid the message about password file protection, I couldn't simply set the file as read only. after right click on it and select the properties option, I...
jeanpierrevitulli
jeanpierrevi...
Offline Send Email
May 18, 2009
4:58 pm
5847
My search engine course just released a Java Sitemap Parser on SourceForge available at http://sourceforge.net/projects/sitemap-parser/ This could be...
Frank McCown
mccownf
Offline Send Email
May 20, 2009
9:15 pm
5848
thanks for the suggestion. i've created a feature request on our issue tracker (Jira) for this. http://webarchive.jira.com/browse/HER-1638 /steve...
siznax
stearcorg
Online Now Send Email
May 20, 2009
10:18 pm
5849
Hi, I am still wondering if there's any publicly available arc writer for integrating Heritrix with Solr? how did you integrate both systems? Thanks! Tony -- ...
Tony Wang
gwangcs
Offline Send Email
May 20, 2009
10:32 pm
5850
actually, there is already an issue open for this. http://webarchive.jira.com/browse/HER-1385 /steve...
siznax
stearcorg
Online Now Send Email
May 20, 2009
10:51 pm
Messages 5821 - 5850 of 6142   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help