Search the web
Sign In
New User? Sign Up
archive-crawler
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 5656 - 5685 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand   (Group by Topic) Author Sort by Date ^
5656
Dear Heretrix Experts, I am a Heretrix newbie-- I downloaded Heretrix and maven, but when I try building, I receive the following error: $ maven dist [ ... ] ...
neil_daswani
Offline Send Email
Feb 8, 2009
3:47 am
5657
You don't say what version you're trying to build, but it sounds similar to the issue recently discussed in this thread, which has some tips for working around...
Gordon Mohr
gojomo
Online Now Send Email
Feb 8, 2009
10:47 pm
5658
Hey guys! I'm new at the Heritrix game (and to this forum), so I apologize in advance if I've missed something in the documentation or in prior discussions,...
Scott Ganyo
scottganyo
Offline Send Email
Feb 10, 2009
5:51 pm
5659
Hey guys! I'm new at the Heritrix so I apologize in advance if I've missed something in the documentation but I have a dubt about DecideRule in Heritrix. I'm...
wp1740
wp1740@...
Send Email
Feb 11, 2009
5:55 pm
5660
The best approach is to pick a similar DecideRule and use it as a model. The method for making it appear in the web UI varies between 1.14.x and 2.0.x. In...
Gordon Mohr
gojomo
Online Now Send Email
Feb 12, 2009
8:11 pm
5661
Welcome to the Heritrix community! Yes, you always need at least one seed, a place to start crawling before any concrete URIs are discovered by the...
Gordon Mohr
gojomo
Online Now Send Email
Feb 12, 2009
8:43 pm
5662
Remember, you can vote on bugs and enhancement-requests in the Heritrix JIRA issues tracker: ...
Gordon Mohr
gojomo
Online Now Send Email
Feb 12, 2009
8:47 pm
5663
Thank you, Gordon! That's just what I needed. BTW: What are the possible types for the parameter Object in: public Object decisionFor(Object object) { } ...
Scott Ganyo
scottganyo
Offline Send Email
Feb 12, 2009
10:18 pm
5664
In Heritrix 1.x, it's essentially always a CandidateURI (and sometimes also a CrawlURI). In Heritrix 2.0, it's defined as a ProcessorURI (instead of Object)...
Gordon Mohr
gojomo
Online Now Send Email
Feb 13, 2009
1:13 am
5665
A test run of: * Heritrix 1.14.2 on an AWS/EC2, small instance, with 100 worker threads, 1.3M seeds, 900MB heap Has the following resource utilization stats: *...
pbaclace
Offline Send Email
Feb 13, 2009
2:16 am
5666
I also just saw StackOverflowError in the web UI of heritrix 1.14.2 right after I clicked Submit Job after composing a job from a previous one. ... An error...
pbaclace
Offline Send Email
Feb 13, 2009
5:00 am
5667
Does it happen every time you follow the same steps? Does it happen if you turn FINE logging off? It almost looks like there's an Attribute with a reference...
Gordon Mohr
gojomo
Online Now Send Email
Feb 13, 2009
7:41 am
5668
Hi, Just started using heritrix , have a few queries : How do I analyse the ARC files! How do I filter out ads and other media... please any amount of help...
nakulmudgal@...
nakulmudgal...
Offline Send Email
Feb 13, 2009
6:53 pm
5669
... Yes. The Default profile did not have the problem, but my customized profile did. ... I commented-out: org.archive.crawler.level = FINE and restarted...
pbaclace
Offline Send Email
Feb 13, 2009
8:40 pm
5670
Hello, I think the NotMatches* deciderules in the 2.0.2 branch are missing a static KeyManager.addKeys() call, which is preventing them from working properly....
Roger Caplan
rogercaplan
Offline Send Email
Feb 15, 2009
6:49 pm
5671
A good way to contribute patches is to attach them to issues in the project JIRA issue tracker: http://webteam.archive.org/jira There's already an issue about...
Gordon Mohr
gojomo
Online Now Send Email
Feb 16, 2009
2:11 am
5672
A "special" no-operation decide rules that I think would be handy: * Comment. Sometimes I would like to put a comment in a rule sequence and this would be a...
pbaclace
Offline Send Email
Feb 16, 2009
3:23 am
5673
The "un-knotting" performance change worked. I see a 2X speedup in heritrix v1.14.2: * 460KB/sec (from 230KB/sec) network usage * 100% cpu with load between...
pbaclace
Offline Send Email
Feb 16, 2009
4:05 am
5674
Hi Paul, this sounds like some rather interesting modifications - would you mind to share your changes as a diff patch? Best regards Olaf Freyer...
pandae667
Offline Send Email
Feb 16, 2009
3:41 pm
5675
Hi, I'm a begginer with Heritrix and I want to discard the files with the same digest I'm using Heritrix 2.0.0, this is my configuration: root=map,...
Miguel Olivares
miguelolivar...
Offline Send Email
Feb 16, 2009
8:03 pm
5676
Hi, I created a DecideRule that checks for every CrawlURI what methods (POST, PUT, DELETE, etc.) are allowed. Therefore, I am using the HTTP OPTIONS method...
peter.goras
Offline Send Email
Feb 16, 2009
9:40 pm
5677
These sound useful and we'd be happy to integrate them into the core if contributed. Also, other simple custom rules can be implemented with Beanshell scripts,...
Gordon Mohr
gojomo
Online Now Send Email
Feb 16, 2009
9:50 pm
5678
I recommend beginners use Heritrix 1.14.x versions; there is better documentation and fewer gotchas when customizing the configuration. I can't tell from the...
Gordon Mohr
gojomo
Online Now Send Email
Feb 16, 2009
10:01 pm
5679
Because the exception is triggered in toString() code called from... ... ...that code, rather than your checkOptionsMethod() code, would offer the best hints...
Gordon Mohr
gojomo
Online Now Send Email
Feb 16, 2009
10:06 pm
5680
ServerCache has been noted as a bottleneck before, so this is a very welcome result. Can you post a patch either here or to a JIRA issue for others to review...
Gordon Mohr
gojomo
Online Now Send Email
Feb 16, 2009
10:26 pm
5681
Hi, is it possible to connect two decide rules with a logical AND? Lets say I want on the one hand the URI to match a RegExp and on the other hand the URI...
peter.goras
Offline Send Email
Feb 17, 2009
3:04 pm
5682
Hi, I hope I do not miss something obvious, but I believe to remember being able to modify "bdb-cache-percent", which is particularly usefull to avoid OOM when...
Holger Lausen
hlausen
Offline Send Email
Feb 17, 2009
4:42 pm
5683
Hi Holger ... in the profile folder is a file called config.txt: # The percentage of JVM RAM to use for the BDB database during a live crawl. # Defaults to...
Juergen Umbrich
juergen@...
Send Email
Feb 17, 2009
5:20 pm
5684
In 2.0.x, the BDB environment is already open during job configuration, to store SURT-to-override-sheet associations. Thus, its cache-size is determined...
Gordon Mohr
gojomo
Online Now Send Email
Feb 17, 2009
6:01 pm
5685
I will use the following bug report for the proposed patch: http://webteam.archive.org/jira/browse/HER-1609 The patch is not yet ready; I will post a message...
pbaclace
Offline Send Email
Feb 18, 2009
2:30 am
Messages 5656 - 5685 of 6147   Oldest  |  < Older  |  Newer >  |  Newest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help