Search the web
Sign In
New User? Sign Up
pavuk · Pavuk Webgrabber Mailing List
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want to share photos of your group with the world? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
two matched substrings and fnrules   Message List  
Reply | Forward Message #813 of 988 |
Re: [pavuk] two matched substrings and fnrules

On Tue, 10 Jan 2006, frederic_meunier_combet wrote:

> Some URLs contain the http session id :
> http://myhost/mypage?
> cat=12&sessionId=jkkjnfjnfjfiNJJHJNHdnjddjhdnj&doc=12
>
> I would like to keep for local document only the starting URL and
> its end by using
> -fnrules R '(.*)sessionId.*(doc=.*)' '$1$2'
>
> this causes an exception :
> 5 [main] pavuk 2612 handle_exceptions: Exception:
> STATUS_ACCESS_VIOLATION
> 872 [main] pavuk 2612 open_stackdumpfile: Dumping stack trace to
> pavuk.exe.stackdump
>
> I also tried with an extended rule, using the sc command, but it is
> not possible to invoque macro "$x" in such rules.

Well, I would think this is a bug. Could you modify your report so, that I
can reproduce it. E.G. search an internet page I can access also and adapt
the commandline accordingly?

> I use pavuk 0.9.33 under windows compiled with cygwin.

Ah, bad timing. I released 0.9.34 yesterday.

Ciao
--
http://www.dstoecker.de/ (PGP key available)





Tue Jan 10, 2006 8:05 am

stoeckerd
Offline Offline
Send Email Send Email

Forward
Message #813 of 988 |
Expand Messages Author Sort by Date

Hello, Some URLs contain the http session id : http://myhost/mypage? cat=12&sessionId=jkkjnfjnfjfiNJJHJNHdnjddjhdnj&doc=12 I would like to keep for local...
frederic_meunier_combet
frederic_meu...
Offline Send Email
Jan 10, 2006
7:58 am

... Well, I would think this is a bug. Could you modify your report so, that I can reproduce it. E.G. search an internet page I can access also and adapt the...
Dirk Stoecker
stoeckerd
Offline Send Email
Jan 10, 2006
8:08 am

... so, that I ... and adapt ... Please try the following command line : pavuk -fnrules R '(.*)session=.*prod=([0-9]*).*' '$1/$2' - url_rpattern 'call_cat=3'...
frederic_meunier_combet
frederic_meu...
Offline Send Email
Jan 15, 2006
10:57 pm

... I fixed the bug in CVS: diff -r1.10 lfname.c 326,327c326,327 < strncpy(pd, p1, sizeof(pom)); < pd[sizeof(pom) - 1] = '\0'; ... Ciao -- ...
Dirk Stoecker
stoeckerd
Offline Send Email
Jan 16, 2006
11:57 am

It's working fine with v0.9.33. I have'nt yet tried with v0.9.34 but it shall be working as well. Thanks Best Regards...
frederic_meunier_combet
frederic_meu...
Offline Send Email
Jan 18, 2006
8:30 am
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help