Search the web
Sign In
New User? Sign Up
xenu-usergroup · Xenu Linkchecker Usergroup
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Want your group to be featured on the Yahoo! Groups website? Add a group photo to Flickr.

Best of Y! Groups

   Check them out and nominate your group.

Messages

  Messages Help
Advanced
checking links in javascript   Message List  
Reply Message #102 of 1253 |
Re: checking links in javascript

> Regexp reXenu =
>
"javascript:(.*)\\(['\"](.*(s?html?|gif|jpe?g|png|jsp|cfm|zip|exe|aspx?|pl|p
df|xml|ra|asx|ram|swf|php)(\\?.*)?)['\"](.*)";

> so I get a match for
>
> javascript:openPopup("http://www.backbonemag.com/php_site/home.php?
> m_column_id=php_news/wmview.php?ArtID=702",850,550);

> and this regexp works for all links on
> http://webfeat.com/html/news/articles.asp

> but no match on your site for for
> javascript:openSite('http://www.airnyx.de/')
> javascript:openSite('http://www.meteor-wifi.com/en/')
>
javascript:openSite('http://www.swisscom-eurospot.com/index.php/internet/de'
)
> javascript:openSite('http://www.mycloud.net')
> javascript:openSite('http://www.tiscali.de')

I suggest such a regexp

javascript:\w+\s*\(\s*['\"]((?:ftp|https?)://[^'\"]+?)['\"](?:\s*,[^,]+?\s*)
*\s*\);

Tested in PowerGREP and works fine. It means this regexp is perl-compatible.
You just need to escape some special C++ sumbols (if there are any) with a
backslash to make it friendly with C++ compiler.

I find this regexp better than above-suggested because it catches not only
URLs ending with extensions listed bu any sequence of symbols that are not
single or double quotes - thanks to negated character class [^'\"] repeated
with plus ['\"]+.

Here is my regexp output:
http://www.airnyx.de/
http://www.meteor-wifi.com/en/
http://www.swisscom-eurospot.com/index.php/internet/de
http://www.mycloud.net
http://www.tiscali.de
http://www.backbonemag.com/php_site/home.php?m_column_id=php_news/wmview.php
?ArtID=702

--
Eugeny



Sun Oct 3, 2004 11:19 am

accmailer
Offline Offline
Send Email Send Email

Message #102 of 1253 |
Expand Messages Author Sort by Date

... Yes. ... So I added (\\?.*)? (each \ must be \ed an extra time because of the C++ compiler) into my string, thus getting Regexp reXenu = ...
Tilman Hausherr
geo4497
Offline Send Email
Oct 2, 2004
6:04 pm

... "javascript:(.*)\\(['\"](.*(s?html?|gif|jpe?g|png|jsp|cfm|zip|exe|aspx?|pl|p df|xml|ra|asx|ram|swf|php)(\\?.*)?)['\"](.*)"; ... ...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 3, 2004
12:45 pm

... I must admit I haven't tested it, because I don't understand it. My understanding of regular expressions is very basic. I don't know what \w or \s is. It...
Tilman Hausherr
geo4497
Offline Send Email
Oct 4, 2004
9:10 pm

hi tilman, this is absolutely wonderful. i tested version 1.2g(beta), and have the following comments: 1. it crashed when i copied your regex with the \ and...
frank visser
reusvisser
Offline Send Email
Oct 4, 2004
10:47 pm

... I suspect that your newsreader broke the long line. I did notice that some errors do crash the regexp :-( That happens when you use "external" software......
Tilman Hausherr
geo4497
Offline Send Email
Oct 5, 2004
7:16 am

hi tilman, got good results with: Javascript=javascript:.*\(['"](.*[^'](\?.*)?)['"](.*) it found all URLs and top domains. it did not catch: javascript:openWin...
frank visser
reusvisser
Offline Send Email
Oct 5, 2004
3:49 pm

... ('http://udp.intercea.co.uk/deutsch/bus_popup.htm','432','287'); ... By using my regexp which I have already posted to this group. If you have lost that my...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 6, 2004
4:37 am

hi eugeny, using your regex from an earlier mail: javascript:\w+\s*\(\s*['\"]((?:ftp|https?)://[^'\"]+?)['\"](?:\s*,[^,] +?\s*)*\s*\); xenu crashes after a...
frank visser
reusvisser
Offline Send Email
Oct 6, 2004
2:56 pm

... remove it. ... "?" means the entity can happen once or not at all. https? matches http and https....
Tilman Hausherr
geo4497
Offline Send Email
Oct 6, 2004
5:15 pm

... ['\"] is equal to ['"] ... That is correct. But question mark followed by colon in a regex construct (?:something) has another meaning. (option1|option2)...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 7, 2004
7:59 am

Hi eugeny, Tnx, no i am reading the powergrep stuff. Let me cut this discussion short: did you test your regex with xenu 1.2g beta? If I do, it keeps crashing,...
Frank Visser
reusvisser
Offline Send Email
Oct 7, 2004
9:49 am

Hi, Is there anyone on this list who has used Xenu's Link Sleuth to check huge websites (300000 links) ? I'm asking because one user claims it always crashes...
Tilman Hausherr
geo4497
Offline Send Email
Oct 10, 2004
2:08 pm

... javascript:\w+\s*\(\s*['"]((?:ftp|https?)://[^'"]+?)['"](?:\s*,[^,]+?\s*)*\s *\); ... There are different regex libraries , each with its own pecularities...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 7, 2004
10:15 am

Hi eugeny, Thanks a million, i will try it out today. I want to understand how the regex fits into Xenu code, doesn't Tilmans regex require three parts: 1. the...
Frank Visser
reusvisser
Offline Send Email
Oct 7, 2004
11:03 am

... No, my regular expression is to be set in the INI file. Which you did when testing. I might update Xenu for other reasons, like more flexibility, and a...
Tilman Hausherr
geo4497
Offline Send Email
Oct 7, 2004
11:59 am

guys, i think we are getting near the end of this javascript link saga. eugeny, i get perfect results with your simplified regex: javascript:[_a-z0-9]+ *\(...
frank visser
reusvisser
Offline Send Email
Oct 8, 2004
2:02 pm

this page http://www.quickbrowse.com/whatsnew.cgi contains this piece of HTML code <a href="javascript:popup('/freelimits.html',400,300)">Some features</a> So,...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 27, 2004
6:50 am

Hi eugeny, I noticed this code isn't matched by your regex: javascript: function('URL') When there's a spacer between "javascript" and "function", so I have...
Frank Visser
reusvisser
Offline Send Email
Oct 31, 2004
9:14 pm

... javascript: is missing. TH...
Tilman Hausherr
geo4497
Offline Send Email
Oct 31, 2004
10:32 pm

Eugeny, quick question: can your regex handle root relative URLs too? javascript:openJump('/folder/folder/file.htm'); frank ... ...
frank visser
reusvisser
Offline Send Email
Oct 4, 2004
9:16 pm

I realize I hate these approval delays. Is there a way that postings from people who have made on-topic posts get through immediately? Or people who have been...
Tilman Hausherr
geo4497
Offline Send Email
Oct 4, 2004
9:19 pm

tilman, i agree, i have selected the "new members" option. frank ... postings ... Or ... for ... member :)...
frank visser
reusvisser
Offline Send Email
Oct 4, 2004
9:25 pm

... "New" are those who joined after you have made you group restricted to post to. They do not become "old" automatically after some time. You have to make...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 5, 2004
4:40 am

... First, my regexp has this part : (?:ftp|https?):// It means it will catch any string that starts either from ftp:// http:// https:// So as long as this...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 5, 2004
4:33 am

... That "concatenation" is there, this has never been a problem/question because it is the same procedure I use for every other link....
Tilman Hausherr
geo4497
Offline Send Email
Oct 5, 2004
6:55 am

Frank , I strongly recommend you to download a trial version of PowerGREP from www.powergrep.com It contains an excellent manual in PDF format on building...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 5, 2004
5:49 am

... "\w" stands for wORD symbol, i.e. any letter or any digit "\w+" will catch any alphanumeric *word*, like "dog", "cat" or "a123bc" So in our example \w+...
Eugeny.Sattler@...
accmailer
Offline Send Email
Oct 5, 2004
8:14 am
 First  |  |  Next > Last 
Advanced

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help