Search the web
Sign In
New User? Sign Up
straight_talking_java · Former JDJ Straight Talking list
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Show off your group to the world. Share a photo of your group with us.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Help with Scrapers   Message List  
Reply | Forward Message #60114 of 60584 |
RE: [ST-J] Help with Scrapers

Calvin,

Yes, they are scraping the page and displaying it in a child frame. They are
then using javascript to prefill fields. So yes, I have considered:
1. obfusicating element names
2. using javascript to dynamically build elements

These both have drawbacks in maintenance as you pointed out. I was hoping
that maybe someone on the list could point me somewhere that I had not
thought of. As I said before, great minds think alike ;)

Mica

-----Original Message-----
From: straight_talking_java@yahoogroups.com
[mailto:straight_talking_java@yahoogroups.com]On Behalf Of Calvin Yu
Sent: Wednesday, July 15, 2009 8:45 AM
To: straight_talking_java@yahoogroups.com
Subject: Re: [ST-J] Help with Scrapers





It's possible that the software is not using the right referrer addresses,
so you can maybe detect that.
Do you know why they are using the scraper on your site? Is it to scrape
data from it, or is it to automate some interaction
repeatedly? I think any solution you come up with will have to hook
into what their
original intention is.

I was on
a project a couple of years ago where I did some scraper analytics and
researched some potential prevention techniques. In almost all cases,
it really becomes an arms race.
Almost any solution you come up with they will be able to defeat. And
since more techniques involve obfuscation almost any solution will have to
be weigh against how much more difficult it would be to work on that
functionality in the future.

Calvin

On Tue, Jul 14, 2009 at 6:05 PM, David Rosenstrauch
<darose@...>wrote:

>
>
> Scot Mcphee wrote:
> > on the login put a check box (with a CAPTCHA) they must fill out every
> > time they log in that says 'i agree with the terms and conditions' one
> > of which is no screen scrapers. use a variety of detection methods.
> > when detection occurs, disable the account for breach of terms and
> > conditions.
> >
> > also, another way - generate the entire interface programatically with
> > something like Dojo. make every Ajax request (which like, gets the
> > JSON code to generate the interface, so it's entirely necessary) send
> > the information as to whether a scraper is detected. if so, as well as
> > logging for the above breach of terms and conditions, send them a
> > tight javascript loop that eats 100% CPU for lets say, a minute.
> >
> > make all this code obfuscated, (take dojo into production mode and
> > it's pretty obfuscated already) and generate and rotate key method
> > names in the javascript.
>
> Maybe there's also some way to encrypt the traffic and/or parts of the
> page making it inaccessible to the ActiveX control.
>
> DR
>
>
>

[Non-text portions of this message have been removed]






[Non-text portions of this message have been removed]




Wed Jul 15, 2009 3:22 pm

mica.cooper
Offline Offline
Send Email Send Email

Forward
Message #60114 of 60584 |
Expand Messages Author Sort by Date

Guys, Got some folks that are illegally accessing our pages using user login information. They open our site in an IFrame of Internet Explorer then script the...
Mica Cooper
mica.cooper
Offline Send Email
Jul 6, 2009
8:02 pm

you can look at "iframe busting" code. This was talked about a lot recently when digg introduced the diggbar. Rob Diana...
Robert Diana
rob_diana
Online Now Send Email
Jul 6, 2009
8:18 pm

Ok, Here is what I found. Top and Self are Window objects in javascript. Doing a check like: if (top != self) { top.location.href =...
Mica Cooper
mica.cooper
Offline Send Email
Jul 7, 2009
1:10 pm

... What if they have disabled JavaScript in the browser they are using? I guess that's not a problem if your site requires JavaScript to actually function,...
Eric Rizzo
asmalltalker
Online Now Send Email
Jul 7, 2009
2:19 pm

Eric, Simple...not a consumer site, javascript is a requirement. Mica [Non-text portions of this message have been removed]...
Mica Cooper
mica.cooper
Offline Send Email
Jul 7, 2009
2:24 pm

Guys, I put in some javascript ajax to detect the scrapers : if (top != self) { sendAjaxMsg("some message"); } I got the first one...then nothing for almost a...
Mica Cooper
mica.cooper
Offline Send Email
Jul 14, 2009
4:35 pm

... Wow, those guys are pure evil Mica. I guess they are using the frame-breaker breaker code. http://alicious.com/2009/remove-digg-bar-from-website/ I'm not...
Simon MacDonald
macdonst
Offline Send Email
Jul 14, 2009
4:47 pm

Well, I am assuming that you can track any server communication, like trying to view or save data in your application. What you could do is add a request ...
Robert Diana
rob_diana
Online Now Send Email
Jul 14, 2009
4:56 pm

That most likely depends on how they filter out the message. If they detect ajax request, you might not be able to do much. If they just detect the top != self...
Gunter Sammet
guntersammet
Online Now Send Email
Jul 14, 2009
5:59 pm

Gunter... Been there and done that! Great minds think alike ;) Mica ... From: straight_talking_java@yahoogroups.com ...
Mica Cooper
mica.cooper
Offline Send Email
Jul 14, 2009
7:57 pm

How is it working? ... -- You too can be a member of the AHWSA [Non-text portions of this message have been removed]...
Jon Strayer
jstrayer2
Online Now Send Email
Jul 16, 2009
11:32 am

Jon, Haven't had time to implement it yet. Mica ... From: straight_talking_java@yahoogroups.com [mailto:straight_talking_java@yahoogroups.com]On Behalf Of Jon...
Mica Cooper
mica.cooper
Offline Send Email
Jul 16, 2009
9:17 pm

... Will the requests always come from a fixed set of IP addresses? DR...
David Rosenstrauch
darose2
Online Now Send Email
Jul 14, 2009
8:05 pm

David, My understanding is that the client runs an Activex plugin in Internet Explorer that creates a parent frame for their software and a child frame for...
Mica Cooper
mica.cooper
Offline Send Email
Jul 14, 2009
8:13 pm

... Well, I guess what I'm thinking is if you can just completely cut them off based on IP address. DR...
David Rosenstrauch
darose2
Online Now Send Email
Jul 14, 2009
8:17 pm

on the login put a check box (with a CAPTCHA) they must fill out every time they log in that says 'i agree with the terms and conditions' one of which is no...
Scot Mcphee
scotartt
Offline Send Email
Jul 14, 2009
10:02 pm

... Maybe there's also some way to encrypt the traffic and/or parts of the page making it inaccessible to the ActiveX control. DR...
David Rosenstrauch
darose2
Online Now Send Email
Jul 14, 2009
10:05 pm

It's possible that the software is not using the right referrer addresses, so you can maybe detect that. Do you know why they are using the scraper on your...
Calvin Yu
cyu77
Offline Send Email
Jul 15, 2009
1:47 pm

Calvin, Yes, they are scraping the page and displaying it in a child frame. They are then using javascript to prefill fields. So yes, I have considered: 1....
Mica Cooper
mica.cooper
Offline Send Email
Jul 15, 2009
3:22 pm

You might want to look at implementing a Negative Captcha: http://nedbatchelder.com/text/stopbots.html Calvin ... [Non-text portions of this message have been...
Calvin Yu
cyu77
Offline Send Email
Jul 15, 2009
5:54 pm
Advanced

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help