One of the more interesting characteristics of spider traffic is the fact that
they normally don't request images or css files. We have been using this to
exclude visits where no hits to images or css were registered and not depend on
IP address at all. Obviously this only works with server-side logs. I am not
sure what software package you are using, I have done this with WebTrends and
Visual Sciences where I know that this is definitely possible.
Thanks,
Nikolay Gradinarov
Senior Web Analyst
Monster.com
----- Original Message ----
From: Nick Arnett <narnett@...>
To: webanalytics@yahoogroups.com
Sent: Thursday, May 8, 2008 4:26:52 PM
Subject: [webanalytics] Badly behaved robot numbers?
Can anybody here share their experiences with badly behaved robots?
I'm referring to log file analysis that reveals probable robots that
masquerade as browsers in the user-agent header, show up usually for a few
hours and retrieve large numbers of pages. One of our sites is getting
double-digit page views from these, which I suspect are spambots trying to
harvest email addresses. I'm looking at daily unique combinations of
user-agent and IP address (since almost none of them support cookies) whose
page views are more than two standard deviations from the average.
For most of the sites I've checked, sources that fit those criteria, after
eliminating known robots (those who properly identify themselves), make up
somewhere around 5 or 6 percent. When these others show up, that number
shoots up to the double-digits.
Anybody else have numbers to compare?
The most annoying thing about these is that they aren't practical to block.
Generally, by the time we can identify them, they're gone, presumably using
a new IP address... most of the address belong to cable and telecom
companies, so I suspect these are coming from hijacked computers.
Anybody else have ideas about spotting them rapidly enough and accurately
enough to block them? We don't want to block proxy servers, which might
appear the same. I looked at Operation Honeypot, but they don't identify
them fast enough to help.
How much have I scared those of you who rely entirely on page tags? ;-)
Nick
--
Nick Arnett
narnett@mccmedia. com
Messages: 408-904-7198
[Non-text portions of this message have been removed]
________________________________________________________________________________\
____
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
[Non-text portions of this message have been removed]