|
Posted by Gordon Burditt on 03/11/07 04:20
>I coded up a hit counter, then extended it to see who was reading my
>blog, by matching IP. The problem is that I am swamped by crawlers.
Nice crawlers for search engines identify themselves in the user
agent string. Also, nice crawlers obey robots.txt, so you can
exclude portions of your site if you want. Of course, that part
won't be indexed. Evil bots fake user agent strings of ordinary
users.
>How can I detect a human, or a crawler? If I can handle one, I can
>negate it for the other.
Unfortunately, evil bots can hire humans to work for them, if you
had in mind such things as CAPTCHAs (decoding warped text in images).
>Should I somehow user $_SERVER['USER_AGENT'] ? or something else?
The user agent string is one thing you can use, mostly to detect nice
crawlers.
[Back to original message]
|