Posted by Philip Ronan on 08/26/05 14:33
"Basta" wrote:
> I'm trying to retrieve information of a website using PHP and Curl.
> This is the code I use:
>
(snip)
>
> This results in a 403 forbidden page. However if I type the url
> http://teletekst.nos.nl/ in my browser then it works fine (also with
> cookies disabled).
That's probably because the owners of teletekst.nos.nl are fed up with
having idiot robots crawling all over their site and stealing its content.
If you had bothered to visit <http://teletekst.nos.nl/robots.txt> you might
have noticed that robots are not permitted to access this website. You're
getting a 403 response because their website has identified that you're
accessing it improperly.
There are probably some things you could do to bypass the blocks on this
website, but I'm not going to tell you what they are. Create your own
content. Don't steal it from other websites.
--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Navigation:
[Reply to this message]
|