|
Posted by Samuel on 10/05/12 11:30
>I am working on a spider script but I only want to parse english pages.
>Is there a way I can check to see what language the content is in? I
>suppose I could restrict my spider to just .com , .org, etc so foreign
>countries would not get parsed.
There are literally thousands of spanish pages using dot com or dot org
domains (I own a couple of them). AFAIK, anyone in any country of any
language can register a dot com, and I can't imagine why you would
assume otherwise.
Greetings.
Navigation:
[Reply to this message]
|