You are here: Re: Spidering only english webpages « PHP Programming Language « IT news, forums, messages
Re: Spidering only english webpages

Posted by R. Rajesh Jeba Anbiah on 10/29/05 20:17

el_roachmeister@yahoo.com wrote:
> I am working on a spider script but I only want to parse english pages.
> Is there a way I can check to see what language the content is in? I
> suppose I could restrict my spider to just .com , .org, etc so foreign
> countries would not get parsed.

If the website is well developed, the language code will be in lang
attribute <http://www.w3.org/TR/REC-html40/struct/dirlang.html> and or
in META. But, it's again not dependable.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация