Re: Efficient way to rip html — HTML — IT news, forums, messages

You are here: Re: Efficient way to rip html « HTML « IT news, forums, messages

Posted by mbstevens on 10/03/06 23:47

On Tue, 03 Oct 2006 11:22:24 -0600, Arthur Rhodes wrote:

> The problem is that the descriptions I need to copy are embedded in
> complex pages, with nested tables, etc. Simply copying the page source
> doesn't seem to be that useful. I end up having to cut out lots of table
> code, etc., and usually make mistakes that are time consuming to figure
> out and fix.

Perl's HTML::Parser module will divide an HTML document into its various
parts (including text) with just a few lines of code. In the more
structured Python world, sgmllib, htmllib, or HTMLParser are the modules
to look into.
--
mbstevens
http://www.mbstevens.com/

Navigation:

Next in forum: Re: Links which are not public
Prev in forum: HTML Colored Text Into Photoshop Intact?
Thread view: Re: Efficient way to rip html

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация