You are here: Re: Efficient way to rip html « HTML « IT news, forums, messages
Re: Efficient way to rip html

Posted by Ben C on 10/04/06 20:56

On 2006-10-04, Arthur Rhodes <rhodesr@no.spam.com> wrote:
> On Tue, 03 Oct 2006 13:25:02 -0500, Ben C wrote:
>
>>> Is there an easier way?
>>
>> Python, and Beautiful Soup.
>>
>> http://www.crummy.com/software/BeautifulSoup/
>
> Looks good. You don't know of any ready made gui for it,
> do you? I'm thinking it would be nice to have a tree
> pane representing the structure of the document, and when
> you click on a node a text pane shows the corresponding part
> of the document.

I don't know of one, but it wouldn't be hard to do. Someone may have
done one.

But Firefox can do exactly what you're describing, if you install the
"DOM Inspector" extension. You can click on something in the tree
representation in the DOM Inspector window and it flashes red on the
page, or you can point to part of the page, click, and the corresponding
part of the tree representation gets highlighted.

Having found your way around the document with this DOM Inspector, you
can then write the python/BeautifulSoup script to pull out the bits
you're interested in.

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация