Reply to Re: Convert HTML to Text

Your name:

Reply:


Posted by Toby Inkster on 03/09/06 21:40

Dylan Parry wrote:

> You probably won't find a list that tells you the exact information you
> are after, but the HTML DTDs available from W3C[1] will show you which
> elements are block level and which are inline. From that you could
> assume that the block elements result in a space, and the inline should
> not.

In fact, you could assume that the block elements should begin and end
with a line break. You could also add a tab between <td> and <th> elements
in a table, add asterisks for unordered lists, add numbers for ordered
lists and so on.

I'll echo Mr Stevens' recommendation to use HTML::Parser for parsing
though -- it will give far better results than a reg exp. For example, a
reg exp won't tell you to add a line break after the word "bar" here,
because the closing tag for a paragraph is optional:

<body>
<p>Foo bar.
</body>

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация