|
Posted by Toby Inkster on 03/09/06 21:40
Dylan Parry wrote:
> You probably won't find a list that tells you the exact information you
> are after, but the HTML DTDs available from W3C[1] will show you which
> elements are block level and which are inline. From that you could
> assume that the block elements result in a space, and the inline should
> not.
In fact, you could assume that the block elements should begin and end
with a line break. You could also add a tab between <td> and <th> elements
in a table, add asterisks for unordered lists, add numbers for ordered
lists and so on.
I'll echo Mr Stevens' recommendation to use HTML::Parser for parsing
though -- it will give far better results than a reg exp. For example, a
reg exp won't tell you to add a line break after the word "bar" here,
because the closing tag for a paragraph is optional:
<body>
<p>Foo bar.
</body>
--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
[Back to original message]
|