Posted by cawoodm on 03/09/06 17:51
I have written a simple RegEx which strips all tags from an HTML file
and replaces them with spaces.
This was fine until I noticed that some tags should not be replaced
with spaces. For example in the HTML:
<b>H</b>ello World
My program will generate "H ello World" effectively breaking a word
apart.
Where could I get an "authoritative" list of tags which should result
in a space and which shouldn't. I presume these are mostly block
elements like div, br, hr, table etc...
Navigation:
[Reply to this message]
|