|
Posted by Steve on 10/18/06 10:17
Hi,
I'm a complete PHP n00b slowly finding my way around
I'm using the following function that I found on php.net to strip
out html and return only the text. It works well except for when
you find styles embedded within the tags
eg: <h3 id="pageName">Have a great day!! </h3>
This throws an error, whereas
<h3 >Thank you for your purchase! </h3> works like a charm.
It also falls over when crappy code has <h3> </h3> between
the tags.
What do I need to add to the below function to get it to work on
cases like above?
regards,
Steve
The function is:
function html2txt($txt){
$search = array('@<script[^>]*?>.*?</script>@si', // Strip out
javascript
'@<[\/\!]*?[^<>]*?>@si', // Strip out
HTML tags
'@<style[^>]*?>.*?</style>@siU', // Strip style
tags properly
'@<![\s\S]*?--[ \t\n\r]*>@', // Strip
multi-line comments including CDATA
"@</?[^>]*>*@"
);
$text = preg_replace($search, '', $txt);
return $text;
}
[Back to original message]
|