|
Posted by Arjen on 02/10/07 22:24
Simon Harris schreef:
> In this case, I want to remove them - I tried your suggestion, but it still
> left & in the string in my test (Using the function below).
>
> function html2txt($document){
> $document = str_replace("<li>"," <li>",$document);
> $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
> '@<[\\/\\!]*?[^<>]*?>@si', // Strip out HTML tags
> '#&\w+;#iU', // Strip out HTML entities such as
> '@<style[^>]*?>.*?</style>@siU', // Strip style tags
> properly
> '@<![\\s\\S]*?--[ \\t\\n\\r]*>@' // Strip multi-line
> comments including CDATA
> );
> $text = preg_replace($search, '', $document);
> return $text;
> }
For a quick and dirty (but nicely formatted imho) use
$text = shell_exec("lynx --dump $url");
--
Arjen
http://www.hondenpage.com
Navigation:
[Reply to this message]
|