You are here: Re: RegEx - replacing html entities « PHP Language « IT news, forums, messages
Re: RegEx - replacing html entities

Posted by shimmyshack on 02/12/07 11:35

On 10 Feb, 22:24, Arjen <d...@mail.me> wrote:
> Simon Harris schreef:
>
>
>
> > In this case, I want to remove them - I tried your suggestion, but it still
> > left &amp; in the string in my test (Using the function below).
>
> > function html2txt($document){
> > $document = str_replace("<li>"," <li>",$document);
> > $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
> > '@<[\\/\\!]*?[^<>]*?>@si', // Strip out HTML tags
> > '#&\w+;#iU', // Strip out HTML entities such as &nbsp;
> > '@<style[^>]*?>.*?</style>@siU', // Strip style tags
> > properly
> > '@<![\\s\\S]*?--[ \\t\\n\\r]*>@' // Strip multi-line
> > comments including CDATA
> > );
> > $text = preg_replace($search, '', $document);
> > return $text;
> > }
>
> For a quick and dirty (but nicely formatted imho) use
> $text = shell_exec("lynx --dump $url");
>
> --
> Arjenhttp://www.hondenpage.com

from php docs [ html_entity_decode() ] this is what one can do in php4

$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))',
$string);
$string = preg_replace('~&#([0-9]+);~e', 'chr(\\1)', $string);

so ammending that slightly

#could use [a-fnmps] instead of [a-z]
$string = preg_replace('~&[#x]?([a-z]+)?([0-9]+)?;~ei', '',
$string);

would probably do the job, untested.

(sorry if this post appears twice, network problems)

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация