Re: RegEx - replacing html entities — PHP Language

You are here: Re: RegEx - replacing html entities « PHP Language « IT news, forums, messages

Posted by Kimmo Laine on 02/10/07 14:48

Simon Harris wrote:
> In this case, I want to remove them - I tried your suggestion, but it still
> left & in the string in my test (Using the function below).
>
> function html2txt($document){
> $document = str_replace("<li>"," <li>",$document);
> $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
> '@<[\\/\\!]*?[^<>]*?>@si', // Strip out HTML tags
> '#&\w+;#iU', // Strip out HTML entities such as  
> '@<style[^>]*?>.*?</style>@siU', // Strip style tags
> properly
> '@<![\\s\\S]*?--[ \\t\\n\\r]*>@' // Strip multi-line
> comments including CDATA
> );
> $text = preg_replace($search, '', $document);
> return $text;
> }
>
> Just out of curiosity, how would you decode them? Thinking about it, this
> might actually work better for me.

There's also html_entity_decode() you can use to translate entities back
to plaintext

--
"En ole paha ihminen, mutta omenat ovat elinkeinoni." -Perttu Sirviö
spam@outolempi.net | Gedoon-S @ IRCnet | rot13(xvzzb@bhgbyrzcv.arg)

Navigation:

Next in forum: Re: RegEx - replacing html entities
Prev in forum: Re: RegEx - replacing html entities
Thread view: Re: RegEx - replacing html entities

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация