Re: [PHP] regex help — PHP — IT news, forums, messages

You are here: Re: [PHP] regex help « PHP « IT news, forums, messages

Posted by Jochem Maas on 01/14/05 01:51

Jason Morehouse wrote:
> Hello,
>
> I normally can take a bit of regex fun, but not this time.
>
> Simple enough, in theory... I need to match (count) all of the bold tags
> in a string, including ones with embedded styles (or whatever else can
> go in there). <b> and <b style="color:red">. My attempts keep matching
> <br> as well.

okay, you didn't show the regexp you currently have no worries - I
happen to have struck the same problem about 9 months ago when I had to
screenscrape product info from a static site for importation into a DB,
heres a list of regexps which will hopefully give you enough info to
do what you want (the fifth regexp is the one you should look at most
closely):

// strip out top and bottom
$str = preg_replace('/<[\/]?html>/is','',$str);
// strip out body tags
$str = preg_replace('/<[\/]?body[^>]*>/is','',$str);
// strip out head
$str = preg_replace('/<head>.*<[\/]head>/Uis','',$str);
// strip out non product images
$str =
preg_replace('/<img[^>]*(nieuw|new|euro)\.gif[^>]*\/?>/Uis','',$str);
// strip out font, div, span, p, b
$str = preg_replace('/<[\/]?(font|div|span|p|b[^r])[^>]*>/Uis','',$str);
// table, td, tr attributes
$str = preg_replace('/<(table|td|tr)[^>]*>/Uis','<$1>',$str);
// strip out the first table and hr?
$str = preg_replace('/<table>.*<hr>/Uis','',$str, 1);
// strip table, td, tr
$str = preg_replace('/<[\/]?(table|td|tr|h5)>/Ui','',$str);
// strip out all new lines
$str = str_replace("\n", '', $str);
// strip out tabs
$str = preg_replace('/[\011]+/', ' ', $str);
// strip out extra white space
$str = preg_replace('/[ ]+/', ' ', $str);

>
> Thanks!
>

Navigation:

Next in thread: Re: [PHP] regex help
Prev in thread: regex help
Next in forum: Re: [PHP] geographic search engine
Prev in forum: Re: [PHP] php editor
Thread view: Regex

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация