Reply to Re: Regex to get the <html></html> — PHP Programming Language

Posted by Rik on 08/02/07 16:33

On Thu, 02 Aug 2007 17:48:24 +0200, FFMG <FFMG.2up4zm@no-mx.httppoint.co=
m> =

wrote:
> I want to get the <head> code and a 'simple?' solution seems to be
> be...
>
> preg_match_all("/<[html]+[^>]*>\s*(.*\s*)<\/html>\s*/i", $html,
> $matches, PREG_SET_ORDER);

Euhm, nope. you start on an undefined tag (lose the blockquotes around =

'[html]'), and you;re matching the html tag, not the head tag.

> but I want to make sure that there isn't a better solution to the
> problem especially if the head contains invalid code like...
>
> //--
> <head>
> <meta name=3D"description" content=3D"<head></head>" />
> </head>
> //--

DOM functions? <http://nl3.php.net/dom>

> How can I change my regex to ignore head tags inside double or single
> quotes?

Could be done by setting a greedy match starting on a quote untill the =

endquote. Then again, if you're concerned with invalid attributes, you'd=
=

have to allow for the possibility the quotes are erronous too, i.e. =

someone forgot to open or close them.

I've taken a stab at it with regexes in the past, which works quite well=
=

as long as you can be sure it's stricly valid HTML. If it isn't, or you'=
re =

using outside sources where this isn't known, don't use regular =

expressions for something a parser ought to be doing.
-- =

Rik Wasmus

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация