|
Posted by Rik on 08/02/07 16:33
On Thu, 02 Aug 2007 17:48:24 +0200, FFMG <FFMG.2up4zm@no-mx.httppoint.co=
m> =
wrote:
> I want to get the <head> code and a 'simple?' solution seems to be
> be...
>
> preg_match_all("/<[html]+[^>]*>\s*(.*\s*)<\/html>\s*/i", $html,
> $matches, PREG_SET_ORDER);
Euhm, nope. you start on an undefined tag (lose the blockquotes around =
'[html]'), and you;re matching the html tag, not the head tag.
> but I want to make sure that there isn't a better solution to the
> problem especially if the head contains invalid code like...
>
> //--
> <head>
> <meta name=3D"description" content=3D"<head></head>" />
> </head>
> //--
DOM functions? <http://nl3.php.net/dom>
> How can I change my regex to ignore head tags inside double or single
> quotes?
Could be done by setting a greedy match starting on a quote untill the =
endquote. Then again, if you're concerned with invalid attributes, you'd=
=
have to allow for the possibility the quotes are erronous too, i.e. =
someone forgot to open or close them.
I've taken a stab at it with regexes in the past, which works quite well=
=
as long as you can be sure it's stricly valid HTML. If it isn't, or you'=
re =
using outside sources where this isn't known, don't use regular =
expressions for something a parser ought to be doing.
-- =
Rik Wasmus
Navigation:
[Reply to this message]
|