You are here: Re: Regex to get the <html></html> « PHP Programming Language « IT news, forums, messages
Re: Regex to get the <html></html>

Posted by Toby A Inkster on 08/03/07 10:20

FFMG wrote:

> I want to get the <head> code and a 'simple?' solution seems to be
> be...

There is no simple solution. In HTML, the start and end tags for the
<head> element are *optional* -- in other words, the following valid
document is considered to have a head element containing one TITLE and
one META element:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<title>Foobar</title>
<meta name=Keywords content="Foo,Bar,Baz,Foobar">
<h1>Foobar</h1>
<p>Foo bar baz.</p>

Your regular expression will not find the <head> element, which *is*
there, even if you can't explicitly see the beginning and end!

Best to use PHP's DOM stuff, as Rik mentioned.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 43 days, 13:49.]

Command Line Interfaces, Again
http://tobyinkster.co.uk/blog/2007/08/02/command-line-again/

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация