You are here: Re: Parsing content for links « PHP Programming Language « IT news, forums, messages
Re: Parsing content for links

Posted by Curtis on 02/22/07 09:30

Tony wrote:
> I have a content management system that has links within the content
> field in the database and I need to verify if those links are correct.
> What I need to have happen is have a php script query the database and
> then parse through the content field to find all the <a href> tags to
> get the href attribute value and the link text.
>
> Does anyone have a way of doing this or a regex to do this?
>
> Thanks,
> Tony
>

Yeah, regex would be easiest, and there should be plenty out there,
but I might do something like this:

$re = '%
<a[^<>]+ # href may or may not come first
href=([\'"]) # capture single/double quote

# match a valid URI
(
[\w.-]+:(?://)? # scheme
[^?"]+ # authority

# possible query string and fragment
(?:
\\? [^#]+
(?: \\# [^"]+ )?
)?
)

\1 # captured quote from above
[^<>]* # possible remaining attributes
>( .*? ) # allow for nested tags
</a> # closing <a> tag
%xi';

The match for the URI would be in $match[2] and the text for the <a>
tag is in $match[3].

Just use this $re var in the preg_* functions.

Hope this helps,
Curtis

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация