| 
	
 | 
 Posted by Curtis on 02/22/07 09:30 
Tony wrote: 
> I have a content management system that has links within the content 
> field in the database and I need to verify if those links are correct. 
> What I need to have happen is have a php script query the database and 
> then parse through the content field to find all the <a href> tags to 
> get the href attribute value and the link text. 
>  
> Does anyone have a way of doing this or a regex to do this? 
>  
> Thanks, 
> Tony 
>  
 
Yeah, regex would be easiest, and there should be plenty out there,  
but I might do something like this: 
 
$re = '% 
<a[^<>]+		# href may or may not come first 
href=([\'"])		# capture single/double quote 
 
# match a valid URI 
( 
	[\w.-]+:(?://)?	# scheme 
	[^?"]+		# authority 
 
	# possible query string and fragment 
	(?: 
		\\? [^#]+ 
		(?: \\# [^"]+ )? 
	)? 
) 
 
\1			# captured quote from above 
[^<>]*			# possible remaining attributes 
 >( .*? )		# allow for nested tags 
</a>			# closing <a> tag 
%xi'; 
 
The match for the URI would be in $match[2] and the text for the <a>  
tag is in $match[3]. 
 
Just use this $re var in the preg_* functions. 
 
Hope this helps, 
Curtis
 
  
Navigation:
[Reply to this message] 
 |