|
Posted by Al on 12/23/05 17:06
Jochem Maas wrote:
> Al wrote:
>
>> I didn't fully test this; but it should get you started.
>
>
> fully? more like not at all.
>
> point 1:
>
> "%<a\040href\040*=['"]$types://((www.)*[\w/\.]+)['"]>.+</a>%i";
> ^-- double quotes are not escaped == parse error
>
> point 2:
>
> "%<a\040href\040*=['"]$types://((www.)*[\w/\.]+)['"]>.+</a>%i";
> ^-- this will inject the string 'Array' into the regexp
> string
>
>
> point 3:
>
> the regexp does not take into account that HTML tag attributes can
> occur in any order e.g:
>
> <a class="mine" id="abc123" target="_top" href="www.bla.com" >
> testing
> </a>
>
>
> point 4:
>
> what happens when the url does not have a protocol specified?
> granted the OP did not actually specify if strings like:
>
> "www.google.com"
>
> should also be considered as a url, so this is not really a valid point.
>
>>
>> $types= array('http', 'ftp', 'https', 'mms', 'irc');
>>
>> $pattern=
>> "%<a\040href\040*=['"]$types://((www.)*[\w/\.]+)['"]>.+</a>%i"; //
>> the "i" makes it non case sensitive
>>
>> if(preg_match($pattern, $URL_str, $match)){
>>
>> $URL= match[1];
>> }
>>
>> else{
>>
>> User did not enter a complete link; do the simple thing
>> }
>>
>>
>>
>> Anders Norrbring wrote:
>>
>>>
>>> I'm writing a filter/parsing function for texts entered by users, and
>>> I've run into a problem...
>>> What I'm trying to do is to parse URLs of different sorts, ftp, http,
>>> mms, irc etc and format them as links, that part was real easy..
>>>
>>> The hard part is when a user has already entered a complete link..
>>> In short:
>>>
>>> http://www.server.tld/page.html
>>> should be converted to:
>>> <a
>>> href='http://www.server.tld/page.html'>http://www.server.tld/page.html</a>
>>>
>>>
>>> That part works fine, but if the user enters:
>>>
>>> <a href='http://www.server.tld/page.html'>click here</a>
>>>
>>> it all becomes a mess... Can somebody please make a suggestion on this?
>>
>>
>>
Jochem's correct. I was in too big a hurry trying to help. It was obvious that Anders was not getting much useful
help. His points 3 and 4 are valid and I was not addressing them because they require more work than I have time to devote.
Here is corrected code. It works with the "Regex Coach". I did not try it with a php script.
$types= (http|ftp|https|mms|irc);
$pattern= "%<a\040href\040*=['\"]$types://((www.)*[\w/\.]+)['\"]>.+</a>%i"; // the "i" makes it non case sensitive
if(preg_match($pattern, $URL_str, $match)){
$URL= match[2];
}
else{
User did not enter a complete link; do the simple thing
}
[Back to original message]
|