|
Posted by Colin Fine on 10/07/06 18:42
Seb wrote:
>
> usenet+2004@john.dunlop.name wrote:
>> Seb:
>>
>>> I am trying to find the right regular expression which would only
>>> validate a URL with a given number of folders.
>> URLs don't have, or refer to, folders. The parts in between the
>> slashes in URL paths are called path segments, and they might or might
>> not correspond to a part of a filesystem.
>>
>>> http://www.abc.com/folder/page.htm --> Valid (4 slahes)
>>>
>>> http://www.abc.com/folder/subfolder/ --> not valid (5 slashes)
>>>
>>> Basically, any URL not made of 4 slahes would be invalid.
>> Count the number of slashes in the string.
>>
>>> http://www.abc.com/folder/subfolder --> would also be invalid
>> How would you distinguish that URL from your first example?
>>
>> Now you see the problems arising from the confusion of URL paths and
>> filesystem paths.
>>
>> --
>> Jock
>
> Thanks.
>
> I guess all my actual files would be file extensions (.htm etc) whereas
> a path segment wouldn't.
>
> The question was around which regular expression I can use to access
> something with 4 slashes, and which does not finish with ".***" or
> ".****".
>
> Thanks,
> Seb
If that is indeed what you need (and assuming you mean 'does', not 'does
not'),
preg_match ('|^http://[^/]+/[^/]+/[^/.]+.[^/.]{3,4}$|', $string)
will do it.
But you should be aware that there's nothing in the URL RFC that says
you can't have a path like:
www.abc.com/my.bit/next.bit
Unless you have control over the format of valid URL's, you are not
entitled to assume that xxx.yyy is the final part of a URL path.
Incidentally, don't top-post if you don't want to bring down the Wrath
of Jerry Stuckle. I've fixed yours.
Colin
Navigation:
[Reply to this message]
|