You are here: Re: Best way to parse a url for validity? « PHP Programming Language « IT news, forums, messages
Re: Best way to parse a url for validity?

Posted by shimmyshack on 04/27/07 00:00

On Apr 26, 11:52 pm, Rick Stem <ricks...@yahoo.com> wrote:
> I have checkURL(http://globalwarmingawareness2007.org.uk,
> globalwarmingawareness2007.org.uk)
>
> I see almost everyone using regular expressions. But I don't completely
> trust them. Don't know if this code is the best way to find if a user
> entered a valid URL and to avoid SQL injection from the URL.
>
> function checkURL($url, $name)
> {
> global $incorrect_input;
>
> $data=parse_url("http://".$url);
> if(!$data)
> die($incorrect_input[1].$name);
> $host=$data['host'];
> $path=$data['path'];
> $query=$data['query'];
> $fragment=$data['fragment'];
>
> //url does not start with a letter, number
> if (!preg_match('/^[A-Za-z0-9]/i',$host))
> die($incorrect_input[1].$name);
>
> //url does not contain a .
> if (!preg_match('/([A-Za-z0-9]+\.)+/i',$host))
> die($incorrect_input[1].$name);
>
> //url ends with .
> if (preg_match('/\.$/i',$host))
> die($incorrect_input[1].$name);
>
> $array=split('\.',$host);
> $arraysize=count($array);
>
> for ($i = 0; $i < $arraysize; $i++)
> {
> if (preg_match('/[^A-Za-z0-9\-\_]+/i',$array[$i]))
> die($incorrect_input[1].$name);
> }
>
> //Only allow alphanumeric letters, _,-,/
> if($path)
> {
> $len=strlen($path);
> for ($i = 0; $i < $len; $i++)
> {
> $ascii = ord($path[$i]);
> if (($ascii < 65 || $ascii > 90) &&
> ($ascii < 48 || $ascii > 57) &&
> ($ascii < 97 || $ascii > 122))
> if ($ascii != 45 && $ascii != 46 && $ascii != 95 && $ascii != 47)
> die($incorrect_input[1].$name);
> }
> }
>
> //Do not allow more than one consecutive slash for the path
> if (preg_match('/[\/]{2,}/i', $path))
> die($incorrect_input[1].$name);
>
> if($query)
> {
> if (preg_match('/[^A-Za-z0-9\/\-\_\=\&]+/i',$query))
> die($incorrect_input[1].$name);
> if (preg_match('/[\=\&]{2,}/i',$query))
> die($incorrect_input[1].$name);
> }
>
> if($fragment)
> {
> if (preg_match('/[^A-Za-z0-9\-\_\.]+/i',$fragment))
> die($incorrect_input[1].$name);
> }
>
> return($url);
>
> }

it isnt the best way no, th above code restricts the url to a small
subset of valid urls, and doesnt prevent sql inject which can occur
inside POST payload as well as GET.
Architecturally it isnt the right way to think about the problem
either, IMHO, its the easy answer - restrict restrict restrict - its
no substitute for allowing all the valid urls, even ones with
injection, and then filtering the input/output of your scripts.
this kind of approach though can have validity, have you tried using
mod_security?
Within php means you will be restricting yourself from application
adjustments, rewrites, non-ascii language implementation, besides all
this, the approach above doesnt lend itself to easy adjustment,
whereas a simple block of more readable reg exp would do, once youve
made the leap of faith (shown by others to be a worthwhile leap) into
the world of reg exps which you can indeed trust despite their
complexity.

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация