|
Posted by shimmyshack on 04/27/07 00:00
On Apr 26, 11:52 pm, Rick Stem <ricks...@yahoo.com> wrote:
> I have checkURL(http://globalwarmingawareness2007.org.uk,
> globalwarmingawareness2007.org.uk)
>
> I see almost everyone using regular expressions. But I don't completely
> trust them. Don't know if this code is the best way to find if a user
> entered a valid URL and to avoid SQL injection from the URL.
>
> function checkURL($url, $name)
> {
> global $incorrect_input;
>
> $data=parse_url("http://".$url);
> if(!$data)
> die($incorrect_input[1].$name);
> $host=$data['host'];
> $path=$data['path'];
> $query=$data['query'];
> $fragment=$data['fragment'];
>
> //url does not start with a letter, number
> if (!preg_match('/^[A-Za-z0-9]/i',$host))
> die($incorrect_input[1].$name);
>
> //url does not contain a .
> if (!preg_match('/([A-Za-z0-9]+\.)+/i',$host))
> die($incorrect_input[1].$name);
>
> //url ends with .
> if (preg_match('/\.$/i',$host))
> die($incorrect_input[1].$name);
>
> $array=split('\.',$host);
> $arraysize=count($array);
>
> for ($i = 0; $i < $arraysize; $i++)
> {
> if (preg_match('/[^A-Za-z0-9\-\_]+/i',$array[$i]))
> die($incorrect_input[1].$name);
> }
>
> //Only allow alphanumeric letters, _,-,/
> if($path)
> {
> $len=strlen($path);
> for ($i = 0; $i < $len; $i++)
> {
> $ascii = ord($path[$i]);
> if (($ascii < 65 || $ascii > 90) &&
> ($ascii < 48 || $ascii > 57) &&
> ($ascii < 97 || $ascii > 122))
> if ($ascii != 45 && $ascii != 46 && $ascii != 95 && $ascii != 47)
> die($incorrect_input[1].$name);
> }
> }
>
> //Do not allow more than one consecutive slash for the path
> if (preg_match('/[\/]{2,}/i', $path))
> die($incorrect_input[1].$name);
>
> if($query)
> {
> if (preg_match('/[^A-Za-z0-9\/\-\_\=\&]+/i',$query))
> die($incorrect_input[1].$name);
> if (preg_match('/[\=\&]{2,}/i',$query))
> die($incorrect_input[1].$name);
> }
>
> if($fragment)
> {
> if (preg_match('/[^A-Za-z0-9\-\_\.]+/i',$fragment))
> die($incorrect_input[1].$name);
> }
>
> return($url);
>
> }
it isnt the best way no, th above code restricts the url to a small
subset of valid urls, and doesnt prevent sql inject which can occur
inside POST payload as well as GET.
Architecturally it isnt the right way to think about the problem
either, IMHO, its the easy answer - restrict restrict restrict - its
no substitute for allowing all the valid urls, even ones with
injection, and then filtering the input/output of your scripts.
this kind of approach though can have validity, have you tried using
mod_security?
Within php means you will be restricting yourself from application
adjustments, rewrites, non-ascii language implementation, besides all
this, the approach above doesnt lend itself to easy adjustment,
whereas a simple block of more readable reg exp would do, once youve
made the leap of faith (shown by others to be a worthwhile leap) into
the world of reg exps which you can indeed trust despite their
complexity.
Navigation:
[Reply to this message]
|