|
Posted by John Dunlop on 12/11/05 19:16
hendry wrote:
> I am looking for a function that checks there isn't non-valid URL
> characters in some input:
That's harder than you make out, even if your only concern is HTTP URLs,
because not all parts allow the same set of characters. For example, a
blanket ban on <?> because it can't occur in paths doesn't take into
account that it can occur in queries and fragments; likewise, a free pass
to digits because they can occur almost anywhere doesn't take into account
that they can't occur first in a scheme.
You'd have to, for starters, slice the URL into its components -
scheme, authority, path, etc. - before performing any checks, unless of
course your check is simply for characters that aren't allowed at all
(spaces, double quotes, and the like). Then, and I'm afraid there's no
getting around this, you'd have to examine the relevant sections of RFCs
3986 and 2616 to find out exactly what can go where.
> http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
(That page is long past its best-by. Burn that bookmark!)
--
Jock
Navigation:
[Reply to this message]
|