| 
	
 | 
 Posted by Jukka K. Korpela on 06/18/06 06:38 
Jonathan N. Little <lws4art@centralva.net> scripsit: 
 
> Before it is posted, your can use JavaScript to check the input, but 
> that is no guarantee because the user may have JavaScript disabled. 
> You should *always* check user input upon the receiving end at the 
> server-side script. 
 
In this particular case, the check should probably be made _only_ in the  
server. 
 
As a rule, it is a good idea to consider setting up client-side checking as  
well, after you have designed and implemented the server-side check.  
Immediate checking is good for usability and accessibility: the user gets an  
error message at an early phase where he remembers what he just did and has  
the context and position in front of his years, literally or figuratively. 
 
However, double checking tends to be expensive in terms of implementation  
and maintenance work. You normally use two quite different programming  
languages, JavaScript for client-side checking and something else for  
server-side checking. This means duplicate coding; only the overall logic is  
the same. Moreover, any changes need to be implemented twice, and this means  
that some day you (or you successor as the maintainer) will forget this.  
Testing needs to be duplicated, too - with scripting enabled and scripting  
disabled. Testing the client-side checking is problematic, since there are  
differences between browsers in JavaScript implementations. 
 
So although it is a good idea to _consider_ client-side checking, due  
consideration does not that often lead to _implementation_ of client-side  
checking. In this particular case, the problem is particular hard. It is  
much harder than people naively think, since they typically imply the ASCII  
character repertoire. 
 
A well-designed form handler is prepared to anything, including any Unicode  
character in input data, since there is no reliable way to prevent users  
from inputing any character, intentionally or accidentally. And when your  
data is Unicode data, the question "what is a punctuation character?" is far  
from trivial. Apparently e.g. Arabic triple dot punctuation mark and Greek  
ano teleia and Tibetan mark tsheg shad are punctuation marks, right? Would  
you even include all characters with a General Category value starting with  
"P"? And no other? That would be a technically simple definition, and you  
could write the check using a suitable function in a suitable subroutine  
library - if you use an advanced programming language. But would that really  
match what you want it to match? 
 
My point here is that you probably want to define a _positive_ rule (which  
characters are allowed) rather than a negative rule (which characters are  
not allowed). The rule should correspond to the repertoire of characters  
that your form handler and other software and data formats involved are  
prepared to handle. There's no point in accepting a character on input if it  
will be lost or distorted in the actual processing, e.g. when saving to a  
database. 
 
--  
Jukka K. Korpela ("Yucca") 
http://www.cs.tut.fi/~jkorpela/
 
  
Navigation:
[Reply to this message] 
 |