| 
	
 | 
 Posted by Gleep on 06/14/85 11:57 
On Mon, 04 Sep 2006 11:24:04 +0200, Wim Cossement <wcosseme@nospam.bcol.be> wrote: 
 
>Hello, 
> 
>I was wondering if there are a few good pages and/or examples on how to  
>process form data correctly for putting it in a MySQL DB. 
> 
>Since I'm not used to using PHP a lot, I already found out that  
>addslashes() can be used escape some characters, but I'm having some  
>more problems with for instance ä, å and µ (since the text is scientifical) 
>Now some people also throw in htmlspecialchars() to convert those to  
>HTML entities, but some nest htmlspecialchars() in addslashes() and  
>others do the opposite. 
> 
>Is there a good and error proof way of ensuring that what one puts in a  
>textarea gets stored and can be retrieved safe and sound? 
> 
>Thanks in advance, 
> 
>Wimmy 
 
 
 
i found user comments in the php manual under   htmlspecialchar 
think these might help 
 
also if you need to save special characters  I sugget turning off magic quotes and that supresses 
the backslashes normally adds  with   set_magic_quote_runtime(0); 
 
After inspecting the non-native encoding problem, I noticed that for example, if the encoding is 
cyrillic, and I write Latin characters that are not part of the encoding (æ for example - 
ae-ligature), the browser will send the real entity, such as æ for this case. 
Therefore, the only way I see to display multilingual text that is encoded with entities is by: 
<?php 
   echo str_replace('&', '&', htmlspecialchars($txt)); 
?> 
The regex for numeric entities will skip the Latin-1 textual entities. 
 
 
 
 
 
 
 
A sample function, if anybody want to turn html entities (and special characters) back to simple. 
(eg: "è", "<" etc) 
function html2specialchars($str){ 
$trans_table = array_flip(get_html_translation_table(HTML_ENTITIES)); 
return strtr($str, $trans_table); 
} 
 
 
 
 
 
 
Quite often, on HTML pages that are not encoded as UTF-8, and people write in not native encoding, 
some browser (for sure IExplorer) will send the different charset characters using HTML Entities, 
such as б for small russian 'b'. 
htmlspecialchars() will convert this character to the entity, since it changes all & to &  
What I usually do, is either turn & back to & so the correct characters will appear in the 
output, or I use some regex to replace all entities of characters back to their original entity: 
<?php 
   // treat this as pseudo-code, it hasn't been tested... 
   $result = preg_replace('/&#(x[a-f0-9]+|[0-9]+);/i', '&#$1;', $source); 
?> 
 
 
 
 
 
Why '? The HTML and XML DTDs proposed ' for this. 
See http://www.w3.org/TR/html/dtds.html#a_dtd_Special_characters 
So better use this: 
$text = htmlspecialchars($text, ENT_QUOTES); 
$text = preg_replace('/�*39;/', ''', $text);
 
  
Navigation:
[Reply to this message] 
 |