You are here: Re: How to upload form data containing special characters correctly? « PHP Programming Language « IT news, forums, messages
Re: How to upload form data containing special characters correctly?

Posted by Gleep on 12/16/85 11:57

On Mon, 04 Sep 2006 11:24:04 +0200, Wim Cossement <wcosseme@nospam.bcol.be> wrote:

>Hello,
>
>I was wondering if there are a few good pages and/or examples on how to
>process form data correctly for putting it in a MySQL DB.
>
>Since I'm not used to using PHP a lot, I already found out that
>addslashes() can be used escape some characters, but I'm having some
>more problems with for instance ä, å and µ (since the text is scientifical)
>Now some people also throw in htmlspecialchars() to convert those to
>HTML entities, but some nest htmlspecialchars() in addslashes() and
>others do the opposite.
>
>Is there a good and error proof way of ensuring that what one puts in a
>textarea gets stored and can be retrieved safe and sound?
>
>Thanks in advance,
>
>Wimmy



i found user comments in the php manual under htmlspecialchar
think these might help

also if you need to save special characters I sugget turning off magic quotes and that supresses
the backslashes normally adds with set_magic_quote_runtime(0);

After inspecting the non-native encoding problem, I noticed that for example, if the encoding is
cyrillic, and I write Latin characters that are not part of the encoding (æ for example -
ae-ligature), the browser will send the real entity, such as &aelig; for this case.
Therefore, the only way I see to display multilingual text that is encoded with entities is by:
<?php
echo str_replace('&amp;', '&', htmlspecialchars($txt));
?>
The regex for numeric entities will skip the Latin-1 textual entities.







A sample function, if anybody want to turn html entities (and special characters) back to simple.
(eg: "&egrave;", "<" etc)
function html2specialchars($str){
$trans_table = array_flip(get_html_translation_table(HTML_ENTITIES));
return strtr($str, $trans_table);
}






Quite often, on HTML pages that are not encoded as UTF-8, and people write in not native encoding,
some browser (for sure IExplorer) will send the different charset characters using HTML Entities,
such as &#1073; for small russian 'b'.
htmlspecialchars() will convert this character to the entity, since it changes all & to &amp;
What I usually do, is either turn &amp; back to & so the correct characters will appear in the
output, or I use some regex to replace all entities of characters back to their original entity:
<?php
// treat this as pseudo-code, it hasn't been tested...
$result = preg_replace('/&amp;#(x[a-f0-9]+|[0-9]+);/i', '&#$1;', $source);
?>





Why &#39;? The HTML and XML DTDs proposed &apos; for this.
See http://www.w3.org/TR/html/dtds.html#a_dtd_Special_characters
So better use this:
$text = htmlspecialchars($text, ENT_QUOTES);
$text = preg_replace('/&#0*39;/', '&apos;', $text);

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация