|
Posted by Mel on 07/11/06 14:19
On 2006-07-11 21:52:53 +1000, Jerry Stuckle <jstucklex@attglobal.net> said:
> Taras_96 wrote:
>> Hi all,
>>
>> I was hoping to get some clarification on a couple of questions I have:
>>
>> 1) When should htmlspecial characters be used? As a general rule should
>> it be used for text that may contain special characters that is going
>> to be rendered in the browser (ie: text that isn't in tags)? I've got a
>> javascript onclick handler whose code includes an ampersand and the
>> HTML validator complains. I don't know if I should escape the
>> ampersand, or even if its possible (seeing that the text is inside a
>> HTML attribute).
>>
>
> Well, I haven't looked at the code, but I suspect htmlspecialchars(),
> since it converts fewer characters and has fewer options, it would be
> faster.
>
> The HTML validator on w3.org is decent, but it doesn't handle
> javascript very well. I just ignore the errors in javascript; for
> instance, something like:
>
> j=4&i;
>
> The "&i" is not a valid html entity - but it's valid javascript code.
> And this javascript wouldn't work:
>
> j = 4%amp;i;
No, it wouldn't, but valid XHTML _requires_ you to preclude the
embedded JavaScript with the appropriate CDATA marker. The character
'&' is reserved by the markup just like '>' and '<'. Not adhering to
the outlined standards simply encourages bad markup and makes
cross-browser compatibility more difficult. It's a big stretch to
equate cross-browser issues with unencoded ampersands, but it's not
that difficult to deal with. Javascript has some functional string
methods for encoding HTML entities.
>
>
>> Why would you ever use htmlentities as opposed to htmlspecialchars? The
>> only reason I can think of is if you're page's charset doesn't support
>> the special character you're trying to render (for example, the euro
>> using Latin1), but then why wouldn't you just change the pages charset
>> to UTF-8 (unless you're editor can't save in UTF-8, which might
>> indicate its time to get another editor). The comment on the PHP manual
>> entry for html entities, 'Please, don't use htmlentities to avoid XSS!
>> Htmlspecialchars is enough!' seems to suggest that the uses for
>> htmlentities is limited, since it needn't be used to avoid XSS.
>>
>
> Just changing the page charset doesn't change what PHP uses. You can
> pass a charset to either function, but if you need more than the five
> chars handled by htmlspecialchars() you need to use htmlentities().
>
> And the notes are comments - from users, not the PHP developers. I
> give it some credence, but not as much as the "official" word from the
> PHP developers. And if you look through them enough, you'll find
> errors and other people who get in and correct the errors. Not that
> much different than what you find here on usenet.
>
>> 2) A comment in the PHP manual entry for htmlentities states that their
>> function can be used to 'replace any characters in a string that could
>> be 'dangerous' to put in an HTML/XML file with their numeric entities
>> (e.g. é for [e acute])'. Why would it be dangerous!?
>>
>
> Don't know here, but I suspect browsers may act differently in
> different languages. But I have enough trouble with my native
> language, so I really haven't worried about it. But again that's a
> user comment.
>
>> 3) What are some typical uses of specifying HTTP input/output character
>> encoding? If it is used to convert output, why wouldn't you just change
>> the output page's char encoding? If its used to convert input from say
>> UTF-8 to Latin1, couldn't you just use a function to do this?
>>
>
> I use it anytime I'm displaying data input by the user, read from a
> database, etc. You never know when the data might contain a '<', a
> '"', etc.
>
> Changing the char encoding for the page doesn't convert any characters.
> All it does is tell the browser how to handle the characters. It's
> up to you, the programmer, to ensure the character encoding you use
> matches that of the page.
>
>
>> That's about it!
>>
>> Thanks in advance
>>
>> Taras
[Back to original message]
|