|
Posted by Yohan N. Leder on 05/19/06 13:55
In article <eMcbg.1906$Ph2.1659@reader1.news.jippii.net>,
jkorpela@cs.tut.fi says...
> Yohan N. Leder <ynleder@nspark.org> scripsit:
>
> > Hoping it will match the alt.html group, because already tried in
> > comp.lang.perl.misc but it seems to be more related to browser and
> > multipart/form-data posting.
>
> Why didn't you summarize which answers you got there?
Better than a summary, which would be false, by design, here is the url
: <http://minilien.fr/a0juc6>
> y do u use silly abbrs? It saves a few seconds of your time and wastes other
> people's time when they try to decipher your private codes. pb = problem
> ain't no std abbr.
Sorry, but sometimes, you've not any time and have to try taking some
shortcuts... However, "pb" is a well known abbreviation in French and
sorry again to didn't have translated what is natural for me and, maybe,
I can't known it's not for an English native man.
> > in this test script called form2dump.pl :
>
> Your script name is irrelevant. What would matter is an absolute URL that
> would let us see the problem in action.
form2dump means "it's a form submission for which I'm observing what is
received by the server". Also, in a first version of this test script I
did "dumped" the "multipart/form-data" content toward a server file...
Later, I've rewritten this part to get it on screen (i.e. client area of
client browser) for facility and because this multipart/form-data
doesn't contains any file upload (binary).
> > # Script written to solve the bug explained below :
>
> Huh? How is the script supposed to solve "the bug"? And why the singular,
> when you clearly have two problems?
No, I've only one problem : "euro sign in any form field corrupt
beginning of sent multipart/form-data (in detail : first lines
containing boundary and declaration of the first field are truncated)"
> > # PB : ? sign in any form field corrupt beginning of
> > multipart/form-data
>
> Which "? sign". Your Usenet message does not declare its character encoding,
> thereby implying ASCII, so you cannot insert the euro sign there, as you
> probably tried (guessing from the Subject line).
>
Sorry about character encoding, but I'm using the newsreader called
"MicroPlanet Gravity 2.5" and I don't find any option about "character
encoding" in this release. Taking care of your message, I've searched a
little on the web and it seems that the only Gravity-like program which
provide something about character encoding is an unofficial release
called "Super Gravity" : <http://www.usenet-fr.net/fur/minis-
faqs/accents.html>. I'll take a look at it.
However, the sign I told about was the "euro sign" which appeared as
interrogation point in your newsreader.
> The real problem is that there is no specification of what happens when the
> user types in a character that cannot be represented in the character
> encoding used for the form, which is the same as the encoding of the page
> (note that browsers ignore accept-charset attributes).
Nevertheless, when I'm trying to submit a form with "accept-
charset='utf-8'" in an HTML page which has a content-type indicating a
character set as "charset=iso-8859-1", the fields data are well
transmitted in an UTF-8 format.
> When the encoding is
> iso-8859-1 and the user types in the euro sign, the browser might (for
> example) ignore it or - strangely, but perhaps usefully in some cases -
> represent it as an entity reference € or some other way. Anyway, it is
> an error condition with no prescribed error processing.
Considering the station on which I've done my own test, it's not what
I've seen. Don't no the reason why, but here is my experience : if the
HTML page containing the form has a content-type indicating "iso-8859-
1", if there's not any checkbox in the form, when I'm typing the euro
sign from an Azerty keyboard using the graphic 'Alt' key in combination
with the 'e' one, it well apperas in the form field and is well
transmitted to the server (the euro sign is well present at the arrival
; in STDIN using my test script).
However, you said you would prefer something inline for testing. So,
I've done it and here it is : <>.
Also, I'm rewriting an explanation of the problem for which I'm
searching for a solution : "euro sign in any form field corrupt
beginning of sent multipart/form-data (in detail : first lines
containing boundary and declaration of the first field are truncated".
And to finish : of course, I could use UTF-8, but there's several reason
which "brake" me (some being about Perl, because I've found the problem
I'm talking about during writing of a Perl script) :
- Some target servers are using Perl 5.00503 under FreeBSD and there's
nothing about UTF-8 encoding/decoding in the stock modules of this
release.
- On those old servers, stock Perl modules only are authorized, even in
personal /cgi-bin directory. I'm aware it's a big constraint, but I've
not any way to change the decision about that : we have to do with this!
- HTML forms generated by the Perl scripts must be able to handle all
which may be usually tped in English and French language, including euro
sign.
- These Perl scripts contain a configurable part where different persons
(some being not developers) will be able to change some strings (stored
as constants using the Perl syntax : "use constant NAMEOFCONSTANT =>
"The string people can write, rewrite and manage by themself as if it
was a configuration feature";"), and we can't ask them to type character
entity rather than special or accentuated characters when there will be
ones (e.g. à, etc). So, if we would choose to use UTF-8, we
should, in the same time, find a way (without external module) to encode
these "configurable strings" prior to display them in any browser (i.e
write our own function).
Hoping to have been more accurate this time ;-)
[Back to original message]
|