You are here: Re: Tidy using unicode does not validate « HTML « IT news, forums, messages
Re: Tidy using unicode does not validate

Posted by groups2 on 03/16/07 19:53

On Mar 16, 8:04 am, "Andy Dingley" <ding...@codesmiths.com> wrote:
> On 15 Mar, 18:45, grou...@reenie.org wrote:
>
> > as I said before,http://reenie.org/test/unicode.htm
> > is http://reenie.org/test/ascii.htmcleanedby tidy with utf
> > encoding.
>
> Ok, I think I understand what you've done now.
>
> http://reenie.org/test/unicode.htmis broken. It appears to have a
> ISO-8859-1 character in the file being served as a UTF-8 document.
>
> Tidy didn't make this file. AFAIK, the Tidy you're using takes its
> input from Firefox and doesn't have any "output to file" feature. You
> must have taken its output from the clipboard, pasted it into your
> choice of editor and saved it from there. At this point, I can only
> assume that the file was a correctly-encoded ISO-8859 file.
>
> The web server then gets to it and serves it up, with UTF-8 encoding
> headers or embedded metas in it. Things go wrong _at_this_point_. File
> is good (but not UTF-8), web document is bad (mis-labelled and thus
> unreadable).
>
> I suggest you try the "Tidy cleanup" process again, but this time make
> sure that your editor's save setting is utf-8. jEdit is a well-behaved
> editor here, some others (e.g. Eclipse) aren't. Watch out for Windows
> editors, as they often say "Unicode" and mean UTF-16, which isn't
> what's wanted at all. Look for a specific UTF-8 option.

Right Right Right.
I just figured out the same thing before I saw your message.
My editor is UltraEdit.
http://reenie.org/test/unicode.htm is Saved as DOS and shows question
marks for the characters in Firefox, and doesn' t validate with w3.

http://reenie.org/test/unicode3.htm is exactly the same except it is
saved as DOS-UTF8. It shows the characters correctly and validates.

Now when I validate it, W3 gives me a warning:

Byte-Order Mark found in UTF-8 File.

The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known
to cause problems for some text editors and older browsers. You may
want to consider avoiding its use until it is better supported.

Should I be worried about this ?
It seems that only way to avoid this problem is to leave the file in
Dos and tidy the file in ascii. Is this correct ?

I am about to to edit quit a few pages so I want to do whatever will
be most common and most recommended in the future. Am I safe in
assuming that will be utf-8 ?

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация