Reply to Re: shopping for an html editor

Your name:

Reply:


Posted by Gιrard Talbot on 04/11/07 18:49

Jukka K. Korpela wrote :
> Scripsit GΓ©rard Talbot:
>
>> - KompoZer 0.77 markup cleaner will fix nested lists, remove trainling
>> <br> that WYSIWYG HTML editors often leave, remove align attributes in
>> empty table cells, remove empty blocks (like <p></p>). HTML Tidy will
>> do all this, except maybe fix nested lists
>
> Thanks for the heads-up.
>
> Such tools should _not_ be used without great discretion.
>
> Apart from fixing nested lists, which is a vague expression and could
> mean just about anything,

All previous Mozilla Composer versions were not creating nested lists in
a valid manner. They were creating improperly nested lists like this:

<ul>
<li>first item at first level</li>
<ul>
<li>first item of second level</li>
<li>second item of second level</li>
</ul>
<li>second item at first level</li>
</ul>

> all of these operations change the document
> and cause largely unpredictable effects on its visual appearance.
>
> For example, authors and editors often insert consecutive <br> tags to
> produce some vertical spacing. That's a wrong approach, but so is the
> operation of blindly removing them. The author wanted to create some
> spacing, so the author should decide what to do. Maybe the spacing
> _could_ be removed. Maybe some simple CSS code should be added while
> removing the tags.
>

Excellent suggestion. I know you and I have mentioned, talked about this
(arbitrary number of consecutive <br> should be better replaced with a
sensible CSS margin-bottom declaration) before in this newsgroup.
Composer 2 could have a feature like this: convert consecutive <br> into
a correspondent margin-bottom of/for the previous block-level element.

Same thing with "drop-empty-paras: specifies if Tidy should discard
empty paragraphs (<p></p>)."
http://tidy.sourceforge.net/docs/quickref.html#drop-empty-paras

> Even "cleaning" <td align="right"></td> to <td></td> is wrong if you
> don't know what will happen,

Maybe a better HTML Tidy documentation, support or FAQ or "how to use"
document should be developed so that users could see/understand how/what
a setting can do, will do.

and a simple program surely cannot know
> that.

People shouldn't use/trust blindly an application at first: they should
back up their work and then experiment.


> Maybe the attribute is there for no good reason, but it's possible
> that it's there intentionally, e.g. because some client-side script will
> change the element's content to nonempty and the author wanted that
> content to be right-aligned.

That would be rather rare, I'd say. Chances are, most of the time, the
left/center/right-alignment attributes were semi-automatically added by
a previous/older/other WYSIWYG HTML editor

>> - HTML Tidy (April 2007 version) has to be your first tool because it
>> is mighty powerful and amazing at fixing severely poorly coded
>> webpages.
>
> I didn't know there's a new version of Tidy; I thought the software was
> effectively frozen. Now I'm afraid I need to take a look, and I'm afraid
> I will be disappointed. When I last tested Tidy, it did _far too much_
> "fixing", making wild assumptions and even changing simple
> presentational HTML to awfully ugly

It's possible... and that should be rare... otherwise you'd invited to
file a bug on this.
The difficult part with Tidy is finding the correct (for your needs),
best/optimal blend of parameters so that it minimizes "ugly fixes"
occurences.

> and poorly structured tag soup in a
> CSS flavor

What are your settings/parameters? Here are mine:

--char-encoding latin1 --clean yes --doctype strict --drop-font-tags yes
--drop-proprietary-attributes yes --enclose-block-text yes
--enclose-text yes --indent auto --logical-emphasis yes --replace-color
yes --show-warnings no --wrap 80

All these are the ones that needed to be changed (for me, for my task)
as I did not want their default value. All of the other parameters (some
70-80 parameters) in their default value are ok with me.

> as well as changing my perfectly good ISO-8895-1 characters
> into messy "escapes".

You need to check the char-encoding parameter
http://tidy.sourceforge.net/docs/quickref.html#char-encoding
and possibly change it from ascii to latin1 since the default is ascii

"Good iso-8859-1 converted into messy 'escapes'" could mean, most
probably mean that input-encoding and output-encoding are not (but
should be) synchronized.
"Tidy will accept Latin-1 (ISO-8859-1) character values, but will use
entities for all characters whose value > 127."
http://tidy.sourceforge.net/docs/quickref.html#char-encoding

My solution/proposal for you: use
--char-encoding latin1

The default values for both parameters (input-encoding and
output-encoding) are not synchronized... which is non-sense. If the
default value for input-encoding is latin1, then the default value of
output-encoding should be latin1 too.

>
>> The nice thing about HTML Tidy is that you can use it on a batch of
>> many webpages. It's highly configurable (with about 100 parameters
>> possible: see http://tidy.sourceforge.net/docs/quickref.html
>> )
>> and very powerful.
>
> That might be nice, but if the defaults for the parameters are poor, I
> cannot really recommend it to most people. Few people will be capable of
> setting, say, 50 parameters to reasonable values when the programmer was
> not able to do that.

You shouldn't have to set 50 parameters... otherwise, it means the
default parameter value are often not best. I personally set only 12
parameters and I think I could even drop one or 2 when upgrading webpages.
I also think that HTML Tidy is not a good, recommendable tool for
totally new comers to HTML edition. A less powerful, less configurable
version of HTML Tidy might be recommendable for newbies though.


>> HTML Tidy will also fix validation markup errors but not all of them.
>> You'll still need to validate your webpages with a true SGML parser
>> software.
>
> That sounds odd. If it is mightly powerful etc. etc., how come it can't
> do the fairly simple job of SGML validation - at least with the DTD
> fixed to one of HTML DTDs?

That is a suggestion, a certainly reasonable good suggestion.
Latest HTML Tidy (.exe) version is 102 KB; a true SGML validation fixed
to, say, HTML 4.01 strict DTD would probably be more than 600 KB, would
be more complex/longer to develop - not that fairly simple, as you say
-, would require a lenghty documentation, etc. With so many invalid
webpages out there, it is very much still worth the trouble to do this.
W3C people should have done this many years ago and made such product
free, open-source, easily available, easily embeddable in applications.

Many of the available WYSIWYG HTML editors (commercial ones or freeware
ones) do not have HTML Tidy built-in nor SGML parsing feature built-in
.... and that is a shame.

GΓ©rard
--
Using Web Standards in your Web Pages (Updated Dec. 2006)
http://developer.mozilla.org/en/docs/Using_Web_Standards_in_your_Web_Pages

[Back to original message]


УдалСнная Ρ€Π°Π±ΠΎΡ‚Π° для программистов  •  Как Π·Π°Ρ€Π°Π±ΠΎΡ‚Π°Ρ‚ΡŒ Π½Π° Google AdSense  •  England, UK  •  ΡΡ‚Π°Ρ‚ΡŒΠΈ Π½Π° английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Π‘Π°ΠΉΡ‚ ΠΈΠ·Π³ΠΎΡ‚ΠΎΠ²Π»Π΅Π½ Π² Π‘Ρ‚ΡƒΠ΄ΠΈΠΈ Π’Π°Π»Π΅Π½Ρ‚ΠΈΠ½Π° ΠŸΠ΅Ρ‚Ρ€ΡƒΡ‡Π΅ΠΊΠ°
ΠΈΠ·Π³ΠΎΡ‚ΠΎΠ²Π»Π΅Π½ΠΈΠ΅ ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΠ° Π²Π΅Π±-сайтов, Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌΠ½ΠΎΠ³ΠΎ обСспСчСния, поисковая оптимизация