|
Posted by Jukka K. Korpela on 02/15/06 19:43
Toby Inkster <usenet200602@tobyinkster.co.uk> wrote:
> Jukka K. Korpela wrote:
>
>> The U+FEFF character is a space character. Though it has nominally
>> no width, it may be expanded in justification, and it also
>> constitutes an allowed line break point - a direct line break
>> point, not a hyphenation point. Would you like to have the word
>> split as "kinder" (without a hyphen) at the end of a line.
>
> U+FEFF is the zero-width NON-BREAKING space.
You're right; I should try to remember that I don't remember all
Unicode characters yet. (And I really _should_ remember correctly what
U+FEFF is. [Slaps himself.])
The defined meaning of U+FEFF is that it is a) a byte order mark (BOM),
b) an invisible control character for preventing a line break, and in
the latter role, U+2060 WORD JOINER is preferred. This means, in
effect, that by Unicode recommendations, U+FEFF should only be used at
the start of a text file as BOM.
This is somewhat theoretic of course, since U+2060 is poorly supported.
Besides, HTML specifications do not require that Unicode semantics be
obeyed; on the other hand, this means that the effect of U+FEFF in an
HTML document is _undefined_.
What you are really saying by using kindergarten is that the
word "kindergarten" be not divided into its components in word
division. This has little effect at present, since browsers don't do
word division.
So in that sense, it might be a harmless trick in an attempt to make
indexing robots treat the construct as two words. However, we have no
guarantee that this actually happens (after all, search engines _could_
be Unicode-aware and treat a word with prevented line break inside as
very much a single word).
Some user agents will choke on . Such user agents are rare
these days, but before taking a risk, I would like to see that
something can possibly be gained. If the split into components is
natural (and "kinder" and "garten" is not, for English text), then it
would be better to _use_ the component words in natural sentences as
healthy, natural food for search engines. If it isn't, the whole trick
is probably quite pointless; nobody is going to search for "kinder" and
"garten" if he wants to find info on kindergartens.
--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
Navigation:
[Reply to this message]
|