|
Posted by Markus on 07/03/07 12:36
Hello
I try to write RTF files using text in UTF-8 encoding. Converting the
text with utf8_decode() already fails on characters such as an
apostrophe or an endash. Of course non-Latin-1 characters would go lost.
Trying to understand the RTF spec I found that ISO-8859-1 is not
available in RTF, but only the Windows 1252 codepage which differs from
Latin-1 in some characters.
So I set the codepage to 1252 and learned that characters not contained
in this codepage should be placed as Unicode:
<quote>
For example, the text Lab[Gamma]Value (Unicode characters 0x004c,
0x0061, 0x0062, 0x0393, 0x0056, 0x0061, 0x006c, 0x0075, 0x0065) should
be represented as follows (assuming a previous \ucl):
Lab\u915Gvalue
</quote>
Now I don't understand this anymore... What does the G after the decimal
value mean? How should this \ucl be applied? ...
So these are actually my questions:
- Is there a good way to convert an UTF-8 string into CP1252, without
losing the non-CP1252 character info? (mbstring is not available on that
server)
- Can somebody point me to an easy to understand RTF tutorial?
Thanks for any hint!
Markus
[Back to original message]
|