|
Posted by Adam Hubscher on 01/22/06 23:08
tedd wrote:
>> I've been having a tough time with parsing XML files and special
>> characters.
>>
>> -snip-
>>
>> Any suggestions as to how I could get around this seemingly impossible
>> road block thats been placed by what seems to be the xml engines :O..
>
>
> Adam:
>
> I believe that these "special" character will be with us for a long
> while. I suggest that you review the Unicode database for these
> characters and my suggestion is to use the code-points (HEX
> equivalences) for these characters. For example, 0061 is a small "a",
> 2022 is a "bullet", 2713 is a "check-mark" and so on. Most language
> glyphs of the world are represented in the Unicode database.
>
> HTH's
>
> tedd
>
Oh, I understand that they'll be here for a while.
The problem is the XML file is not my own, rather, its generated by
another service that I am creating a stemmed service for. I feel I have
asked much of the owner of that service in creating a properly formed
XML file (he was simply using pseudo xml that was, although nice and
organized, unable to be parsed.. period, and took forever with pregs, at
least now running through an XML generator the script itself takes less
time on his part too, and hes thankful for that.)
There are usernames listed in the file that use these special characters.
Rather than have him have to well, go through and edit the 30000 some
odd users that are indexed... unless there is a way for the xml writer
to do hex codes instead of unicode codes automatically... (and in that
partake, is there any way to read them automatically with a parser?),
then the idea is feasible.
Other than that, I'm trying to find a solution to parse the existing
file with the unicode data that causes a fatal error in the parser.
[Back to original message]
|