Posted by Adam Hubscher on 01/22/06 19:51
I've been having a tough time with parsing XML files and special characters.
I have attempted every applicable engine, last try SAX, to attempt at
parsing a (rather large, 17.8mb) xml file.
The problem I hit, is when it hits a UTF8 encoded character. I've
attempted at decoded the file before it hits the parser, I've attempted
even ENCODING it (god knows why that'd work, it didnt, lol). I've tried
html_entities, etc. Nothing as such has worked.
I've also tried simply removing the character, and low/behold, it
worked! Darned thing...
Those are the characters so far that have caused me problems. I'd give
the utf8 encoded equivalent, but I'm not sure of it off the top of my head.
My code, varies so much that I'm not sure it'd be useful to type it out.
The issue seems not to be with my code, as when I parse the file
manually with a whole bunch of inefficient regex statements, everything
works out peachy. The problem with that way again is, it eats system
resources for a very long time (remember, 17mb file, and its all plain
Any suggestions as to how I could get around this seemingly impossible
road block thats been placed by what seems to be the xml engines :O..
[Back to original message]