|
Posted by MW on 11/20/07 23:47
Followup:
Yes, the initial XML reading is the problem. The following code fixes
the problem:
$xml_data=file_get_contents($ef_url) or die("could not open XML input");
if (!mb_check_encoding($xml_data, "US-ASCII"))
$data=mb_convert_encoding($xml_data, "US-ASCII");
The obvious problem being, of course, that all unicode chars now appear
as "??" in the output. Using ISO-8859-1 doesn't help with the duplicate
characterdata procedure call.
MW
MW wrote:
> Re-vising this problem, I have discovered that the ef_characterData
> function is called twice by the parser, once with the part of the string
> before the "é" and then again with the rest of the string (including the é)
>
> While I investigate, I think the problem is because my XML file is
> external. Before I feed it to the parser, I am reading it into a
> variable using file_get_contents() - I think the assignment here is
> creating the problem.
>
> Will keep you guys posted, but if anybody has a similar problem can you
> let me know?
>
> MW
>
> MW wrote:
>> Problem solved with a strange workaround - I changed the assignment line
>> to $ef['title'].=$data. For some reason the assignment was happening in
>> two steps - the first step would transfer the part before the 'é' and
>> the second would transfer the rest. By adding the concatenate operator I
>> bypassed the issue.
>>
>> MW
>>
>> Dikkie Dik wrote:
>>>>> What limitations? Strings can be "absurd" long.
>>>> By limitations I mean that the charset is 8-bit, only 256 unique chars.
>>>> If my string has an character like the french accent "e", it can lead to
>>>> problems.
>>> Well, yes, but string handling should be binary-safe in recent versions
>>> of PHP. I use utf-8 a lot, and I never ran into that kind of problems.
>>> The only thing I have to take care of is the fact that some characters
>>> are represented by more than one "character".
>>>
>>>>> If you think the assignment is the problem, have you tried what
>>>>> $ef['title'] is directly before and after the assignment?
>>>> $ef['title'] is empty before the assignment, and "é to the White House"
>>>> after assignment. The funny part is that if I echo both variables right
>>>> below the line where the assignment occurs, $data is "Attaché to the
>>>> White House" and $ef['title'] is "é to the White House"
>>> That is really strange. I never encountered anything like it. Does it
>>> help (as an ugly workaround) to make it a reference assignment?
>>> Like: $ef['title'] &= $data;
>>>
>>> If so, it might help to "clone"-assign it to a non-array (local)
>>> variable first and then "reference"-assign that that local variable to
>>> the $ef['title']
>>>
>>> Just curious...
[Back to original message]
|