You are here: Re: Reading XML file - chars being dropped « PHP Programming Language « IT news, forums, messages
Re: Reading XML file - chars being dropped

Posted by MW on 11/20/07 23:47

Followup:

Yes, the initial XML reading is the problem. The following code fixes
the problem:

$xml_data=file_get_contents($ef_url) or die("could not open XML input");
if (!mb_check_encoding($xml_data, "US-ASCII"))
$data=mb_convert_encoding($xml_data, "US-ASCII");

The obvious problem being, of course, that all unicode chars now appear
as "??" in the output. Using ISO-8859-1 doesn't help with the duplicate
characterdata procedure call.

MW


MW wrote:
> Re-vising this problem, I have discovered that the ef_characterData
> function is called twice by the parser, once with the part of the string
> before the "é" and then again with the rest of the string (including the é)
>
> While I investigate, I think the problem is because my XML file is
> external. Before I feed it to the parser, I am reading it into a
> variable using file_get_contents() - I think the assignment here is
> creating the problem.
>
> Will keep you guys posted, but if anybody has a similar problem can you
> let me know?
>
> MW
>
> MW wrote:
>> Problem solved with a strange workaround - I changed the assignment line
>> to $ef['title'].=$data. For some reason the assignment was happening in
>> two steps - the first step would transfer the part before the 'é' and
>> the second would transfer the rest. By adding the concatenate operator I
>> bypassed the issue.
>>
>> MW
>>
>> Dikkie Dik wrote:
>>>>> What limitations? Strings can be "absurd" long.
>>>> By limitations I mean that the charset is 8-bit, only 256 unique chars.
>>>> If my string has an character like the french accent "e", it can lead to
>>>> problems.
>>> Well, yes, but string handling should be binary-safe in recent versions
>>> of PHP. I use utf-8 a lot, and I never ran into that kind of problems.
>>> The only thing I have to take care of is the fact that some characters
>>> are represented by more than one "character".
>>>
>>>>> If you think the assignment is the problem, have you tried what
>>>>> $ef['title'] is directly before and after the assignment?
>>>> $ef['title'] is empty before the assignment, and "é to the White House"
>>>> after assignment. The funny part is that if I echo both variables right
>>>> below the line where the assignment occurs, $data is "Attaché to the
>>>> White House" and $ef['title'] is "é to the White House"
>>> That is really strange. I never encountered anything like it. Does it
>>> help (as an ugly workaround) to make it a reference assignment?
>>> Like: $ef['title'] &= $data;
>>>
>>> If so, it might help to "clone"-assign it to a non-array (local)
>>> variable first and then "reference"-assign that that local variable to
>>> the $ef['title']
>>>
>>> Just curious...

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация