|
Posted by turnitup on 02/25/06 14:09
Julien CROUZET wrote:
> Il se trouve que turnitup a formulé :
>> Dear all,
>>
>> I have a problem with a form, and I have tried various permutations of
>> htmlentities() and html_entity_decode() to resolve, but without success.
>>
>> Here is the workflow.
>>
>> 1: User pastes MS Word formatted text into form field.
>> 2: Server uses mail() to send input text to mail client.
>> 3: Recipient pastes text into html file.
>>
>> The problem is that MS Word contains peculiar characters for things
>> like bullets, which come out as tabs, which then come out as
>> different, but spurious, html characters in the html translation.
>>
>> Does anyone know of a function(s) that can clean up MS Word input into
>> something that can be simply pasted as plain text without spurious
>> characters?
>>
>> Turner
>
> From a comment on the PHP documentation for the utf8_decode() function
> http://us2.php.net/manual/en/function.utf8-decode.php
>
>
> peter dot mescalchin at geemail dot com
> 27-Dec-2005 06:43
>
> Adding to below I have a few more MS word characters that need
> replacing. Found this was required when "fixing" some phpmyadmin export
> scripts from a live server where MS word characters were all through the
> content - before importing them back into my local mySQL database.
>
> The code I wrote for this process also does a strpos for any extra
> "\\xe2\\x80" strings - which are the tell-tale sign of any funny
> characters I want removed.
>
> Here are my updated arrays()
>
> <?php
> $badchr = array(
> "\\xe2\\x80\\xa6", // ellipsis
> "\\xe2\\x80\\x93", // long dash
> "\\xe2\\x80\\x94", // long dash
> "\\xe2\\x80\\x98", // single quote opening
> "\\xe2\\x80\\x99", // single quote closing
> "\\xe2\\x80\\x9c", // double quote opening
> "\\xe2\\x80\\x9d", // double quote closing
> "\\xe2\\x80\\xa2" // dot used for bullet points
> );
>
> $goodchr = array(
> '...',
> '-',
> '-',
> '\\'',
> '\\'',
> '"',
> '"',
> '*'
> );
> ?>
>
>
Merci!!
Navigation:
[Reply to this message]
|