You are here: Re: Cleaning MS Word input - last resort!! « PHP Programming Language « IT news, forums, messages
Re: Cleaning MS Word input - last resort!!

Posted by turnitup on 02/25/06 14:09

Julien CROUZET wrote:
> Il se trouve que turnitup a formulé :
>> Dear all,
>>
>> I have a problem with a form, and I have tried various permutations of
>> htmlentities() and html_entity_decode() to resolve, but without success.
>>
>> Here is the workflow.
>>
>> 1: User pastes MS Word formatted text into form field.
>> 2: Server uses mail() to send input text to mail client.
>> 3: Recipient pastes text into html file.
>>
>> The problem is that MS Word contains peculiar characters for things
>> like bullets, which come out as tabs, which then come out as
>> different, but spurious, html characters in the html translation.
>>
>> Does anyone know of a function(s) that can clean up MS Word input into
>> something that can be simply pasted as plain text without spurious
>> characters?
>>
>> Turner
>
> From a comment on the PHP documentation for the utf8_decode() function
> http://us2.php.net/manual/en/function.utf8-decode.php
>
>
> peter dot mescalchin at geemail dot com
> 27-Dec-2005 06:43
>
> Adding to below I have a few more MS word characters that need
> replacing. Found this was required when "fixing" some phpmyadmin export
> scripts from a live server where MS word characters were all through the
> content - before importing them back into my local mySQL database.
>
> The code I wrote for this process also does a strpos for any extra
> "\\xe2\\x80" strings - which are the tell-tale sign of any funny
> characters I want removed.
>
> Here are my updated arrays()
>
> <?php
> $badchr = array(
> "\\xe2\\x80\\xa6", // ellipsis
> "\\xe2\\x80\\x93", // long dash
> "\\xe2\\x80\\x94", // long dash
> "\\xe2\\x80\\x98", // single quote opening
> "\\xe2\\x80\\x99", // single quote closing
> "\\xe2\\x80\\x9c", // double quote opening
> "\\xe2\\x80\\x9d", // double quote closing
> "\\xe2\\x80\\xa2" // dot used for bullet points
> );
>
> $goodchr = array(
> '...',
> '-',
> '-',
> '\\'',
> '\\'',
> '"',
> '"',
> '*'
> );
> ?>
>
>

Merci!!

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация