|
Posted by moosus on 10/11/05 01:29
G'day all
Hopefully somebody out there can point me in the right direction.
I have built a website, to allow editors to quickly place online new press
releases sent out by the government and other bodies. These documents are
usually in MS Word format. Since we are a publishing house we are full Mac
OS (10.3 & 10.4) running Office X
In my site I have a form with a body field all the editors do is copy the
body of the word doc and paste it in the form ... This is where the problems
arise.
The charset of MS Word for Mac is Western European (Macintosh) when there
are " , ' and in the articles they get stored in the database as strings
of special characters. I have been able to use str_replace to clean some of
these characters but some just break my code
//crud from word
$msword = array ('&' ,'"' ,"'" ,'' ,'' ,' "' ,''
,'' ,'âÆ __' );
$msnew = array
('&','"',''','-',''','·','"',"'",'· &n
bsp;' );
$cleanbody = str_replace($msword, $msnew, $body);
The page is content-typed :
<meta http-equiv="content-type" content="text/html;charset=utf-8">
To get to the question,
Can any body tell me how I can convert Western European (Macintosh) to
Western European (Windows) or some other way I may be able to clean the
contents of this field.
As an extra I should mention that when the copy is delivered to MySQL it is
different to the original (but I'm sure you guys probably already knew that)
Cheers
Moosus
[Back to original message]
|