Reply to i18n maybe?

Your name:

Reply:


Posted by "Richard Lynch" on 12/15/05 02:26

I have a table like this:
artist_id | artistname | artistname_alpha
1 | The Doors |
2 | The The |
3 | 100 Monkeys |
4 | 3�16 |

That last artistname is not in ASCII/English... Dunno what your email
client is showing you, but it's:

the digit 3
capital A with umlauts
US cents sign
capital A with carat
question mark
capital A with carat
US cents sign
the digit 1
the digit 6

THAT ought to get through any email client/mta okay. :-)

Now, my goal is to fill in artistname_alpha with things such as:
Doors, The
The, The
one hundred monkeys
3�16 (???)

I've written a nifty function for this:

function alpha ($string){
//$string = utf8_decode($string);

$string = preg_replace_callback('/(\\$[0-9\\.]+)/',
create_function('$s', 'return
Numbers_Words::toCurrency(str_replace("$", "", $s[1]));'), $string);
$string = preg_replace_callback('/([0-9]+)/', create_function('$s',
'return Numbers_Words::toWords($s[1]);'), $string);

if (stristr(substr($string, 0, 4), 'The ')) return (substr($string,
4) . ', ' . substr($string, 0, 4));
elseif (stristr(substr($string, 0, 3), 'An ')) return
(substr($string, 3) . ', ' . substr($string, 0, 3));
elseif (stristr(substr($string, 0, 2), 'A ')) return
(substr($string, 2) . ', ' . substr($string, 0, 2));
else return $string;
}

Now, the tricky part is that I don't really know what
'3�16' is.

It looks like it might be UTF-8, but utf8_decode() had no effect on
it, which is why I've commented that out in the function.

SO my function currently converts it to:
'three�sixteen'

That ain't right.

So, does anybody who understands this i18n stuff want to clue me in
the right direction?...

Things you should know:

I'm not trying to provide support for anything but English here,
unless it's trivial to do so.

The table has 150,000 rows.

I have no real control over fancy MySQL settings, as it's a $20 shared
host deal.

Every day, at 6 am, I get a new file of this data, and run through
with a script that does an UPDATE or INSERT. REPLACE is not suitable
due to primary key field size of source data. Anyway, I haven't even
checked if the function as-is will be too slow, but whatever I do to
fix the i18n issue can't have too much overhead, as it will be called
150,000 times every morning at 6 am.

If it helps, here is what my data-source dumps out when he encounters
this band name:
http://cdbaby.com/cd/316live

Here is the band's web-site:
http://316live.com/

And, here, possibly, is HTML source for what somebody copied/pasted
into the FORM to fill in the band name:

3·16

So, possibly, this is not i18n at all, and just somebody really really
really silly copying and pasting an HTML entity 'middot' from their
website into a form input and expecting it to render...

Would '·' output by a browser turn into 'âÂ�¢' ???

If so, what can I do about it?

--
Like Music?
http://l-i-e.com/artists.htm

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация