You are here: Re: MySQL 5.0, FULL-TEXT Indexing and Search Arabic Data, Unicode « PHP Programming Language « IT news, forums, messages
Re: MySQL 5.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

Posted by Jerry Stuckle on 06/14/06 02:22

jrs_14618@yahoo.com wrote:
> Hello All,
>
> This post is essentially a reply a previous post/thread
> here on this mailing.database.myodbc group titled:
>
> MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
>
> [This version has a couple subtle edits from the orginial I posted
> on mailing.database.myodbc - I'm cross posting here on this
> topic/subject related newsgroup]
>
> I was wondering if anybody has experienced the same issues
> challenges I'm experiencing I'll describe shortly. Once
> resolved some fascinating and powerful multi-lingual
> apps incorporating non-English/latin character sets can be
> realized by many developers.
>
> I have a Unicode utf8 English - Arabic - Hebrew - Greek (and
> several other languages) database in Microsoft Excel. I KNOW
> that it is Unicode utf8 data because MySQL tells me it
> recognizes the encoding as such but not in the context I want.
>
> Allow me to explain ...
>
> I can search the Unicode utf8 encoding with no problem in
> Excel. While in Excel I highlight a complete word or a
> partial string of an Arabic word copy it to the clipboard
> (i.e. memory). I then do a find and the process is the
> same successful result as if it was an English string.
>
> MySQL 5.0 is supposed to handle Unicode utf8
>
> I created a MySQL database I named: languages
>
> CREATE DATABASE languages ;
>
> and I implemented the following command on a MySQL
> command prompt:
>
> ALTER DATABASE languages DEFAULT CHARACTER SET utf8;
>
> No problem (so far) MySQL seemingly recognized utf8 and
> accepted it. My understanding is with the ALTER command
> the tables I create against languages will be utf8.
>
> I now created a table I named mainlang which denotes it
> will be the main table for my languages.
>
> mysql>CREATE TABLE mainlang
> ->(
> ->langNumID varchar(30),
> ->colB varchar(30),
> ->colC varchar(30),
> ->primary key (langNumID, colB)
> ->);
>
> Again so far no problem: Table successfully created.
> My third column 'colC' is where the Unicode data
> will be stored.
>
> I now attempt to import the database from my
> Excel file into my MySQL database as follows:
>
> mysql>load data infile 'c:\\arabicdictionary.csv'
> ->into table mainlang
> ->fields terminated by ','
> ->lines terminated by '\n'
> ->(langNumID, colB, colC);
> ERROR 1406 (22001): Data too long for 'colC' at row 1
>
> So what to do? I did a search and found other
> people seemingly had the same problem and someone
> suggested:
>
> ALTER DATABASE languages DEFAULT CHARACTER SET cp1250;
>
> I dropped mainlang, recreated it, redid the load and
> Lo and behold ... it seemed to work. No Data too long
> error occurred and when I did the following query:
>
> mysql>select langNumID, colB, colC
> ->from mainlang
> ->where colB = '4994';
>
> I see colA have a correct numeric value, colB a
> correct numeric value (4994) and for colC a string of
> unintelligible characters with diacritical marks,
> oomlats etc. which I know is the cp1250 encoding
> interpretation of the Unicode utf8 data which is
> similarly unintelligible in its own regard.
>
> Now what I try is: do a copy of the obscure colC
> cp1250 character string into the clipboard/memory
> and then do the following tweak on the original
> select statement to see if I can search on the
> (now) cp1250 character string:
>
> mysql>select langNumID, colB, colC
> ->from mainlang
> ->where colc = 'paste of the cp1250 character string';
>
> The computer would not allow a paste unless I pressed
> the escape key. On initiating this select command
> I got an empty set (no match)
>
> My questions are:
>
> Has anyone been successful creating a Unicode utf8
> MySQL database that accepts Arabic?
>
> If yes, how did you get around or not encounter the
> Data too long issue?
>
> Have you tried the cp1250 (or cp1251 - same mechanics
> same results) work around as I have? Are you
> able to search the cp1250 character string (my colC)?
> If yes, how did you successfully manage to do it?
>
> Lastly, if I take the cp1250 encoded string and paste
> it into Excel ... I can string search the cp1250
> encoding with no problem.
>
> Also, here's how I know my Unicode utf-8 data is
> correct apart from my own manual cross-referencing
> and being recognized by MySQL in some respect:
>
> When I copy the Unicode utf8 encoding and try to
> paste it into the select command to see what would
> happen I get the following error:
>
> ERROR 1257 (HY000): Illegal mix of collations
> (cp1250_general_ci, IMPLICIT) and
> (utf8_general_ci, COERCIBLE) for operation '='
>
> So what I have here is a situation where MySQL
> is recognizing Unicode utf8 encoding but not
> from the respect of packing a table!
>
> Go Figure ...
>
> Anyone wishing to share any insight or solution would
> be GREATLY appeciated. I promise if I find a solution
> I will share it.
>
> Thank you Very Much, Shukran Jiddan, Todah Rabah,
> Muchos Gracias ...
>
> Joel S
> (585) 255-0997
> jrs_14618 at yahoo.com
>

No idea, Joel. Why don't you try asking in a mysql database newsgroup - such as
comp.databases.mysql. This newsgroup is for PHP programming.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация