|
Posted by Erland Sommarskog on 06/30/05 00:20
dBlue (zkvneml@hotmail.com) writes:
> BYTW, I found an very instereting stuff. Please the script below,
>
> The first 1 and 2 select statement should return same result, but they
> do not; and only difference between them is 1 statement does not have
> the M character inside the like statement.
>
>
> CREATE TABLE tbl
> (lastname nvarchar(30) NOT NULL)
> go
> INSERT tbl (lastname) VALUES ('MópeX')
> INSERT tbl (lastname) VALUES ('MópeY')
> INSERT tbl (lastname) VALUES ('MopeZ')
> go
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_CP1_CI_AI LIKE N'%[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_CP1_CI_AI LIKE N'%M[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_CP1_CI_AI LIKE N'%M%[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_CP1_CI_AI LIKE N'%[o]pe%'
> GO
>
> drop table tbl
But "MópeX" is not the same as "MópeX". The first string has six
characters, the second has five. Both strings match N'%[óo]pe%' -
that is first any number of characters and then exactly one of
"Ã", "³" or "o" and then "pe" followed by any number of chars.
On the other hand, the string N'%M[óo]pe%', you say that there
should be an M and then exactly one of "Ã", "³" or "o" and then "pe",
which there is in the six-char string.
OK, so that "MòpeZ" is really a representation of "MópeX", to whit
the bit pattern that you have in UTF-8, but stored in a string which
is supposed to hold UTF_16 values, and then it goes downhill from
there.
So I guess you really problem is why you have UTF-8 encodings in the
wrong place. When I saw your first posting, I assumed that the mangled
UTF-8 came from the news posting itself, but it appears now that it
comes from the database.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
[Back to original message]
|