|
Posted by bbcworldtour on 04/18/07 08:11
On 16 Apr., 13:46, "Dan Guzman" <guzma...@nospam-online.sbcglobal.net>
wrote:
> Have you applied any service packs? If not, try installing Express SP2
> (http://www.microsoft.com/downloads/details.aspx?familyid=31711d5d-725....).
> I get the correct results on my SP2 Developer Edition instance:
I upgraded to SP2, but the problem persisted. It turns out that it is
hidden deep inside the finer points of the database collation. I
thought this was kind of interesting in a low-intense way, so here's
the story:
Our databases are running a collation of Danish_Norwegian_CS_AS (we
are a Danish company).
In Danish we have three special phonemes that are represented in
writing as the letters Æ, Ø and Å. These three letters are
alphabetically placed as the last three letters of the alphabet.
The last one turns out to the culprit (if it doesn't show up proper
imagine an upper-cased A with a small circle superimposed on it).
Using the letter Å for the phoneme [Å] is a fairly recent addition to
Danish (around 1950's). Traditionally it was written as "AA". For
instance, my surname can be written as either "Brunsgård" or
"Brunsgaard", but is still considered the same name.
So in Danish, "AA" can be either the traditional writing of the
phoneme [Å] OR just two "A"s which happen to be consecutive.
Danish_Norwegian_CS_AS collation recognizes "AA" as "Å". This is
usually real neat for sorting. Consider the lastnames "Ågård" and
"Aagaard" - these should be sorted together at the end of a list, and
using any Danish_Norwegian collation will ensure just that.
Consider:
create table taDanishDemo
(
nameInDanish varchar(30)
collate Danish_Norwegian_CS_AS
, nameInEnglish varchar(30)
collate Latin1_General_CS_AS
)
;
Let us insert a couple of rows which contain a case of consecutive
"A"s:
insert
into taDanishDemo (nameInDanish,nameInEnglish)
select 'TBAAA','TBAAA'
union all
select 'TBABA','TBABA'
;
Retrieving the rows ordered will now yield different results depending
on whether we order on the Danish or the Latin1 collated column:
select nameInEnglish
from taDanishDemo
order by nameInEnglish;
nameInEnglish
------------------------------
TBAAA
TBABA
Under Latin1 collation the "AA" is considered just two concecutive
"A"'s and ordered at the beginning of the list.
But, under Danish collation, the "AA" is considered the traditional
writing of [Å], and placed at the end of the list:
select nameInDanish
from taDanishDemo
order by nameInDanish;
nameInDanish
------------------------------
TBABA
TBAAA
So far, so good.
What threw me completely is that this also affect how the string "AA"
is interpreted by the LIKE operator.
select nameInDanish
from taDanishDemo
where nameInDanish like 'TBA%'
nameInDanish
------------------------------
TBABA
The row containing "TBAAA" isn't returned Trying to match "AA" with an
"A" plus a wildcard will yield no match under Danish collation, since
SQL Serve interprets this as trying to match "Å" with "A"!
But under Latin1 collation "AA" does match "A" and a wildcard, as "AA"
is just two "A"'s
select nameInEnglish
from taDanishDemo
where nameInEnglish like 'TBA%'
nameInEnglish
------------------------------
TBAAA
TBABA
I'm still not really sure whether this is a useful feature, an
unintended side effect or a bug :-)
Bo Brunsgaard
Navigation:
[Reply to this message]
|