| 
	
 | 
 Posted by bbcworldtour on 04/18/07 08:11 
On 16 Apr., 13:46, "Dan Guzman" <guzma...@nospam-online.sbcglobal.net> 
wrote: 
> Have you applied any service packs?  If not, try installing Express SP2 
> (http://www.microsoft.com/downloads/details.aspx?familyid=31711d5d-725....). 
> I get the correct results on my SP2 Developer Edition instance: 
 
I upgraded to SP2, but the problem persisted. It turns out that it is 
hidden deep inside the finer points of the database collation. I 
thought this was kind of interesting in a low-intense way, so here's 
the story: 
 
Our databases are running a collation of Danish_Norwegian_CS_AS (we 
are a Danish company). 
 
In Danish we have three special phonemes that are represented in 
writing as the letters  Æ, Ø and Å. These three letters are 
alphabetically placed as the last three letters of the alphabet. 
 
The last one turns out to the culprit (if it doesn't show up proper 
imagine an upper-cased A with a small circle superimposed on it). 
 
Using the letter Å for the phoneme [Å] is a fairly recent addition to 
Danish (around 1950's). Traditionally it was written as "AA". For 
instance, my surname can be written as either "Brunsgård" or 
"Brunsgaard", but is still considered the same name. 
 
So in Danish, "AA" can be either the traditional writing of the 
phoneme [Å] OR just two "A"s which happen to be consecutive. 
 
Danish_Norwegian_CS_AS collation recognizes "AA" as "Å". This is 
usually real neat for sorting. Consider the lastnames "Ågård" and 
"Aagaard" - these should be sorted together at the end of a list, and 
using any Danish_Norwegian collation will ensure just that. 
 
Consider: 
 
create table taDanishDemo 
  ( 
      nameInDanish   varchar(30) 
                     collate Danish_Norwegian_CS_AS 
 
   ,  nameInEnglish  varchar(30) 
                     collate Latin1_General_CS_AS 
  ) 
; 
 
Let us insert a couple of rows which contain a case of consecutive 
"A"s: 
 
insert 
  into  taDanishDemo (nameInDanish,nameInEnglish) 
  select 'TBAAA','TBAAA' 
union all 
  select 'TBABA','TBABA' 
; 
 
Retrieving the rows ordered will now yield different results depending 
on whether we order on the Danish or the  Latin1 collated column: 
 
select nameInEnglish 
  from taDanishDemo 
order by nameInEnglish; 
 
nameInEnglish 
------------------------------ 
TBAAA 
TBABA 
 
Under Latin1 collation the "AA" is considered just two concecutive 
"A"'s and ordered at the beginning of the list. 
But, under Danish collation, the "AA" is considered the traditional 
writing of [Å], and placed at the end of the list: 
 
select nameInDanish 
  from taDanishDemo 
order by nameInDanish; 
 
nameInDanish 
------------------------------ 
TBABA 
TBAAA 
 
So far, so good. 
 
What threw me completely is that this also affect how the string "AA" 
is interpreted by the LIKE operator. 
 
select nameInDanish 
  from taDanishDemo 
where  nameInDanish like 'TBA%' 
 
nameInDanish 
------------------------------ 
TBABA 
 
The row containing "TBAAA" isn't returned Trying to match "AA" with an 
"A" plus a wildcard will yield no match under Danish collation, since 
SQL Serve interprets this as trying to match "Å" with "A"! 
 
But under Latin1 collation "AA" does match "A" and a wildcard, as "AA" 
is just two "A"'s 
 
 
select nameInEnglish 
  from taDanishDemo 
where  nameInEnglish like 'TBA%' 
 
nameInEnglish 
------------------------------ 
TBAAA 
TBABA 
 
I'm still not really sure whether this is a useful feature, an 
unintended side effect or a bug :-) 
 
Bo Brunsgaard
 
  
Navigation:
[Reply to this message] 
 |