|  | Posted by Erland Sommarskog on 08/30/07 21:30 
(raymond_b_jimenez@yahoo.com) writes:> I've seen a dump of the TDS traffic going from my webserver to the SQL
 > Server database and it seems encoded in Unicode (it has two bytes per
 > char). Seems it would have a huge impact on performance if it
 > travelled in one byte. Why might this be?
 
 I have never eavesdropped on TDS, but Unicode is indeed the character
 set of SQL Server. You are perfectly able to name your tables in
 Cyrillic or Hindi characters if you feel like. And of course character
 strings may include all sorts of characters. So an batch of SQL statement
 that is sent over the wire must be Unicode. That is beyond dispute.
 
 However, you don't encode something in Unicode. Unicode is the character
 set, and there are several encodings available, of which the most popular
 are UTF-16 and UTF-8. In UTF-8 each character in the base plane takes up
 2 bytes, and characters beyond that takes up 4 bytes. (The base plane
 covers the vast majority of living langauges). In UTF-8, ASCII characters
 takes up one byte, other characters in the Latin, Greek and Cyrillic
 script takes two bytes, and Chinese and Japanese characters takes up three
 bytes.
 
 SQL Server uses UTF-16 exclusively. It is true that for network traffic
 in the western world, it would be more effective if TDS used UTF-8, but
 as you can see that it is necessarily the case in the Far East. And had
 TDS used UTF-8, both ends of the wire would have had to convert to
 UTF-16, so any reduced network traffic could be eaten up by extra CPU
 time.
 
 
 --
 Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
 
 Books Online for SQL Server 2005 at
 http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
 Books Online for SQL Server 2000 at
 http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
  Navigation: [Reply to this message] |