|
Posted by Erland Sommarskog on 08/30/07 21:30
(raymond_b_jimenez@yahoo.com) writes:
> I've seen a dump of the TDS traffic going from my webserver to the SQL
> Server database and it seems encoded in Unicode (it has two bytes per
> char). Seems it would have a huge impact on performance if it
> travelled in one byte. Why might this be?
I have never eavesdropped on TDS, but Unicode is indeed the character
set of SQL Server. You are perfectly able to name your tables in
Cyrillic or Hindi characters if you feel like. And of course character
strings may include all sorts of characters. So an batch of SQL statement
that is sent over the wire must be Unicode. That is beyond dispute.
However, you don't encode something in Unicode. Unicode is the character
set, and there are several encodings available, of which the most popular
are UTF-16 and UTF-8. In UTF-8 each character in the base plane takes up
2 bytes, and characters beyond that takes up 4 bytes. (The base plane
covers the vast majority of living langauges). In UTF-8, ASCII characters
takes up one byte, other characters in the Latin, Greek and Cyrillic
script takes two bytes, and Chinese and Japanese characters takes up three
bytes.
SQL Server uses UTF-16 exclusively. It is true that for network traffic
in the western world, it would be more effective if TDS used UTF-8, but
as you can see that it is necessarily the case in the Far East. And had
TDS used UTF-8, both ends of the wire would have had to convert to
UTF-16, so any reduced network traffic could be eaten up by extra CPU
time.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
[Back to original message]
|