|  | Posted by Erland Sommarskog on 04/01/07 10:00 
pinney.colton@gmail.com (pinney.colton@gmail.com) writes:> I have a SQL 2005 database that's created by a survey data collection
 > system.  Users of this system are fairly non-technical and have little
 > to no conscious control over the datatypes.  As a result, the data is
 > mostly stored quite inefficiently as varchars.  For example, there is
 > data that could be stored in a column of bits and it's stored as a
 > varchar value of 0 or 1.  (Yuck, I know.)
 
 Well, obvious varchar is not right, but it is not that much of a disaster,
 although there is a two-byte overhead for each value. What is more important
 is that there is a constraint that disallows any other values. But I
 can guess from the background, that there isn't. :-)
 
 More generally, there is not really an agreement in the SQL community how
 to store boolean values. Bit is probably the most commonly used data type,
 and I am that camp myself. But others advocate things like:
 
 iscompleted char(1) NOT NULL
 CONSTRAINT ckc_iscompleted CHECK (iscompleted IN ('Y', 'N'))
 
 Others use T or F for true or false. Or J or N as we did in an older
 incarnation of our system. (J from "ja", the Swedish word for "yes".) So
 why not 0 or 1?
 
 What I am getting at here is that while the varchar thing is not optimal,
 it may not be broken enough to warrant a change. It may be sufficient to
 add a constraint to ascertain that there are no other values.
 
 > I am building a reporting system using this raw data and have a new
 > table structure designed that is much more efficient (and better for
 > reporting).  Does anyone have any suggestions for getting this data
 > into my new structure?  Specifically, how would you recommend checking
 > that varchar field and determining it could be stored as a bit?
 
 SELECT varcharcol, COUNT(*) FROM tbl GROUP BY varcharcol
 
 should give you an indication of what really is in that column. Say that you
 find something like:
 
 0    1234
 1    9802
 Y      23
 N      12
 X       1
 
 Then you could convert to bit in this way:
 
 SELECT CASE WHEN varcharcol IN ('0', 'N') THEN convert(bit, 0)
 WHEN varcharcol IN ('1', 'Y') THEN convert(bit, 1)
 END
 
 Question remains what you should do with that X column. Make the bit
 column nullable and store it as NULL? Just drop that row on the floor?
 
 
 --
 Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
 
 Books Online for SQL Server 2005 at
 http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
 Books Online for SQL Server 2000 at
 http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
  Navigation: [Reply to this message] |