|  | Posted by --CELKO-- on 06/11/59 11:27 
>> Rows ARE records, fields ARE columns . . . etc.  I am not working on a thesis here, I am implementing solutions to problems.  Give it a break. <<
 Do you want a break (read: Kludge) or to learn something?  What you are
 working with is a mental model that is keeping you from understanding
 SQL and RDBMS.  Like most new ideas, the hard part of understanding
 what the relational model is comes in un-learning what you know about
 file systems.  As Artemus Ward (William Graham Sumner, 1840-1910) put
 it, "It ain't so much the things we don't know that get us into
 trouble. It's the things we know that just ain't so."
 
 If you already have a background in data processing with traditional
 file systems, the first things to un-learn are:
 
 (0) Databases are not file sets.
 (1) Tables are not files.
 (2) Rows are not records.
 (3) Columns are not fields.
 
 Modern data processing began with punch cards, or Hollerith cards used
 by the Bureau of the Census.  Their original size was that of a United
 States Dollar bill.  This was set by their inventor, Herman Hollerith,
 because he could get furniture to store the cards from the United
 States Treasury Department, just across the street.  Likewise, physical
 constraints limited each card to 80 columns of holes in which to record
 a symbol.
 
 The influence of the punch card lingered on long after the invention of
 magnetic tapes and disk for data storage.  This is why early video
 display  terminals were 80 columns across.  Even today, files which
 were migrated from cards to magnetic tape files or disk storage still
 use 80 column records.
 
 But the influence was not just on the physical side of data processing.
 The methods for handling data from the prior media were imitated in
 the new  media.
 
 Data processing first consisted of sorting and merging decks of punch
 cards (later, sequential magnetic tape files) in a series of distinct
 steps.  The result of each step feed into the next step in the process.
 
 
 Relational databases do not work that way.  Each user connects to the
 entire database all at once, not to one file at time in a sequence of
 steps.  The users might not all have the same database access rights
 once they are connected, however.  Magnetic tapes could not be shared
 among users at the same time, but shared data is the point of a
 database.
 
 Tables versus Files
 
 A file is closely related to its physical storage media.  A table may
 or may  not be a physical file.  DB2 from IBM uses one file per table,
 while Sybase puts several entire databases inside one file.  A table is
 a <i>set<i> of rows of the same kind of thing.  A set has no ordering
 and it makes no sense to ask for the first or last row.
 
 A deck of punch cards is sequential, and so are magnetic tape files.
 Therefore, a <i>physical<i> file of ordered sequential records also
 became the <i>mental<i> model for data processing and it is still hard
 to shake.  Anytime you look at data, it is in some physical ordering.
 
 The various access methods for disk storage system came later, but even
 these access methods could not shake the mental model.
 
 Another conceptual difference is that a file is usually data that deals
 with  a whole business process.  A file has to have enough data in
 itself to support applications for that business process.  Files tend
 to be "mixed"  data which can be described by the name of the business
 process, such as "The Payroll file" or something like that.
 
 Tables can be either entities or relationships within a business
 process.  This means that the data which was held in one file is often
 put into several tables.  Tables tend to be "pure" data which can be
 described by single words.  The payroll would now have separate tables
 for timecards, employees, projects and so forth.
 
 Tables as Entities
 
 An entity is physical or conceptual "thing" which has meaning be
 itself.  A  person, a sale or a product would be an example.  In a
 relational database, an entity is defined by its attributes, which are
 shown as values in columns in rows in a table.
 
 To remind users that tables are sets of entities, I like to use plural
 or collective nouns that describe the function of the entities within
 the system for the names of tables.  Thus "Employee" is a bad name
 because it is singular; "Employees" is a better name because it is
 plural; "Personnel" is best because it is collective and does not
 summon up a mental picture of individual persons.
 
 If you have tables with exactly the same structure, then they are sets
 of the same kind of elements.  But you should have only one set for
 each kind of data element!  Files, on the other hand, were PHYSICALLY
 separate units of storage which coudl be alike -- each tape or disk
 file represents a step in the PROCEDURE , such as moving from raw data,
 to edited data, and finally to archived data.  In SQL, this should be a
 status flag in a table.
 
 Tables as Relationships
 
 A relationship is shown in a table by columns which reference one or
 more entity tables.  Without the entities, the relationship has no
 meaning, but the relationship can have attributes of its own.  For
 example, a show  business contract might have an agent, an employer and
 a talent.  The method of payment is an attribute of the contract
 itself, and not of any of the three parties.
 
 Rows versus Records
 
 Rows are not records.  A record is defined in the application program
 which  reads it; a row is defined in the database schema and not by a
 program at all.  The name of the field in the READ or INPUT statements
 of the application; a row is named in the database schema.
 
 All empty files look alike; they are a directory entry in the operating
 system with a name and a length of zero bytes of storage.  Empty tables
 still have columns, constraints, security privileges and other
 structures, even tho they have no rows.
 
 This is in keeping with the set theoretical model, in which the empty
 set is a perfectly good set.  The difference between SQL's set model
 and standard mathematical set theory is that set theory has only one
 empty set, but in SQL  each table has a different structure, so they
 cannot be used in places where non-empty versions of themselves could
 not be used.
 
 Another characteristic of rows in a table is that they are all alike in
 structure and they are all the "same kind of thing" in the model.  In a
 file system, records can vary in size, datatypes and structure by
 having flags in the data stream that tell the program reading the data
 how to interpret it.  The most common examples are Pascal's variant
 record, C's struct syntax and Cobol's OCCURS clause.
 
 The OCCURS keyword in Cobol and the Variant records in Pascal have a
 number which tells the program how many time a record structure is to
 be repeated in the current record.
 
 Unions in 'C' are not variant records, but variant mappings for the
 same physical memory. For example:
 
 union x {int ival; char j[4];} myStuff;
 
 defines myStuff to be either an integer (which are 4 bytes on most
 modern C  compilers, but this code is non-portable) or an array of 4
 bytes, depending on whether you say myStuff.ival or myStuff.j[0];
 
 But even more than that, files often contained records which were
 summaries of subsets of the other records -- so called control break
 reports.  There is no requirement that the records in a file be related
 in any way -- they are literally a stream of binary data whose meaning
 is assigned by the program reading them.
 
 Columns versus Fields
 
 A field within a record is defined by the application program that
 reads it.  A column in a row in a table is defined by the database
 schema.  The datatypes in a column are always scalar.
 
 The order of the application program variables in the READ or INPUT
 statements is important because the values are read into the program
 variables in that order.  In SQL, columns are referenced only by their
 names.  Yes, there are shorthands like the SELECT * clause and INSERT
 INTO <table name> statements which expand into a list of column names
 in the physical order in which the column names appear within their
 table declaration, but these are shorthands which resolve to named
 lists.
 
 The use of NULLs in SQL is also unique to the language.  Fields do not
 support a missing data marker as part of the field, record or file
 itself.  Nor do fields have constraints which can be added to them in
 the record, like the DEFAULT and CHECK() clauses in SQL.
 
 Relationships among tables within a database
 
 Files are pretty passive creatures and will take whatever an
 application program throws at them without much objection.  Files are
 also independent of each other simply because they are connected to one
 application program at a time and therefore have no idea what other
 files looks like.
 
 A database actively seeks to maintain the correctness of all its data.
 The methods used are triggers, constraints and declarative referential
 integrity.
 
 Declarative referential integrity (DRI) says, in effect, that data in
 one table has a particular relationship with data in a second (possibly
 the same)  table.  It is also possible to have the database change
 itself via referential actions associated with the DRI.
 
 For example, a business rule might be that we do not sell products
 which are not in inventory.  This rule would be enforce by a REFERENCES
 clause on the Orders table which references the Inventory table and a
 referential action of ON DELETE CASCADE
 
 Triggers are a more general way of doing much the same thing as DRI.  A
 trigger is a block of procedural code which is executed before, after
 or instead of an INSERT INTO or UPDATE statement.  You can do anything
 with a trigger that you can do with DRI and more.
 
 However, there are problems with TRIGGERs.  While there is a standard
 syntax  for them in the SQL-92 standard, most vendors have not
 implemented it.  What they have is very proprietary syntax instead.
 Secondly, a trigger cannot pass information to the optimizer like DRI.
 In the example in this section, I know that for every product number in
 the Orders table, I have that same product number in the Inventory
 table.  The optimizer can use that information in setting up EXISTS()
 predicates and JOINs in the queries.  There is no reasonable way to
 parse procedural trigger code to determine this relationship.
 
 The CREATE ASSERTION statement in SQL-92 will allow the database to
 enforce conditions on the entire database as a whole.  An ASSERTION is
 not like a CHECK() clause, but the difference is subtle.  A CHECK()
 clause is executed when there are rows in the table to which it is
 attached.  If the table is empty then all CHECK() clauses are
 effectively TRUE.  Thus, if we wanted to be sure that the Inventory
 table is never empty, and we wrote:
 
 CREATE TABLE Inventory
 ( ...
 CONSTRAINT inventory_not_empty
 CHECK ((SELECT COUNT(*) FROM Inventory) > 0), ... );
 
 it would not work.  However, we could write:
 
 CREATE ASSERTION Inventory_not_empty
 CHECK ((SELECT COUNT(*) FROM Inventory) > 0);
 
 and we would get the desired results.  The assertion is checked at the
 schema level and not at the table level.
 
 >> I was able to learn quite a bit from a single newsgroup post. <<
 
 Or you can get a quick newsgroup kludge that will get you over a hump
 and have built more bad habits without any real gain.
 
 >> Yours, however, was the only posting I learned nothing from. <<
 
 "Against stupidity the gods themselves struggle in vain." - Die
 Jungfrau von Orleans; Friedrich von Schiller (1759-1805).  Get a book
 on Basic RDBMS.  It takes about a year of full-time SQL programming to
 have a revelation and to kick old habits. What people often get is one
 week of SQL porgraming experience repeated 52 times ina row and they
 never learn anything.
  Navigation: [Reply to this message] |