Reply to Re: Updating the SQL key value — PHP Programming Language

Posted by Toby A Inkster on 03/31/07 20:56

Jerry Stuckle wrote:
> Toby A Inkster wrote:
>> Erwin Moller wrote:
>>
>> Example data in table licences:
>>
>> GPL GNU General Public Licence http://www.gnu...
>> FDL GNU Free Documentation Licence http://www.gnu...
>> CC Creative Commons Licence http://www.cre...
>>
>> Usage is fairly similar, apart from the fact that now, when you look at
>> the table 'articles' without doing any joins, you can still infer a bit of
>> information about which licence each article is under, without having to
>> inner join onto the licences table.
>
> True, but you also have to look at performance issues. When searching
> an index, comparing an int is always faster than comparing a varchar.
> And comparing a single column is always faster than comparing multiple
> columns. And the index file itself is smaller.

Firstly, I said char, not varchar. char columns are fixed width storage
and consequently a lot faster to search than varchar columns. On a 32-bit
processor, char(4) is particularly fast as it corresponds to the
processor's word length.

> Additionally, the PK should not be dependent on data which may change -
> i.e. if part of your key was the license (code), what would happen if
> they changed the licensing terms?

Firstly, in this example, the licence code is an arbitrary string chosen by
the creator of the data, so it needs to change no more often than an
integer key would.

Secondly, it is perfectly acceptable for a primary key to change. A decent
database can even cascade changes to dependent tables for you.

>> And before you say that this is a waste of space as char(8) takes up
>> eight bytes rather than 4 bytes for an integer, you're second-guessing
>> the database engine there. Database engines really are dead clever.
>
> Not generally, they don't, because of potential problems. For instance,
> if the data is stored as a pointer to the license table, the system
> has to do an additional file lookup to fetch the license data. And what
> happens if the entry from the license table is altered - or worse yet,
> deleted? There is no referential integrity built in here. Deleting an
> item from the license table would require all other tables which point
> to that entry be updated - that is, the varchar data would have to be
> reinserted into each row in every table which pointed to the license
> table.

Of course there's referential integrity built-in, via foreign key
constraints. If foreign keys are correctly set up, then depending on
how you configured it, one of the following actions would happen when
you tried to delete a licence from the licences table which was being
used in the articles table:

1. the RDBMS would not allow you to delete it;
2. the articles which used that licence would also be deleted; or
3. the articles which used that licence would have their licence
set to the default licence.

It's all about cascading.

> And in your case, the data would be stored in 4 bytes ("GPL" + 1 byte
> length). So it would take the same 4 bytes - but comparisons would
> still be slower.

Greater than / less than comparisons would be slower. But (case sensitive)
equal to / not equal to comparisons should be of equivalent speed, as it's
a simple matter of choosing if one 32-bit value is the same as another
32-bit value -- exactly the same process used if they were integers.
Probably the same CPU instructions used.

> That's true. But you're also taking up more storage space and slowing
> down searches.

More storage space on, say, the articles table, for storing the user's
login in a column instead of an integer; but less space in the user table,
as there's no need to store an extra integer column.

Slower to perform a join between the article and user tables, but fewer
occasions when you'll need to perform a join, as you have some useful user
information (the login) in the article table already.

>> On one of my current projects, I've got twelve tables, only one of
>> which has a surrogate integer primary key (technically it doesn't
>> autonumber, but I use MAX() to simulate autonumbering when creating a
>> new record). Guess which table is causing me the most problems?
>
> I can imagine. Using MAX() like this can cause concurrency problems.
> That's why RDB designers came out with the auto-numbering columns.

Not heard of transactions?

>> Better would have been to design my table so that it had a primary key
>> like (url, revision).
>
> But then your comment table wouldn't point to it anyway if you use (url,
> revision). Or, if you just use url, your comment table would be
> pointing at two different entries - which is not good foreign key
> design.

It would be pointing to a partial key. Which I think is against 4NF or
5NF, but I mostly ignore those two as I've found them of very little
practical use.

>> Anyhow, it's mostly a matter of taste. I was being somewhat tongue in
>> cheek when describing surrogate keys as "the root of all evil", but I
>> can't stand to see a good candidate key go to waste.
>
> Here, I agree. But I tend to lean more towards the performance side.
> Design the database with efficiency in mind. That's why all databases
> aren't 5NF form.

I tend to err on the side of my own sanity. As far as performance is
concerned, I'm of the opinion that it's premature optimisation, and that
the RDBMS programmers will have already done a much better job than I
would have done.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация