|
Posted by Kenneth Downs on 05/17/06 04:58
Bent Stigsen wrote:
> Kenneth Downs wrote:
>
>> Bent Stigsen wrote:
>>
>>>
>>> Hmmm, I don't really see it as such. Arguably a step in that direction,
>>> but still good old relational tables to me, and very much a reasonable
>>> method to anticipate changes in schema/data.
>>>
>>>
>>>> But I would repeat that any 'abstraction' made in an attempt to avoid
>>>> table
>>>> structure changes is going to fail. It fails because you give up what
>>>> the server can do for you and end up spending your time reinventing an
>>>> RDMBS server.
>>>
>>> Perhaps I don't quite understand you here, but I don't think I am
>>> reinventing anything. Using an intermediate table to represent a
>>> many-to-many relationship is hopefully quite a common practice, and by
>>> no means an abuse of any relational database, but rather something they
>>> are extremely good at.
>>>
>>>
>>
>> As I said, in this case it may be valid.
>
> Perhaps we think the same, but just to be clear. No doubt all suggested
> solutions will get the job done. In that respect, I consider them of equal
> validity. But I do consider the solution with the intermediate link table
> a better option, especially since it saves a hell of a lot of work, should
> the requirements change.
Solutions which appear to be of equal validity often turn out not to be.
With a little more digging we could find out. In this case we can't know
the answer, but we can define the parameters.
With a bunch of user-defined flags whose existence doesn't require
special-case programming, use a cross-reference. Of course it may turn out
nobody uses them because they don't do anything. And there is the fact
that you now need at least one JOIN every time you reference them, so I'm
not exactly sure what the "hell of a lot of work" you've saved is, looks
like you made work for yourself actually.
OTOH, the default position is that a property of an thing is built as a
column in a table, unless that property is itself another thing that has
more properties. This is the design of maximum efficiency, which is the
real reason why Codd is so famous for explaining it all to us. This is
also why my earlier posts stress that it is vital for your tools to make it
easy to change table designs. The table design is the foundation of your
system, every inefficiency in the design will cost you ten times over, so
you are always seeking to get the tables to maximum efficiency. In Physics
we say you want the "lowest energy state".
As for requirements change, that's a "when" not an "if". Embrace it, love
it, live it. A good programmer codes against errors and changing
requirements, a naive programmer codes for today.
But of course don't take my word for it. Make it a point to try them both
in your travels, and see which one turns out easier on a system that grows
over time.
>
>> It would be a move toward using E-A-V if and only if it were done
>> specifically to avoid structure changes. That would flag a mindset that
>> would tend towards making the data more and more difficult to work with,
>> trading the effort of regular development and use for the effort of
>> modifying the table structures.
>
> My only objection is "more and more difficult". It is not like abstraction
> is something that demand more and more by itself, but yes, not unlikely to
> be a trade, but which I wouldn't automatically think of as difficult, not
> in this case anyway.
Abstraction does require a lot more work. That's why so few programmers
actually do it. It only pays when you are taking on a job (or jobs) large
enough to tip the scales.
>
> Sure, it can be seen as increased complexity and abstraction, allthough I
> don't consider it a big deal, but rather just part of the natural design
> process. I'll get back to that.
>
>> I can only argue here from experience. A flag is a property of the
>> entity
>> being recorded in the table. That means by default it is a column in a
>> table, along with other flags. This is the simplest possible arrangement
>> and anything else carries a higher ongoing cost. The default position is
>> to have it a column in the table. In most cases, if the customer wants
>> another flag, that's another column. If a developer is trying to avoid
>> structure changes because of cost, then that developer needs to seriously
>> look at their development tools (or coding habits), anything which drives
>> you away from the natural use of tables is not your friend.
>
> I would contend that it is not impossible, actually well within classical
> relational design, to make a database less likely to require altering in
> the future, without resorting to complex constructs with higher developing
> costs as result. It is not just a hopeful idea. Since Jerry's word
> apparently is not good enough, I'll try argument.
>
> I think the original question is a fine example. OP's initial approach is,
> as you also imply, a set of properties of the entity, and you conclude
> therefor should be represented in the table of the entity. We could stop
> here, not making any closer analysis of the data or playing what-if, and
> from this point of view you are quite right. (except claiming that
> "anything else carries a higher ongoing cost.")
Stop. If you don't accept that point, you aren't going to get the rest of
it.
A normalized database is the foundation for the least expensive project to
code and maintain. There are two diversions from this ideal. One is
under-normalization, where there are two many columns stacked into tables
that ought to be broken out into their own child tables. This makes some
queries easier at the expense of producing wrong data. The other direction
is to "over-normalize", creating false abstractions that take their toll in
query complexity and the complexity of a framework that must reproduce what
has been thrown away.
This basic truth, that the most efficient system overall is based on a
normalized design, does not seem to come through to people until they've
tried it.
But once you understand this, you realize that you don't want to avoid table
design changes, you simply want to be able to do them efficiently.
> But the OP does go a little further as he just generally calls them
> characteristics. Even without making any assumption about the future, it
> wouldn't be strange to elevate a characteristic as an entity by itself,
> from thereon ordinary normalization *dictates* a separate entity-table and
> an intermediate link table because of the many-to-many relation.
That's the E-A-V error stated in its most precise form. You throw away the
ability to use the most basic SQL because you are re-implementing a
relational system.
> That is
> not cheating or making unnecessary abstractions, it is just plain database
> design.
Actually it is database server design, like what the authors of mySQL worry
about. Database design is all about information stored on behalf of the
user, database server design is all about figuring out how to store that
information.
When you confuse a meta-design for a design, you throw away what the authors
of your db server have done, and you end up having to reproduce it. Hows
that for saving a "hell of a lot of work!"
> If or when that extra requirement comes along, the solution with one field
> to each property would require changes through the whole system, from
> schema to interface,
....which is why I have said several times in this thread that you need tools
and methodologies that allow you to change your tables, because in any
living and growing system that is a natural requirement.
Trying to avoid table structure changes is liking trying to find a perpetual
motion machine. The search for an unattainable goal prevents you from
reaching the attainable goals.
--
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec)ure(Dat)a(.com)
Navigation:
[Reply to this message]
|