Reply to Re: what is better - one field or eight - mysql bit testing — PHP Programming Language

Posted by Bent Stigsen on 05/17/06 18:01

Kenneth Downs wrote:

> Bent Stigsen wrote:
>
>> Kenneth Downs wrote:
>>
>>> Bent Stigsen wrote:
>>>
>>>>
>>>> Hmmm, I don't really see it as such. Arguably a step in that direction,
>>>> but still good old relational tables to me, and very much a reasonable
>>>> method to anticipate changes in schema/data.
>>>>
>>>>
>>>>> But I would repeat that any 'abstraction' made in an attempt to avoid
>>>>> table
>>>>> structure changes is going to fail. It fails because you give up what
>>>>> the server can do for you and end up spending your time reinventing an
>>>>> RDMBS server.
>>>>
>>>> Perhaps I don't quite understand you here, but I don't think I am
>>>> reinventing anything. Using an intermediate table to represent a
>>>> many-to-many relationship is hopefully quite a common practice, and by
>>>> no means an abuse of any relational database, but rather something they
>>>> are extremely good at.
>>>>
>>>>
>>>
>>> As I said, in this case it may be valid.
>>
>> Perhaps we think the same, but just to be clear. No doubt all suggested
>> solutions will get the job done. In that respect, I consider them of
>> equal validity. But I do consider the solution with the intermediate link
>> table a better option, especially since it saves a hell of a lot of work,
>> should the requirements change.
>
> Solutions which appear to be of equal validity often turn out not to be.
> With a little more digging we could find out. In this case we can't know
> the answer, but we can define the parameters.

Oh come on, the complexity of the problem is not exactly mindboggling.

> With a bunch of user-defined flags whose existence doesn't require
> special-case programming, use a cross-reference. Of course it may turn
> out
> nobody uses them because they don't do anything. And there is the fact
> that you now need at least one JOIN every time you reference them, so I'm
> not exactly sure what the "hell of a lot of work" you've saved is, looks
> like you made work for yourself actually.

Sure I have to make a join, if I would want the names as well in the same
select, which I probably would. You got the choice to hard code the names
in the userinterface(s) or have a separate table to select from. If you
hardcode the names, then you probably got the edge in the initial
development of the whole system. Shall we say a couple of minutes perhaps.

As for "hell of a lot of work", that was followed by "should the
requirements change".

> OTOH, the default position is that a property of an thing is built as a
> column in a table, unless that property is itself another thing that has
> more properties. This is the design of maximum efficiency, which is the
> real reason why Codd is so famous for explaining it all to us. This is
> also why my earlier posts stress that it is vital for your tools to make
> it
> easy to change table designs. The table design is the foundation of your
> system, every inefficiency in the design will cost you ten times over, so
> you are always seeking to get the tables to maximum efficiency. In
> Physics we say you want the "lowest energy state".

I'm sure Codd set a good ideal to aim for, but there is this tiny thing
called the real world. Ideals aren't allways practical when they have to
fit into a whole system, consisting of different parts each with their own
ideal conditions.

> As for requirements change, that's a "when" not an "if". Embrace it, love
> it, live it. A good programmer codes against errors and changing
> requirements, a naive programmer codes for today.

Sure, I am betting on the change, that is why I prefer the one solution
before the other, because it saves a hell of a lot of work.

> But of course don't take my word for it. Make it a point to try them both
> in your travels, and see which one turns out easier on a system that grows
> over time.

Just answer this question: Given the original scenario of this thread. When
it is requested for another characteristic to be added, which design
requires less (possibly zero) work to adapt to this new requirement.

>>> It would be a move toward using E-A-V if and only if it were done
>>> specifically to avoid structure changes. That would flag a mindset that
>>> would tend towards making the data more and more difficult to work with,
>>> trading the effort of regular development and use for the effort of
>>> modifying the table structures.
>>
>> My only objection is "more and more difficult". It is not like
>> abstraction is something that demand more and more by itself, but yes,
>> not unlikely to be a trade, but which I wouldn't automatically think of
>> as difficult, not in this case anyway.
>
> Abstraction does require a lot more work. That's why so few programmers
> actually do it. It only pays when you are taking on a job (or jobs) large
> enough to tip the scales.

Some abstractions might take extra work, but there is no physical law that
require it to be so. I can think of situations where it would be more
convenient. Take the present case for instance, where you have a lot of
on/off switches, which most likely would be grouped together in a list to
select from, or as a list of checkboxes. I C# for example, there are
classes that makes it easy to go from; "records in table -> internal
dataset -> listcontrol", because it is a common thing to do. The design
with the abstraction of characteristics fits nicely into this, whereas the
column design doesn't, which means more work for you, less for me.

>> Sure, it can be seen as increased complexity and abstraction, allthough I
>> don't consider it a big deal, but rather just part of the natural design
>> process. I'll get back to that.
>>
>>> I can only argue here from experience. A flag is a property of the
>>> entity
>>> being recorded in the table. That means by default it is a column in a
>>> table, along with other flags. This is the simplest possible
>>> arrangement
>>> and anything else carries a higher ongoing cost. The default position
>>> is
>>> to have it a column in the table. In most cases, if the customer wants
>>> another flag, that's another column. If a developer is trying to avoid
>>> structure changes because of cost, then that developer needs to
>>> seriously look at their development tools (or coding habits), anything
>>> which drives you away from the natural use of tables is not your friend.
>>
>> I would contend that it is not impossible, actually well within classical
>> relational design, to make a database less likely to require altering in
>> the future, without resorting to complex constructs with higher
>> developing costs as result. It is not just a hopeful idea. Since Jerry's
>> word apparently is not good enough, I'll try argument.
>>
>> I think the original question is a fine example. OP's initial approach
>> is, as you also imply, a set of properties of the entity, and you
>> conclude therefor should be represented in the table of the entity. We
>> could stop here, not making any closer analysis of the data or playing
>> what-if, and from this point of view you are quite right. (except
>> claiming that "anything else carries a higher ongoing cost.")
>
> Stop. If you don't accept that point, you aren't going to get the rest of
> it.

Perhaps, but it doesn't make me wrong. You would need sticks and stones for
that, and not just throw generalities and hypothetical ideals at me.

> A normalized database is the foundation for the least expensive project to
> code and maintain. There are two diversions from this ideal. One is
> under-normalization, where there are two many columns stacked into tables
> that ought to be broken out into their own child tables. This makes some
> queries easier at the expense of producing wrong data. The other
> direction is to "over-normalize", creating false abstractions that take
> their toll in query complexity and the complexity of a framework that must
> reproduce what has been thrown away.
>
> This basic truth, that the most efficient system overall is based on a
> normalized design, does not seem to come through to people until they've
> tried it.
>
> But once you understand this, you realize that you don't want to avoid
> table design changes, you simply want to be able to do them efficiently.

So what prevents me for having both? I am a lazy bastard when it comes to
coding, and I'll use any dirty trick I can think of to make my life easier.
I can't really refute that this case isn't a "false abstraction", allthough
I doesn't readily see it as such. Regardless, I'll use it if it suits me,
for the convenience, not for the sake of the abstraction.

>> But the OP does go a little further as he just generally calls them
>> characteristics. Even without making any assumption about the future, it
>> wouldn't be strange to elevate a characteristic as an entity by itself,
>> from thereon ordinary normalization *dictates* a separate entity-table
>> and an intermediate link table because of the many-to-many relation.
>
> That's the E-A-V error stated in its most precise form. You throw away
> the ability to use the most basic SQL because you are re-implementing a
> relational system.

Bollocks. I have not implied a large scale abstraction of all attributes of
all entities. I just point out that present case could be seen as a shift
in point of view, making the table design quite ordinary and standard
normalization.

>> That is
>> not cheating or making unnecessary abstractions, it is just plain
>> database design.
>
> Actually it is database server design, like what the authors of mySQL
> worry
> about. Database design is all about information stored on behalf of the
> user, database server design is all about figuring out how to store that
> information.
>
> When you confuse a meta-design for a design, you throw away what the
> authors
> of your db server have done, and you end up having to reproduce it. Hows
> that for saving a "hell of a lot of work!"

I have allready given you examples of how it would save work, which you
haven't even tried to refute, just snipped it. Use of link tables is hardly
something that gives the authors of MySQL sleepless nights.

>> If or when that extra requirement comes along, the solution with one
>> field to each property would require changes through the whole system,
>> from schema to interface,
>
> ...which is why I have said several times in this thread that you need
> tools and methodologies that allow you to change your tables, because in
> any living and growing system that is a natural requirement.

That is true regardless of the design, and does not prevent one from doing
some tricks in anticipation of future changes, with the purpose of saving
oneself for a hell of a lot of work. Making abstractions without any
calculated reasons would be pointless of course.

> Trying to avoid table structure changes is liking trying to find a
> perpetual
> motion machine. The search for an unattainable goal prevents you from
> reaching the attainable goals.

Obviously, one cannot with certainty, avoid changes in the table structure
in a changing environment, but assessing the probability of certain changes
is quite attainable. As Jerry also said, changing just one column in a
table is not necessarily a small thing. You valuate the consequences of
certain events of high probability, and weighed it against the efforts
needed to make a flexible system that can accommodate it. If it is a
reasonable trade, then common sense is to do the trade, regardless if it is
a deviation from a hypothetical ideal. It is the "lowest energy state" if
you like.

/Bent

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация