|
Posted by Norman Peelman on 01/29/07 05:03
Paul Lautman wrote:
> Norman Peelman wrote:
>> Paul Lautman wrote:
>>> Norman Peelman wrote:
>>>> Paul Lautman wrote:
>>>>> Norman Peelman wrote:
>>>>>> Szymon Bobek wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> I have databese with a few tables named like that: films_triller,
>>>>>>> films_horror, films_action etc.
>>>>>>> I decided to create many tables instead of creting just one big
>>>>>>> table containing all sort of films. Is it a good idea? (I think,
>>>>>>> that it boost search time much - am I right?)
>>>>>>> Now - to find a film (when only title is known) I have to search
>>>>>>> in all that tables. But I have a problem with that.. How to do
>>>>>>> that? SELECT * FROM REGEXP 'film_*' WHERE title='the film' does
>>>>>>> not work. The same is when I want to list all tables (to see what
>>>>>>> groups of films I have) - if I try do join SHOW TABLES with
>>>>>>> REGEXP I get an syntax error too. Where is the mistake and how to
>>>>>>> correct it? Thanks in advance
>>>>>>> Szymon Bobek
>>>>>>>
>>>>>>>
>>>>>> Your structure is fine... the problem is your method of searching.
>>>>>> I'm not really sure you can use REGEX in that context:
>>>>> You REALLY think that structure is fine???
>>>>>
>>>>>
>>>> What do you THINK is wrong with it? If you want to go for one big
>>>> table then your structure could become:
>>>>
>>>> film_type set('horror','thriller','comedy','romance','action',...)
>>>>
>>>> and then a row could be:
>>>>
>>>> film_id film_title film_type
>>>> 1 Romancing the Stone action, comedy, romance
>>>> 2 GODZILLA action, sci-fi
>>>> 3 Jurasic Park action, sci-fi
>>>>
>>>> and your search(s) becomes:
>>>>
>>>> SELECT * FROM film_table WHERE film_title LIKE '%Jurasic%'
>>>> returns data on Jurasic Park
>>>>
>>>> or
>>>>
>>>> SELECT * FROM film_table WHERE FIND_IN_SET('sci-fi')
>>>> returns data on GODZILLA and Jurasic Park
>>>>
>>>> ...does that look better. (an ENUM can contain only one item from a
>>>> list, a SET can contain multiple item from a list)
>>>>
>>>> You would still want an index on the film_title. Not sure if you can
>>>> index a SET column but you could try.
>>>>
>>>> Norm
>>> If you read my post you would know that that was not what I was
>>> suggesting. Have you heard of normalisation?
>>> What you are suggesting is the sort of thing I expect to see from a
>>> first year student!
>>>
>>>
>> First of all, my apologies to you, I thought I was replying to the
>> original poster. Secondly, I seem to be the only one offering
>> suggestions. Yes, read up on normalization, get your head spun around
>> with a lot of things you may or may not understand (and I don't claim
>> to). I would imagine the original poster would like to get his
>> database up to speed. Looking at your original post you seem to
>> describe a schema that is the opposite of normalization.
>>
>> For your example:
>>
>>> ID | Film_Type
>>> 225 | Comedy
>>> 225 | Thriller.
>>> This is called "normalisation".
>> ID is not unique and Film_type can (and will) be repeated. And
>> actually both will be repeated quite often as alot of films fit
>> multiple categories.
>>
>> 1NF suggests that ID should be unique (or at least there should be a
>> unique identifier, and there is none) and Film_Type (and ID) should
>> not repeat.
> In this case the primay key would be (ID, Film_Type). 1NF does not require
> that the unique identifier should consist of a single column of a table.
>
>> This schema in reality is no different than using a SET()
>> for Film_Type
> this schema is VERY DIFFERENT from using a SET().
>
>> and I would imagine the later is more efficient since
>> the SET()
> Unless you want to look things up in an efficient way!
>
>> data is stored in the schema definition and only pointers
>> are stored in the tables. 1NF suggests that for normalization you
>> should have one table for each Film_Type thus ensuring unique ID's and
>> non-repeating data.
>>
>> p.s. Is normalization a requirement?
>>
>> Norm
>
>
>
It also appears that having an index on film_title using queries like:
1) SELECT * FROM table WHERE film_title = '????'; = instant retrieval
(.01 secs)
2) SELECT * FROM table WHERE film_title LIKE '????'; = instant retrieval
(.03 secs)
3) SELECT * FROM table WHERE film_title LIKE '%???'; = full table scan
(my system ~28 seconds on 10 million rows)
4) SELECT * FROM table WHERE film_title = '????%'; = instant retrieval
(.01 secs)
5) SELECT * FROM table WHERE film_title = '%??%'; = full table scan (my
system ~28 seconds on 10 million rows)
What can be done to speed #3 and #5 up? I would almost expect MySQL
to look into the INDEX first but it doesn't seem to want to. It tried to
USE/FORCE INDEX but no go if '%' was the first character in the LIKE
clause. This is really what the original poster needs to know (in
addition to normalization).
Norm
Navigation:
[Reply to this message]
|