| 
	
 | 
 Posted by Norman Peelman on 01/29/07 05:03 
Paul Lautman wrote: 
> Norman Peelman wrote: 
>> Paul Lautman wrote: 
>>> Norman Peelman wrote: 
>>>> Paul Lautman wrote: 
>>>>> Norman Peelman wrote: 
>>>>>> Szymon Bobek wrote: 
>>>>>>> Hi! 
>>>>>>> 
>>>>>>> I have databese with a few tables named like that: films_triller,  
>>>>>>> films_horror, films_action etc. 
>>>>>>> I decided to create many tables instead of creting just one big 
>>>>>>> table containing all sort of films. Is it a good idea? (I think, 
>>>>>>> that it boost search time much - am I right?) 
>>>>>>> Now -  to find a film (when only title is known) I have to search 
>>>>>>> in all that tables. But I have a problem with that.. How to do 
>>>>>>> that? SELECT * FROM REGEXP 'film_*' WHERE title='the film' does 
>>>>>>> not work. The same is when I want to list all tables (to see what 
>>>>>>> groups of films I have)  - if I try do join SHOW TABLES with 
>>>>>>> REGEXP I get an syntax error too. Where is the mistake and how to 
>>>>>>> correct it? Thanks in advance 
>>>>>>> Szymon Bobek 
>>>>>>> 
>>>>>>> 
>>>>>> Your structure is fine... the problem is your method of searching. 
>>>>>> I'm not really sure you can use REGEX in that context: 
>>>>> You REALLY think that structure is fine??? 
>>>>> 
>>>>> 
>>>>   What do you THINK is wrong with it? If you want to go for one big 
>>>> table then your structure could become: 
>>>> 
>>>> film_type set('horror','thriller','comedy','romance','action',...) 
>>>> 
>>>> and then a row could be: 
>>>> 
>>>> film_id film_title film_type 
>>>> 1 Romancing the Stone action, comedy, romance 
>>>> 2 GODZILLA action, sci-fi 
>>>> 3 Jurasic Park action, sci-fi 
>>>> 
>>>> and your search(s) becomes: 
>>>> 
>>>> SELECT * FROM film_table WHERE film_title LIKE '%Jurasic%' 
>>>> returns data on Jurasic Park 
>>>> 
>>>> or 
>>>> 
>>>> SELECT * FROM film_table WHERE FIND_IN_SET('sci-fi') 
>>>> returns data on GODZILLA and Jurasic Park 
>>>> 
>>>> ...does that look better. (an ENUM can contain only one item from a 
>>>> list, a SET can contain multiple item from a list) 
>>>> 
>>>> You would still want an index on the film_title. Not sure if you can 
>>>> index a SET column but you could try. 
>>>> 
>>>> Norm 
>>> If you read my post you would know that that was not what I was 
>>> suggesting. Have you heard of normalisation? 
>>> What you are suggesting is the sort of thing I expect to see from a 
>>> first year student! 
>>> 
>>> 
>>   First of all, my apologies to you, I thought I was replying to the 
>> original poster. Secondly, I seem to be the only one offering 
>> suggestions. Yes, read up on normalization, get your head spun around 
>> with a lot of things you may or may not understand (and I don't claim 
>> to). I would imagine the original poster would like to get his 
>> database up to speed.  Looking at your original post you seem to 
>> describe a schema that is the opposite of normalization. 
>> 
>> For your example: 
>> 
>>> ID     |    Film_Type 
>>> 225    |    Comedy 
>>> 225    |    Thriller. 
>>> This is called "normalisation". 
>>   ID is not unique and Film_type can (and will) be repeated. And 
>> actually both will be repeated quite often as alot of films fit 
>> multiple categories. 
>> 
>>   1NF suggests that ID should be unique (or at least there should be a 
>> unique identifier, and there is none) and Film_Type (and ID) should 
>> not repeat. 
> In this case the primay key would be (ID, Film_Type). 1NF does not require  
> that the unique identifier should consist of a single column of a table. 
>  
>> This schema in reality is no different than using a SET() 
>> for Film_Type 
> this schema is VERY DIFFERENT from using a SET(). 
>  
>> and I would imagine the later is more efficient since 
>> the SET() 
> Unless you want to look things up in an efficient way! 
>  
>> data is stored in the schema definition and only pointers 
>> are stored in the tables. 1NF suggests that for normalization you 
>> should have one table for each Film_Type thus ensuring unique ID's and 
>> non-repeating data. 
>> 
>> p.s. Is normalization a requirement? 
>> 
>> Norm 
>  
>  
>  
 
   It also appears that having an index on film_title using queries like: 
 
1) SELECT * FROM table WHERE film_title = '????'; = instant retrieval  
(.01 secs) 
2) SELECT * FROM table WHERE film_title LIKE '????'; = instant retrieval  
            (.03 secs) 
3) SELECT * FROM table WHERE film_title LIKE '%???'; = full table scan  
(my system ~28 seconds on 10 million rows) 
4) SELECT * FROM table WHERE film_title = '????%'; = instant retrieval  
(.01 secs) 
5) SELECT * FROM table WHERE film_title = '%??%'; = full table scan (my  
system ~28 seconds on 10 million rows) 
 
   What can be done to speed #3 and #5 up?  I would almost expect MySQL  
to look into the INDEX first but it doesn't seem to want to. It tried to  
USE/FORCE INDEX but no go if '%' was the first character in the LIKE  
clause. This is really what the original poster needs to know (in  
addition to normalization). 
 
Norm
 
  
Navigation:
[Reply to this message] 
 |