You are here: Re: LEFT JOIN on very large tables « PHP Programming Language « IT news, forums, messages
Re: LEFT JOIN on very large tables

Posted by Andy Hassall on 05/27/05 01:18

On Thu, 26 May 2005 04:13:41 +0100, "StealthBananaT" <noone@nowhere.com> wrote:

>"Michael Vilain" <vilain@spamcop.net> wrote in message news:vilain-505DEC.09413125052005@comcast.dca.giganews.com...
>> In article <d71ov0$sc9$1@newsg3.svr.pol.co.uk>,
>> "StealthBananaT" <noone@nowhere.com> wrote:
>>> "Michael Vilain" <vilain@spamcop.net> wrote in message
>>> "news:vilain-2E9C3E.20060724052005@comcast.dca.giganews.com...
>>>> In article <d70ip2$ouj$1@news8.svr.pol.co.uk>,
>>>> "StealthBananaT" <noone@nowhere.com> wrote:
>>>>
>>>>> My database has two tables - films has 10,000 records and reviews has
>>>>> 20,000
>>>>> records. Whenever I try to list all the films and the
>>>>> count of its reviews, MySQL locks and I have to restart the service.
>>>>>
>>>>> Here's the SQL...
>>>>>
>>>>> SELECT films.*, COUNT(reviews.id)

Change that to films.id, count(reviews.id) and it make more sense given the
GROUP BY, although that's already been pointed out.

>>>>> FROM films LEFT JOIN reviews ON films.id = reviews.filmID
>>>>> GROUP BY films.id
>>>>> LIMIT 0, 100;
>>>>>
>>>>> JOIN works but doesn't return records for films with no reviews. Please
>>>>> help!
>>>>
>>>> Are films.id and reviews.filmID indexes?
>>>>
>>>
>>> Both tables have id as their index. filmID is a foreign key so that I know
>>> which film the review is of. Here's the SQL that
>>> created both tables...
>>>
>>> CREATE TABLE `films` (
>>> `id` int(10) unsigned NOT NULL auto_increment,
>>> `title` varchar(100) NOT NULL default '',
>>> PRIMARY KEY (`id`)
>>> )
>>>
>>> CREATE TABLE `reviews` (
>>> `id` int(10) unsigned NOT NULL auto_increment,
>>> `filmID` int(10) unsigned NOT NULL default '0',
>>> `author` varchar(20) NOT NULL default '',
>>> `date` int(10) unsigned NOT NULL default '0',
>>> `region` tinyint(1) unsigned NOT NULL default '0',
>>> `review` longtext NOT NULL,
>>> `rating` tinyint(1) unsigned NOT NULL default '0',
>>> PRIMARY KEY (`id`)
>>> )
>>
>> In order to not do a linear table scan, you must create indices on all
>> fields that require random access or are part of a join. Create an
>> index for the reviews.filmID field.
>>
>
>I'd like to thank everybody for their help, but the problem remains. I've tried making reviews.filmID an index but that seemed to
>have no effect.
>
>I have a third table called users, and when I try to count the number of reviews submitted by users I have exactly the same
>problem...
>
>SELECT users.name, COUNT(reviews.id)
> FROM users LEFT JOIN reviews ON users.name = reviews.author
> GROUP BY users.name;
>
>I'm guessing that the problem is linked to the sheer volume of the data in the tables.

Tables with 10k and 20k rows are not large in terms of rows - but the thing
that catches my attention is the "longtext" field in review.

This could make the table physically quite large; if MySQL is scanning the
table even once, then that could take some time.

You might consider moving the `review` field out to a separate table, with
primary key the same as reviews. That way you can run your queries on the
review metadata without it having to work through the full text of the review.

mysql> select count(*) from films;
+----------+
| count(*) |
+----------+
| 10000 |
+----------+
1 row in set (0.02 sec)

mysql> select count(*) from reviews;
+----------+
| count(*) |
+----------+
| 20000 |
+----------+
1 row in set (0.00 sec)

mysql> select films.id, count(*)
-> from films
-> left outer join reviews
-> on (films.id = reviews.filmID)
-> group by films.id
-> limit 0, 100;

+-----+----------+
| id | count(*) |
+-----+----------+
| 1 | 2 |
[ snipped ]
| 99 | 1 |
| 100 | 3 |
+-----+----------+
100 rows in set (0.02 sec)

That's on an ancient Pentium 200 whilst it's compiling something in the
background, with the tables filled with random-ish data, an index on filmID,
but the review field containing only the string 'a review'.

But load up the reviews table with 250 words in each 'review' field and things
take a turn for the worse; left it running for 15 minutes with no result.

--
Andy Hassall / <andy@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация