|
Posted by Sandman on 11/24/06 09:40
In article <4566b547$0$321$e4fe514c@news.xs4all.nl>,
Erwin Moller
<since_humans_read_this_I_am_spammed_too_much@spamyourself.com>
wrote:
> >> First: Prefetching all members into memory for each page where you need
> >> one or two members is absolutely a no-go. You must change that of course.
> >> Just fetch the information you need WHEN you need it.
> >
> > Yes, but if I need to fetch information for 100 members on one page
> > making 100 individual SQL queries is also a no-go.
>
> Hi Sandman,
>
> As Jerry and Micha indicated: Do not fetch all members into PHP memory each
> page, just fetch the ones you need.
Right, but how?
> I made a few community sites that had excactly the same problems as you are
> facing now, and good design solves everything.
As usual :)
> (Luckily I designed it right from the start, in your case you must change
> existing code which is always a lot more trouble.)
>
> A few pointers that might speed things up:
> 1) At the start of your page decide which articles or user you need on that
> page.
That's impossible to know beforehand. Articles listed may or may not
contain references to members. Any given page might contain references
to member ids from 30 different sources in 30 different tables. I
can't know, at the beginning of any page, exactly what member
information will be displayed on that page.
> 2) Make a query that only fetches that information.
Which is good if you're looking at fetching member information about
1-50 members, but if you've got member information about 500 members
displayed on one page, that select would be dog slow.
> 3) Do NOT store the array (or resultset, or whatever approach you have to
> represent data from the database) in a session.
As in $_SESSION? No, no way. :) I don't even use sessions. :)
> 4) Design your database smart. I mean by this: Design your database in such
> a way that the queries you typically need run fast.
They do, they run very fast. The query runs in 0.16 seconds, the
action takes 0.8 seconds (and I'm assuming the extra time is moving
the resultset to the array).
> Did you normalize your database? (In case you don't know what that is: One
> important thing about normalization is the fact that you represent each
> piece of data only in one place(=table/column) and make the relations by
> means of primary and foreign keys.)
Yes.
> The more joins, the slower the queries.
No joins in this select. It's a very simple select and looks pretty
much like this:
select field, field, field from member.member where status = 'ok' and
origin = 'LVL';
"LVL" being the code for the particular site. "origin" is indexed. The
above query, when fetching 9000 posts, takes 0.16 seconds. That I can
live with it. It's moving that information to a PHP array that seems
to take too long.
> SO it might help to duplicate some data (denormalization) in some tables.
> This can help a lot, but use it cautiously, since a good designed database
> stay as close to 100% nomalized as possible.
Yes, the data I want is all in seperate "field, field, field".
> 5) Indexing a few columns might also give a huge speedadvantage.
In the query above, the only column that needs to be indexed is status
and origin, and both are.
> If you are familiar with profilers, use one. MySQL, Postgres, all databases
> have some tools that tell you WHAT they are doing and how much time it
> takes.
Yeah, I know exactly how much time they use for these queries. And
it's too long :-D
> If you find out that a certain column/tablescan takes up 50% of the time or
> so of the total querytime, then you know that is a firstclass candidate for
> indexing.
Exactly. These are all good advices, but I'm afraid that my main
problem is the time it takes to create the PHP array.
> 6) Prepared statements can also give some speedincrease.
Yeah, I've never actually used them, and wouldn't you agree that that
wouldn't help me?
> 7) Prepared data in the database (Last resort)
> Do not use this method unless the others do not help enough.
> I don't like this method, but was forced to use it myself in a few
> occasions.
> I'll describe a case I had and how I solved it.
> It is a huge discussionplatform, and all participants had the possibility to
> vote on articles of others. The weight their vote carried varried on a lot
> of factors and the logic was also totally recursive, meaning that if one
> visitor did one thing (eg read an article, or make a vote), all others had
> to be updated.
> So realtime calculation of the whole system was no option for me.
> I decided to create a table that contained the calculated data, use that
> data whenever I need it, and run an update every hour or so.
Yes, I do aggregation a lot in five minute intervals (such as labels,
scores and such). But that's not a part of fetching this information.
> This method is not considered 'good programming' (at least by me), but gets
> the job done, and in some situation you have no choice.
Indeed. I have a "What's new?" function that lists new content on the
site, from several tables (articles, calendar, project rooms, forum,
and so on). That data hgas to be aggregated into a seperate table for
speed.
Thanks for the post, very good advice, but I'm afraid that it won't
help much for me.
--
Sandman[.net]
Navigation:
[Reply to this message]
|