You are here: Re: Efficient Uniqueness Check « PHP Programming Language « IT news, forums, messages
Re: Efficient Uniqueness Check

Posted by Anonymous on 10/12/06 13:12

Alan Little wrote:
>
> Carved in mystic runes upon the very living rock, the last words of
> <petersprc@gmail.com> of comp.lang.php make plain:
>
> > Alan Little wrote:
> >> I have affiliates submitting batches of anywhere from 10 to several
> >> hundred orders. Each order in the batch must include an order ID,
> >> originated by the affiliate, which must be unique across all orders
> >> in all batches ever submitted by that affiliate. I'm trying to figure
> >> out the most efficient way to check the uniqueness of the order ID.
> >>
> >> The two ways I'm considering are: 1) a single query to load all the
> >> affiliate's order IDs into an array for checking as I process the
> >> batch; 2) creating a temporary table with just the affiliate's IDs,
> >> querying it for each order in the batch.
> >>
> >> Any ideas which would be more efficient? Would there be any
> >> significant difference either way?
> >
> > Perhaps this could work: If you keep the submissions in a set of
> > staging tables, then your unique key would work.
>
> Thanks, but that isn't really practical; all orders are maintained in the
> orders table, so inserting them into an intermediate table would just be
> another step. Using a single table, I would have to flag the records as
> temporary, as I added them, then remove the flag upon successful
> completion, or delete the flagged records upon failure. With an
> intermediate table, I wouldn't have the deletion problem, but I would
> still then have to copy them to the permanent table upon success, which
> (one would hope) would be the majority of cases.
>
> > Of the things you're considering, a single query would be more
> > scalable, but it's like you won't notice much difference at all until
> > the volume went up.
>
> That's kind of what I figured, that even with an order of magnitude of
> tens of thousands of past records, it wouldn't make much difference
> either way. I just wondered about the efficiency of having to loop over
> all those records from the single query, and load them into an array, but
> I think that would be the most efficient.
>
> --
> Alan Little
> Phorm PHP Form Processor
> http://www.phorm.com/


Nope, the most efficient way is to let the database do the checking.

I would solve your problem like this: Put a new row into the orders
table, character, as long as you need (which is the max. length of
affiliate ID + max. length of order ID + 1), which will become your new
primary key.

You create your new primary keys by concatenating the affiliate ID with
the order ID, separated by a dash for better readability.

Example: You have three affiliates, "Jack", "Johnny" and "Jim", each
having three orders in the table. Let's assume that while Jack and Jim
use simple numbering for their IDs, Johnny uses some obscure letter code
for order IDs. Then you would have nine entries in the order table and
their primary keys would look similar to this:

"Jack-0001"
"Jack-0002"
"Jack-0003"
"Johnny-qwertz"
"Johnny-asdfgh"
"Johnny-yxcvbn"
"Jim-0001"
"Jim-0002"
"Jim-0003"

As you can see, every affiliate has his own 'namespace' within your new
primary key, because every affiliate has his own unique ID, so the
primary keys generated this way will never overlap between different
affiliates. However, since the part up to the dash is always constant
for any affiliate it would lead to a primary key collision if any
affiliate screws up his order ID scheme and submits an ID he has
submitted before.

So you don't need to do any checking beforehand at all. Just insert all
the orders you receive into the orders table creating your new key like
$ID = $AffiliateID . "-" . $OrderID; and insert it along with the data
into your table. Only when the insert fails will you have to call
mysql_error() or mysql_errno() to find out whether the INSERT query
failed because of a duplicate key. But you are doing some proper error
checking anyway, right? At least I hope so.

You don't need any arrays, any temporary tables or other things and no
duplicate checking code. The database does it all for you. And it needs
to check for dulpicate keys anyway, so we are not even wasting one
single CPU cycle here. :-) About 2-3 lines of code added to your error
checking to check if the reason for an insert failure was a duplicate
index would be sufficient plus one new database column.


Further optimizing the above solution:

*) You don't even need to create a new variable for $ID, you can just
concatenate the strings when you put your query together. But you
probably thought of that yourself already.

*) After creating your new column, populating it with
"AffiliateID-OrderID" for all database entries that already exist and
declaring it the primary key for the table you can even delete these two
columns from your table. They are redundant now because their combined
information is already contained within your new primary key. And it can
be easily extracted again if needed and also filtered on. If you want to
see for example only orders from Jim, you just filter for all IDs which
start with "Jim-". I don't see a reason why we would need to keep these
around. Except perhaps for comfort, because we wouldn't have to extract
the information from the combined string when we need it, but that's not
really difficult at all.


If I understood your problem correctly this should be the optimal way to
do it. Or did I get your problem wrong?

Bye!

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация