|
Posted by "Michael Sims" on 08/11/05 17:42
Jay Blanchard wrote:
> [snip]
> Assuming unix, I'd do the following from the root of the application
> to get a list
> of files that contain queries:
>
> $ egrep "=[:space:]*\".*\b(SELECT|INSERT|UPDATE)\b" * -ril
> ...
>
> Anyway, that's how I'd do it. Hope you got something out of this...
> :) [/snip]
>
> That is a good start, now all I need to do is get the whole query(s)
I guess I misunderstood your goal. You said before that you needed to LOCATE the
queries, which the above will do (if certain assumptions are true). What exactly
are you wanting to accomplish? Are you trying to write a script that will extract
the entire query from a set of PHP files?
If so, and given this:
$variableName = "INSERT INTO bar (foo) ";
$variableName .= "VALUES ('".$foo."') ";
Would you expect:
INSERT INTO bar (foo) VALUES ('')
or something else?
At any rate, if you are trying to automatically extract entire queries, I'd say
that's pretty tricky, unless your code uses very strict idioms for defining queries
that it doesn't stray from. For example, if the code always do this:
$query = "select .....";
$query .= "from .....";
$query .= "where ....";
That's one thing, but if it sometimes does this:
$query = "select ....".
"from .....".
"where .....";
Then that's something entirely different.
Assuming the first, you could perhaps use a two-pass approach where you first go
through a file and find the beginning of a query assignment (using a regex like the
one above), then extract the variable name and line number. Then on a second pass
start from each line number you saved and process line number + N lines (where N is
the largest number of lines the code normally takes to define a query, arbitrarily
chosen) and look for lines where the variable name is followed by an assignment
operator (= or .=). Then use a regex on those lines to extract everything between
quotes (hopefully just double quotes, unless the code tends to switch between double
and single). But then you run into problems when your N is too large and the end of
one chunk overlaps into the beginning of the next. I would imagine something like
this would take a lot of time to get right, and even then only be an 80% solution,
if that.
It seems to get this 100% right you'd have to have a full-fledged PHP parser, which
means hacking the parser that PHP itself uses. Even hacking the PHP parser probably
wouldn't get you a 100% solution because the compile stage won't be enough....there
is going to be code that may or may not be evaluated at runtime and I think it would
be impossible to know without actually running it.
So...if you absolutely have to have this done programmatically (as opposed to just
locating the beginning of the queries and manually extracting them) then I would say
good luck to you. :) Hopefully someone else has some ideas...
[Back to original message]
|