Posted by John Nichel on 12/28/66 11:36

Jay Paulson (CE CEN) wrote:
> Hello everyone! I've been given the responsiblity of coding an apache access_log parser. What my tasks are to do is to return the number of hits for certain file extensions that happen on certain dates with specific IP address.
> As of now I'm only going back 7 days in the log looking for this information and I'm only looking for 5 file types (.doc, .pdf, .html, .php, and .flv). I'm using the fgets() function so I can read the file line by line and do the matches that I need to do and increment the counters as needed. Right now I have 3 loops looking for everything, which seems to me not to be the best way of doing this. I've also encountered that a line may have the file extension I want but it's actually the soucre of another file. (see below for example)
> Log file example:
> I want the first line but not the second line. The second line has a .css file which was used by the .html file therefore I don't want this line. I do want the first line that all it has is .html and no other files.
> - - [01/Jan/2006:07:33:18 -0600] "GET /home.html HTTP/1.1" 200 8220 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
> - - [01/Jan/2006:07:33:18 -0600] "GET /styles/redesign.css HTTP/1.1" 200 2381 "http://wfmu.wfm.pvt/home.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
> At any rate, here's some of my psudo code/code for what I'm trying to accomplish. I know there has to be a better way for this and I'm looking for suggestions!

Save yourself a ton of work. Dump the raw logs into a db, and you can
do all the queries on the db. Something like this...

CREATE TABLE `rawLogs` (
`ipAddress` int(15) NOT NULL default '0',
`rfcIdentity` varchar(32) NOT NULL default '',
`apacheUser` varchar(32) NOT NULL default '',
`date` int(15) NOT NULL default '0',
`request` longtext NOT NULL,
`statusCode` varchar(32) NOT NULL default '',
`sizeBytes` int(11) NOT NULL default '0',
`referer` longtext NOT NULL,
`userAgent` longtext NOT NULL,
KEY `ipAddress` (`ipAddress`),
FULLTEXT KEY `search` (`request`,`referer`,`userAgent`)

John C. Nichel IV
Programmer/System Admin (ÜberGeek)
Dot Com Holdings of Buffalo



