Posted by John Nichel on 12/04/70 11:36
Jay Paulson (CE CEN) wrote:
> Jay Paulson (CE CEN) wrote:
>>Hello everyone! I've been given the responsiblity of coding an apache access_log parser. What my tasks are to do is to return the number of hits for certain file extensions that happen on certain dates with specific IP address.
>>As of now I'm only going back 7 days in the log looking for this information and I'm only looking for 5 file types (.doc, .pdf, .html, .php, and .flv). I'm using the fgets() function so I can read the file line by line and do the matches that I need to do and increment the counters as needed. Right now I have 3 loops looking for everything, which seems to me not to be the best way of doing this. I've also encountered that a line may have the file extension I want but it's actually the soucre of another file. (see below for example)
>>Log file example:
>>I want the first line but not the second line. The second line has a .css file which was used by the .html file therefore I don't want this line. I do want the first line that all it has is .html and no other files.
>>10.25.40.64 - - [01/Jan/2006:07:33:18 -0600] "GET /home.html HTTP/1.1" 200 8220 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
>>10.25.40.64 - - [01/Jan/2006:07:33:18 -0600] "GET /styles/redesign.css HTTP/1.1" 200 2381 "http://wfmu.wfm.pvt/home.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
>>At any rate, here's some of my psudo code/code for what I'm trying to accomplish. I know there has to be a better way for this and I'm looking for suggestions!
> Save yourself a ton of work. Dump the raw logs into a db, and you can
> do all the queries on the db. Something like this...
> I took your idea and did a search on Google and found that this has already been done for me! Check it out!
> Very cool :)
This is the script I wrote when we first started this project a few
months ago to parse the 2+ years of log files, and intially get them
into the db. If you want to use parts of it, feel free.
John C. Nichel IV
Programmer/System Admin (ÜberGeek)
Dot Com Holdings of Buffalo
[Reply to this message]
Copyright © 2005-2006 Powered by Custom PHP Programming