I required some simple statistics (selected page visits per day) from web-server logs. I looked at some web log analyzer packages like AWStats, but it looked to me like as an overkill in my case – I’d probably spent more time to trying make it work then putting together some small script. So here it is – a simple bash script that will take all available access logs (by default on Debian nginx is using logrotate to rotate logs daily and keeps 52 daily logs, old logs are gzipped) and calculate page visits for certain request pattern:
#!/bin/bash BASE_FILE=/var/log/nginx/access.log OUTPUT=/tmp/pdf-checker-tmp/stats.txt echo -e "DATE\tVOLUME" > $OUTPUT COUNT=52 for ((i=1;i<=$COUNT;++i)) do FILE=$BASE_FILE.$i if [ -f $FILE ] ; then #echo "Uncompressed file $FILE" LISTER="cat $FILE" elif [ -f $FILE.gz ] ; then #echo "Compressed file $FILE.gz" LISTER="gunzip -c $FILE.gz" fi if [ -n "$LISTER" ] ; then DATE=`$LISTER | head -1 | grep -oP "\d{1,2}/\w{3}/\d{4}"` #echo "From date $DATE" VOL=`$LISTER | awk -F\" '($2 ~ "POST /upload "){print $2}' | wc -l` echo -e "$DATE\t$VOL" >> $OUTPUT fi done
This script illustrates power of shell programming. Script is scheduled in cron to run daily after log rotate job and the output file is stored in web server directory for static files.
Sample of script output:
DATE VOLUME 12/Nov/2014 62 11/Nov/2014 55 10/Nov/2014 62 09/Nov/2014 0 08/Nov/2014 0 07/Nov/2014 60 06/Nov/2014 70