Seeing a real-time breakdown of web traffic by vhost
histo.awk
# creates a histogram of values in the first column of piped-in data
function max(arr, big) {
big = 0;
for (i in cat) {
if (cat[i] > big) { big=cat[i]; }
}
return big
}
NF > 0 {
cat[$1]++;
if (!start) { start = $6 }
end = $6
}
END {
printf "from %s to %s\n", start, end
maxm = max(cat);
for (i in cat) {
scaled = 60 * cat[i] / maxm;
printf "%-25.25s [%8d]:", i, cat[i]
for (i=0; i<scaled; i++) {
printf "#";
}
printf "\n";
}
}
Which can be used like this:
watch 'tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3'
which will give a histogram of the occurence of vhosts in the last 100 lines of the apache log, updating every 2 seconds, sorted with the most frequent vhosts at the top. (Note that this assumes you are using an apache log format which includes the vhost as the first column.) It looks something like this:
Every 2.0s: tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3 Thu Oct 1 09:51:41 2009 www.dogwoodinitiative.org [ 49]:############################################################ www.wildliferecreation.or [ 24]:############################## www.earthministry.org [ 14]:################## blogs.onenw.org [ 3]:#### www.tilth.org [ 2]:### www.oeconline.org [ 2]:### www.audubonportland.org [ 1]:## oraction.org [ 1]:## oeconline.org [ 1]:## dogwoodinitiative.org [ 1]:## bandon.onenw.org [ 1]:## 209.40.194.148 [ 1]:## from [01/Oct/2009:09:51:21 to [01/Oct/2009:09:48:40
(Another useful variant of this is to produce a histogram of requests by IP address, which can help determine what to block in a DOS attack.)
Anonymous on Seeing a real-time breakdown of web traffic by vhost