It is pretty common that when you have a log file, you sometimes tend to zone in on a single column and do some sort of aggregration. This happened when I was checking out my HTTP access logs (I had not installed awstats yet). So to get the frequency of the host hits, I did the following:
Assuming you have the logging where the request IP comes first, we take the first column and then cut/sort and then have the uniq command count it and then the subsequent command sort its numerically in a descending order of hits and the unevenful output will look like:
With this output, I got curious with the list of IPs and started firing ‘nslookup’ on the list of the IPs to find out the actual domains of the IPs. Instead of typing the individual ‘nslookup’ commands, it would be great if my output were to look like this:
This started the quest to develop a single liner where instead of IPs, the output lists the domain names with their frequencies.
Using the host hits and IPs as the starting point, I came up with this one-liner on my Mac.
This command saves the host hits in a “count” variable and then searches for the “name” field in the nslookup command. This command worked fine on Mac but not on Ubuntu because the order of host hits and output of nslookup was reversed. To fix this I added an explicit action block for host hits and also gave it a column name. The improved one-liner is as follows:
Now let’s merge the above one liner with the one with the IPs and this gets us a final version.
But wait, there’s more, As I read more about nslookup, I found that it has been deprecated in favor of ‘host’ and ‘dig’. So finally the one-liner using ‘host’ looks like this:
The new output looks this:
If you have any more thoughts how this one-liner can be improved, drop in a comment.