How to Read Your Log Files to Study Your Web Site Traffic
Not every visitor to your Web site is a human, and it’s the humans you want the data on — not the robots. When you study the log files for your site, there are certain things you need to look for in these files to make sure your numbers are correct:
Search engine spiders. Search engines use programs commonly called spiders or robots that come to your site and read it to help the search engine to analyze your site. You can check and see if the robots.text file was requested (this is how you figure out if your site was spidered or not).
When you recognize a spider, grab the IP address and let the analytics software know to ignore hits from that address. Most good log analyzers reverse IP lookup the IP address to find spiders and ignore them for you.
Masked IP addresses. Not every IP address represents an individual user. Corporations, universities, and even users from AOL can show your server a single IP address when in fact many people have visited your site. Watch for high traffic from a single IP address to see if you have more visitors than your log file suggests.
Cookies. Don’t expect accurate visitor counts from cookies. Many people set their browsers not to accept cookies. Cookies also can’t distinguish multiple users on the same computer (like a library or school computer). Log files, however, do not contain cookie info.
One way to solve this problem is to create a dynamic page. A dynamic page is a page that is built on the fly from the database using scripts. You can also set your server to prevent caching if you have enough bandwidth.
Know your audience. Some sites only track users who are logging on from home or from work; those sites filter users coming in from libraries and schools using public terminals. In general, this means they require a login or a persistent cookie, which public terminals are not likely to allow.
Analytics is not just about gathering data. It’s all about knowing what you want from your Web site and then being able to read the pile of data you’ve acquired in order to see whether those goals are being reached, and what else you need to be doing differently to get a higher rate of conversion.