Inconsistent Web Analytics Numbers: Google vs. The World

Over the past 11 years, iMarc has used a number of web analytics tools. Whether FunnelWeb, Webalizer, Urchin, Mint, or Google Analytics, the goal is always to understand how people use the web and make optimizations based on that usage.

Recently, we've been recommending Google Analytics. Of course Google Analytics has its limitation and problems, most notably, Javascript and Cookie-acceptance is required by the end-user. That said, Google's ease of use—especially when compared to other reporting software—made it our choice for most clients.

Once we started moving clients sites from Webalizer and Urchin to Google Analytics, we were amazed at the discrepancies in traffic numbers. Google's numbers were much lower— sometimes half, 1/5th, even 1/10th the traffic that Urchin was reporting. Luckily, though the numbers were inconsistent between software packages, traffic trends were almost identical. Moral of that story: don't change reporting tools.

However, once committed to switching from a log-based analyzer like Urchin to Google Analytics, we were determined to learn more about what was causing this discrepancy.

We looked at one of our websites, picked random day, and compared the results from Google Analytics to Urchin/Webalizer (Urchin and Webalizer are different programs but their reporting numbers are almost identical). Since we have access to the raw Apache logfile, we looked at that as well.

Coincidentally, Urchin's results are almost identical to the server's logfile, but Google Analytics is by far the best gauge of true, meaningful traffic. Google Analytics' numbers may be dramatically lower, but they are much more important.

Analyzing 1 Day's Traffic

Here we see how each tool reports the same day's web traffic.

Website Visitors (or Sessions)
Raw Logfile 814
Urchin 1,036
Google Analytics 379
With the raw logfile, I pulled all unique IP addresses. A number of these IPs came back multiple times throughout the day, presumably causing Urchin's number to be higher.
Pageviews
Raw Logfile 9,579
Urchin 10,718
Google Analytics 1,672
For the logfile number, I added all requests for ".php" pages. Urchin reports .xml, .pdf, .swf, and .txt files in their pageview reports, causing the number to be higher.
Single Page Requests (search.php)
Raw Logfile 2,441
Urchin 2,440
Google Analytics 30
Here, I filtered out all pageviews except the site's search page, /search.php. Looking at these results for a single page or the previous results for all pageviews show huge discrepancies. See below for details...

This last comparison is the most telling. Both the raw logfile and Urchin report about 2,440 requests for "search.php". Why is Google Analytics only reporting 30 requests for the same page on the same day? Google seems to be under-reporting 2,411 requests.

Looking at the server log, we find exactly 2,411 requests from browsers (or User Agents) that we probably don't care about. Google Analytics filters all of these out of their reports:

  • 2,317 of the requests were from user agent, "Mozilla/5.0 (compatible; Googlebot/2.1)". This is Google, spidering our page. (On a side note, this seems like an insane amount of requests for one page on one day... I guess that's another issue I could look into)
  • 47 requests came from a user agent that doesn't identify itself. All these requests came from 3 IP addresses all resolving to the same domain, clients.your-server.de. This person (or script) probably has Javascript turned off. I'm actually glad that Google Analytics is filtering these requests out, as they're obviously not a user we care about. All this user's requests are searches for "<a" or "<script"—most likely a script looking for some vulnerabilities.
  • 44 requests came from "Twiceler-0.9 http://www.cuil.com/twiceler/robot.html";. Google Analytics is filtering out requests from the new search engine, Cuil.com.
  • 1 request from Yahoo/Slurp's robot
  • 1 request from user agent, "Java/1.6.0_04"
  • 1 request from "FeedHub MetaDataFetcher/1.0 (http://www.feedhub.com)"

So Google's report of 30 requests ends up being much more meaningful than the other log analyzer's report of 2,440 requests. Everything that Google Analytics filters out is either:

  • Google's own search engine spider
  • Other search engine spiders
  • Scripts / Feedburners
  • People up to no good.

Google Analytics' focus seems a natural progression of reporting more meaningful data, even if the numbers are lower. In the 1990's it was all about hits (how much more useless can you get?), then it was pageviews, then visitors.

Now Google seems focused on reporting real people—not scripts, robots, spiders, or search engines.

While researching this discrepancy, I did notice a few instances where Google seemed to filter out real people. By following a user's path through the actual logfile, it looked like a few legitimate requests just weren't showing up in Google Analytics. In these cases, the browser's User Agent didn't identify itself. Though extremely rare, I'm guessing these requests came from someone behind a corporate firewall or someone who doesn't accept cookies and keeps their browser in its most secure state. Again, these requests were so rare, they wouldn't have affected the report much anyway.

I'll be happy switching to Google Analytics and believing their numbers represent real, meaningful traffic.

Comments

Monday, Dec 22, 2008 / 8:58pm Christian Madden said…

Great article, I've always wondered about the discrepancy, but hadn't yet dug into the details of why. We're moving most of our stuff to GA and it's good to know it most closely reflects what "real people" are doing on our sites.

Tuesday, Dec 23, 2008 / 10:37am Nick said…

Lesson of the day, when reporting traffic for potential ad placement..use Urchin numbers.

Tuesday, Dec 23, 2008 / 7:58pm Will Bond said…

Nice write-up Dave! I was just going through a similar process on Flourish for my download counter. The counter seemed a bit higher than I expected, so I looked through the web server logs. It turned out I have quite a number of requests from Googlebot, Slurp, Java and a whole host of other search engines.

I would be really nice if there was a standard phrase search engines included in the user agent to allow logging to ignore non-human users. Perhaps something like "non-interactive".

Tuesday, Dec 30, 2008 / 2:58pm Ryan Capers said…

Dave - fantastic article - I forwarded it to several folks. I've always thought the web-stats packages were very squishy and this really helps puts things in perspective for me. Thanks a lot!

Tuesday, Jun 23, 2009 / 1:19pm Jim Samuel said…

Great article. Thanks for posting it. I've been trying to find an explanation for the discrepancy between Webalizer and Google Analytics as we switch to GA. Now I have the answers we need. Thanks.

Friday, Dec 18, 2009 / 8:08am Prasad SN said…

Great Article. I've been trying to find an explanation for the discrepancy between Urchin and Google Analytics. Now I have the answers we need. Thanks.

Comments have been turned off on this blog.
Read something more recent.

Statements and opinions expressed in this blog and any comments made are the private opinions of the respective poster, and, as such, iMarc LLC is neither responsible nor liable for such content.

Meet The Author

Dave Tufts

Vice President, Director of Technology

Search

Recent Blog Posts

Recent Comments

  • Changes

    Jean Fitzgerald commented: Congratulations Jeff! It's a great move for you, the company will never regret it. You are one of the most creative people I know too.

  • Changes

    Claire Turcotte commented: Jeffrey! Congratulations. VERY PROUD OF YOU. Send me an answer............ Love, Memere

  • What To Know Before You SEO

    seo course bangalore commented: Wow, awesome blog layout! How long have you been blogging for? you made blogging look easy. The overall look of your website is fantastic, as well as the content! http://seocoursevideo.com/worry/

  • Changes

    Nick Hill commented: Congrats to Will and Jeff!

  • Scrolling, clicking, and the fold

    jay commented: Yeah ,,people will scroll up and down no matter what, now a days a lot of sites have horizontal scroll ..Personally i think scrolling is better than clicking..no time to waste lookin for the link and then clicking..scrolling offers all in one go.

We heart Visitors

  • iMarc
  • 14 Inn Street
  • Newburyport, MA 01950
  • Phone: (978) 462-8848
  • Fax: (978) 462-8807
  • Directions

Contact Us

Whether you have a huge project or just want to talk about updating your site, we’re here to help. Fill out the form, and we’ll get right back to you.

Contact Us
  • All Fields Required

Close