Mobile app version of vmapp.org
Login or Join
Debbie626

: Google analytics web data vastly different than the core reporting API data I am trying to pull the exact same data i am seeing in my dashboard via the core reporting API for Google Analytics.

@Debbie626

Posted in: #Api #GoogleAnalytics #Python

I am trying to pull the exact same data i am seeing in my dashboard via the core reporting API for Google Analytics. However I just dont seem to understand why the data can be so different even for the same time period and metrics!

From my web UI this is my table structure in the dashboard.

**Display the following columns:**
Dimension: Month of Year
Metric: Pageviews
**Filter this data:**
Only show **Page** containing "/blog/"


And this is what i see in my web UI fore period 09/26/2013 to 12/26/2013:

Month of Year Pageviews
201312 151,502
201311 136,856
201310 183,555
201309 22,689


In my script, I use the exact same metrics (except for naming convention differences between the web and API metrics):

dimensions = ga:yearMonth
start-date = 2013-09-26
start-index = 1
metrics = [u'ga:pageviews']
filters = ga:pagepath=@/blog/
end-date = 2013-12-26


And this is what i see:

Rows:
201312 148626
201311 160769
201310 154770
201309 16099


Report Infos:

Contains Sampled Data = False
Kind = analytics#gaData
ID = www.googleapis.com/analytics/v3/data/ga?ids=ga:xxxxxx&dimensions=ga:yearMonth&metrics=ga:pageviews&sort=-ga:yearMonth&filters=ga:pagepath%3D@/blog/&start-date=2013-09-26&end-date=2013-12-26 Self Link = www.googleapis.com/analytics/v3/data/ga?ids=ga:xxxxxx&dimensions=ga:yearMonth&metrics=ga:pageviews&sort=-ga:yearMonth&filters=ga:pagepath%3D@/blog/&start-date=2013-09-26&end-date=2013-12-26
Pagination Infos:
Items per page = 1000
Total Results = 4


So as we can see, the data format is correct but the data inside is wrong. Whats worse is that the data trend is different.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Debbie626

2 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

I had (what I think was) the same question, comparing python-generated reporting and the google provided web tool. I found the difference was because the web tool uses sampling:

"This report is based on 96,693 sessions (92.19% of sessions)"

You have one data point that is actually higher in the web tool though... can't explain that :)

10% popularity Vote Up Vote Down


 

@Bryan171

Actually this is pretty good. Your numbers are pretty close. On my end my stats on my systems would give me about 4x more hits than Google Analytics.

Now... why the discrepancy? There are many factors, these are those I can think of at this point:


You have a cache between you and your clients, Google Analytics will count every single hit, your system not since it does not get hit.
Your system may be capable of returning a 304 and not count those as hits.
Your system count all the hits, including hits from all spiders (i.e. googlebot hits). Google Analytics knows of many spiders and they do not count their hits.
Your system counts hacker accesses since it hits your server, Google Analytics does not since the hackers (web spammers, etc.) do not execute their JavaScript code.
Goole Analytics count hits from HTML pages only, your server may server other data (PDF files, images, etc.) that get counted too.
Google Analytics also counts differently for visitors who browse your website and "returning visitors," which most often a CMS won't grasp in the same way.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme