Google Analytics (GA) is a tool for tracking and reporting website traffic, and is the most widely used web analytics service on the Internet. Whether you are marketing for another company or you’re a business owner looking to gain deeper insights into your own site, GA can be a powerful asset for understanding what drives your traffic (from gender and location to device type, bounce rate, and more). As digital marketing becomes increasingly about our ability to synthesize and analyze large datasets, this post is about using R to possibly gain a deeper level of insight into your GA data.
R is a freely available statistical computing and graphics programming language (and software environment). Following an awesome piece from R-bloggers, I’m going to show you how to extract GA data in R and then talk briefly about where you can go from there. What’s the point of combining R with GA? Once our GA data is in R we can then do awesome statistics (e.g., linear regression, ANOVA), run predictive analytics, have greater flexibility in structuring data, and also visualize our data in really cool ways. OK let’s get started!
Downloading R & RStudio
First off go ahead and download R and RStudio (the IDE that makes running R code and seeing the results a breeze). If you have any trouble downloading R, check this out, and keep in mind that CRAN is just a network of FTP and web servers around the world that keeps your R code and documentation up to date (select a CRAN mirror closest to your geographic location to minimize local network load).
Configuring Google Analytics API
Sign in to your Google account associated with the site you’ll be analyzing and point your browser to console.developers.google.com. Create a new project (give it a name you can easily identify), and then search for Analytics API and make sure this is enabled. Next, click on the credentials tab and select “OAuth client ID”. For application type you can select “Other” and name it “Installed Application”. After you go through these steps you’ll receive a client ID and client secret (make sure to save these to a notepad so you don’t forget).
Now, open up RStudio. What’s really cool about R is the slew of packages made available by a community of developers and statisticians. Go ahead and install the RGoogleAnalytics package with:
> install.packages (“RGoogleAnalytics”)
In the future after installing a package, you can call it with: require(RGoogleAnalytics).
Next we’re going to authorize the Google Analytics account (and we’ll also save this information to a token file so we don’t need to repeat this process every time). Enter your client id and secret like this (replacing the x’s with your own info):
> client.id <- “xxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com”
> client.secret <- “xxxxxxxxxxxxxxxd_TknUI”
> token <- Auth(client.id,client.secret)
RStudio will ask if you want to use a local file to cache OAuth, say yes, and then point your browser to the link that’s provided to get the authorization code to enter back into RStudio. You’re going to want to save this token file somewhere you remember. I recommend creating a directory in your standard working environment (Desktop for example) and saving the file there with:
> save (token, file = “~/path/directory_name”)
The token will expire after an hour, but in the future you don’t have to worry about taking these steps again. All you have to do is load the file and validate, like this:
> load ( “~/path/directory_name”)
And remember if you don’t feel like writing the path each time, you can always set your working directory in the future with:
There’s one last step we can’t forget. Remember to get the View ID for your GA data by going into Analytics, then to Admin, and select View Settings under All Website Data. Copy down your View ID as you will need this in the code we are running.
Running GA Queries with R
This is the fun part; it’s time to query the GA API with R. We’re going to be running our queries with dimensions and metrics. Dimensions are attributes of your data while metrics are quantitative measurements. Here’s the list of dimensions and metrics you’ll be working with. In this example I’m going to run a query to give me some insights into the geographic trends in my data (I’ll be looking sessions and organic searches, along with the country, regions (state in the U.S.) and city where this data originates from). Open up RStudio and enter this code:
> query.list <- Init(start.date = “2016-01-01”,
end.date = “2016-12-31”,
dimensions = “ga:country,ga:region,ga:city”,
metrics = “ga:sessions,ga:organicSearches”,
max.results = 10000,
sort = “-ga:sessions”,
table.id = “ga:xxxxxxxxx”)
You’ll notice that I only ran this query for a year (this is a new site so we don’t have much data outside of this range, but you can check this query over a larger range by simply changing the dates). Now run this code:
> ga.query <- QueryBuilder(query.list)
> ga.data <- GetReportData(ga.query,token)
And you should see ga.data show up in RStudio! Open it up and take a look at your data. One thing you’ll notice that’s pretty cool is how easy it is to filter your data. For example, by filtering by sessions or OrganicSearches I can take a look at the primary locations for either metric. Cool!
RStudio makes it really easy to export your data to. To view this table in CSV form just run:
> write (ga.data, file = “~/path/nameoffile.csv”)
Falling Deeper Down the Rabbit Hole
I mentioned earlier that we can use R to run interesting statistics and visualization on our GA dataset. I’ll get deeper into this with future posts, but here’s a little teaser. Let’s start with a linear regression model (predicting for sessions with our organic search data). Run this code:
> ga.linear <- lm(sessions~organicSearches, data=ga.data)
Take a look at your p-value and Adjusted R-squared, to get an idea of the significance of your data. Pretty cool! If you’re new to stats, here’s some background on how to actually interpret those values. Let’s say I want to create a scatterplot of this data. Try this out:
> plot1<-plot(ga.data$sessions~ga.data$organicSearches, log=”xy”,main=”Sessions by Organic Searches”, xlab=”Organic Searches”, ylab=”GA Sessions”)
Not bad! We’re really just scratching the surface here, but I hope this post gives you some awesome ideas on how you can use R to gain deeper insights into your Google Analytics data. Stay tuned for more!