Monday, April 20, 2020

Google Analytics serverless ETL function

How to build your own Google Analytics ETL?

Sometimes Google Analytics webpage is just not enough (or maybe you want to add some data from other sources?). Here we will try to build a working lambda function to extract data from a Google Analytics account. The function will create a csv file with transaction_id and other marketing columns of one day and one account view.

The demo is done on Mac OS Catalina.

Requirements:
1) Google Account
2) Google Analytics account
3) AWS account
4) Installed Docker

Here are the main steps:
1) Create Google API service account and download credentials file
2) Enable Google Analytics and Google Analytics reporting for this API
3) Grant Google Analytics access to your service account email
4) Create python function to write api query result to S3

STEP 1
Go to your cloud credential page: https://console.cloud.google.com/apis/credentials and click "Create Credentials" and select "Service account"


Create desired account name:


Skip part 2:


Click "CREATE KEY" and generate JSON key file. It should download to your computer.


Step 2

Go to https://console.cloud.google.com/apis/library and enable both analytics services


Step 3

Go to your Google Analytics. Go to Admin -> Account User Management and add your service account email (e.g. mytesting@local-storm-255611.iam.gserviceaccount.com) and give "Read & Analyse" permissions. 

Step 4

Create AWS account (or log in if you already have one) and create a bucket:


Go to https://eu-north-1.console.aws.amazon.com/lambda/home?region=eu-north-1#/functions and press "Create Function"

 Then, name your function and choose python 3.8 runtime
Create credential file in your lambda function and copy credential json file, downloaded in the previous step.



Copy the function to lambda_function.py:


import json
import boto3
import csv
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = 'cred.json'

def initialize_analyticsreporting():
  """Initializes an Analytics Reporting API V4 service object.
  Returns:
    An authorized Analytics Reporting API V4 service object.
  """
  credentials = ServiceAccountCredentials.from_json_keyfile_name(
      KEY_FILE_LOCATION, SCOPES)
  # Build the service object.
  analytics = build('analyticsreporting', 'v4', credentials=credentials, cache_discovery=False)
  return analytics

def get_report(analytics, VIEW_ID, dt):
  """Queries the Analytics Reporting API V4.
  Args:
    analytics: An authorized Analytics Reporting API V4 service object.
  Returns:
    The Analytics Reporting API V4 response.
  """
  return analytics.reports().batchGet(
      body={
        'reportRequests': [
        {
          'viewId': VIEW_ID,
          'dateRanges': [{'startDate': dt, 'endDate': dt}],
          'hideTotals': True,
          'metrics': [{'expression': 'ga:itemQuantity'}],
          'dimensions': [{'name': 'ga:transactionId'},
                         {'name': 'ga:dateHour'},
                         {'name': 'ga:adContent'},
                         {'name': 'ga:sourceMedium'},
                         {'name': 'ga:campaign'},
                         {'name': 'ga:keyword'}
                         ]
        }]
      }
  ).execute()
def lambda_handler(event, context):
    VIEW_ID = str(event['view_id'])
    dt=str(event['dt'])
    dt
    analytics = initialize_analyticsreporting()
    response = get_report(analytics, VIEW_ID, dt)
    if response['reports'][0]['data'].get('rows'):
      a=[]
      for row in json.loads(json.dumps(response))['reports'][0]['data']['rows']:
          a.append(row['dimensions'])
    
      with open('/tmp/temp.csv', 'w', newline='') as file:
          writer = csv.writer(file)
          writer.writerows(a)
        
      with open("/tmp/temp.csv", "rb") as f:
          s3 = boto3.client('s3')
          fn='google/GA/' + dt[0:4] + '/' + dt[5:7] + '/' + dt[8:10] + '/' + str(VIEW_ID) + '.csv'
          s3.upload_fileobj(f, str(event['bucket']), fn)


If you would try to run the function, it won't work. Google api library is not a standard one, so we need to import it by adding a layer.

First, we will create a layer using Docker desktop. You will need to open terminal and just paste this code


cd Documents/ mkdir google_layer cd google_layer/ mkdir lambda_function lambda_layers mkdir -p lambda_layers/python/lib/python3.8/site-packages Cd lambda_layers docker run -v "$PWD":/var/task "lambci/lambda:build-python3.8" /bin/sh -c "pip install google-api-python-client -t python/lib/python3.8/site-packages/; pip install oauth2client -t python/lib/python3.8/site-packages/; exit" zip -r ga.zip python > /dev/null



After this you will find in your documents folder, google_layers folder with ga.zip
Upload this file to your lambda layers.

In Lambda menu, go to Layers and press "Create layer".
Upload a .zip file that we have created in the previous step.


Name it and choose Python 3.8 compatible runtime

Go back to your created function and click on Layers part of Designer and click "Add a layer"


Chose your created layer here and click add. Do not forget to save the function after. 



Let's test our function.

Click on test and configure your test event (paste this and change where needed):

{
  "view_id": 1111111,
  "dt": "2020-04-20",
  "bucket": "mytestbucket"
}

You can find your own view_id on this page: https://ga-dev-tools.appspot.com/account-explorer/
Finally you would need to give access to your lambda role to access your S3 bucket.

Go to https://console.aws.amazon.com/iam/home?region=eu-north-1#/home
From the left menu, select roles and click on your lambda role:


Click on "Attach policies":


Start typing "S3" and check the box on "AmazonS3FullAccess"




Now you can try to run your lambda function. If everything is ok, you will see a file created in your S3 folder.

Further notes:
  • You can extract any kind of data from here. You can build a valid query using metrics and dimensions explorer https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/
  • You can write additional function to process the created file and insert rows into the database.
  • You can write an additional lambda function, to extract all the historical data.
  • Step functions could be used if you have a complex ETL process.
  • Additional logic for pagination can be easily added
  • Stitch Data is offering free 6 million rows per month for free, so if you are not planning a lot of row,  you might want to check them out


5 comments:

  1. How to get to the Seminole Hard Rock Hotel & Casino by Bus - Mapy
    This page shows 경주 출장안마 the following transit lines: (North) 거제 출장마사지 + 2 888 796, (South) + 7 888 795, (North) + 7 888 792. (North) + 창원 출장안마 7 여주 출장샵 888 793, (North) + 이천 출장안마

    ReplyDelete
  2. Many people like to purchase garments online to try them out in the consolation of their houses. Aside from the rising demand, the power to source these items for little or no makes clothes a prime merchandise for cheap wholesale reselling. There are a couple of ways that this enterprise mannequin may be utilized.

    ReplyDelete
  3. We wish to maintain playing enjoyable - and these two elements are crucial to verify issues keep how they're alleged to. If at any level you are 1xbet feeling your playing habits are getting out of control, seek assist. Full Keymapping support for precise control of keyboard and mouse or gamepad. Bigger screen with better graphics; Long length, without limitation of battery or mobile data. Spokespeople for Google and Apple, which run the preferred app stores, declined to remark. Brett stated playing apps shouldn't be so pervasive, particularly end result of|as a end result of} they make it more durable for him and other playing addicts to stay clean.

    ReplyDelete
  4. What’s more, you always have the choice to play the demo model of your sport of curiosity first earlier than you put in 메리트카지노 actual cash. With the amount of variations that exist today, it can’t harm to give quantity of} a attempt free of charge find a way to} find the sport you like most. Fortunately for on-line on line casino gamblers, the sport easily transitions to laptop and cellular screens, making it an especially well-liked choice on-line nicely as|in addition to} stay.

    ReplyDelete
  5. One of the best issues about playing in} video poker is that the machine lets you know the anticipated return earlier than you even begin to play. Keep in mind that you just also needs to|must also} totally practice on your pc earlier 우리카지노 than risking actual money on an actual video poker machine. In case you already know all of this and can’t wait to play a number of the} best video poker video games - that is the positioning you need to} go to.

    ReplyDelete

Google Analytics serverless ETL function

How to build your own Google Analytics ETL? Sometimes Google Analytics webpage is just not enough (or maybe you want to add some data f...