Introduction to Google Analytics 4 Data in BigQuery 

BigQuery-GA4-event-schema 

BigQuery GA4 Event Schema

This article provides the details of Google Analytics 4 data that is exported into BigQuery. So let's dive into BigQuery. In BigQuery, look at the Schema of Events table so we can see all the fields, the type, and the mode. This is an in-depth article with more details, and then we can look at the actual data of GA4 in BigQuery. 

Before looking into BigQuery, it is imperative that we must know how Google Analytics 4 works. The more we understand Google Analytics 4, the better it will be for you to understand all the data in BigQuery.

 

GA4-event-based-tracking-system

Google Analytics 4 Event Based Tracking System

Google Analytics 4 is an event-based tracking system. It means every action that the user takes on the site or app and every interaction the user has with the site or app either is tracked as an event or can be tracked as an event, like when a user clicks on something that's an event when a user downloads something that's an event. When a user scrolls through the page - that's an event. Even page views are events, and these events have parameters and values. 

 

GA4-user-properties

Google Analytics 4 User Properties

Each event can have multiple parameters and values associated with them. Along with those events, we can also pass information about users, and those are called user properties. A user looks at a particular page that gets passed as an event with several parameters and values. The user interacting with those also has certain properties that are also passed along with that event.

We can see that the data in Google Analytics 4 is made of rows and columns. Each row corresponds to one event, and columns are either the event parameters or the user properties. And these events take several forms, such as eCommerce, items, views, etc. 

 

BigQuery-GA4-data-rows-columns

First is an event_date. When did this event happen? What date? Then we have event_timestamp, event_name, and event_params (parameters), and this is of type record means that it contains multiple values and is repeated for every event parameter we send.

Each event parameter has a key and value pair. The various values of event parameters are stored in different columns depending on the type of the value. Alphanumeric values are numerical values, string values, integers, or numerical values stored in into value. Then there are float and double values, which store decimal values.

Decimal values are stored in the double value column. The float value column is not currently used by Google Analytics 4; then, we have several other values for events. Next, we have user properties just like event parameters, which are also a type of record and have string, int, and float values. 

BigQuery-GA4-mobile-user-information

BigQuery GA4 Mobile User Information

We also have the user's lifetime value (user_ltv), a device that the user was using when this event was recorded. We have all the device information, such as mobile brand name, model name, operating system, etc. We also have (geo) geographical location of the user. Where was this user located when this event happened? If a user is using the app, then information about the app (app_info). 

BigQuery-GA4-app-information

Traffic_source information contains information about the user's acquisition, like how the user first came to the site or app. Stream_Id is the id we see in Google Analytics when we set up the stream. The platform that the app was built on and then some more event information. 

BigQuery-GA4-ecommerce-information

 BigQuery GA4 eCommerce Information

Then we can see eCommerce information, and then we have information about the items that the user looked at when this event was recorded. So those are eCommerce items or the products for which the event was locked. Keep in mind that some of the values that are recorded in the daily event table are not recorded in the intraday table.

So when we look at the intraday table, we might find some differences. Check this link to know more about this https://support.google.com/firebase/answer/7029846?hl=en. On that page, we can find information about what values come into these different columns, as well as values that are available in the intraday table and which ones aren't. 

It is advisable to become familiar with that list, so we know which columns to use for the analysis and creating the SQL queries. Since we have seen the table structure and the various columns containing the data from Google Analytics 4, next, we will see the actual data within BigQuery. 

actual-GA4-data-in BigQuery

The above image shows the data that's stored in the daily events table. So as data comes in, it's stored in rows, and then we have columns that store the individual data for that row. In rows, we can see multiple rows right here and then the values that are stored in the columns. Remember again, Google Analytics 4 is event-based. Everything is an event. Every interaction with the site or app or any action user takes either is tracked as an event or can be tracked as an event.

Everything is centered around the event. So we have event_date, event_timestamp, the event_name and then you have event_param(event Parameter), event parameters key, and then values. It is a type record, and the event parameter is of a type record, meaning it has multiple values. We have page_location, page_title, session_number, session_ID, etc. Then these values spread over different columns, string value, Int value, float value, and double value. 

If a page location has a string value, then it will appear in the string value column. We see nulls in other columns and session ID, which is numeric, and we see that in event_params.value.int_value column and other values are null. Scrolling further on the right side, we can see other values that are coming across. We have the user ID and user properties. These will be populated once we start to populate the user values. User properties are also a type record, and we have a string value, Int value, float value, etc. 

User_first_touch_timestamp tells us when the user first came to the site or app. LTV values, device information, and browser information geolocation of the user which is not set in this case. But if they were set and we'll see those values. Traffic sources (traffic_source.source) come from a direct source which means directly typing in, and the platform is the website, so they are using the website, not an app and eCommerce. These values will become clear once we write the SQL statements to pull the data. 

different-rows-BigQuery-table

Let's take a look at the different rows in the above table. Notice there is an event parameter key which is session_ID. So each session will have a unique ID, and that number can be found in the event parameter's value. This is how Google combines the activity within a session. 

There is a second row that also has the same session ID. That means this activity also belongs to the same user in the same session, and then we have another row belonging to the same session ID. So what's happening is the timestamp (event_timestamp) when the session started. The event name is session_start, and the id of the session is 1609157277. Then the user viewed a page.  

So session_start was marked as an event. Then the page_view event was triggered, then another page view event_triggered, which tells us whether the user migrated to another page or refreshed that page. In this case, the page location is the same, page title is the same, so that means the user refreshed the page, then the same session ID, and the event is a scroll. That means the user scrolled on the page. Every time a user Scrolls to 90% of the width of the page, this event will get fired. We can also fire this event for other scroll percentages such as 10%, 25%, 50%, et cetera.

BigQuery-GA4-user-first-visit

BigQuery GA4 User Data

And it provides information about which page the user was on and the page URL. We also have other event parameters for this event that are listed in the event_params.key. Below we can find a different session ID, 1609164522. That means a different user came to the site and that user's session started. User engagement gets fired after the user spends a certain time on the site. Then we have the first visit, which tells us a brand new user came to the site, and this is their first visit because the first visit event got fired.

We can scroll through this list on our own site and see which events are getting fired, what other values are coming across, etc. Next, we can start to construct SQL queries to pull different data so that we can conduct the analysis. 

Google Analytics 4 help and support

We are here to help you with all your Google Analytics needs. Our team can get you up and running in no time and also train you and your team to save you a lot of time so that you can focus on driving the business outcomes rather than configuring Google Analytics 4.

Online Courses:

Google Analytics 4 Masterclass

BigQuery for Marketing Analytics with GA4 Data

Workshop - We conduct the onsite customized workshop. Reach out to us for more details.

Consulting Services - We can help you up and running with GA4 and BigQuery. Email us at support@optizent.com

Leave a Reply

Your email address will not be published.