How to Calculate GA4 Session Counts in BigQuery

In this post, I will share how you can recreate the “Session” metrics in BigQuery.

There are three ways to get the “Session” metrics in GA4 and each one tells a different story.

In addition to the “Session” metrics there are also “Engaged Session” metrics, that I will cover in a future post.

For now, let us focus on “Session” metrics. The three ways of calculating are:

  1. Using session_start event

  2. Using ga_session_id event parameter

  3. Using ga_session_id event parameter and user_pseudo_id column

Using session_start event

Every time a new session starts, GA4 fires a session start event. You can count the number of session start events for the time frame you need to calculate by using the COUNT function. Here is the sample SQL

SELECT COUNT(*)
FROM your_project_id.your_dataset_id.your_table_id 
WHERE event_name = "session_start" 

So when I ran this SQL on my dataset, I got 896 session_start events for a particular day.

However, GA4 shows 893 sessions on that particular day.

So why is it different? We will cover this later but now let’s take a look at the second method of calculating sessions.

BigQuery for Marketers and Marketing Analysts Course

BigQuery for Marketers and Marketing Analysts. Step by step course that teaches you everything from scratch.

academy.optizent.com/courses/bigquery-for-marketers-and-marketing-analysts

Using ga_session_id event parameter

As you might recall from one of my previous emails, GA4 stores a session_id ,based on timestamp, in a parameter called ga_session_id. To count sessions based on this field simply do a distinct count of the ga_session_id value, which is stored in the int_value column of event parameters. Distinct makes sure that you count one session_id only once.

Before you can use the values stored in event parameters, make sure to unnest the event_params column. Here is the sample SQL:

SELECT COUNT(DISTINCT value.int_value)
FROM your_project_id.your_dataset_id.your_table_id, unnest(event_params)
WHERE key = 'ga_session_id'

Running this SQL on my dataset, I got 882 as the session count.

Note: Even though the name of this field suggests that it is unique but it is not. More than one user can have the same ga_session_id so using this field is wrong to begin with and hence ignore the metrics based on this field. Instead, use the next method that builds on this method.

Using a combination of ga_session_id and user_pseudo_id

In this method, you will convert the session_id into a unique identifier by attaching user_pseudo_id value to it. user_pseudo_id column contains a unique value for each user and hence attaching a non-unique session_id to it will create a unique value.

In the following example, I am using a function called CONTACT to string together the values of user_pseudo_id and session_id value (stored in value.int_value column).

Note: CONCAT function can take multiple strings and create one string from them. For example: CONCAT (‘A’,’B’,’C’) will return ‘ABC’ as the output.

Here is SQL for calculating the sessions.

SELECT COUNT(DISTINCT CONCAT(user_pseudo_id,value.int_value))
FROM your_project_id.your_dataset_id.your_table_id, unnest(event_params)
WHERE key = 'ga_session_id'

Using this SQL on my database, I got 889 users.

To recap:

  • GA4 shows, 893

  • The count of “session_start” shows 896

  • The count of “ga_session_id” shows 882 (as mentioned above this is wrong so ignore it)

  • The count of “ga_session_id” with “user_pseudo_id” shows 889

This is the correct method to calculate this metrics.

 

Learn GA4 BigQuery

Leave a Reply

Your email address will not be published. Required fields are marked *