GA4 Data Organization in BigQuery

In this post, we will go over the BigQuery organization of GA4 data in BigQuery.

This opens up the BigQuery interface also called the BiQuery console.

On the top of the console is a drop-down that shows you the projects you have access to. In your case, you may have only one project as you are starting with BigQuery.

Various resources in BigQuery are organized in a hierarchy.  This hierarchy allows you to control access, quotas, billing, etc.

This project hierarchy looks like the following:

On the top you have your organization and then underneath that you have a billing account that contains one or more projects. If you work for a large organization then the “Organization” and Billing account will likely be managed by your IT team. However, if you do work for a small organization then you will likely end up managing all the aspects of BigQuery.

Most“digital analysts” will spend most of their time in the “Project” and what’s inside a project.

Once you select the right project in the top drop-down, you will see it in the right-hand panel as well.

Expanding the projects will show you the list of datasets that are in that project. Data sets are top-level containers that organize and control access to your tables and views.

Under each dataset are the various tables that contain the data from sources such as GA4, file imports, etc..

If you are just starting out then you can potentially see up to four tables in the dataset created by GA4 integration. The tables that get generated depend on what you selected in GA4, during the integration and your GA4 setup.

These four tables are:

  1. Events tables - All the GA4 events with various parameters and other information are stored in the events table. This table gets created when you select the “Daily” option in GA4 BigQuery integration in the GA4 admin panel. In this table, each day of data is contained in its own table. Tables are named in event_YYYYMMDD format, YYYYMMDD is the year, date, and month of the GA4 data. All these tables are then shown as events_##, under the data set. ## stands for a number of tables i.e. the number of days of GA4 data, since each day creates a new table. So if you have two days worth of data then you will see events_(2) under the data set.
  2. Events intraday table - This table gets created when you select the “Streaming” option during GA4 and BigQuery integration.  You will see these tables as event_intraday_(##). Intraday tables are staging table that includes records of session activity that took place during the day.  These tables might not contain all the data and certain values might be missing as well (more on this in the future).
  3. Pseudo Users table - This table contains information about anonymous users. All users on your site or app are anonymous by default. If you do enable User ID tracking with GA4 then any users with a known User ID won’t be in this table. Instead, they will be in the user table as described below. These tables are listed as pseudonymous_users_(##) and just like other tables the name of this table also contains the date.
  4. User Table - These tables contain information about known users. This table will show up if you enable User ID integration with GA4. The format of this table is users_YYYYMMDD and is listed under users_(##).

In the next post, we will look inside the events and events_intraday tables.

This newsletter is supported by people like you so please make sure to share this with your colleagues and social media connections.  This will provide me with the motivation to keep writing.

If you want to fast-track your BigQuery knowledge then check out, BigQuery for the Marketing Analytics Course

Leave a Reply

Your email address will not be published. Required fields are marked *