GA4 BigQuery Schema Demystified

GA4 has a direct integration with BigQuery, which means you can send all your GA4 data to BigQuery with just a click of a few buttons. I have covered GA4 and BigQuery integration in another post, check it out at GA4 BigQuery Export - How to set it up, step by step.

In this post, I will dive into the GA4 data in BigQuery. so that you can effectively use this data for your analysis.

If you are not familiar with database or BigQuery fundamentals then join my course - BigQuery for Marketing Analytics 

GA4 BigQuery Structure

The GA4 BigQuery data creates a dataset in the project you link to in BigQuery

Under this data set, you will see up to 4 tables in BigQuery that contain BigQuery Data. These tables are

  • events_(#)
  • events_intraday_(#)
  • pseudonymous_users_(#)
  • user_(#)

# indicates a number. This number shows you the number of days of data.  Each day of data is stored in a separate date-partitioned table.

event_(#) table is a daily events table containing all the events you chose to send to BigQuery and their details.

events_intraday_(#) table is a live-streaming table that contains the real-time data. You can use this table to create real-time reports and analysis. We use this table extensively for such purposes as well as for GA4 setup audits.

pseudonymous_users_(#) table contains the details about users who do not have a user_id attached to them, that ID in GA4 is called user_pseudo_id . GA4 assigns an ID to each user (a device ID, or client ID) that's used to identify a user and this table shows the details about those users., Check out this post for more details - What is user_pseudo_id in GA4 BigQuery Export?

user_(#) table contains the data for those users who have a user_id attached to them. You have to set up user_id tracking in GA4 for this table to show you the data. (see the post, Google Analytics 4 User ID Tracking for details).

Each table has a date appended in the table's name so the actual table name will be like events_YYYYMMDD e.g. events_20231001 is a table for October 1st, 2023. All the above tables follow the same format.

events_(#) table is the only table that will show up when you use the default setup.

To enable the events_intraday table you will have to have Google Cloud with billing enabled.

pseudonymous_users and user tables require you to explicitly enable them in the GA4 interface.  user table also requires to setup User ID tracking as mentioned above.

GA4 Events Table Schema in BigQuery

The events table is at the core of all the tables. This table contains each and every event that you choose to send to BigQuery.  There is a 1 million event limit so keep that in mind.

The structure below is for both daily events tables and events_intraday tables.

The data fields in this table can be grouped into a few major buckets

  1. events
  2. user
  3. device
  4. geo
  5. app_info
  6. collected_traffic_source
  7. traffic_source
  8. stream and platform ifnro
  9. ecommerce
  10. items

 

This table has the following fields

Event Related Data

Field name Data type Description
event_date STRING The date when the event was logged.  The format of this data is YYYYMMDD and is in the time zone of your GA4 property.
event_timestamp INTEGER The time when the event was logged on the client. (see GA4 BigQuery event_timestamp field Demystified). The format is UTC
event_previous_timestamp INTEGER The time when the event was previously logged on the client. The format is UTC.
event_name STRING The name of the event.
event_value_in_usd FLOAT The value of the event converted in USD.
event_bundle_sequence_id INTEGER The sequential ID of the bundle in which these events were uploaded.
event_server_timestamp_offset INTEGER Timestamp offset between collection time and upload time in micros.
event_params RECORD Record of all the event parameters associated with that event.

 

User Related Data

Note there are separate User Tables that contain other details of the users)

Field name Data type Description
is_active_user BOOLEAN Whether the user was active (True) or inactive (False) at any point in the calendar day. Included in only the daily tables
user_id STRING This is set using GA4 user id integration.  If it is not set then it will be null.
user_pseudo_id STRING The pseudonymous id (e.g., app instance ID) for the user.
user_first_touch_timestamp INTEGER The time at which the user first opened the app or visited the site. This is in Microseconds
privacy_info.ads_storage STRING Indicates if ad targeting is enabled for a user or not. You can have Yes, No, Unset
privacy_info.analytics_storage STRING Indicates if analytics storage is enabled for the user or not. You can have Yes, No, Unset as values.
privacy_info.uses_transient_token STRING Indicates if the user has denied Analytics storage and the developer has enabled measurement without cookies based on transient tokens in server data. You can have Yes, No, Unset as values.
user_properties RECORD Contains key-value pairs of all user properties
user_ltv RECORD Contains the Users LTV and the currency used for that LTV. Not available in events_intra day table.

Stream and Platform info

Field name Data Type Description
stream_id STRING The ID of the data stream from which the event was logged
platform STRING The data stream platform, Web, IOS or Android, from which the event was logged

Device, Geo, App Info, Collected Traffic Source, Traffic Source, eCommerce and Items

All of this data is captured in the device, geo, app_info, collected_traffic_source, e-commerce, and items fields in the events tables.
They are all of record type and contain various types of data associated with these records.  See below the general information about record type data.

Record Type Data in BigQuery Import of GA4

Many of the fields, as mentioned above, are Record type fields. This field contains the entire record in one column. Think of it as a table (record) within a table. 

Each record has a key value containing the name of the value (e.g. event_params.key) and the value containing the actual value.

Actual values are contained in a column that matches the data type of the value.  There are four possible data types, String, Integer, Float, or Double, and hence you will see 4 columns but only one of them contains the value and the other columns are null.

For example, for the event parameters, there is a column called, event_params.key. Let’s say this key value is session_number. This means the value column corresponding to this event parameter will contain the session number of the user, which is an integer value.  So event_params.value.int_value column is populated with the value while other columns are empty.

Don’t worry if this concept is a little difficult to understand. It will become clear in future posts.  However, if you want to speed up your learning then join my BigQuery for Marketing Analytics course and master these concepts.

 

GA4 BigQuery Schema for Events Video

This is an old video as things have changed recently but should still provide you with a good overview

 

GA4 Users Table Schema in BigQuery

The following sections describe the user data that Analytics exports to the Pseudo ID and User ID tables (subject to the differences enumerated in the section above).

  • occurrence_date -  Date when the record change was triggered
  • last_updated_date - Date when the record was updated in the table
  • user_id  - ID for the User-ID namespace in reporting identity (User ID table only)
  • pseudo_user_id -  ID for the Pseudonymous namespace (Pseudo ID table only)
  • stream_id -  Data-stream ID (Pseudo ID table only)
  • user_info.last_active_timestamp_micros - Date of the user's last activity (timestamp in microseconds)
  • user_info.last_active_timestamp_micros - Date of the user's first_open or first_visit event, whichever is earlier (timestamp in microseconds)
  • user_info.user_first_touch_timestamp_micros
  • user_info.first_purchase_date
  • audiences (record)
  • audiences.id
  • audiences.name
  • audiences.membership_start_timestamp_micros - When the user was first included in the audience (timestamp in microseconds)
  • audiences.membership_expiry_timestamp_micros - When the user's audience membership will expire (timestamp in microseconds).  Membership duration is reset when new activity requalifies the user for the audience
  • audience.npa - true or false based on your NPA settings for events and user-scoped custom dimensions included in your audience definition
  • user_properties - record that key and string values
  • device record that has operating system, category, brand_name, model_name and unified_scree_name
  • Gero record that has city, country, continent and region
  • lifetime value record
  • predictions record

Check out Google Analytics 4 and BigQuery User Data Export for complete details.

 

GA4 + BigQuery Integration

If you haven't yet integrated GA4 with BigQuery then you should do that right away. Check out, these posts

Need Help with GA4 and BigQuery?

We provide Google Analytics 4 & BigQuery consulting and training services. We can set up Google Analytics 4 and BigQuery as well as train your team to be self-sufficient. Contact us for a no-obligation consultation.

Leave a Reply

Your email address will not be published. Required fields are marked *