GA4 Pseudonymous Users and User Tables in BigQuery

When you export GA4 data to BigQuery, you will see one or two user tables, depending on your GA4 implementation. These tables are

  1. Pseudonymous Users table – This table contains information about anonymous users. All users on your site or app are anonymous by default. The naming format of this table is pseudonymous_users_YYYYMMDD. Each day of data creates a new table. These tables are listed as pseudonymous_users_(##) and just like other tables the name of this table also contains the date.
    • Key things about this table:
    • It contains a row for every pseudonymous identifier.
    • This table is updated when there is a change to data in any one of the fields.
    • Users who have not given consent, are not exported to this table.
    • Known User IDs are not exported to this table (see the User ID table below for this information)
    • The last active timestamp is exported to this table.
  2. Users – This table contains information about known users. This table will show up if you enable User ID integration with GA4. The naming format of this table is users_YYYYMMDD and is listed under users_(##). Key things about this table:
    • It contains a row for every user ID. D
    • This table is updated when there is a change to data in any one of the fields.
    • Users who have not given consent can be exported to this table if they include a user ID.
    • Pseudouser IDs are not exported to this table, they are contained in the Pseudonymous Users table
    • The last active timestamp is exported to this table.

Data Exported in Pseudonymous Users and User Tables

The daily export in the above tables includes any new users and all the users whose data has changed that day. If you need some examples to see which users will be included in the daily export then check the “Advanced Notes” section below.

Tables Schema

The schema for both tables is very similar, with just a minor difference.

The data in these tables can be grouped into a few major buckets, as listed below:

  1. User ID
  2. User Info
  3. Privacy Info
  4. User Properties
  5. Device
  6. Geo
  7. Audiences
  8. User Lifetime
  9. Predictions
  10. Audit

Let’s take a look at what type of data is contained in the major buckets.

User ID

User ID information is stored in three different columns.

  • user_id, STRING (date type) – ID for the User-ID namespace in reporting identity (only available in the User table)
  • pseudo_user_id, STRING – ID for the Pseudonymous namespace (only available in the Pseudonymous Users table only)
  • stream_id, INTEGERData-stream- ID (only available in the Pseudonymous Users table)

User info

This data contains some important timelines of the user and is stored in the “user_info” column as a record data type. This type of column is called a record type column (or record data type) since it contains a record of information instead of just a single value. I will cover dealing with “records” in future posts. This column contains a record with the following information:

  • user_info.last_active_timestamp_micros, INTEGER – Containts the date of the user’s last activity, stored as timestamp in microseconds.
  • user_info.user_first_touch_timestamp_micros, INTEGER – Contains the date of the user’s first_open or first_visit event, whichever is earlier (timestamp in microseconds)
  • user_info.first_purchase_date, STRING – Contains the date of the user’s first purchase stored in YYYYMMDD format.

 Privacy info

The privacy setting of the user is stored in the “privcacy_info” column. This is also of type record and contains the following information:

  • privacy_info.is_ads_personalization_allowed, STRING – If a user is eligible for ads personalization, this value is set to ‘true’. If a user is not eligible for ads personalization, this value is set to ‘false’. isAdsPersonalizationAllowed returns ‘(not set)’ if Google Analytics is not currently able to return whether this user is eligible for ads personalization.
  • privacy_info.is_limited_ad_tracking, STRING – The device’s Limit Ad Tracking setting. Possible values include: ‘true’, ‘false’, and ‘(not set)’. isLimitedAdTracking returns ‘(not set)’ if Google Analytics is not currently able to return this device’s Limit Ad Tracking setting.

Audiences

This contains information about the audiences that the user belongs to. The data is stored in the “audiences” column. Since the user can belong to multiple GA4 audiences, each of them is listed in this column. This is also of type record and contains the following information:

  • audiences.id, INTEGER – This is an internal ID that GA assigns to each “Audience” that you create in the interface.
  • audiences.name, STRING -The name of the audience that you assign in GA4
  • audiences.
  • membership_start_timestamp_microsI, INTEGER – This field contains the timestamp when the user was first included in the audience.
  • audiences.membership_expiry_timestamp_micros, INTEGER The timestamp when the user’s audience membership expires. Membership duration is reset when a new activity requalifies the user for the audience
  • audience.npa, BOOLEAN This field stores true or false based on your NPA settings for events and user-scoped custom dimensions included in your audience definition (This is a GA4 topic that will covered as required in this series).

User Properties

The data is stored in the “user_properties” column of the type record. It contains the following information.

  • user_properties.key, STRING – This contains the key of the “user property” (dimension) that you send in GA4. The value of this key is stored in the next column.
  • user_properties.value.string_value, STRING – This contains the value of the user-property dimension
  • user_properties.value.set_timestamp_micros, INTEGER – This field contains the timestamp when the value was last set.

 Device

The information about the device the user used to visit your site or app is stored in the “devices” column. This data is also of type record and contains the following information:

  • device.operating_system, STRING – The operating system of the device used by the user.
  • device.category, STRING – The category of the device the user. The possible values are mobile, tablet, and desktop.
  • device.mobile_brand_name, STRING – The brand name of the user’s device.
  • device.mobile_model_name, STRING – The model name of the user’s device
  • device.unified_screen_name, STRING – The screen name of the user’s device

Geo

This field, “geo”, contains all the information about the user’s geolocation. This field is also of type record and stores the following information

  • geo.city, STRING – The city from which events were reported
  • geo.country, STRING – Contains the country from which events were reported
  • geo.continent, STRING – The continent from which events were reported
  • geo.region, STRINGthe region from which events were reported

 Lifetime

The column called “user_ltv” contains the user’s lifetime value data. This field is also of type record and store the following information:

  • user_ltv.revenue_in_usd, DOUBLEThe lifetime total revenue generated from this user (in USD)
  • user_ltv.sessions, INTEGERLifetime total number of sessions
  • user_ltv.engagement_time_millis, INTEGER -Lifetime total engagement time (in milliseconds)
  • user_ltv.purchases, INTEGER – Total number of purchases by this user
  • user_ltv.engaged_sessions, INTEGER- Lifetime total number of engaged sessions
  • user_ltv.session_duration_microsINTEGER – Lifetime total session duration (in milliseconds)

 Predictions

The column “predictions” contains the values based on machine learning (ML) predictions about Google’s model. It is of type record and contains the following values.

  • predictions.in_app_purchase_score_7d, DOUBLE – The probability that a user who was active in the last 28 days will purchase (log an in_app_purchase event) within the next 7 days
  • predictions.purchase_score_7d, DOUBLE- The probability that a user who was active in the last 28 days will purchase (log a purchase event) within the next 7 days
  • predictions.churn_score_7d, DOUBLE – Contains the probability that a user who was active on your app or site within the last 7 days will not be active within the next 7 days.
  • predictions.revenue_28d_in_usd, FLOAT- Contains expected revenue, in USD, from all purchase events within the next 28 days from a user who was active in the last 28 days

Audit

At the very end of the table, there are two separate columns containing the dates of the changes in the record. These columns are used for auditing purposes.

  • occurrence_date, STRING – Date when the record change was triggered
  • last_updated_date, STRING – Date when the record was updated in the table

Record Type Data in BigQuery Import of GA4

Many of the fields, as mentioned above, are Record type fields. This field contains the entire record in one column. Think of it as a table (record) within a table. 

Each record has a key value containing the name of the value (e.g. event_params.key) and the value containing the actual value.

Actual values are contained in a column that matches the data type of the value.  There are four possible data types, String, Integer, Float, or Double, and hence you will see four columns but only one of the columns contains the value while the other columns are null.

For example, for the event parameters, there is a column called, event_params.key. Let’s say this key value is session_number. This means the value column corresponding to this event parameter will contain the session number of the user, which is an integer value.  So event_params.value.int_value column is populated with the value while other columns are empty.

Interested in learning BigQuery for GA4? Subscribe to BigQuery Newsletter.

Leave a Reply

Your email address will not be published. Required fields are marked *