Hit Enter to Search or X to close
Why is BigQuery data different from the one you see in Google Analytics 4
If you are considering using BigQuery to use your Google Analytics 4 (GA4) data, or if you are already doing so, you might want to know or better understand what are the differences between the data accessible in both tools.
We might think that the data exported in BigQuery is exactly the same as the one that can be used in GA4 reports. Well… it is not the case, and this is because of many reasons. In this article, we explore why.
Note: in this article, we assume that you are using the builtin data export from Google Analytics 4 to BigQuery. If you are using a connector, pipeline or the GA4 API, the elements listed here might not apply.
A first factor to take into account is the fact that the data that GA4 collects through Google Signals is not exported to BigQuery. It means that you won’t be able to use gender, age and interest dimensions.
Upon inspecting BigQuery’s GA4 export schema (its data structure), you probably noticed the absence of key data.
For instance, you don’t have a landing page or even a page field. Accessing such data requires the use of SQL to extract information mostly from events collected by GA4. For the pages, for instance, you would want to use the automatically collected page_view event with its page_location parameter (the full page URL) and count how many times it was triggered with each URL.
One big advantage of exporting Google Analytics 4’s data into BigQuery is the fact that, when stored in this data warehouse, it is not subjected to custom dimensions, metrics and user properties quotas which are applied at the GA4 property level.
This is due to the fact that BigQuery receives raw data that you can freely access using SQL. The tool allows direct access to raw data, providing flexibility to retrieve information that may not be accessible in GA4 reports.
Unlike in GA4's UI, data stored in BigQuery does not face issues such as sampling, thresholding or cardinality.
Sampling happens when the data you are trying to load with a report is just too voluminous and that GA4 needs to use a subset of it in order to respond to your request.
Thresholding is related to Google Signals and aims at protecting personal data.
And finally, cardinality refers to how many unique values are stored for a dimension for one day. Above 500, GA4 might load some of your data into a (other) row.
BigQuery gives you access to the purest available data from your GA4 property, with none of these limitations.
Once you set up a GA4 data export to BigQuery, your data is exported daily in a dedicated (for each day) table, starting with the day the setup was done. But the process has one day lag, since the data needs to be “final”.
However, if you chose to export streaming data, your BigQuery project is storing almost real time data in dedicated tables (events_intraday_*).
Because of data processing, GA4 reports might take up to 48 hours before presenting final data.
The key information to keep in mind is that the GA4 data stored in a BigQuery project is raw. As mentioned in the previous sections of this article, this is not truly the case for the reports available directly in Google Analytics 4.
But, did you know that there are also differences between GA4 report types? You can check Google’s documentation, however it doesn’t give you all the information you need to know.
Standard reports (from the Reports section) use aggregated data, providing a stripped-down version of individual user data. Explorations, while offering access to raw event and user-level data, are subject to the limitations discussed in this article (with the data retention question on top).
Want more? Here are other articles for you