7. The need for Great Expectations (GX) Validation
The Great Expectations (GX) app is a good way to know which columns, as defined in your gx .yml
files, passed the validation stage. Ideally, GX validation should occur after the data ingestion phase, otherwise your tables won’t be visible in the GX app. Here is an example for a Social Impact community form.
The validations show which columns have their expectations met, eg if a column shouldn’t be null, which set values are allowed in that column and so on.
Here is an example of a failed validation.
The GX Validation Process
Initiating the GX Validation app begins from the Github Actions site. The process is almost similar to that of dab: Deploy
in the Data Ingestion phase.
Step 1: Github Actions
Go to the Github Actions site of the NIP-Lakehouse-Data repository. Click on the GX Deploy Static Azure Web Apps button under Actions on your left.
Step 2: Run workflow
Click on the Run workflow button on the top right. Ensure that the values to each of the dropdowns is set to as below:
-
Branch
:dev
-
The branch to build
:dev
-
The environment to build to
:dev
Click on Run workflow.
The workflow should start to run as indicated with the spinning wheel.
Wait until the workflow has successfully run. It can take quite some time but refresh the Github Actions page in case you suspect the workflow is taking unnecessarily long.
Step 3: Open the GX Validation app
To open the GX validation app, go to this link.
The Validation results tab shows the results of your gx expectations per subtable as set out in their respective .yml
files. The Expectation suites shows the .yml
files that have been read during the data ingestion process.
Step 4: Filter results
Use the filter boxes to filter to the subtable you want to assess. For example, if I only want to view the SI Community subtables, I would do as follows.
Go to the most recent run, as indicated by the version number appended to the end of the asset name. Each successful run results in the version number updating incrementally. In this case the most recent run is indicated by version 2.
Click on the subtable to view its GX validation results.
Step 5: View validation results
Inside the gx_interface, the Show All validation filter will show all validation results irrespective of whether it was a success or failure. The Failed Only will show only those columns whose validations failed.
The Table of Contents displays the columns of that table. You can click on any column to zoom into its validations. For now, we are interested in the failed validations of which we want to debug. Click on Failed Only button. The Table of Contents updates likewise to match the selection.
From then on, it’s about brainstorming how to debug what is causing the validation to fail. This will involve making modifications to the gx yml file and beginning the data ingestion process. It is an iterative process.