7. The need for Great Expectations (GX) Validation

The Great Expectations (GX) app is a good way to know which columns, as defined in your gx .yml files, passed the validation stage. Ideally, GX validation should occur after the data ingestion phase, otherwise your tables won’t be visible in the GX app. Here is an example for a Social Impact community form.

GX Validation

GX Validation

The validations show which columns have their expectations met, eg if a column shouldn’t be null, which set values are allowed in that column and so on.

Here is an example of a failed validation.

Failed GX validation

The GX Validation Process

Initiating the GX Validation app begins from the Github Actions site. The process is almost similar to that of dab: Deploy in the Data Ingestion phase.

Step 1: Github Actions

Go to the Github Actions site of the NIP-Lakehouse-Data repository. Click on the GX Deploy Static Azure Web Apps button under Actions on your left.

Github Actions

Step 2: Run workflow

Click on the Run workflow button on the top right. Ensure that the values to each of the dropdowns is set to as below:

  • Branch: dev

  • The branch to build: dev

  • The environment to build to: dev

GX run workflow

Click on Run workflow.

The workflow should start to run as indicated with the spinning wheel.

GX workflow in progress

Wait until the workflow has successfully run. It can take quite some time but refresh the Github Actions page in case you suspect the workflow is taking unnecessarily long.

Successful workflow run

Step 3: Open the GX Validation app

To open the GX validation app, go to this link.

GX Validation app

The Validation results tab shows the results of your gx expectations per subtable as set out in their respective .yml files. The Expectation suites shows the .yml files that have been read during the data ingestion process.

Step 4: Filter results

Use the filter boxes to filter to the subtable you want to assess. For example, if I only want to view the SI Community subtables, I would do as follows.

Filtering subtables

Go to the most recent run, as indicated by the version number appended to the end of the asset name. Each successful run results in the version number updating incrementally. In this case the most recent run is indicated by version 2.

Click on the subtable to view its GX validation results.

Step 5: View validation results

Inside the gx_interface, the Show All validation filter will show all validation results irrespective of whether it was a success or failure. The Failed Only will show only those columns whose validations failed.

The Table of Contents displays the columns of that table. You can click on any column to zoom into its validations. For now, we are interested in the failed validations of which we want to debug. Click on Failed Only button. The Table of Contents updates likewise to match the selection.

Failed validations

From then on, it’s about brainstorming how to debug what is causing the validation to fail. This will involve making modifications to the gx yml file and beginning the data ingestion process. It is an iterative process.