Lakehouse

The purpose of this document is to provide a step-by-step guide on how to create and develop great expectations files and in extension, performing the data ingestion for Survey123 files.

Overview

  1. Initial Setup
  2. Github
  3. Great Expectations
  4. Data Ingestion

1. Initial Setup

This section details how to set up the appropriate environment for performing the great expectations tasks in your local machine. It shows how to install Azure CLI, Visual Studio (VS) code and Databricks extension for VS Code.

2. Github

This section showcases how to clone a Github repository, not just for the great expectations task but for any other task from Natural State’s Github repository.

3. Great Expectations

This sections showcases how to configure a .yml file, from which great expectations references from. It also outlines in length how to develop the expectations for a particular feature layer, and also the scripst used to map the values for particular fields.

4. Data ingestion

This section outlines how to sync your work from VS Code to databricks. It explains how to create checks of a feature layer using the already developed python notebooks. This section also describes the process of ingesting data, from Survey123 to landing and finally to bronze.

5. New Great Expectations Format

This section outlines how to configure a .yml file for the new data ingestion process. This section outlines the format for creating new .yml files.


Table of contents