Lakehouse
The purpose of this document is to provide a step-by-step guide on how to create and develop great expectations files and in extension, performing the data ingestion for Survey123 files.
Overview
- Initial Setup
- Github
- Great Expectations
- Data Ingestion
1. Initial Setup
This section details how to set up the appropriate environment for performing the great expectations tasks in your local machine. It shows how to install Azure CLI, Visual Studio (VS) code and Databricks extension for VS Code.
2. Github
This section showcases how to clone a Github repository, not just for the great expectations task but for any other task from Natural State’s Github repository.
3. Great Expectations
This sections showcases how to configure a .yml
file, from which great expectations references from. It also outlines in length how to develop the expectations for a particular feature layer, and also the scripst used to map the values for particular fields.
4. Data ingestion
This section outlines how to sync your work from VS Code to databricks. It explains how to create checks of a feature layer using the already developed python notebooks. This section also describes the process of ingesting data, from Survey123 to landing and finally to bronze.
5. New Great Expectations Format
This section outlines how to configure a .yml
file for the new data ingestion process. This section outlines the format for creating new .yml
files.