2. The Workspace

2.1 The workspace browser

Provided you have been granted access to the Azure Databricks platform, you should have a folder with your email under the Workspace>Users dropdown. The sub-folders within your email username will be those folders you will create or upload to Azure Databricks in the course of your work.

Azure workspace account

If you click your workspace browser, to the right will be a new interface like shown below:

Account workspace interface

You can use this to create the following: new folders, git folders, notebooks and more. One can also share their workspace with other people, including setting the permission level. All this is possible through the Create and Share buttons.

2.2 The .ide folder

The .ide directory under your Users/<username>/.ide is a special directory. This is the directory that is created when you sync your folders from your local VS Code in your PC to your personal Azure Databricks workspace.

Most of the files that we use for ingestion are found in the dab folder. Therefore, when syncing from VS Code to Azure databricks workspace, the dab folder will be found under the User/<username>/.ide/dab-<some-random-number>. We use the Databricks Extension for Visual Studio Code to sync our data ingestion files and folders to Azure Databricks. Currently, the most preferred version is v1.3.1.

Below is an example of the Databricks extension for VS Code having already synced our dab-<some-number> with our online Azure Databricks workspace.

Databricks extension

Here how our dab folder appears in our Azure Databricks workspace.

Our dab folder in Azure databricks

You will be mostly working with the dab when using notebooks within your Azure Databricks workspace.

2.3. The Repos folder

The Git Repos folder enables you to perform version control with your Git account right from the databricks UI.

Common operations that you can perform include: clone, pull, push, commit, checkout and branch management.

If you click on the Repos folder, you should see your email username.

Github Repos

To connect to one of your Git repositories, click on the Create button at the top right of the UI.

Git folder

Fill in the required field and click Create Git Folder.

If you do so, the connected git repository should appear as one of the linked repos under your workspace name. Here is an example of some of the repositories and branches linked to the author’s workspace.

Linked Git folders

2.4. Practical

To see a practical of using your workspace, follow this tutorial of data ingestion process. This tutorial shows the process of downloading data from ArcGIS Online (AGOL) to Azure Data Lake Storage Gen 2. We refer to the latter as the landing stage. For the full context, see this.

Login in to your Azure portal.

In your Workspace tab, go to this path- /Workspace/Users/<user-email>/.ide/dab-47fc1c58/development/gx_deploy_yml.

This is the flle that will load the great expectations yaml files that you created. The last cell contains the paths to your great expectations .yaml files.

Just a few things to consider:

i. Ensure that user_name variable corresponds with your email.

ii. Ensure that is_dev variable is set to False.

iii. Make sure that survey_abbr matches to the abbreviation of the form you want to ingest into bronze. For example, when dealing and having created the yml files for a form abbreviated as xprize_sens_reg, the survey_abbr value will be xprize_sens_reg.

Once you are satisfied every value is okay, click Run all at the top. This should run all the cells in the notebook.

If there is no issue with your yml files, the last cell should display a list of bars and all should reflect as 100%. This means that your data ingestion into bronze worked perfectly.