End-to-end Azure Databricks project

The purpose of this end-to-end workflow is to show how you might be using all of the databricks components discussed so far. We shall touch on using the compute, workspace, jobs, catalog, SQL editor and alerts. The current data ingestion process is one such workflow where we use a good deal of the databricks components we’ve seen so far.

1. Compute.

As discussed, the Compute browser hosts the compute resources for your project. Before you can run any notebook, go to the Compute tab and start your Personal Compute resource.

Starting personal compute

2. Workspace

To run a notebook, you can create or open an already existent one from your workspace.

For our data ingestion process, we already have a notebook provided by the Neoxia team. This is the gx_deploy_yml notebook found in this path: /Workspace/Users/<user-email>/.ide/dab-47fc1c58/development/gx_deploy_yml.

Click on it and it will open up in the same window.

Run notebook

If your compute has successfully started, there should be a green icon next to your compute resource name. Click Run all and ensure all the bars that appear after running the last cell are all green. If so, your data has been successfully loaded from AGOL to Azure Data Lake Storage Gen2.

3. Running a job

A job in databricks is a task that is set to run, either manually or on schedule.

To start a job, go to Workflows>Jobs and search for a Survey123 form abbreviation from the search bar.

Once you’ve selected your Survey123 form, say “survey123_gem_grass_litter” you can click on the Play button (Run) to the right of the form name. That will start the job for that particular S123 form.

Job run

If a job succeeds, all its dependent tasks should be displayed in green, with a status message of “Succeeded”.

Succeeded job

4. Catalog

The catalog is what consists all of your data assets such as tables, databases and AI models. For purposes of NS, any Survey123 form that has been ingested should appear in the catalog browser. For example, if the job run for gem_grass_npp survey succeeded, it’s constituent tables should appear under the catalog browser.

Catalog

Clicking on one of the subtables opens up an interface which shows both the data and other metadata.

5. SQL Editor

The SQL Editor is where you can save and run your queries. Picking up from the gem_grass_npp form, we shall perform a simple query that checks the data in the survey sublayer. Note that the SQL Editor can perform complex queries too.

Query

You can use the >> button next to a subtable’s name to quickly copy the subtable’s name to the SQL Editor.

If you are satisfied with your queries, you can save them into your workspace. In this case, the test tab which hosts our query for the gem_grass_npp is saved to the ns_queries folder under my workspace name.

Save query

One can also create a visualization by clicking on the + next to the Raw results, and selecting Visualization. A new interface that provides various visualization charts such as bar charts, pie charts and line charts appears. You can format with various visualization menus to create a bespoke visualization. However, it has been noted to create a proper chart, further data wranging of the raw data is required.

Visualization interface

6. Alerts

Alerts are notifications triggered when a certain condition is met.

For example, we might want to receive a notification when the count of objectids exceeds 1000. Thereafter we may want to create a new form all together or deprecate the current one.

To do so, in the Alerts browser, click on the Create alert button.

A new interface should show up. Provide a name for your alert, in this case we shall call it test1. For the Query dialog box, a dropdown will appear with a list of all saved queries. You will find our recently saved query, – test. Select it and the interface expands with more options for the Trigger condition, Notifications and Template.

Create alert

Since we want to be sent a notification when the count of our objectid values is greater than 1000, we will set our trigger condition as below. Yours may be different based on the trigger condition we want to create. This is just for demonstration purposes.

Trigger condition

You can also adjust the way you want to receive your notifications and also the message template. Here we’ve used a custom template.

Once you are satisfied with your alert configurations, click on the Create alert button.

Once you create the alert, you can further configure it by creating a schedule at which the SQL query should always run.

Add schedule

One can also define the schedule at which the SQL query will run and which specific individuals or groups at the Settings and Destinations menus.

Alert schedule

The kebab menu at the top right allows one to delete or clone an alert.

Alert options

By selecting Go to Folder on the kebab menu, Databricks will take you directly to where your alert is stored. That will be in your workspace folder!

Where alerts are stored

If you go back to the Alerts browser, you will see your new alert.