Introduction

A model is just a sql file that tells dbt how to transform your data. At this stage, particularly when working with bronze files, the sql files will majorly be creating views of their respective subtables in our Azure cloud storage. For example, when running a model bronze_pitfall_trap_sublayer_survey for our pitfall form, a view by the name of pitfall_trap_sublayer_survey will be created.

Creating the sql models

A sql files in dbt is a model. The sql file contains a single sql statement. The model name is inherited from the file name.

Depending on the sql statement, the output can be a table or a view.

Here is the sql for pitfall trap sublayer survey called bronze_pitfall_trap_sublayer_survey.sql. The bronze prefix refers to the source of the table, in this case the bronze folder. The sql models are found in the same folder as the yaml files.

In a dbt sql model, the config() sets the configurations of your model. Think of configurations as the settings that can be set in your model and how you would like them to appear in your data warehouse.

Let’s go line by line.

The config() encapsulates the settings that define how your data will appear in the data warehouse.

materialized - defines how you would like your data to persist in the data warehouse. In our case, the data will persist as a view.

Pitfall trap survey view

alias="pitfall_trap_sublayer_survey" - this overrides the name of the model, which is essentially the filename. One can use the alias to provide a different label of how the file will appear in the data warehouse.

schema="bronze" - this is the target environment in which your models will be built. Definitely in this case it will be the bronze folder in the ns-ii-tech-dev-catalog in Azure.

tags=['pitfall_trap'] - a tag in dbt is used to categorize a model, test and/or other dbt artifacts. Tags have the following purposes:

  • Organization: Tags can help in organizing and grouping similar models or tests together.

  • Execution Control: You can run specific sets of models or tests based on their tags. This can be especially useful when you have a large number of models and you want to run a subset of them. For example, to only run models that have the tag pitfall you can do so through this: dbt run --select tag:pitfall. dbt will run the models having the tag pitfall even if they are in their hundreds.

Let’s go to this line:

with table_v1 as (
	select * from source('bronze', get_full_table_name('pitfall_trap_sublayer_survey'))
)

The with table_v1 as is a Common Table Expression (CTE) that creates the expression name table_v1. This table_v1 expression picks tables from the bronze schema and the table name defined in get_full_table_name() block of the select statement.

After the above, the select statement follows.

select
  objectid,
  globalid,
  date_time_details,
  project,
  area,
  survey,
  deployment,
  transport,
  field_team_list,
  activity,
  date_time_deploy,
  recorder_deploy,
  date_time_retrieve,
  recorder_retrieve,
  pitfall_trap_survey_comment,
  end_time,
  creator,
  editor,
  x,
  y,
  source_file_path,
  source_file_modification_time,
  creationdate,
  editdate
from
  table_v1

The above select statement picks the columns defined in the get_full_table_name() block, which are enveloped by the table_v1 statement. The selected columns are the ones that get displayed in the pitfall_trap_sublayer_survey view in the Azure storage.

Now that we have demonstrated how to create dbt files using just one subtable of the pitfall trap survey form, the same process will be replicated for other subtables such as repeat_deploy and repeat_retrieve.

Your endgame of the dbt file creation should look something like this:

DBT files

Automating the creation of sql models

A function by the name of dbt_sql_func has already been created for you. The function takes the following parameters: path to the Survey123 csv file (s123_path), the path to the Arcgis excel file (xls_form_path), the subtable name (subtable_name) and finally the tags (tags).

Here is the function in full swing.

dbt_sql_func(
    s123_path="../GEM_Leaf_Litter_09092024/leaf_litter_details_1.csv",
    xls_form_path="../GEM_Leaf_Litter_09092024/GEM Leaf Litter.xlsx",
    subtable_name="gem_leaf_litter_subtable_leaf_litter_details",
    tags=["gem", "leaf_litter"]
)

Here is the generated output.

The above function is available from here.