Introduction

Good documentation enables others understand your project and the datasets that they used. dbt offers a way to not only create the documentation, but render it as a website which is more portable than pdf documents.

Generating documentation

Prior to generating documents, it is important to consider that your dbt run -t dev or dbt run -t stg is working successfully. You can know that the dbt run command run successfully if you got an output like the one below:

13:35:41  Done. PASS=39 WARN=0 ERROR=142 SKIP=9 TOTAL=190

After you’ve executed your dbt run command and it has performed the necessary transformations in your data warehouse, such as creating views, you can now generate your documentation.

Step 1: Ensure the virtual environment is active

This is done by activating the environment via: source nip-dbt-venv/bin/activate.

Step 2: Moving into the src folder

Assuming you are still in the ~/github4/NIP-Lakehouse-Data/NIP-Lakehouse-Data/dbt this is where all our models are, move into the src/ folder using cd src/.

Call the environment variables

We will use the environment variables that enable us to push our work to the dev environment. Call them like so: source /home/sammigachuhi/github4/NIP-Lakehouse-Data/NIP-Lakehouse-Data/vars/.env.example.

Running the generate documentation command.

This is where we run the documentation command.

dbt docs generate

dbt will generate an output like below at the very end.

-- snip --
Found 198 models, 1 seed, 352 tests, 216 sources, 0 exposures, 0 metrics, 684 macros, 0 groups, 0 semantic models
06:05:09  
06:07:29  Concurrency: 1 threads (target='dev')
06:07:29  
06:07:43  Building catalog
06:08:01  Catalog written to /home/sammigachuhi/github4/NIP-Lakehouse-Data/NIP-Lakehouse-Data/dbt/src/target/catalog.json

If you see that the catalog has been written somewhere then dbt has generated the documentation successfully. Some warning will be generated though, such as below. Although not recommended, these can be ignored since dbt can still generate documentation even with warning. Errors are what should be avoided at all costs.

06:04:59  Running with dbt=1.7.14
06:05:01  Registered adapter: databricks=1.7.14
06:05:01  Unable to do partial parsing because config vars, config profile, or config target have changed
06:05:05  [WARNING]: Did not find matching node for patch with name 'xprize_community_sublayer_survey' in the 'models' section of file 'models/bronze/xprize_community/schema_xprize_community_sublayer_survey.yml'
06:05:05  [WARNING]: Did not find matching node for patch with name 'si_community_sublayer_survey' in the 'models' section of file 'models/bronze/si/community/schema_si_community_sublayer_survey.yml'
06:05:06  [WARNING]: Did not find matching node for patch with name 'bronze_acstc_subtable_repeat_deploy' in the 'models' section of file 'models/bronze/acstc/deploy/schema_acstc_subtable_deploy.yml'

-- snip -- 
06:05:08  [WARNING]: Test 'test.nip_dbt.relationships_bronze_acstc_subtable_repeat_deploy_recorder__member_name__ref_bronze_dev_ns_team_list_.bff6851103' (models/bronze/acstc/deploy/schema_acstc_subtable_deploy.yml) depends on a node named 'bronze_acstc_subtable_repeat_deploy' in package '' which was not found
06:05:08  [WARNING]: Test 'test.nip_dbt.relationships_bronze_acstc_sublayer_repeat_grid_location_check_recorder__member_name__ref_bronze_dev_ns_team_list_.711b031d64' (models/bronze/acstc/check/schema_acstc_sublayer_check.yml) depends on a node named 'bronze_acstc_sublayer_repeat_grid_location_check' in package '' which was not found

Thereafter, you can run the following command to tunnel the documentation website to a port called 8080.

dbt docs serve --port 8080

The following will be the generated output:

06:21:12  Running with dbt=1.7.14
Serving docs at 8080
To access from your browser, navigate to: http://localhost:8080
--snip--

You can now proceed to a private browser and key in localhost:8080/.

The dbt website will appear with all the documentation you have set up for your subtables, fields and package names.

Dbt documentation

Dbt documentation

At the bottom right, one can view the lineage graph of the models. Click on the blue View Lineage Graph button. A lineage of all the models we’ve been working on will pop up.

Lineage graph

Alternatively, if you select a model from the bronze subfolder, say pitfall_trap_sublayer_survey dbt will show the lineage graph of that specific model.

Lineage graph

It is highly encourage to play around with the dropdown buttons such as resources, packages, tags, –select and –exclude below the lineage graph to further understand what they do.

Generating documentation from Github actions

In your organization Github, go to Natural-State>NIP-Lakehouse-Data repository. This is where all the lakehouse code is stored.

Go to the Actions tab. Under the list of workflows, select Dbt: Deploy.

Github actions

Click on the Run workflow button on the top right of the Dbt: Deploy workflow interface.

Ensure the parameters are set as below:

  • Use workflow from: dev
  • The branch to build: dev
  • The environment to deploy to: dev

Once everything is set, click the Run workflow button. The workflow is successful is there is a green tick next to the workflow name.

Run workflow

Wait for the workflow to run. Thereafter go to this link. You may need to sign in.

If your workflow was successfull, the newly added models and their documentation should also be visible.

Dbt website