Introduction
Good documentation enables others understand your project and the datasets that they used. dbt offers a way to not only create the documentation, but render it as a website which is more portable than pdf documents.
Generating documentation
Prior to generating documents, it is important to consider that your dbt run -t dev
or dbt run -t stg
is working successfully. You can know that the dbt run
command run successfully if you got an output like the one below:
13:35:41 Done. PASS=39 WARN=0 ERROR=142 SKIP=9 TOTAL=190
After you’ve executed your dbt run
command and it has performed the necessary transformations in your data warehouse, such as creating views, you can now generate your documentation.
Step 1: Ensure the virtual environment is active
This is done by activating the environment via: source nip-dbt-venv/bin/activate
.
Step 2: Moving into the src
folder
Assuming you are still in the ~/github4/NIP-Lakehouse-Data/NIP-Lakehouse-Data/dbt
this is where all our models are, move into the src/
folder using cd src/
.
Call the environment variables
We will use the environment variables that enable us to push our work to the dev
environment. Call them like so: source /home/sammigachuhi/github4/NIP-Lakehouse-Data/NIP-Lakehouse-Data/vars/.env.example
.
Running the generate documentation command.
This is where we run the documentation command.
dbt docs generate
dbt will generate an output like below at the very end.
-- snip --
Found 198 models, 1 seed, 352 tests, 216 sources, 0 exposures, 0 metrics, 684 macros, 0 groups, 0 semantic models
06:05:09
06:07:29 Concurrency: 1 threads (target='dev')
06:07:29
06:07:43 Building catalog
06:08:01 Catalog written to /home/sammigachuhi/github4/NIP-Lakehouse-Data/NIP-Lakehouse-Data/dbt/src/target/catalog.json
If you see that the catalog has been written somewhere then dbt has generated the documentation successfully. Some warning will be generated though, such as below. Although not recommended, these can be ignored since dbt can still generate documentation even with warning. Errors are what should be avoided at all costs.
06:04:59 Running with dbt=1.7.14
06:05:01 Registered adapter: databricks=1.7.14
06:05:01 Unable to do partial parsing because config vars, config profile, or config target have changed
06:05:05 [WARNING]: Did not find matching node for patch with name 'xprize_community_sublayer_survey' in the 'models' section of file 'models/bronze/xprize_community/schema_xprize_community_sublayer_survey.yml'
06:05:05 [WARNING]: Did not find matching node for patch with name 'si_community_sublayer_survey' in the 'models' section of file 'models/bronze/si/community/schema_si_community_sublayer_survey.yml'
06:05:06 [WARNING]: Did not find matching node for patch with name 'bronze_acstc_subtable_repeat_deploy' in the 'models' section of file 'models/bronze/acstc/deploy/schema_acstc_subtable_deploy.yml'
-- snip --
06:05:08 [WARNING]: Test 'test.nip_dbt.relationships_bronze_acstc_subtable_repeat_deploy_recorder__member_name__ref_bronze_dev_ns_team_list_.bff6851103' (models/bronze/acstc/deploy/schema_acstc_subtable_deploy.yml) depends on a node named 'bronze_acstc_subtable_repeat_deploy' in package '' which was not found
06:05:08 [WARNING]: Test 'test.nip_dbt.relationships_bronze_acstc_sublayer_repeat_grid_location_check_recorder__member_name__ref_bronze_dev_ns_team_list_.711b031d64' (models/bronze/acstc/check/schema_acstc_sublayer_check.yml) depends on a node named 'bronze_acstc_sublayer_repeat_grid_location_check' in package '' which was not found
Thereafter, you can run the following command to tunnel the documentation website to a port called 8080.
dbt docs serve --port 8080
The following will be the generated output:
06:21:12 Running with dbt=1.7.14
Serving docs at 8080
To access from your browser, navigate to: http://localhost:8080
--snip--
You can now proceed to a private browser and key in localhost:8080/
.
The dbt website will appear with all the documentation you have set up for your subtables, fields and package names.
At the bottom right, one can view the lineage graph of the models. Click on the blue View Lineage Graph button. A lineage of all the models we’ve been working on will pop up.
Alternatively, if you select a model from the bronze subfolder, say pitfall_trap_sublayer_survey
dbt will show the lineage graph of that specific model.
It is highly encourage to play around with the dropdown buttons such as resources, packages, tags, –select and –exclude below the lineage graph to further understand what they do.
Generating documentation from Github actions
In your organization Github, go to Natural-State>NIP-Lakehouse-Data
repository. This is where all the lakehouse code is stored.
Go to the Actions tab. Under the list of workflows, select Dbt: Deploy.
Click on the Run workflow button on the top right of the Dbt: Deploy workflow interface.
Ensure the parameters are set as below:
Use workflow from
:dev
The branch to build
:dev
The environment to deploy to
:dev
Once everything is set, click the Run workflow button. The workflow is successful is there is a green tick next to the workflow name.
Wait for the workflow to run. Thereafter go to this link. You may need to sign in.
If your workflow was successfull, the newly added models and their documentation should also be visible.