BirdNet Pipeline Architecture
Overview
The BirdNet pipeline processes large datasets of audio recordings, detects bird species, and stores results in designated Azure Blob Storage containers.
Key Components
- Datastores:
- Deployed via Terraform/Terragrunt as part of the infrastructure-as-code setup.
- Refer to the NIP-Lakehouse-Infra repo for the Terraform code used to deploy these datastores
landing_kutuma_hashed
: Stores the raw audio recordings.ml_public_models
: Holds the BirdNet model.bronze_audio_data
: Stores intermediate processed results and final detection results.
- Pipeline Phases:
- Phase 1: Analysis step with BirdNet.
- Script executed:
analyze.py
. - Results stored in
bronze_audio_data
.
- Script executed:
- Phase 2: Post-processing of audio data.
- Script executed:
segments.py
. - Processed results saved in
bronze_audio_data
.
- Script executed:
- Phase 3: Archiving of the audio data.
- Script executed:
archive_original_data_assets.py
. - Archived data saved in
landing_kutuma_hashed
and other designated storage.
- Script executed:
- Phase 1: Analysis step with BirdNet.
Pipeline Details
- Pipeline Name: BirdNet-Natural-State-RBP-Pipeline.
- Inference Script:
analyze.py
,segments.py
,archive_original_data_assets.py
. - Model Used: kenya classifier.
- Execution Environment: birdnet-process-env (version 9).
- Compute Clusters:
- CPU Cluster: Used for analysis, post-processing, and archiving steps.