BirdNet Pipeline Architecture

Overview

The BirdNet pipeline processes large datasets of audio recordings, detects bird species, and stores results in designated Azure Blob Storage containers.

Key Components

Datastores:
- Deployed via Terraform/Terragrunt as part of the infrastructure-as-code setup.
- Refer to the NIP-Lakehouse-Infra repo for the Terraform code used to deploy these datastores
- landing_kutuma_hashed: Stores the raw audio recordings.
- ml_public_models: Holds the BirdNet model.
- bronze_audio_data: Stores intermediate processed results and final detection results.
Pipeline Phases:
- Phase 1: Analysis step with BirdNet.
  - Script executed: analyze.py.
  - Results stored in bronze_audio_data.
- Phase 2: Post-processing of audio data.
  - Script executed: segments.py.
  - Processed results saved in bronze_audio_data.
- Phase 3: Archiving of the audio data.
  - Script executed: archive_original_data_assets.py.
  - Archived data saved in landing_kutuma_hashed and other designated storage.

Pipeline Details

Pipeline Name: BirdNet-Natural-State-RBP-Pipeline.
Inference Script: analyze.py, segments.py, archive_original_data_assets.py.
Model Used: kenya classifier.
Execution Environment: birdnet-process-env (version 9).
Compute Clusters:
- CPU Cluster: Used for analysis, post-processing, and archiving steps.