BirdNet Pipeline Architecture

Overview

The BirdNet pipeline processes large datasets of audio recordings, detects bird species, and stores results in designated Azure Blob Storage containers.

Key Components

  • Datastores:
    • Deployed via Terraform/Terragrunt as part of the infrastructure-as-code setup.
    • Refer to the NIP-Lakehouse-Infra repo for the Terraform code used to deploy these datastores
    • landing_kutuma_hashed: Stores the raw audio recordings.
    • ml_public_models: Holds the BirdNet model.
    • bronze_audio_data: Stores intermediate processed results and final detection results.
  • Pipeline Phases:
    • Phase 1: Analysis step with BirdNet.
      • Script executed: analyze.py.
      • Results stored in bronze_audio_data.
    • Phase 2: Post-processing of audio data.
      • Script executed: segments.py.
      • Processed results saved in bronze_audio_data.
    • Phase 3: Archiving of the audio data.
      • Script executed: archive_original_data_assets.py.
      • Archived data saved in landing_kutuma_hashed and other designated storage.

Pipeline Details

  • Pipeline Name: BirdNet-Natural-State-RBP-Pipeline.
  • Inference Script: analyze.py, segments.py, archive_original_data_assets.py.
  • Model Used: kenya classifier.
  • Execution Environment: birdnet-process-env (version 9).
  • Compute Clusters:
    • CPU Cluster: Used for analysis, post-processing, and archiving steps.