BirdNet Pipeline Setup Instructions

Prerequisites

Access to the shared Azure Machine Learning (AML) workspace in the same subscription.
The following datastores should be configured:
- landing_kutuma_hashed: Contains raw audio recordings.
- ml_public_models: Stores the BirdNet model.
- bronze_audio_data: For processed and intermediate results.

Setting Up the Pipeline in Azure Machine Learning Studio

1. Log into Azure Machine Learning Studio

Go to Azure Machine Learning Studio and select the workspace in your subscription.

2. Verify Datastores

Ensure that the necessary datastores are set up:

landing_kutuma_hashed
ml_public_models
bronze_audio_data

3. Verify Pipeline Configuration

The current pipeline is named BirdNet-Natural-State-RBP-Pipeline and processes audio recordings in phases:

Phases 1-3 handle ingestion, analysis, post-processing, and archiving of audio data.

4. Running the Pipeline

Open the pipeline in AML Studio.
Verify the input data directory under landing_kutuma_hashed.
Set the output locations in bronze_audio_data.
Click Submit to start the pipeline.

5. Monitor Execution

Once submitted, monitor the pipeline’s progress under Experiments. Detailed logs will be available for each step of the pipeline.

Pipeline Deployment

The BirdNet pipeline is deployed and managed primarily via a scheduled endpoint. Documentation on how to deploy this endpoint using Terraform is available in the NIP-Lakehouse-Infra repository.

For further details on setting up the infrastructure required for the MegaDetector pipeline, consult the Terraform configuration.