MegaDetector Pipeline Setup Instructions

Prerequisites

You must have access to the shared Azure Machine Learning (AML) workspace in the same subscription,( currently is ns-ii-tech-stg-ml-workspace).
The following datastores are and should be configured:
- landing_kutuma_hashed: Contains raw images.
- ml_public_models: Stores the MegaDetector model (currently: md_v5b.0.0.pt).
- bronze_camera_trap: For processed results.
- bronze_megadetector: For detection results.

Setting Up the Pipeline in Azure Machine Learning Studio

1. Log into Azure Machine Learning Studio

Go to Azure Machine Learning Studio and select the workspace in your subscription.

2. Verify Datastores

Ensure that the necessary datastores are set up:

landing_kutuma_hashed
ml_public_models
bronze_camera_trap
bronze_megadetector

3. Verify Pipeline Configuration

The current pipeline is named MegaDetector-NaturalState-RBP-Pipeline and processes images in phases.
Phases 1-3 handle ingestion, detection, and post-processing of camera trap images.

4. Running the Pipeline

Open the pipeline in AML Studio.
Verify the input data directory under landing_kutuma_hashed.
Set the output locations in bronze_megadetector and bronze_camera_trap.
Click Submit to start the pipeline.

5. Monitor Execution

Once submitted, monitor the pipeline’s progress under Experiments. Detailed logs will be available for each step of the pipeline.

Pipeline Deployment

The MegaDetector pipeline is deployed and managed primarily via a scheduled endpoint. Documentation on how to deploy this endpoint using Terraform is available in the NIP-Lakehouse-Infra repository.

For further details on setting up the infrastructure required for the MegaDetector pipeline, consult the Terraform configuration.