MegaDetector Pipeline Architecture
Overview
The MegaDetector pipeline processes large datasets of camera trap images, detects objects (animals, humans, vehicles), and stores results in designated Azure Blob Storage containers.
Key Components
- Datastores:
- Deployed via Terraform/Terragrunt as part of the infrastructure-as-code setup.
- Refer to the NIP-Lakehouse-Infra repo for the Terraform code used to deploy these datastores
landing_kutuma_hashed
: Stores the raw camera trap images.ml_public_models
: Holds the MegaDetector model (md_v5b.0.0.pt
).bronze_camera_trap
: Stores intermediate processed results.bronze_megadetector
: Stores the final detection results.
- Pipeline Phases:
- Phase 1: Image ingestion and initial object detection.
- The script
cameratraps_pipeline.py
is executed, and results are stored inbronze_camera_trap
.
- The script
- Phase 2 & 3: Post-processing and archiving of the images. The final results are saved in
bronze_megadetector
.
- Phase 1: Image ingestion and initial object detection.
- Pipeline Details:
- Pipeline Name:
MegaDetector-NaturalState-RBP-Pipeline
- Inference Script:
cameratraps_pipeline.py
- Model Used:
md_v5b.0.0.pt
- Execution Environment:
megadetector-env (version 4)
- Compute Clusters:
- GPU Cluster: Used for Phase 1 (object detection).
- CPU Cluster: Used for Phases 2 & 3 (post-processing and archiving).
- Pipeline Name: