Species Classification Model Training

COCO Annotation

The first step in the training process is to annotate the data. This involves labelling each image with the species of the animal in the image. This is done using the COCO annotation format. This is a standardised format for annotating images, and is used by many different machine learning frameworks. The COCO annotation format is a JSON file, which contains the following information:

Info: This contains information about the dataset, such as the date it was created, and the version of the COCO annotation format that was used.
Licenses: This contains information about the license under which the dataset is released.
Images: This contains information about each image in the dataset, such as the image ID, the image URL, and the image dimensions.
Annotations: This contains information about each annotation in the dataset, such as the image ID, the category ID, and the bounding box coordinates.
Categories: This contains information about each category in the dataset, such as the category ID, and the category name.

The data we aggregated and uploaded to cloud storage is currently annotated, but there are no bounding boxes associated with each image. The following script is used to add bounding boxes to the data:

Link to COCO script

To use this script, you should first spin up a virtual machine on Azure. You should then clone the NIP-AI-Models repository, and navigate to the coco_annotations directory. You should then run the following command:

from coco import CocoDataset

def main():
    CocoDataset("./Africa_Training_Data_2000_v2/Africa_Training_Data_2000_v2/Set_1", "./results/Set_1/").construct_all()

if __name__ == "__main__":
    main()

Link to script

The resulting output is a COCO JSON file, which next needs to be converted to JSONL format to be ingested into Azure Machine Learning.