Species Classification Model Training

Data Aggregation

These datasets were originally stored in multiple cloud storage locations, as needed to be retrieved in a logical manner. However, this made it difficult to access the data in a consistent manner. To overcome this, we aggregated the data into a single cloud storage location. This was done using the following script:

Link to script

The above script aggregates the data into a single dataset, and then randomly extracts 2000 images per species. Here is an example of such a dataset:

This is done to ensure that the dataset is balanced, and that the model is trained on a representative sample of the data. The script then exports the data to a csv file, which is used to download the data from cloud storage.