Species Classification Model Training

Data Download

Now that we have a csv file, containing the source and destination URL paths for every image in the dataset, we can download the data from cloud storage. This is done using the following script:

Link to script

The above script loops through the csv file, and copies the data from the source URL to the destination URL. This is done using the Azure Blob Storage Python SDK. The script also prints a message indicating that the copy operation has started, and another message indicating that the copy operation has completed. This is done to ensure that the script is running correctly.

We strongly reccommend spinning up a small virtual machine on Azure to carry out this data download process, ensuring that the process uses a no hangup approach (either nohup or tmux) you ensure that the process continues to run even if you disconnect from the virtual machine. This is important, as the data download process can take a long time to complete.