Extract data from Google Earth Engine

The code to extract GEE data is run using the Python API in Google Colab notebooks (which are Jupyter notebooks hosted by Colab). Using Colab means there is no need for local software/library installations, and it also leverages Google’s cloud computing resources.

The latest version of the Earth Engine Python client library is already installed on Google Colab, but note that there are other packages that will require installation via pip (these will need to be installed for each new runtime).

Account and activation

The first thing you need to do is create an GEE account, see here for details. Once you’ve done that it would be advisable to test the Python API and check that everything is working. See here for a short introduction to using the Python API in Google Colab.

In order to run code in the Colab notebook you will need to import, authenticate, and initialize the ee library each time you start a new runtime. This corresponds to the first chunk of code in each notebook. Follow the prompts that the authentication creates and paste your authorization code back into the notebook.

Notebooks

There are currently 19 notebooks in the GitHub repo, and each notebook deals with a specific data source/platform. They are spilt up because each data set requires slightly different processing and export steps, however there is a lot of redundancy in some parts of the code. To open the notebooks stored in the repo, use the Open in Colab button at the top of the notebook. You will need to save your own copy of the notebook in your Google Drive folder if you are making edits.

Once you’ve completed authentication after starting the runtime, the next step is to define the arguments that relate to: the area of interest (AOI), the name of the layer (which is a unique identifier linked to the metadata spreadsheet - note that the “RS” prefix stands for remote sensing), the image reducer metric, start and end dates for annual layers, and date ranges for seasonal layers. Not all arguments are used because some layers are single GEE Images while others are ImageCollections from which composites need to be created. Below is an example of how the arguments are specified.

# Area of interest
aoi = ee.FeatureCollection("projects/ns-agol-rs-data/assets/MKR_PACE")
aoi_name = "MKR_PACE"

# GEE layer ID
layer_name = "RS_001"

# Image reducer (options: mean, median, min, max, stdDev, sum, product)
img_col_reducer = "mean"

# Date parameters (for sentinel there aren't 10 years of data available)
start_year = 2016
end_year = 2022

# Range doesn't include the stop value
year_list = ee.List(list(range(start_year, end_year+1)))

# Season parameters (months)
rain_start = 3
rain_end = 5 
dry_start = 7
dry_end = 10

AOI

An essential part of the workflow is to have a GEE FeatureCollection that represents the spatial footprint AOI which defines the extent of the output layers. This FeatureCollection must be stored as a GEE asset and is imported into the notebook as follows:

aoi = ee.FeatureCollection("projects/ns-agol-rs-data/assets/MKR_PACE")

In this case the AOI is called MKR_PACE and is stored within the ns-agol-rs-data Google Cloud Project. See this guide for how to get setup with Cloud Assets and Cloud Projects. This path to the AOI asset will be unique to you and how you’ve specified your cloud project structure.

Make a note of where you stored the AOI shapefile that you’ve uploaded as an asset. You will need it in the subsequent steps below.

Running and exporting

The next section of code in the notebooks deals with importing various Image or ImageCollections and processing them (e.g., running spatial and temporal reducers, removing clouds, etc). The code is annotated and should be easy enough to follow.

The final step is exporting the data. The code is currently setup to export the data to your Google Drive using either ee.batch.Export.image.toDrive (raster data) or ee.batch.Export.table.toDrive (vector data). If you want to use the code as is you will need to create a folder called GEE_exports in the root directory of your Google Drive. Otherwise, you can change the folder argument in the export function to point to your own folder. It’s recommend you keep the nomenclature so that it matches this original code because the downstream parts of the workflow will reference the GEE_exports folder explicitly. Within the GEE_exports folder you will also need to create a folder called sent_to_gdb (which will be used as a storage location of the raw data after it’s been imported into the GDBs).