Extract data from Google Earth Engine
The code to extract GEE data is run using the Python API in Google Colab notebooks (which are Jupyter notebooks hosted by Colab). Using Colab means there is no need for local software/library installations, and it also leverages Google’s cloud computing resources.
The latest version of the Earth Engine Python client library is already installed on Google Colab, but note that there are other packages that will require installation via pip (these will need to be installed for each new runtime).
Account and activation
The first thing you need to do is create an GEE account, see here for details. Once you’ve done that it would be advisable to test the Python API and check that everything is working. See here for a short introduction to using the Python API in Google Colab.
In order to run code in the Colab notebook you will need to import, authenticate, and initialize the ee
library each time you start a new runtime. This corresponds to the first chunk of code in each notebook. Follow the prompts that the authentication creates and paste your authorization code back into the notebook.
Notebooks
There are currently 19 notebooks in the GitHub repo, and each notebook deals with a specific data source/platform. They are spilt up because each data set requires slightly different processing and export steps, however there is a lot of redundancy in some parts of the code. To open the notebooks stored in the repo, use the Open in Colab
button at the top of the notebook. You will need to save your own copy of the notebook in your Google Drive folder if you are making edits.
Once you’ve completed authentication after starting the runtime, the next step is to define the arguments that relate to: the area of interest (AOI), the name of the layer (which is a unique identifier linked to the metadata spreadsheet - note that the “RS” prefix stands for remote sensing), the image reducer metric, start and end dates for annual layers, and date ranges for seasonal layers. Not all arguments are used because some layers are single GEE Images
while others are ImageCollections
from which composites need to be created. Below is an example of how the arguments are specified.
# Area of interest
aoi = ee.FeatureCollection("projects/ns-agol-rs-data/assets/MKR_PACE")
aoi_name = "MKR_PACE"
# GEE layer ID
layer_name = "RS_001"
# Image reducer (options: mean, median, min, max, stdDev, sum, product)
img_col_reducer = "mean"
# Date parameters (for sentinel there aren't 10 years of data available)
start_year = 2016
end_year = 2022
# Range doesn't include the stop value
year_list = ee.List(list(range(start_year, end_year+1)))
# Season parameters (months)
rain_start = 3
rain_end = 5
dry_start = 7
dry_end = 10
AOI
An essential part of the workflow is to have a GEE FeatureCollection
that represents the spatial footprint AOI which defines the extent of the output layers. This FeatureCollection
must be stored as a GEE asset and is imported into the notebook as follows:
aoi = ee.FeatureCollection("projects/ns-agol-rs-data/assets/MKR_PACE")
In this case the AOI is called MKR_PACE
and is stored within the ns-agol-rs-data
Google Cloud Project. See this guide for how to get setup with Cloud Assets and Cloud Projects. This path to the AOI asset will be unique to you and how you’ve specified your cloud project structure.
Make a note of where you stored the AOI shapefile that you’ve uploaded as an asset. You will need it in the subsequent steps below.
Running and exporting
The next section of code in the notebooks deals with importing various Image
or ImageCollections
and processing them (e.g., running spatial and temporal reducers, removing clouds, etc). The code is annotated and should be easy enough to follow.
The final step is exporting the data. The code is currently setup to export the data to your Google Drive using either ee.batch.Export.image.toDrive
(raster data) or ee.batch.Export.table.toDrive
(vector data). If you want to use the code as is you will need to create a folder called GEE_exports
in the root directory of your Google Drive. Otherwise, you can change the folder
argument in the export function to point to your own folder. It’s recommend you keep the nomenclature so that it matches this original code because the downstream parts of the workflow will reference the GEE_exports
folder explicitly. Within the GEE_exports
folder you will also need to create a folder called sent_to_gdb
(which will be used as a storage location of the raw data after it’s been imported into the GDBs).