Extracting and Saving MODIS NDVI Data to Azure Blob Storage using Google Earth Engine and Python

Stage 1: Import Earth Engine Library

import ee

This line imports the Earth Engine Python API library (ee), allowing to interact with Google Earth Engine services from within a Python environment.

Stage 2: Authenticate using User-based Authentication

ee.Authenticate()

This line initiates the authentication process with Google Earth Engine. When we run this line, it typically opens a new browser window or tab, prompting to log in to our Google account.

Follow the authentication steps to grant the necessary permissions to the Earth Engine Python API.

Once the authentication is successful, a verification code is provided. We might need to copy and paste this code back into the notebook or provide it in the authentication prompt. This step establishes the connection between Python environment and Google Earth Engine account, allowing you to access Earth Engine data and services.

Step 3: Initialize Earth Engine

ee.Initialize()

In the Earth Engine Python API, the ee.Initialize() function is used to initialize the Earth Engine API library. This function must be called after authentication using ee.Authenticate() to set up the necessary configurations for interacting with Google Earth Engine services.

This line initializes the Earth Engine API library in the current Python environment. It sets up the required configurations and establishes a connection with Google Earth Engine servers.

Initializing the Earth Engine API also enables various features and functionalities provided by the Earth Engine services, such as querying and processing geospatial data, running analyses, and accessing Earth Engine’s extensive image and dataset collection.

Stage 4: Install earthengine-api azure-storage-blob

pip install earthengine-api azure-storage-blob

earthengine-api Package:

Allows to interact with Google Earth Engine services using Python. Provides functionalities for querying, analyzing, and visualizing geospatial data.

azure-storage-blob Package:

Enables interaction with Azure Blob Storage from Python. Provides tools for managing blobs (objects) in Azure Blob Storage, such as uploading, downloading, and deleting data. By installing these packages, we can leverage the capabilities of Google Earth Engine and Azure Blob Storage in our Python projects.

Stage 5: Define the Region of Interest (ROI)

roi = ee.Geometry.Rectangle([-0.510375, 51.286760, 0.334015, 51.691874])

This line of code defines a Region of Interest (ROI) using Google Earth Engine (EE).

ee.Geometry: It is a class in the Earth Engine API that represents geometric shapes, such as points, lines, and rectangles.

Rectangle([-0.510375, 51.286760, 0.334015, 51.691874]): It creates a rectangular geometry specified by its bounding box coordinates. The format of the coordinates is [west, south, east, north].

west: The western longitude coordinate of the bounding box.

south: The southern latitude coordinate of the bounding box.

east: The eastern longitude coordinate of the bounding box.

north: The northern latitude coordinate of the bounding box.

Stage 6: Retrieve SAS Token from Databricks Secrets

sasToken = dbutils.secrets.get(scope="geenew1-secret", key="geeincoming")

This line of code retrieves a Shared Access Signature (SAS) token from Databricks secrets. Let’s break down the components:

dbutils.secrets: Databricks provides a utility called dbutils for managing secrets. Secrets are a secure way to store and manage sensitive information, such as API keys or tokens.

.get(scope=”geenew1-secret”, key=”geeincoming”): This part of the code is using the get method to retrieve a secret value. It takes two parameters:

scope: The scope is like a namespace or a category for organizing secrets. In this case, the scope is set to “geenew1-secret”.

key: The key is the specific identifier or name for the secret within the given scope. In this case, the key is set to “geeincoming”.

So, dbutils.secrets.get(scope=”geenew1-secret”, key=”geeincoming”) fetches the secret value associated with the key “geeincoming” in the secret scope “geenew1-secret”. The secret value, in this context, is likely a Shared Access Signature (SAS) token used for authentication with Azure Blob Storage.

Stage 7: Specify Azure Storage Details

account_name = 'fromgee'
container_name = 'ndvi'
blob_name = 'ndvi_data10.tif'

These variables are defining the details needed to interact with Azure Blob Storage. Let’s break down each component:

account_name: This is the name of the Azure Storage account. In this case, it is set to ‘fromgee’. The Azure Storage account is the top-level namespace for our storage resources.

container_name: This is the name of the Azure Blob Storage container. Containers are used to organize and manage sets of blobs. Here, it is set to ‘ndvi’.

blob_name: This is the name of the specific blob within the container. A blob is a file of any type and size. In this case, it is set to ‘ndvi_data10.tif’, suggesting it might be a TIFF file containing NDVI (Normalized Difference Vegetation Index) data.

Together, these variables define the location and name of the blob where data will be uploaded or downloaded within the specified Azure Blob Storage account and container.

Stage 8: Define MODIS NDVI Collection

modis_ndvi = ee.ImageCollection('MODIS/006/MOD13Q1') \
    .filterBounds(roi) \
    .filterDate('2022-01-01', '2022-12-31') \
    .select('NDVI')
median_ndvi = modis_ndvi.median()

This code is using the Earth Engine Python API to interact with the MODIS (Moderate Resolution Imaging Spectroradiometer) NDVI (Normalized Difference Vegetation Index) dataset.

ee.ImageCollection(‘MODIS/006/MOD13Q1’): This line initializes an Earth Engine ImageCollection for the MODIS NDVI dataset version 6 (MOD13Q1). An ImageCollection is a stack or time series of images.

.filterBounds(roi): It filters the ImageCollection to include only images that intersect the specified region of interest (roi). The roi was defined earlier using ee.Geometry.Rectangle.

.filterDate(‘2022-01-01’, ‘2022-12-31’): It further filters the ImageCollection to include only images acquired between January 1, 2022, and December 31, 2022. .select(‘NDVI’): It selects the ‘NDVI’ band from each image in the ImageCollection. NDVI is a common vegetation index used in remote sensing to assess the presence and health of vegetation.

median_ndvi = modis_ndvi.median(): It computes the median of the ImageCollection. This step reduces the stack of images into a single image, where each pixel value represents the median NDVI value over the specified time range and region.

After executing these steps, median_ndvi contains an Earth Engine Image representing the median NDVI values for the specified region and time period. This information can be further processed or visualized using Earth Engine capabilities.

Stage 9: Convert Earth Engine Image to NumPy Array

ndvi_array = median_ndvi.select('NDVI').sampleRectangle(region=roi).get('NDVI').getInfo()

median_ndvi.select(‘NDVI’): It selects the ‘NDVI’ band from the median_ndvi image. The median_ndvi image represents the median NDVI values over a specific region and time period.

.sampleRectangle(region=roi): It samples the NDVI values from the selected band within the specified region of interest (roi). The result is a dictionary containing information about the sampled values.

.get(‘NDVI’): It extracts the values corresponding to the ‘NDVI’ band from the dictionary obtained in the previous step.

.getInfo(): It retrieves the information from the Earth Engine server and converts it into a Python object. In this case, it retrieves the NDVI values as a Python list. After executing these steps, ndvi_array contains a list of NDVI values sampled from the ‘NDVI’ band of the median_ndvi image within the specified region (roi). These values can be further used for analysis or visualization using Python libraries such as NumPy or for integration with other workflows.

Stage 10: Connect to Azure Blob Storage

blob_service_client = BlobServiceClient(account_url=f'https://{account_name}.blob.core.windows.net', credential=sasToken)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(blob_name)

blob_service_client: It is an instance of the BlobServiceClient class, which is part of the Azure Storage Blob Python library. This client provides access to the Blob service in Azure Storage.

BlobServiceClient(account_url=f’https://{account_name}.blob.core.windows.net’, credential=sasToken): It initializes the BlobServiceClient with the Azure Storage account URL and the Shared Access Signature (SAS) token retrieved from Databricks secrets (sasToken). The SAS token provides limited-time access to the Azure Storage resources without exposing the account key.

container_client: It is an instance of the ContainerClient class, obtained by calling get_container_client(container_name) on the blob_service_client. This client represents a specific Azure Storage container within the account.

blob_client: It is an instance of the BlobClient class, obtained by calling get_blob_client(blob_name) on the container_client. This client represents a specific blob (binary large object) within the container.

After executing these steps, blob_client is ready to perform various operations on the specified blob, such as uploading, downloading, or deleting the blob in Azure Storage. It acts as a client for interacting with the specific blob identified by blob_name within the specified container (container_name) and Azure Storage account (account_name).

Stage 11: Save NDVI Array as Binary NumPy File to Blob

with io.BytesIO() as bio:
    np.save(bio, ndvi_array)
    bio.seek(0)
    blob_client.upload_blob(bio.read(), blob_type="BlockBlob", content_settings=ContentSettings(content_type='application/octet-stream'))

with io.BytesIO() as bio: It creates a BytesIO object (bio) that behaves like a file in memory. The with statement ensures proper cleanup and closure of the BytesIO object after the block.

np.save(bio, ndvi_array): It uses NumPy’s save function to save the ndvi_array (NumPy array) into the BytesIO object (bio). This effectively writes the binary data of the NumPy array into the BytesIO object.

bio.seek(0): It resets the position of the BytesIO object to the beginning (offset 0). This is necessary because the upload_blob method reads from the beginning of the file-like object.

blob_client.upload_blob(bio.read(),blob_type=”BlockBlob”, content_settings=ContentSettings(content_type=’application/octet-stream’)): It reads the binary data from the BytesIO object (bio) using bio.read() and uploads it to the Azure Blob Storage using the upload_blob method of the blob_client. The blob_type parameter specifies that it’s a block blob, and content_settings specify the content type as ‘application/octet-stream’.

Stage 12: Print Confirmation Message

print(f"NDVI data saved to Azure Blob Storage: {blob_name}")

The line print(f”NDVI data saved to Azure Blob Storage: {blob_name}”) is a print statement that outputs a message indicating that the NDVI data has been successfully saved to Azure Blob Storage, and it includes the name of the blob where the data was stored (blob_name).