AGOL Data Exploration and Storage to Azure Blob

Stage 1: Connect to ArcGIS Online (AGOL)

1.1 Retrieve ArcGIS Online Credentials from Databricks Secrets

agolUsername = dbutils.secrets.get(scope="agolnew-secret", key="username")
agolPassword = dbutils.secrets.get(scope="agolnew-secret", key="password")

The dbutils.secrets.get method is used to retrieve sensitive information (in this case, the AGOL username and password) stored in Databricks secrets.

The scope parameter specifies the secret scope where the secrets are stored.

The key parameter specifies the name of the secret to retrieve.

1.2 Authenticate with ArcGIS Online

gis = GIS("https://www.arcgis.com", username=agolUsername, password=agolPassword)

The GIS class from the ArcGIS API for Python is used to create a connection to the ArcGIS Online service.

The constructor of the GIS class takes parameters such as the URL of the ArcGIS Online portal (“https://www.arcgis.com”), the username (agolUsername), and the password (agolPassword).

The gis object represents the authenticated connection to the ArcGIS Online service.

Stage 2: Fetch Spatial Data from AGOL

2.1 Specify Desired Layer Details

layer_ids = "a6e7755a5df2476e987c0f5552bf85d6"
layer_name = "demo_site_boundary_layer"

The code sets the layer_ids variable to a specific identifier that corresponds to a feature layer in ArcGIS Online (AGOL). The layer_name variable is set to a human-readable name for the layer.

2.2 Fetch Spatial Data

feature_layer = gis.content.get(layer_ids)
flayer = feature_layer.layers[0]

The gis.content.get() method is used to retrieve information about a feature layer from AGOL based on the specified layer_ids.

The resulting feature_layer object represents the entire item (including metadata) related to the specified layer. flayer = feature_layer.layers[0] is used to extract the first (and presumably the only) layer from the feature layer item. The [0] index is used to access the first layer.

2.3 Query and Retrieve Spatial Data as a Spatial DataFrame (SDF)

sdf = flayer.query(out_sr=4326).sdf

The query method is called on the feature layer (flayer) to perform a query operation.

The out_sr=4326 parameter specifies the spatial reference in which the data should be returned. In this case, it is set to WGS 84 (EPSG:4326), a common spatial reference for geographic data in decimal degrees.

The result of the query is then converted to a Spatial DataFrame (SDF) using the sdf property. The sdf property is specific to the ArcGIS API for Python and allows easy conversion between feature layers and Pandas DataFrames with spatial capabilities.

Stage 3: Display Spatial Data on Databricks Notebook

display(sdf)

This code snippet displays the spatial data using Databricks’ built-in display function, which renders the spatial dataframe on a map within the notebook.

Stage 4: Explore and Print Attributes of Features in AGOL Layers

4.1 Print Column Names

print(sdf.columns)

This line prints the column names of the Spatial DataFrame (sdf). The columns attribute of a Pandas DataFrame contains the names of all the columns.

4.2 Print the Shape of the DataFrame

print(f"Shape of DataFrame: {sdf.shape}")

This line prints the shape of the Spatial DataFrame (sdf). The shape attribute of a Pandas DataFrame returns a tuple representing the dimensions of the DataFrame (number of rows, number of columns).

4.3 Print the First Few Records

print("First few records:")
print(sdf.head())

The first print statement is a label indicating that the following output is the first few records of the DataFrame.

The second print statement uses the head() method to print the first few rows of the Spatial DataFrame (sdf). The head() method is useful for quickly inspecting the data without printing the entire DataFrame.

Stage: 5 Retrieve and Explore Feature Services Owned by a User on ArcGIS Online

5.1 Search for Feature Services Owned by a Specific User

search_results = gis.content.search('owner:rpitman_naturalstate')

The gis.content.search() method is used to search for content in ArcGIS Online based on a query. In this case, the query searches for content owned by the user with the username ‘rpitman_naturalstate’.

5.2 Iterate Through Search Results

for one_search in search_results:

A for loop is used to iterate through the search results obtained in Step 1. Each one_search represents an item found in the search results.

5.3 Check if the Item is a Feature Service

if one_search.type == 'Feature Service':

Within the loop, there’s an if statement checking if the type of the current item (one_search) is ‘Feature Service’. This step ensures that only feature services are considered for further processing.

5.4 Access Feature Service Item and Layers

feature_service_item = one_search
feature_layers = feature_service_item.layers

If the current item is indeed a Feature Service, its reference is stored in feature_service_item, and the layers within the feature service are obtained and stored in feature_layers.

5.5 Iterate Through Layers and Query Data

for layer in feature_layers:
    results = layer.query(where='1=1')  # This retrieves all data
    for data in results.features:
        print(data.attributes)  # Use 'attributes' instead of 'as_dict'

Another for loop is used to iterate through each layer within the feature service (feature_layers). For each layer, a query (layer.query(where=’1=1’)) is performed to retrieve all data (where ‘1=1’ is a condition that retrieves all features).

The returned features are then iterated through (for data in results.features), and the attributes of each feature are printed using print(data.attributes).

5.6 Get Current Timestamp

timestamp = datetime.datetime.now().strftime("%Y_%m_%d__%H_%M_%S")

The code uses the datetime.datetime.now() function to get the current date and time. The result is formatted into a string using strftime with the specified format (“%Y_%m_%d__%H_%M_%S”). This timestamp can be used for various purposes, such as creating unique filenames or tracking when the script was executed.

Stage 6: Query AGOL Items Owned by the Authenticated User

6.1 Define the Query String

query_string = f"owner: {gis.users.me.username}"

This line constructs a query string for searching content in ArcGIS Online. The query is based on the owner’s username, which is retrieved from the currently authenticated user (gis.users.me.username). The f”owner: {gis.users.me.username}” syntax is using an f-string to embed the current user’s username in the query.

user_items = gis.content.search(query=query_string, max_items=-1)

The gis.content.search() method is used to search for content in ArcGIS Online based on the specified query string (query_string). The result is stored in the user_items variable.

The max_items=-1 parameter is used to indicate that there is no limit on the number of items returned. This ensures that all items matching the query are retrieved.

6.3 Print the Number of Items Returned

print(len(user_items), 'items returned belonging to the current user')

This line prints the number of items returned by the content search, indicating how many items in ArcGIS Online belong to the currently authenticated user.

Stage 7: Specify Feature Layer IDs and Fetch Feature Layer Information

7.1 Specify the Layer IDs and Layer Name

layer_ids = "a6e7755a5df2476e987c0f5552bf85d6"
layer_name = "demo_site_boundary_layer"

In this step, the code defines the layer_ids variable with a string representing the unique identifier(s) of the layer(s) in ArcGIS Online (AGOL). Additionally, it sets the layer_name variable with a human-readable name for the layer.

7.2 Get the Feature Layer from ArcGIS Online (AGOL)

feature_layer = gis.content.get(layer_ids)

The gis.content.get() method is used to retrieve the feature layer(s) from AGOL based on the specified layer_ids. The gis object represents the connection to the ArcGIS Online service, and content is used to access content (such as feature layers) within that service.

Stage 8: Fetch Spatial Data from Feature Layer and Convert to DataFrame

8.1 Fetch the First Feature Layer

flayer = feature_layer.layers[0]

This line fetches the first feature layer from the feature_layer object. In ArcGIS API for Python, a FeatureLayer represents a layer in a web GIS that contains features.

8.2 Convert Feature Layer to Pandas DataFrame

sdf = pd.DataFrame.spatial.from_layer(flayer)

This line converts the ArcGIS Feature Layer (flayer) to a Pandas DataFrame (sdf) using the pd.DataFrame.spatial.from_layer method. The spatial namespace is part of the ArcGIS API for Python and provides spatial capabilities.

8.3 Specify Desired Coordinate Reference System (CRS)

desired_crs = {'wkid': 4326}

This line specifies the desired Coordinate Reference System (CRS) for the data. In this case, the CRS is specified using the wkid (Well-Known ID), and the value 4326 corresponds to the WGS 84 coordinate system, which is commonly used for geographic data in decimal degrees.

8.4 Project the Data to the Desired CRS

sdf = flayer.query(out_sr=4326).sdf

This line projects the spatial data in the DataFrame (sdf) to the desired CRS (out_sr=4326). The query method is used to perform a query on the feature layer, and the sdf property is used to retrieve the resulting spatial DataFrame.

8.5 Check the Shape of the Resulting DataFrame

sdf.shape

This line checks and prints the shape of the resulting DataFrame (sdf). The shape represents the number of rows and columns in the DataFrame. The output will be in the form (number_of_rows, number_of_columns).

Stage 9: Save DataFrame to Azure Blob Storage

9.1 Retrieve Azure Blob Storage Account Credentials

sasTokenIncoming = dbutils.secrets.get(scope="geenew1-secret", key="geeincoming")

This line retrieves the Shared Access Signature (SAS) token for Azure Blob Storage from Databricks secrets. The SAS token is a way to provide limited-time access to Azure Storage resources without exposing account key.

9.2 Define Azure Blob Storage Account and Container

account_name = "fromgee"
container_name = "ndvi"

These lines set the Azure Storage account name (account_name) and the name of the container (container_name) where we want to upload our data.

9.3 Connect to Azure Blob Storage

blob_service_client =BlobServiceClient(account_url=f'https://{account_name}.blob.core.windows.net', credential=sasTokenIncoming)
container_client = blob_service_client.get_container_client(container_name)

Here, a connection to Azure Blob Storage is established using the BlobServiceClient from the azure.storage.blob library. The account_url is constructed using the Azure Storage account name and the SAS token for authentication. Then, a ContainerClient is obtained for the specified container.

9.4 Convert DataFrame to CSV String

csv_string = sdf.to_csv(index=False)

This line converts the Pandas DataFrame (sdf) to a CSV string using the to_csv method. The index=False parameter ensures that the DataFrame index is not included in the CSV output.

9.5 Specify CSV Filename

csv_filename = "check3.csv"

This line sets the desired filename for the CSV file that will be uploaded to Azure Blob Storage.

9.6 Upload CSV String to Azure Blob Storage

blob_client = container_client.get_blob_client(blob=csv_filename)
blob_client.upload_blob(csv_string, overwrite=True)

This code gets the BlobClient for the specified CSV filename within the container. It then uploads the CSV string to Azure Blob Storage using the upload_blob method. The overwrite=True parameter ensures that if a file with the same name already exists, it will be overwritten.

9.7 Print Upload Confirmation

print(f"CSV file '{csv_filename}' uploaded to Azure Blob Storage.")

This line prints a confirmation message indicating that the CSV file has been successfully uploaded to Azure Blob Storage.