8. Significance of ArcGIS Online in the Data Ingestion process

Sometimes, errors in a job ingestion run are related to some of the settings in ArcGIS Online (AGOL). To contextualize this, consider the below job run which at first, both tasks were totally unsuccessful while in the subsequent run, the first task was successful.

Failed job run

In a data ingestion run (for NS), there are two tasks:

  • ingestion run: this loads data from AGOL to the NS Azure Data Lake Storage Gen2.

  • bronze run: this extracts the data from Azure Data Lake Storage Gen2 to bronze folders. The bronze folder stands for the bronze stage, the data is still raw, as is without any modifications from AGOL.

Taking a look at the first job run, the first task had failed, which led to the second task not running at all.

Failed tasks

If you click on the first task, the one labelled survey123_gem_soil_respiration_ingestion_landing run you will get a Error code: 400 which in the internet world is related to server issues.

Error

Error

If you go back one step and proceed to the second task of the pipeline, the one labelled survey123_gem_soil_respiration_landing_bronze run, you will see that this task didn’t even start since upstream processes were unsuccessful. Although not always the case, the Error code: 400 is associated with some settings not activated in ArcGIS online.

Error

8.1. Optimizing AGOL settings for data ingestion

To avoid such frustrating errors, we will use the example of the Gem Soil Respiration form, where at least the first task (ingestion run) was successful. We will use the AGOL settings for this as a template for how all other forms’ AGOL settings should be.

Step 1: Signing in to AGOL

Sign in to your AGOL account using your provided NS credentials.

Step 2: Locating the feature layer

Go to Content>My Groups tab of AGOL. Depending on the group you have been placed in, you can only access the forms within that particular group. Type ‘GEM Soil Respiration’ and the Feature layer, Web map and Form links should appear.

Gem Soil Respiration

Step 3: Settings

Click on the Gem Soil Respiration Feature layer link and go to the settings tab.

Under the General tab, ensure your settings are as in the below image.

General settings

Under the Feature layer (hosted) tab, ensure the settings are as below.

Feature settings

For the Manage indexes subsection, the settings should match as seen below.

Manage indexes

Leave the Field indexes subsection untouched.

Once the above settings were checked and the Gem Soil Respiration pipeline rerun, the first task (ingestion run) was successful.

Task successful

The error that led to the second task (bronze landing run) to fail is related to some validation checks. More of than not, the ValueError: Encounterd error … is related to some GX expecations not being met.

Task unsuccessful

Task unsuccessful

Taking a look at the Great Expectations webpage shows that the multiple values per record for some of the subtables is a suspect for breaking the pipeline.

Cause of error

From experience, it has been noted one should not provide the values for those columns that accept multiple choices. For example, in the gx yml files, the column with the schema name reasons_sp1 and its corresponding values have been commented out under the columns_mapped_values key. If it were a S123 field that accepts only one answer, rather than multiple, the pipeline would have worked.

Comment out fields with multiple values