Import data to file geodatabases

ArcPy

This next part of the workflow is run in Python v3.7 using functions in the arcpy v3.1 package. In order to use arcpy you will need to have a valid ArcGIS Pro or ArcGIS Desktop license. See here and here for instructions on how to get arcpy working in your Python IDE (in this case, PyCharm). This part of the workflow is used to import and process layers that are in the GEE_exports folder mentioned in the preceding step. The code for this step is found here.

Log files: we’ve made use of the Python logging module to write various information, warning, and error messages to log files. Each script will generate it’s own log file with the suffix matching the script file prefix (i.e., logfile_01.log is created when you run 01_create_gdbs.py). These log files will write to the directory containing the Python script. Apart from what’s mentioned above, the log files also have information regarding processing times for each layer.

ArcPro project

Before running the scripts, you will need to create a ArcGIS Pro project called agol-data-workflows (which corresponds to the GitHub repo name) within the directory of the forked GitHub repo. The project will house the various geodatabases and output files. After creating the project run this code to create file geodatabases (GDBs) and output folders needed for the workflow. The GDB names are taken from the agol_layers_metadata.xlsx file which needs to be in the project root directory (the default location from the forked repo). If new layers are added to the metadata file you will need to specify them in the file and create then re-run the code to create the new GDBs.

Directories

The next step is to modify the agol_dirs.py module. Unfortunately many of the arcpy functions simply don’t work when using relative file and folder paths (especially when running the code interactively). It’s infinitely frustrating so the only safe option is to define your own file/folder paths in the agol_dirs.py. This module is imported in the subsequent processing steps so you only have to do it once, and can easily make changes if you move files/folders around on your local machine. See below for a short description of the paths:

  • proj_dir : directory of the ArcGIS Pro project.
  • metadata_dir : metadata Excel file path.
  • gee_dir: directory that contains the remote sensing layer downloads generated from the GEE scripts.
  • api_input_dir: directory where files are stored prior to upload into AGOL in the next processing step (see below).
  • move_dir : directory within gee_dir for files that have been imported in GDBs (described above).
  • clip_boundary_name: name of the AOI used in the GEE code when importing the AOI FeatureCollection

Create GDBs

The next step is to create the local GDBs that house the corresponding layers. These are defined the metadata sheet. Run the 01_create_gdbs.py script to do this.

There is a line of code toward the end of the script that imports the AOI asset into the Boundaries GDB. You will need to specify the path of where this shapefile is stored.

Import data into GDBs

Once you have created the GDBs and set the file paths and directories you can proceed to running the 02_ee_to_gdb.py script. This is the standard script that is used for importing the majority of the layers. Look at the Script name (import to GDB) field in the metadata table to see which code you need to run to import a particular layer. The scripts for the custom layers are located here and are identified by their unique RS layer ID.

You can specify which layers you want to import in two ways. First, you can modifying the following code at the beginning of 02_ee_to_gdb.py to process a sequential set of layers:

start, end = 13, 16
rs_layer_list = ["RS_{id:03d}".format(id=i) for i in range(start, end + 1)]

The start and end points refer to first and last layers you want to process are. So setting start and end to 13 and 16 respectively will result in processing layers RS_013, RS_014, RS_015 and RS_016.

Second, you can specify a combination of sequential and non-sequential set of layers. So if you want to process RS_001 to RS_003, RS_005, and RS_023 to RS_031 then the layer_seq argument would look like this:

layer_seq = [*range(1, 3+1), 5, *range(23, 31+1)]
rs_layer_list = ["RS_{id:03d}".format(id=i) for i in layer_seq]

Timings

You can then proceed to run the script. Note that depending on how many layers you’ve specified it may take along time for this code to execute.

It is advisable to test this import step on a few layers. You can then look at the logfile_02.log to inspect timings for each layer and all layers combined. The logfile is written to the directory where the 02_ee_to_gdb.py code file is stored.