Import data to file geodatabases
ArcPy
This next part of the workflow is run in Python v3.7 using functions in the arcpy
v3.1 package. In order to use arcpy
you will need to have a valid ArcGIS Pro or ArcGIS Desktop license. See here and here for instructions on how to get arcpy
working in your Python IDE (in this case, PyCharm). This part of the workflow is used to import and process layers that are in the GEE_exports
folder mentioned in the preceding step. The code for this step is found here.
Log files: we’ve made use of the Python
logging
module to write various information, warning, and error messages to log files. Each script will generate it’s own log file with the suffix matching the script file prefix (i.e.,logfile_01.log
is created when you run01_create_gdbs.py
). These log files will write to the directory containing the Python script. Apart from what’s mentioned above, the log files also have information regarding processing times for each layer.
ArcPro project
Before running the scripts, you will need to create a ArcGIS Pro project called agol-data-workflows
(which corresponds to the GitHub repo name) within the directory of the forked GitHub repo. The project will house the various geodatabases and output files. After creating the project run this code to create file geodatabases (GDBs) and output folders needed for the workflow. The GDB names are taken from the agol_layers_metadata.xlsx
file which needs to be in the project root directory (the default location from the forked repo). If new layers are added to the metadata file you will need to specify them in the file and create then re-run the code to create the new GDBs.
Directories
The next step is to modify the agol_dirs.py
module. Unfortunately many of the arcpy functions simply don’t work when using relative file and folder paths (especially when running the code interactively). It’s infinitely frustrating so the only safe option is to define your own file/folder paths in the agol_dirs.py
. This module is imported in the subsequent processing steps so you only have to do it once, and can easily make changes if you move files/folders around on your local machine. See below for a short description of the paths:
proj_dir
: directory of the ArcGIS Pro project.metadata_dir
: metadata Excel file path.gee_dir
: directory that contains the remote sensing layer downloads generated from the GEE scripts.api_input_dir
: directory where files are stored prior to upload into AGOL in the next processing step (see below).move_dir
: directory withingee_dir
for files that have been imported in GDBs (described above).clip_boundary_name
: name of the AOI used in the GEE code when importing the AOIFeatureCollection
Create GDBs
The next step is to create the local GDBs that house the corresponding layers. These are defined the metadata sheet. Run the 01_create_gdbs.py
script to do this.
There is a line of code toward the end of the script that imports the AOI asset into the
Boundaries
GDB. You will need to specify the path of where this shapefile is stored.
Import data into GDBs
Once you have created the GDBs and set the file paths and directories you can proceed to running the 02_ee_to_gdb.py
script. This is the standard script that is used for importing the majority of the layers. Look at the Script name (import to GDB)
field in the metadata table to see which code you need to run to import a particular layer. The scripts for the custom layers are located here and are identified by their unique RS layer ID.
You can specify which layers you want to import in two ways. First, you can modifying the following code at the beginning of 02_ee_to_gdb.py
to process a sequential set of layers:
start, end = 13, 16
rs_layer_list = ["RS_{id:03d}".format(id=i) for i in range(start, end + 1)]
The start and end points refer to first and last layers you want to process are. So setting start and end to 13 and 16 respectively will result in processing layers RS_013
, RS_014
, RS_015
and RS_016
.
Second, you can specify a combination of sequential and non-sequential set of layers. So if you want to process RS_001
to RS_003
, RS_005
, and RS_023
to RS_031
then the layer_seq
argument would look like this:
layer_seq = [*range(1, 3+1), 5, *range(23, 31+1)]
rs_layer_list = ["RS_{id:03d}".format(id=i) for i in layer_seq]
Timings
You can then proceed to run the script. Note that depending on how many layers you’ve specified it may take along time for this code to execute.
It is advisable to test this import step on a few layers. You can then look at the logfile_02.log
to inspect timings for each layer and all layers combined. The logfile is written to the directory where the 02_ee_to_gdb.py
code file is stored.