4 Satellite Data Acquisition Software and Settings

Currently, all satellite data in AquaMatch are obtained using the Python API for Google Earth Engine Gorelick (2023). While the orchestration of data acquisition is performed by the {targets} workflow, all code directly related to GEE data acquisition is written in Python. If you are running the ‘default’ configuration for lakeSR or siteSR the following directions are not applicable.

4.1 R Environment and Package Information

We recommend using the most recent release of R, R Studio, {targets} and all R packages used within the workflow. The most recent run of lakeSR and siteSR was completed using R version 4.5.0 (R Core Team 2025) in R Studio 2025.05.0 (Posit team 2023). The table below lists the packages used across the lakeSR and siteSR workflow, versions used, and the associated citation.

R Package version citation
arrow 20.0.0.2 Richardson et al. (2025)
bookdown 0.43 Xie (2016); Xie (2025)
config 0.3.2 Allaire (2023)
cowplot 1.1.3 Wilke (2024)
crew 1.1.2 Landau (2025)
data.table 1.17.4 Barrett et al. (2025)
deming 1.4-1 Therneau (2024)
ggrepel 0.9.6 Slowikowski (2024)
ggthemes 5.1.0 Arnold (2024)
googledrive 2.1.1 McGowan and Bryan (2023)
kableExtra 1.4.0 Zhu (2024)
nhdplusTools 0.6.2 Blodgett and Johnson (2023)
polylabelr 0.3.0 Larsson (2024)
reticulate 1.42.0 Ushey, Allaire, and Tang (2025)
rmapshaper 0.5.0 Teucher and Russell (2023)
sf 1.0-21 Pebesma (2018); Pebesma and Bivand (2023)
tarchetypes 0.13.1 Landau (2021a)
targets 1.11.3 Landau (2021b)
tidyverse 2.0.0 Wickham et al. (2019)
tigris 2.2.1 Walker (2025)
USA.state.boundaries 1.0.1 Embry (2022)
viridis 0.6.5 Garnier et al. (2024)
xml2 1.3.8 Wickham, Hester, and Ooms (2025)
yaml 2.3.10 Garbett et al. (2024)

4.2 {reticulate} Conda Environment

RStudio (Posit team 2023) is an integrated development environment that, alongside the {reticulate} package (Ushey, Allaire, and Tang 2025), facilitates integration of R and Python code within the same environment. In AquaMatch, we use a single R script to set up a {reticulate} Conda environment that is invoked at the beginning of a {targets} run to be sure that our Python code runs consistently.

Python and python module
Software/Python Module version citation
Python 3.10.13 Python Software Foundation, www.python.org
earthengine-api 1.4.0 Gorelick (2023)
pandas 2.0.3 The pandas development team (2023)
pyreadr 0.5.2 Fajardo (2023)
PyYAML 6.0.2 The PyYAML Project, https://github.com/yaml/pyyaml
numpy 1.24.4 Harris et al. (2020)

The script run_targets.Rmd includes the steps to create this environment and authenticate your GEE user. These steps should be run prior to running the pipeline to assure a smooth run of the workflow.

4.3 Google Earth Engine Setup

If running the ‘admin_update’ configuration for either lakeSR or siteSR, you will need to have a GEE account, the gcloud Command Line Interface (CLI) installed and configured, create an Earth Engine Project, have successful authenticated your account, and you will need to alter the configuration file. All of these tasks are described in the section below.

4.3.1 Create a GEE account

Creation of a GEE account is free. Click ‘Get Started’ at the far right side of the earthengine.google.com webpage to create an account:

Screenshot of the landing page for earthengine.google.com, with the tab “Get Started” highlighted in a red box.

4.3.2 gcloud CLI

This workflow requires the installation and initiation of gcloud CLI, a command-line tool set for accessing Google Cloud Resources. All settings for AquaSat v2 are default gcloud configurations using a single GEE project. The link above documents how to set up gcloud.

4.3.3 GEE project setting

AquaMatch is run in a specific GEE project associated with our authenticated Google account. If you wish to re-run this code as written, you will not have proper access because the code refers to our specific GEE project. You will need to update the config yaml (in lakeSR: b_pull_Landsat_SRST_poi/config_files/config_poi.yml, in siteSR: gee_config.yml) with your Google credentials and GEE project in order to run the pipeline locally. If you are new to GEE, go to code.earthengine.google.com and enter the project name listed in the top right hand corner of your screen:

Screenshot of the header at code.earthengine.google.com with current Earth Engine project highlighted in the red box on the top right.

Alternatively, you can create a GEE project for this task in the dropdown menu accessed by clicking on the icon to the right of the highlighted box in the figure above. This workflow will not run without specifying an Earth Engine Project that is managed by the Google Account you authenticate this run with.

4.3.4 GEE Authentication

Once gcloud is installed and initialized, the configuration file is properly set up, and the Python Conda environment is set up, you can authenticate your GEE instance. For this workflow, this is completed in the run_targets.Rmd script at the root directory. This script provides explicit directions to complete this task before running the pipeline.