3 Download process
3.0.1 Catalogue existing data
The download process begins by using the {dataRetrieval} package to
catalogue the data that is available in the WQP. To do this we select
parameters of interest and their corresponding characteristicName for
retrieving data from the WQP.
Our general process for selecting characteristic names is to review the Characteristics field on Water Quality Portal for relevant keywords for our parameters of interest, download a subset of that data, and then perform iterative searches for key words related to the parameter within text in available metadata fields (e.g., methods-related, units, USGS parameter codes, etc.). Final selected names are then reviewed for relevance to the target parameter and for completeness of the record to provide meaningful information to end users.
Below are the current parameters and their characteristicNames as
defined in the configuration YAML, 1_inventory/cfg/wqp_codes.yml. Data
for these parameters are requested from the WQP within a spatial grid
that is mapped in the next section. Parameters included here are: cdom
(colored dissolved organic matter)
Code
| parameter | characteristicName |
|---|---|
| cdom | Absorbance at 280 nanometers |
| Absorbance at 280 nm | |
| Absorbance at 370 nanometers | |
| Absorbance at 412 nm | |
| Absorbance at 440 nm | |
| Absorption coefficient at 440 nm | |
| Absorption spectral slope (Sag) | |
| Colored dissolved organic matter (CDOM) | |
| Emission intensity ratio | |
| Fluorescence index | |
| Fluorescence, excitation 260 emission 450 | |
| Fluorescence, excitation 275 emission 304 | |
| Fluorescence, excitation 275 emission 340 | |
| Fluorescence, excitation 280 emission 370 | |
| Fluorescence, excitation 300 emission 390 | |
| Fluorescence, excitation 340 emission 440 | |
| Fluorescence, excitation 370 emission 460 | |
| Fluorescence, excitation 390 emission 510 | |
| Fluorescence, excitation 420 emission 460 | |
| Specific UV Absorbance at 254 nm | |
| Specific UV Absorbance at 254 nm, corrected for Fe | |
| UV 254 | |
| UV Absorption, relative conc. of organic constituents |
3.0.2 Maps of data spread
The pipeline uses a spatial grid to download data from the WQP in batches. Maps are presented below with counts of records across this spatial grid. Note: The counts here are for raw data that are not yet filtered or harmonized.
Code
## Retrieving data for the year 2024
Code
# Conterminous US map:
conterminous_us <- state_selection %>%
filter(!(NAME %in% c("Alaska", "Hawaii", "American Samoa",
"Guam", "Puerto Rico",
"United States Virgin Islands",
"Commonwealth of the Northern Mariana Islands")))
ggplot() +
geom_sf(data = grid_counts,
aes(fill = n)) +
geom_sf(data = conterminous_us,
fill = NA, color = "black") +
xlab(NULL) +
ylab(NULL) +
coord_sf(xlim = c(min(st_coordinates(conterminous_us)[,"X"]),
max(st_coordinates(conterminous_us)[,"X"])),
ylim = c(min(st_coordinates(conterminous_us)[,"Y"]),
max(st_coordinates(conterminous_us)[,"Y"]))) +
scale_fill_viridis_c("Record count",
trans = "log10",
labels = scales::label_number(),
na.value = "white",
breaks = c(1, 10, 100, 1000, 10000)) +
ggtitle("Conterminous US") +
theme_bw()Code
# Alaska map:
AK <- state_selection %>%
filter(NAME == "Alaska")
ggplot() +
geom_sf(data = grid_counts,
aes(fill = n)) +
geom_sf(data = AK,
fill = NA, color = "black") +
xlab(NULL) +
ylab(NULL) +
coord_sf(xlim = c(min(st_coordinates(AK)[,"X"]),
max(st_coordinates(AK)[,"X"])),
ylim = c(min(st_coordinates(AK)[,"Y"]),
max(st_coordinates(AK)[,"Y"]))) +
scale_fill_viridis_c("Record count",
trans = "log10",
labels = scales::label_number(),
na.value = "white",
breaks = c(1, 10, 100, 1000, 10000)) +
ggtitle("Alaska") +
theme_bw()Code
# Hawaii map:
HI <- state_selection %>%
filter(NAME == "Hawaii")
ggplot() +
geom_sf(data = grid_counts,
aes(fill = n)) +
geom_sf(data = HI,
fill = NA, color = "black") +
xlab(NULL) +
ylab(NULL) +
coord_sf(xlim = c(min(st_coordinates(HI)[,"X"]),
0.9 * max(st_coordinates(HI)[,"X"])),
ylim = c(1.1 * min(st_coordinates(HI)[,"Y"]),
max(st_coordinates(HI)[,"Y"]))) +
scale_fill_viridis_c("Record count",
trans = "log10",
labels = scales::label_number(),
na.value = "white",
breaks = c(1, 10, 100, 1000, 10000)) +
ggtitle("Hawaii") +
theme_bw()Code
# American Samoa map:
AS <- state_selection %>%
filter(NAME %in% c("American Samoa"))
ggplot() +
geom_sf(data = grid_counts,
aes(fill = n)) +
geom_sf(data = AS,
fill = NA, color = "black") +
xlab(NULL) +
ylab(NULL) +
coord_sf(xlim = c(1.025 * min(st_coordinates(AS)[,"X"]),
0.975 *max(st_coordinates(AS)[,"X"])),
ylim = c(1.05 * min(st_coordinates(AS)[,"Y"]),
max(st_coordinates(AS)[,"Y"]))) +
scale_fill_viridis_c("Record count",
trans = "log10",
labels = scales::label_number(),
na.value = "white",
breaks = c(1, 10, 100, 1000, 10000)) +
ggtitle("American Samoa") +
theme_bw()Code
# Guam & Commonwealth of the Northern Mariana Islands map
GU_CNMI <- state_selection %>%
filter(NAME %in% c("Guam", "Commonwealth of the Northern Mariana Islands"))
ggplot() +
geom_sf(data = grid_counts,
aes(fill = n)) +
geom_sf(data = GU_CNMI,
fill = NA, color = "black") +
xlab(NULL) +
ylab(NULL) +
coord_sf(xlim = c(1.025 * min(st_coordinates(GU_CNMI)[,"X"]),
0.975 * max(st_coordinates(GU_CNMI)[,"X"])),
ylim = c(0.95 * min(st_coordinates(GU_CNMI)[,"Y"]),
1.05 * max(st_coordinates(GU_CNMI)[,"Y"]))) +
scale_fill_viridis_c("Record count",
trans = "log10",
labels = scales::label_number(),
na.value = "white",
breaks = c(1, 10, 100, 1000, 10000)) +
ggtitle("Guam & Commonwealth of the Northern Mariana Islands") +
theme_bw()Code
# Puerto Rico & United States Virgin Islands map:
PR_VI <- state_selection %>%
filter(NAME %in% c("Puerto Rico", "United States Virgin Islands"))
ggplot() +
geom_sf(data = grid_counts,
aes(fill = n)) +
geom_sf(data = PR_VI,
fill = NA, color = "black") +
xlab(NULL) +
ylab(NULL) +
coord_sf(xlim = c(0.95 * min(st_coordinates(PR_VI)[,"X"]),
1.05 * max(st_coordinates(PR_VI)[,"X"])),
ylim = c(1.075 * min(st_coordinates(PR_VI)[,"Y"]),
0.925 * max(st_coordinates(PR_VI)[,"Y"]))) +
scale_fill_viridis_c("Record count",
trans = "log10",
labels = scales::label_number(),
na.value = "white",
breaks = c(1, 10, 100, 1000, 10000)) +
ggtitle("Puerto Rico & United States Virgin Islands") +
theme_bw()