9 CDOM harmonization process
Following the completion of the {dataRetrieval} download process described previously, the pipeline contains raw WQP data for each parameter of interest. Before we harmonize each parameter we run through a series of universal “pre-harmonization” steps, which ensure that the datasets are appropriately formatted when entering their harmonization routines.
The text below first walks through the pre-harmonization steps for the colored dissolved organic matter (CDOM) dataset and then delves into the specifics of the harmonization process.
9.1 Pre-harmonization of the raw CDOM WQP dataset
At the start of the pre-harmonization process the raw CDOM WQP dataset contains 186.4 thousand rows.
9.1.1 Missing results
Next, records that have missing data are dropped from the dataset. Several criteria are used when checking for missing data. If any of the below criteria are met the row is flagged as missing:
- Both the result column and detection limit column had
NAdata - Result, result unit, activity comment, laboratory comment, and
result comment columns are all
NA - The result comment column contains any user-provided text indicating
a missing value. This currently includes:
analysis lost,not analyzed,not recorded,not collected, orno measurement taken
473 rows are dropped, resulting in a final count of 185.9 thousand.
9.1.2 Filter status
The final step in pre-harmonization is to filter the
ResultStatusIdentifier column to include only the following statuses:
"Accepted""Final""Historical""Validated""Preliminary"NA
These statuses generally indicate a reliable result was obtained,
however we also include NA in an effort to be conservative. More
specifically, when making decisions for this and other columns we
occasionally retain NA values if removing the records would otherwise
drop 10% or more of the available data.
This step removes 0 rows of data, leaving it with 185.9 thousand rows remaining.
9.2 Harmonization-ready CDOM dataset
Once ready for harmonization, the CDOM WQP dataset contains the
following user-defined CharacteristicNames: Absorbance at 280
nanometers, Absorbance at 280 nm, Absorbance at 370 nanometers,
Absorbance at 412 nm, Absorbance at 440 nm, Absorption coefficient at
440 nm, Absorption spectral slope (Sag), Colored dissolved organic
matter (CDOM), Emission intensity ratio, Fluorescence index,
Fluorescence, excitation 260 emission 450, Fluorescence, excitation 275
emission 304, Fluorescence, excitation 275 emission 340, Fluorescence,
excitation 280 emission 370, Fluorescence, excitation 300 emission 390,
Fluorescence, excitation 340 emission 440, Fluorescence, excitation 370
emission 460, Fluorescence, excitation 390 emission 510, Fluorescence,
excitation 420 emission 460, Specific UV Absorbance at 254 nm, Specific
UV Absorbance at 254 nm, corrected for Fe, UV 254, UV Absorption,
relative conc. of organic constituents.
These CharacteristicNames are chosen in order to select for only those
measurements that pertain to CDOM. We selected them by reviewing options
in the Characteristics field of the Water Quality Portal Filter
Results page.
9.2.1 Filter media and fractions, categorize parameters
We next ensure that the media type for all CDOM data is
"Surface Water", "Water", "Estuary", or NA. Any records not
meeting these criteria are dropped.
We then assign each record to one of the parameter values in the table
below, based on its values for CharacteristicName,
ResultAnalyticalMethod.MethodName, and
ResultMeasure.MeasureUnitCode. This helps us to group comparable
measurements throughout the rest of the process and generally simplify
the way the dataset is organized:
| Parameter | Definition |
|---|---|
| Absorbance at 254 nm | CharacteristicName is either “Specific UV Absorbance at 254 nm” or “UV 254” and ResultMeasure.MeasureUnitCode is not “L/mg-cm” or “L/mgDOC*m”. Alternatively, CharacteristicName is “UV Absorption, relative conc. of organic constituents” and ResultAnalyticalMethod.MethodIdentifier is “5910-B” |
| Absorbance at 280 nm | CharacteristicName is “Absorbance at 280 nanometers” |
| Absorbance at 370 nm | CharacteristicName is “Absorbance at 370 nanometers” |
| Absorbance at 412 nm | CharacteristicName is “Absorbance at 412 nm” |
| Absorbance at 440 nm | CharacteristicName is “Absorption coefficient at 440 nm” OR “Absorbance at 440 nm” OR CharacteristicName is “Colored dissolved organic matter (CDOM)” and ResultAnalyticalMethod.MethodName is “CDOM absorption (440nm)” |
| Absorption spectral slope, 275 to 295 nm | CharacteristicName is “Absorption spectral slope (Sag)” and USGSPCode is 32300 |
| Absorption spectral slope, 290 to 350 nm | CharacteristicName is “Absorption spectral slope (Sag)” and USGSPCode is 32301 |
| Absorption spectral slope, 350 to 400 nm | CharacteristicName is “Absorption spectral slope (Sag)” and USGSPCode is 32302 |
| Absorption spectral slope, 400 to 500 nm | ResultAnalyticalMethod.MethodName is “Slope Of CDOM Absorption Coefficient Spectrum (400 To 500 Nm)” and StatisticalBaseCode is “Slope” |
| Absorption spectral slope, 412 to 600 nm | CharacteristicName is “Absorption spectral slope (Sag)” and USGSPCode is 32331 |
| Absorption spectral slope, 412 to 676 nm | CharacteristicName is “Absorption spectral slope (Sag)” and USGSPCode is 32303 |
| FDOM | CharacteristicName is “Colored dissolved organic matter (CDOM)” AND ResultAnalyticalMethod.MethodName is NA or mentions a fluorometer AND ResultMeasure.MeasureUnitCode is one of “RFU”, “ug/l QSE”, “mg/l”, or “ug/L”. Alternatively, CharacteristicName is formatted like “Fluorescence, excitation 275 emission 304” but exact numbers will vary. |
| Fluorescence Index | CharacteristicName is “Fluorescence index” |
| SUVA | ResultMeasure.MeasureUnitCode is “L/mg-cm” or “L/mgDOC*m” OR CharacteristicName is “UV Absorption, relative conc. of organic constituents” and ResultAnalyticalMethod.MethodIdentifier is “415.3” |
Records that do not meet the criteria of one of the parameter entries
above are dropped (n = 1,113). A summary of the types of dropped records
and their row counts is below:
Code
| CharacteristicName | ResultAnalyticalMethod.MethodName | USGSPCode | ResultAnalyticalMethod.MethodIdentifier | ResultAnalyticalMethod.MethodIdentifierContext | ResultMeasure.MeasureUnitCode | ResultSampleFractionText | n |
|---|---|---|---|---|---|---|---|
| Colored dissolved organic matter (CDOM) | 9222 B ~ Standard Total Coliform Membrane Filter Procedure | NA | 9222B | APHA | ug/L | NA | 7 |
| Colored dissolved organic matter (CDOM) | CHROMOPHORIC DISSOLVED ORGANIC MATTER | NA | NASA-CDOM | 21VASWCB | None | Dissolved | 839 |
| Colored dissolved organic matter (CDOM) | NASA/TM-2003-211621/Rev4-Vol.IV | NA | NASA/TM-2003-211621 | RTI | m | NA | 96 |
| Colored dissolved organic matter (CDOM) | NA | 32295 | NA | NA | NA | Dissolved | 2 |
| Colored dissolved organic matter (CDOM) | NA | 32322 | NA | NA | NA | Total | 2 |
| Colored dissolved organic matter (CDOM) | NA | NA | NA | NA | m | NA | 130 |
| Emission intensity ratio | Fluorescence scan, 250-620 nm | 32347 | FL020 | USGS | None | Dissolved | 5 |
| UV Absorption, relative conc. of organic constituents | UNKNOWN | NA | UNKNOWN | AZDEQ_SW | units/cm | Total | 32 |
Below is a stacked barplot of the dataset organized by the new
parameter column:
1.4 thousand rows are removed. The final row count after this is 184.6 thousand.
9.2.2 Document and remove fails
In this step we filter out records based on indications that they have failed data quality assurance or quality control for some reason given by the data provider (these instances are referred to here as “failures”).
After reviewing the contents of the ActivityCommentText,
ResultLaboratoryCommentText, ResultCommentText,
ResultDetectionConditionText, and ResultMeasureValue_original
columns, we developed a list of terms that captured the majority of
instances where records had failures or unacceptable measurements.
(Note: ResultMeasureValue_original is a duplicate, character version
of the ResultMeasureValue column that we use as a reference for the
column’s contents before it was converted to a numeric type.) We found
the phrasing to be consistent across columns, so we searched for the
same (case agnostic) text in all four locations. The terms are: “beyond
accept”, “cancelled”, “contaminat”, “error”, “fail”, “improper”,
“interference”, “invalid”, “no result”, “no test”, “not accept”,
“outside of accept”, “problem”, “questionable”, “suspect”, “unable”,
“violation”, “reject”, “no data”, “time exceed”, “value extrapolated”,
“exceeds”, “biased”, “parameter not required”, “not visited”, “warm”,
“broken”.
Below are pie charts that break down the number of failure detections by column. Note that the plotting below is automated so if one or more of the columns listed above are not plotted, this indicates that the column(s) did not return any matches for the failure phrases. Also note that a single record can contain multiple failure phrases; therefore, failure phrases are not mutually exclusive.
9.2.3 Clean MDLs
In this step method detection limits (MDLs) are used to clean up the
reported values. When a numeric value is missing for the data record
(i.e., NA or text that became NA during an as.numeric call) we
check for non-detect language in the ResultLaboratoryCommentText,
ResultCommentText, ResultDetectionConditionText, and
ResultMeasureValue columns. This language can be "non-detect",
"not detect", "non detect", "undetect", or "below".
If non-detect language exists then we use the
DetectionQuantitationLimitMeasure.MeasureValue column for the MDL.
Otherwise if there is a < and a number in the ResultMeasureValue
column, we use that number instead.
We then use a random number between 0 and 0.5 * MDL as the record’s
value moving forward.
We produce a new column, mdl_flag, from the MDL cleaning process.
Records where no MDL-based adjustment was made and which are at or above
the MDL are assigned a 0. Records with corrected values based on the MDL
method are assigned a 1. Finally, records where no MDL-based adjustment
was made and which contain a numeric value below the provided MDL are
assigned a 2.
This should not result in a change in rows, but we still check: 0 rows are removed. The final row count after this is 175.7 thousand.
9.2.4 Clean approximate values
Cleaning approximate values involves a similar process as for MDL
cleaning. We flag “approximated” values in the dataset. The
ResultMeasureValue column gets checked for all three of the
following conditions:
- Numeric-only version of the column is still
NAafter MDL cleaning - The original column text contained a number
- Any of
ResultLaboratoryCommentText,ResultCommentText, orResultDetectionConditionTextmatch this regular expression, ignoring case:"result approx|RESULT IS APPROX|value approx"
We then use the approximate value as the record’s value moving forward.
Records with corrected values based on the above method are noted with a
1 in the approx_flag column. Note, however, that occasionally
approximate language will be used in a record but not changed or
flagged. This occurs when the language is used in a comment-related
column and not the result column itself, meaning that there is a
usable numeric value provided (and thus doesn’t need correction).
This should not result in a change in rows, but we still check: 0 rows are removed. The final row count after this is 175.7 thousand.
9.2.5 Clean values with “greater than” data
The next step is similar to the MDL and approximate value cleaning
processes, and follows the approximate cleaning process most closely.
The goal is to clean up values that were entered as “greater than” some
value. The ResultMeasureValue column gets checked for all three of
the following conditions:
- Numeric-only version of the column is still
NAafter MDL & approximate cleaning - The original column text contained a number
- The original column text contained a
>
We then use the “greater than” value (without >) as the record’s value
moving forward.
Records with corrected values based on the above method are noted with a
1 in the greater_flag column.
This should not result in a change in rows, but we still check: 0 rows are removed. The final row count after this is 175.7 thousand.
9.2.6 Drop unresolved NA measurements
The goal of the preceding three steps was to prevent records with
seemingly missing measurement data from being dropped if there was still
a chance of recovering a usable value. At this point we’ve finished with
that process and we proceed to check for remaining records with NA
values in their harmonized_value column. If they exist, they are
dropped. We also filter out any negative values in the dataset at this
point because CDOM cannot be negative. The exception to this is for the
“Absorption spectral slope” parameters, which we take the absolute value
of instead of removing.
6 rows are removed. The final row count after this is 175.7 thousand.
9.2.7 Harmonize record units
The next step in CDOM harmonization is converting the units of WQP
records. Records that don’t make sense for CDOM or can’t be converted to
a standardized unit dropped. We standardize units to “AU/m” unless the
parameter is “SUVA” (“L/mgDOC*m”), “FDOM” (“RFU”, “RU”, “ug/l QSE”),
“Fluorescence index” (“None”), or one of the “Absorption spectral slope”
variants (“nm-1”). We use the unit conversion table below for this.
| ResultMeasure.MeasureUnitCode | conversion |
|---|---|
| AU/cm | 100 |
| units/cm | 100 |
| #/cm | 100 |
| cm | 100 |
| None | 1 |
| nm | 100 |
| NA | 1 |
| m | 1 |
| L/mg-cm | 100 |
| L/mgDOC*m | 1 |
| mg/l | 1000 |
| ug/L | 1 |
| ug/l QSE | 1 |
| RFU | 1 |
| RU | 1 |
| per m | 1 |
| nm-1 | 1 |
Note that in the table above “nm” units are given a conversion
factor of 100. This is because we assessed that these records are likely
intended to use cm-1 units. For example: The
DetectionQuantitationLimitMeasure.MeasureUnitCode for these records is
“units/cm”, their measured values fall within expected ranges for
natural waters for absorbance at 254 nm reported in cm-1 units, and the
detection limit listed as 0.005 also aligns with cm-1 measurements.
Additionally the table lists “None” with a conversion factor of 1,
which allows it to be retained as-is for records with the “Fluorescence
index” parameter. Records with parameter variants of “Absorption
spectral slope” are often provided in the WQP with units of “None” as
well, but we convert these to “nm-1” in this step.
Below is a pie chart that breaks down the different unit codes that were dropped in the unit harmonization process, and how many records were lost with each code. (Note that if there is no pie chart below that is because no unit codes were dropped).
0 rows are removed. The final row count after this is 175.7 thousand.
9.2.8 Clean depth data
The next harmonization step cleans the four depth-related columns obtained from the WQP. The details behind this step are covered in the Depth flags section of the Tiering, flagging, and quality control philosophy chapter.
This should not result in a change in rows, but we still check: 0 rows are removed. The final row count after this is 175.7 thousand.
9.2.9 Filter and tier methods
We next review information related to the analytical methods, sample fraction, and result units for each record to construct hierarchical tiers as described in the Method tiering section of the Tiering, flagging, and quality control philosophy chapter.
In preparation for tiering, we check the
ResultAnalyticalMethod.MethodName column for names that indicate
non-CDOM measurements.
Finally, we group the data into three tiers as described in Tiering, flagging, and quality control philosophy. Broadly, these tiers are:
| Tier | Name | Description | CDOM Details |
|---|---|---|---|
| 0 | Restrictive | Data that are verifiably self-similar across organizations and time-periods and can be considered highly reliable and interoperable | Has appropriate (and consistent) information in CharacteristicName, methods columns, and ResultMeasure.MeasureUnitCode. ResultSampleFractionText is filtered |
| 1 | Narrowed | Data that we have good reason to believe are self-similar, but for which we can’t verify full interoperability across data providers | There is a mismatch in information, e.g., between the methods columns |
| 2 | Inclusive | Data that are assumed to be reliable and are harmonized to our best ability given the information available from the data provider. This tier includes NA or non-resolvable descriptions for the analytical method, which often make up the majority of methods descriptions for any given parameter | ResultSampleFractionText is reported as total or important information is missing from one or more relevant columns |
At this point we create a file (in/cdom/cdom_tiering_record.csv) that
contains a record of how specific method text was tiered and how many
row counts corresponded to each method. This file is also included as an
interactive HTML table below, in order to illustrate how the specifics
of each tiering decision in the table above are carried out.
| parameter | CharacteristicName | ResultAnalyticalMethod.MethodName | USGSPCode | ResultAnalyticalMethod.MethodIdentifier | ResultAnalyticalMethod.MethodIdentifierContext | ResultMeasure.MeasureUnitCode | ResultSampleFractionText | tier | min_value | max_value | n |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SUVA | UV 254 | Computation by NWIS algorithm | 63162 | ALGOR | USGS | L/mgDOC*m | Dissolved | 0 | 0.0000000 | 1.980000e+02 | 37767 |
| Absorbance at 254 nm | UV 254 | UV Absorb, 254 nm, Supor (NWQL) | 50624 | UV006 | USGS | units/cm | Dissolved | 0 | 0.7000000 | 1.260000e+02 | 12301 |
| Absorbance at 254 nm | UV 254 | UV Absorb, 254 nm, GFF (NWQL) | 50624 | UV005 | USGS | units/cm | Dissolved | 0 | 1.0000000 | 6.320000e+02 | 12301 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | UV Absorb, 280 nm, Supor (annon) | 61726 | UV002 | USGS | units/cm | Dissolved | 0 | 0.2000000 | 9.810000e+01 | 12272 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | UV Absorb, 280 nm, GFF (NWQL) | 61726 | UV007 | USGS | units/cm | Dissolved | 0 | 0.7000000 | 4.950000e+02 | 12221 |
| Absorbance at 254 nm | UV 254 | UV absorbance, wf, 254 nm | 50624 | UV008 | USGS | units/cm | Dissolved | 0 | 0.0000000 | 1.905000e+02 | 6856 |
| Absorbance at 254 nm | UV 254 | UV Absorb, 254 nm,GFF (USGS-NYL) | 50624 | UV019 | USGS | units/cm | Dissolved | 0 | 0.3000000 | 1.680000e+02 | 5991 |
| Absorbance at 254 nm | UV 254 | NA | 50624 | NA | NA | units/cm | Dissolved | 0 | 0.0000000 | 2.140000e+02 | 4980 |
| SUVA | UV 254 | SUVA, wf, calculation | 63162 | UV003 | USGS | L/mgDOC*m | Dissolved | 0 | 0.1000000 | 1.000000e+01 | 4759 |
| Absorption spectral slope, 275 to 295 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32300 | ABS02 | USGS | None | Dissolved | 0 | 0.0077000 | 2.840000e-02 | 2680 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | Absorbance, 200 to 800 nm | 32296 | ABS01 | USGS | AU/cm | Dissolved | 0 | 0.0700000 | 1.135000e+02 | 2678 |
| Absorbance at 370 nm | Absorbance at 370 nanometers | Absorbance, 200 to 800 nm | 32297 | ABS01 | USGS | AU/cm | Dissolved | 0 | 0.0000000 | 2.369000e+01 | 2676 |
| Absorption spectral slope, 290 to 350 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32301 | ABS02 | USGS | None | Dissolved | 0 | 0.0000000 | 2.770000e-02 | 2670 |
| Absorption spectral slope, 350 to 400 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32302 | ABS02 | USGS | None | Dissolved | 0 | 0.0071000 | 4.930000e-02 | 2662 |
| Absorbance at 440 nm | Absorbance at 440 nm | Absorbance, 200 to 800 nm | 32299 | ABS01 | USGS | AU/cm | Dissolved | 0 | 0.0500000 | 8.470000e+00 | 2639 |
| Absorbance at 412 nm | Absorbance at 412 nm | Absorbance, 200 to 800 nm | 32298 | ABS01 | USGS | AU/cm | Dissolved | 0 | 0.0500000 | 1.094000e+01 | 2629 |
| Fluorescence index | Fluorescence index | Fluorescence scan, 250-620 nm | 32312 | FL020 | USGS | None | Dissolved | 0 | 1.1340000 | 2.171000e+00 | 2476 |
| FDOM | Fluorescence, excitation 300 emission 390 | Fluorescence scan, 250-620 nm | 32309 | FL020 | USGS | RU | Dissolved | 0 | 0.0030000 | 5.971000e+00 | 2476 |
| FDOM | Fluorescence, excitation 340 emission 440 | Fluorescence scan, 250-620 nm | 52901 | FL020 | USGS | RU | Dissolved | 0 | 0.0010000 | 7.140000e+00 | 2474 |
| FDOM | Fluorescence, excitation 370 emission 460 | Fluorescence scan, 250-620 nm | 52902 | FL020 | USGS | RU | Dissolved | 0 | 0.0010000 | 5.260000e+00 | 2419 |
| FDOM | Fluorescence, excitation 260 emission 450 | Fluorescence scan, 250-620 nm | 32304 | FL020 | USGS | RU | Dissolved | 0 | 0.0096000 | 1.502000e+01 | 2379 |
| FDOM | Fluorescence, excitation 390 emission 510 | Fluorescence scan, 250-620 nm | 32307 | FL020 | USGS | RU | Dissolved | 0 | 0.0015000 | 3.236000e+00 | 2338 |
| FDOM | Fluorescence, excitation 280 emission 370 | Fluorescence scan, 250-620 nm | 32310 | FL020 | USGS | RU | Dissolved | 0 | 0.0117000 | 5.096000e+00 | 2319 |
| Absorbance at 254 nm | UV 254 | UV Absorb, 254 nm, silver (NWQL) | 50624 | UV004 | USGS | units/cm | Dissolved | 0 | 0.1000000 | 1.430000e+02 | 1990 |
| SUVA | UV 254 | NA | 63162 | NA | NA | L/mgDOC*m | Dissolved | 0 | 0.3000000 | 6.300000e+00 | 1809 |
| FDOM | Fluorescence, excitation 275 emission 340 | Fluorescence scan, 250-620 nm | 32311 | FL020 | USGS | RU | Dissolved | 0 | 0.0026000 | 3.517000e+00 | 1713 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | UV Absorb, 280 nm,silver (annon) | 61726 | UV001 | USGS | units/cm | Dissolved | 0 | 0.1000000 | 9.930000e+06 | 1697 |
| FDOM | Colored dissolved organic matter (CDOM) | Colorized Dissolved Organic Matter | NA | FLUORO | COEOMAHA_WQX | ug/L | NA | 2 | 10.0000000 | 2.680000e+02 | 1683 |
| Absorbance at 254 nm | Specific UV Absorbance at 254 nm | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | #/cm | Dissolved | 0 | 0.0008584 | 6.520000e+01 | 1577 |
| FDOM | Colored dissolved organic matter (CDOM) | Turner Designs Trilogy Fluorometer with CDOM specific excitation/emission filters for monitoring CDOM in natural waters | NA | TurnerDesigns-CDOM | 31DELRBC_WQX | RFU | Dissolved | 2 | 472.5600000 | 9.261500e+03 | 1557 |
| Absorption spectral slope, 412 to 676 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32303 | ABS02 | USGS | None | Dissolved | 0 | 0.0029000 | 2.920000e-02 | 1509 |
| FDOM | Colored dissolved organic matter (CDOM) | NA | 32295 | NA | NA | ug/l QSE | Dissolved | 0 | 0.0000000 | 3.230000e+04 | 1449 |
| Absorbance at 440 nm | Absorbance at 440 nm | Dissolved Organic Matter Absorption Coefficient (Gelbstoff) | NA | L01~CDOM_440 | CBP_WQX | per m | NA | 2 | 0.1228000 | 2.246920e+01 | 1414 |
| Absorption spectral slope, 400 to 500 nm | Absorbance at 440 nm | Slope Of CDOM Absorption Coefficient Spectrum (400 To 500 Nm) | NA | L01~CDOM_SLOPE | CBP_WQX | nm-1 | NA | 2 | 0.0075000 | 2.580000e-02 | 1414 |
| FDOM | Colored dissolved organic matter (CDOM) | NA | 32322 | NA | NA | RFU | Total | 2 | 0.0000000 | 4.060000e+03 | 1345 |
| Absorption spectral slope, 412 to 600 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32331 | ABS02 | USGS | None | Dissolved | 0 | 0.0030000 | 1.700000e-02 | 973 |
| Absorbance at 254 nm | UV 254 | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | units/cm | Total | 2 | 1.6000000 | 1.460000e+02 | 821 |
| SUVA | Specific UV Absorbance at 254 nm | UV absorbance, 254 nm | NA | UV absorbance, 254 nm | YRITWC | L/mg-cm | Dissolved | 0 | 0.0000000 | 1.296472e+05 | 816 |
| Absorbance at 254 nm | UV 254 | UV absorbance, 254 nm | NA | UV absorbance, 254 nm | YRITWC | nm | Dissolved | 2 | 0.0000000 | 7.151227e+02 | 770 |
| Absorbance at 440 nm | Absorbance at 440 nm | Gross Alpha and Beta Activity in Water | NA | 00-01 | USEPA | units/cm | Total | 2 | 0.1000000 | 2.130000e+01 | 719 |
| FDOM | Fluorescence, excitation 275 emission 304 | Fluorescence scan, 250-620 nm | 32305 | FL020 | USGS | RU | Dissolved | 0 | 0.0041000 | 1.075000e+00 | 699 |
| Absorbance at 254 nm | UV 254 | Absorbance, 254 nm, wf (SM5910) | 50624 | UV011 | USGS | units/cm | Dissolved | 0 | 0.0000000 | 3.510000e+01 | 634 |
| FDOM | Colored dissolved organic matter (CDOM) | YSI EXO fluorometer, 365/480 nm | 32295 | FL017 | USGS | ug/l QSE | Dissolved | 0 | 0.1000000 | 2.117000e+02 | 631 |
| Absorbance at 254 nm | UV 254 | Determination of Total Organic Carbon and Specific UV Absorbance at 254 nm in Source Water and Drinking Water | NA | 415.3 | USEPA | units/cm | Dissolved | 0 | 1.5000000 | 3.880000e+01 | 400 |
| FDOM | Colored dissolved organic matter (CDOM) | NA | 32289 | NA | NA | mg/l | Dissolved | 2 | 670.0000000 | 2.510000e+03 | 385 |
| Absorbance at 440 nm | Absorbance at 440 nm | Determination of UV Absorbance at 440 nm in Water, by MDH modified method | NA | MDH055 | MNPCA | units/cm | Total | 2 | 0.1000000 | 8.150000e+00 | 324 |
| Absorbance at 254 nm | UV 254 | UV Absorbance at 254 nm | NA | UVA_UV_254 | 11NPSWRD_WQX | #/cm | Dissolved | 0 | 1.3000000 | 1.840000e+01 | 319 |
| FDOM | Colored dissolved organic matter (CDOM) | Turner Designs fluoro.,365/470nm | 32295 | FL013 | USGS | ug/l QSE | Dissolved | 0 | 37.9000000 | 1.087000e+03 | 207 |
| Absorbance at 254 nm | UV 254 | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | units/cm | NA | 2 | 2.8000000 | 1.370000e+02 | 194 |
| Fluorescence index | Fluorescence index | Fluorescence Index | 32312 | ABS06 | USGS | None | Dissolved | 2 | 1.0000000 | 2.000000e+00 | 187 |
| FDOM | Fluorescence, excitation 260 emission 450 | Spectrometry - 5 x 2 matrix scan | 32304 | ABS04 | USGS | RU | Dissolved | 0 | 0.0480000 | 2.452000e+00 | 187 |
| FDOM | Fluorescence, excitation 275 emission 340 | Spectrometry - 5 x 2 matrix scan | 32311 | ABS04 | USGS | RU | Dissolved | 0 | 0.0120000 | 6.921000e-01 | 187 |
| FDOM | Fluorescence, excitation 280 emission 370 | Spectrometry - 5 x 2 matrix scan | 32310 | ABS04 | USGS | RU | Dissolved | 0 | 0.0190000 | 1.048000e+00 | 187 |
| FDOM | Fluorescence, excitation 300 emission 390 | Spectrometry - 5 x 2 matrix scan | 32309 | ABS04 | USGS | RU | Dissolved | 0 | 0.0220000 | 1.218000e+00 | 187 |
| FDOM | Fluorescence, excitation 340 emission 440 | Spectrometry - 5 x 2 matrix scan | 32332 | ABS04 | USGS | RU | Dissolved | 0 | 0.0290000 | 1.312000e+00 | 187 |
| FDOM | Fluorescence, excitation 370 emission 460 | Spectrometry - 5 x 2 matrix scan | 32333 | ABS04 | USGS | RU | Dissolved | 0 | 0.0200000 | 1.105000e+00 | 187 |
| FDOM | Fluorescence, excitation 390 emission 510 | Spectrometry - 5 x 2 matrix scan | 32307 | ABS04 | USGS | RU | Dissolved | 0 | 0.0110000 | 6.395000e-01 | 187 |
| Absorbance at 412 nm | Absorbance at 412 nm | UV absorbance, wf, 412 nm | 66700 | UV010 | USGS | AU/cm | Dissolved | 0 | 0.4000000 | 1.570000e+01 | 165 |
| Absorbance at 254 nm | UV 254 | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | units/cm | Dissolved | 0 | 18.4000000 | 2.930000e+02 | 165 |
| FDOM | Colored dissolved organic matter (CDOM) | NA | 32330 | NA | NA | mg/l | Total | 2 | 120.0000000 | 2.964000e+04 | 154 |
| SUVA | Specific UV Absorbance at 254 nm, corrected for Fe | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | L/mg-cm | Dissolved | 0 | 246.0000000 | 4.510000e+02 | 153 |
| FDOM | Colored dissolved organic matter (CDOM) | NA | NA | NA | NA | ug/L | NA | 2 | 19.0714216 | 1.065636e+02 | 130 |
| Absorbance at 440 nm | Absorption coefficient at 440 nm | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | #/cm | NA | 2 | 0.2053158 | 1.745181e+01 | 123 |
| SUVA | Specific UV Absorbance at 254 nm | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | L/mg-cm | NA | 2 | 1.2248924 | 4.844491e+00 | 123 |
| FDOM | Fluorescence, excitation 420 emission 460 | Fluorescence scan, 250-620 nm | 32355 | FL020 | USGS | RU | Dissolved | 0 | 0.0215000 | 5.180000e-01 | 119 |
| Absorbance at 254 nm | UV Absorption, relative conc. of organic constituents | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | units/cm | Total | 2 | 9.0800000 | 1.460000e+02 | 97 |
| Absorbance at 254 nm | Specific UV Absorbance at 254 nm | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | units/cm | Total | 2 | 0.0300000 | 2.514000e+01 | 95 |
| FDOM | Fluorescence, excitation 275 emission 304 | Spectrometry - 5 x 2 matrix scan | 32305 | ABS04 | USGS | RU | Dissolved | 0 | 0.0186000 | 2.625000e-01 | 93 |
| Absorption spectral slope, 412 to 676 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32303 | ABS02 | USGS | NA | Dissolved | 0 | 0.0001329 | 1.718640e-02 | 85 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | NA | 61726 | NA | NA | units/cm | Dissolved | 0 | 0.3000000 | 6.780000e+01 | 78 |
| Absorbance at 254 nm | UV 254 | Determination of Total Organic Carbon and Specific UV Absorbance at 254 nm in Source Water and Drinking Water | NA | 415.3 | USEPA | None | NA | 2 | 0.0120000 | 2.270000e-01 | 73 |
| SUVA | UV Absorption, relative conc. of organic constituents | Determination of Total Organic Carbon and Specific UV Absorbance at 254 nm in Source Water and Drinking Water | NA | 415.3 | USEPA | None | NA | 2 | 1.6000000 | 4.400000e+00 | 73 |
| FDOM | Fluorescence, excitation 275 emission 304 | Spectrometry - 5 x 2 matrix scan | 32305 | ABS04 | USGS | NA | Dissolved | 2 | 0.0017291 | 2.483674e-01 | 73 |
| Absorbance at 440 nm | Colored dissolved organic matter (CDOM) | CDOM absorption (440nm) | NA | CDOM absorption | RTI | m | NA | 2 | 0.0793651 | 2.813000e+01 | 58 |
| FDOM | Colored dissolved organic matter (CDOM) | YSI EXO, 365/480/80 nm | 32295 | FL029 | USGS | ug/l QSE | Dissolved | 0 | 5.2500000 | 9.320000e+01 | 54 |
| FDOM | Colored dissolved organic matter (CDOM) | YSI EXO, 365/480/80 nm | 32322 | FL029 | USGS | RFU | Total | 2 | 1.7500000 | 2.448000e+01 | 48 |
| Absorbance at 440 nm | Absorbance at 440 nm | Gross Alpha and Beta Activity in Water | NA | 00-01 | USEPA | NA | Total | 2 | 0.0000073 | 4.830000e-04 | 41 |
| Absorbance at 440 nm | Absorbance at 440 nm | Determination of UV Absorbance at 440 nm in Water, by MDH modified method | NA | MDH055 | MNPCA | NA | Total | 2 | 0.0000239 | 3.582000e-03 | 38 |
| Absorbance at 412 nm | Absorbance at 412 nm | UV absorbance, wf, 412 nm | 66700 | UV010 | USGS | NA | Dissolved | 2 | 0.0000468 | 5.267600e-03 | 34 |
| Absorbance at 254 nm | UV 254 | Determination of Total Organic Carbon and Specific UV Absorbance at 254 nm in Source Water and Drinking Water | NA | 415.3 | USEPA | units/cm | Total | 2 | 4.5500000 | 4.280000e+01 | 34 |
| Absorption spectral slope, 350 to 400 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32302 | ABS02 | USGS | NA | Dissolved | 0 | 0.0000533 | 8.895900e-03 | 18 |
| FDOM | Fluorescence, excitation 275 emission 304 | Fluorescence scan, 250-620 nm | 32305 | FL020 | USGS | NA | Dissolved | 2 | 0.0009441 | 2.497560e-02 | 18 |
| Absorption spectral slope, 290 to 350 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32301 | ABS02 | USGS | NA | Dissolved | 0 | 0.0000715 | 7.584400e-03 | 14 |
| Absorbance at 254 nm | UV 254 | Determination of UV Absorbance at 440 nm in Water, by MDH modified method | NA | MDH055 | MNPCA | None | Dissolved | 2 | 1.0000000 | 1.000000e+00 | 11 |
| Absorption spectral slope, 412 to 600 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32331 | ABS02 | USGS | NA | Dissolved | 0 | 0.0002522 | 2.449000e-03 | 10 |
| Absorbance at 254 nm | UV 254 | Computation by NWIS algorithm | 63162 | ALGOR | USGS | NA | Dissolved | 2 | 0.0091553 | 7.818130e-02 | 8 |
| Absorbance at 254 nm | UV 254 | 5910 B ~ UV - Absorbing Organic Compounds | NA | 5910-B | APHA | units/cm | None | 2 | 0.0000000 | 3.540000e+00 | 6 |
| FDOM | Colored dissolved organic matter (CDOM) | YSI EXO fluorometer, 365/480 nm | 32322 | FL017 | USGS | RFU | Total | 2 | 0.0600000 | 1.266000e+01 | 5 |
| Absorbance at 254 nm | UV 254 | UV Absorb, 254 nm, Supor (NWQL) | 50624 | UV006 | USGS | NA | Dissolved | 2 | 0.0000515 | 1.520700e-03 | 5 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | UV Absorb, 280 nm, Supor (annon) | 61726 | UV002 | USGS | NA | Dissolved | 2 | 0.0001215 | 1.596300e-03 | 5 |
| Absorption spectral slope, 275 to 295 nm | Absorption spectral slope (Sag) | Sag, specified wavelength range | 32300 | ABS02 | USGS | NA | Dissolved | 0 | 0.0002761 | 1.240770e-02 | 4 |
| FDOM | Colored dissolved organic matter (CDOM) | NA | 32314 | NA | NA | ug/l QSE | Dissolved | 2 | 49.8000000 | 1.210000e+02 | 3 |
| Absorbance at 254 nm | UV 254 | UV Absorb, 254 nm, silver (NWQL) | 50624 | UV004 | USGS | NA | Dissolved | 2 | 0.0014522 | 1.645300e-03 | 2 |
| Fluorescence index | Fluorescence index | Fluorescence scan, 250-620 nm | 32312 | FL020 | USGS | NA | Dissolved | 0 | 0.0524613 | 2.471460e-01 | 2 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | UV Absorb, 280 nm, GFF (NWQL) | 61726 | UV007 | USGS | NA | Dissolved | 2 | 0.0004394 | 1.803810e-02 | 2 |
| Absorbance at 254 nm | UV 254 | UV Absorb, 254 nm, GFF (NWQL) | 50624 | UV005 | USGS | NA | Dissolved | 2 | 0.0010575 | 1.838950e-02 | 2 |
| Absorbance at 254 nm | UV 254 | NA | 50624 | NA | NA | NA | Dissolved | 2 | 0.0002493 | 2.774000e-04 | 2 |
| Absorbance at 440 nm | Absorbance at 440 nm | Absorbance, 200 to 800 nm | 32299 | ABS01 | USGS | NA | Dissolved | 2 | 0.0006541 | 1.257400e-03 | 2 |
| Absorbance at 254 nm | UV 254 | UV - Absorbing Organic Compounds | NA | 5910-B | KAWNATON_WQX | cm | NA | 2 | 12.4000000 | 1.240000e+01 | 1 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | UV Absorb, 280 nm,silver (annon) | 61726 | UV001 | USGS | NA | Dissolved | 2 | 0.0002786 | 2.786000e-04 | 1 |
| Absorbance at 440 nm | Absorbance at 440 nm | Determination of UV Absorbance at 440 nm in Water | NA | USEPA 415.3_440 | MNPCA | #/cm | Total | 2 | 4.3300000 | 4.330000e+00 | 1 |
| SUVA | Specific UV Absorbance at 254 nm | Determination of Total Organic Carbon and Specific UV Absorbance at 254 nm in Source Water and Drinking Water | NA | 415.3 | USEPA | L/mg-cm | Total | 2 | 258.2000000 | 2.582000e+02 | 1 |
| SUVA | Specific UV Absorbance at 254 nm, corrected for Fe | Determination of Total Organic Carbon and Specific UV Absorbance at 254 nm in Source Water and Drinking Water | NA | 415.3 | USEPA | L/mg-cm | Total | 2 | 238.5000000 | 2.385000e+02 | 1 |
| Absorbance at 254 nm | UV 254 | Determination of Total Organic Carbon and Specific UV Absorbance at 254 nm in Source Water and Drinking Water | NA | 415.3 | USEPA | #/cm | Total | 2 | 22.3300000 | 2.233000e+01 | 1 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | NA | 61726 | NA | NA | NA | Dissolved | 0 | 0.0000386 | 3.860000e-05 | 1 |
| Absorbance at 254 nm | UV 254 | UV absorbance, wf, 254 nm | 50624 | UV008 | USGS | NA | Dissolved | 2 | 0.0001090 | 1.090000e-04 | 1 |
| Absorbance at 280 nm | Absorbance at 280 nanometers | Absorbance, 200 to 800 nm | 32296 | ABS01 | USGS | NA | Dissolved | 2 | 0.0025234 | 2.523400e-03 | 1 |
| Absorbance at 370 nm | Absorbance at 370 nanometers | Absorbance, 200 to 800 nm | 32297 | ABS01 | USGS | NA | Dissolved | 2 | 0.0014879 | 1.487900e-03 | 1 |
The above process drops 0 rows, leaving 175.7 thousand remaining.
9.2.10 Flag based on field methods
Next, we use the field_flag column to flag records based on the type
of sampling equipment that was used:
field_flag=0when discrete sampling methods were used. Indicated by detection of “grab”, “bucket”, “point”, “kemmerer”, “van dorn”, “bailer”, or “bottle” inSampleCollectionEquipmentNamefield_flag=1when integrated sampling methods were used. Indicated by detection of “integrated”, “multiple”, or “pump” inSampleCollectionEquipmentNamefield_flag=2for all other records
No records should be removed by this process and so there are 0 rows dropped leaving 175.7 thousand remaining in the harmonized CDOM dataset.
9.2.11 Miscellaneous flag
Next, we populate the miscellaneous flag (misc_flag) column. The
miscellaneous flag is used in this dataset to mark measurements that
fall outside published maximum records based on a review of the relevant
literature. We do this instead of filtering the records out, but a
depth-related filter still occurs in the next harmonization step.
For this flag column, values are as follows: 0 = does not exceed published range, 1 = below published minimum, and 2 = above published maximum. Absorption spectral slope parameters, SUVA, and Fluorescence index are checked for values below minimum or above maximum published records, while the rest of the parameter options are just checked for values above published maximums. We use the absolute value of absorption spectral slope when testing against thresholds.
Thresholds used and citations are below, followed by a table with counts by parameters, unit, and flag value:
| Parameter | Units | Min value | Max value | Source |
|---|---|---|---|---|
| Absorbance at 254 nm | AU/m | - | 450 | Yates et al. 2023 |
| Absorbance at 280 nm | AU/m | - | 36.1 | Korak and McKay 2024 |
| Absorbance at 370 nm | AU/m | - | 11 | Korak and McKay 2024 |
| Absorbance at 412 nm | AU/m | - | 14 | We use the threshold for 440 nm (below) as a conservative estimate |
| Absorbance at 440 nm | AU/m | - | 14 | Brezonik et al. 2019 |
| Absorption spectral slope (multiple) | nm-1 | 0.0026 | 0.0473 | Korak and McKay 2024 |
| FDOM | RU | - | 4.38 | Korak and McKay 2024 |
| FDOM | ug/L, ug/l QSE | - | 84.4 | Avouris et al. 2025 |
| Fluorescence index | None | 1.31 | 2.18 | Korak and McKay 2024 |
| SUVA | L/mgDOC*m | 0.651 | 6.83 | Korak and McKay 2024 |
| parameter | harmonized_units | misc_flag | n |
|---|---|---|---|
| Absorbance at 254 nm | AU/m | 0 | 49634 |
| Absorbance at 254 nm | AU/m | 2 | 3 |
| Absorbance at 280 nm | AU/m | 0 | 27431 |
| Absorbance at 280 nm | AU/m | 2 | 1525 |
| Absorbance at 370 nm | AU/m | 0 | 2653 |
| Absorbance at 370 nm | AU/m | 2 | 24 |
| Absorbance at 412 nm | AU/m | 0 | 2826 |
| Absorbance at 412 nm | AU/m | 2 | 2 |
| Absorbance at 440 nm | AU/m | 0 | 5347 |
| Absorbance at 440 nm | AU/m | 2 | 12 |
| Absorption spectral slope, 275 to 295 nm | nm-1 | 0 | 2681 |
| Absorption spectral slope, 275 to 295 nm | nm-1 | 1 | 3 |
| Absorption spectral slope, 290 to 350 nm | nm-1 | 0 | 2679 |
| Absorption spectral slope, 290 to 350 nm | nm-1 | 1 | 5 |
| Absorption spectral slope, 350 to 400 nm | nm-1 | 0 | 2671 |
| Absorption spectral slope, 350 to 400 nm | nm-1 | 1 | 8 |
| Absorption spectral slope, 350 to 400 nm | nm-1 | 2 | 1 |
| Absorption spectral slope, 400 to 500 nm | nm-1 | 0 | 1414 |
| Absorption spectral slope, 412 to 600 nm | nm-1 | 0 | 973 |
| Absorption spectral slope, 412 to 600 nm | nm-1 | 1 | 10 |
| Absorption spectral slope, 412 to 676 nm | nm-1 | 0 | 1543 |
| Absorption spectral slope, 412 to 676 nm | nm-1 | 1 | 51 |
| FDOM | RFU | 0 | 2955 |
| FDOM | RU | 0 | 18391 |
| FDOM | RU | 2 | 38 |
| FDOM | ug/L | 0 | 1719 |
| FDOM | ug/L | 2 | 633 |
| FDOM | ug/l QSE | 0 | 2053 |
| FDOM | ug/l QSE | 2 | 291 |
| Fluorescence index | None | 0 | 2628 |
| Fluorescence index | None | 1 | 37 |
| SUVA | L/mgDOC*m | 0 | 43998 |
| SUVA | L/mgDOC*m | 1 | 130 |
| SUVA | L/mgDOC*m | 2 | 1374 |
No records should be removed by this process and so there are 0 rows dropped leaving 175.7 thousand remaining in the harmonized CDOM dataset.
9.2.12 Remove unrealistic values
Before finalizing the dataset we check for and remove any depths > 592m, the deepest point in a lake in the U.S.
0 rows are removed. The final row count after this is 175.7 thousand.
9.2.13 Aggregate simultaneous records
The final step of CDOM harmonization is to aggregate simultaneous observations.
The aggregation process: Any group of samples determined to be
simultaneous are simplified into a single record containing the mean and
coefficient of variation (CV) of the group. These can be either true
duplicate entries in the WQP or records with non-identical values
recorded at the same time and place and by the same organization (field
and/or lab replicates/duplicates). The CV can be used to filter the
dataset based on the amount of variability that is tolerable to specific
use cases. Note, however, that many entries will have a CV that is NA
because there are no simultaneous records or 0 because the records are
duplicates and all entries have the same harmonized_value.
We identify simultaneous records to aggregate by creating identical
subgroups (subgroup_id) from the following columns: parameter,
OrganizationIdentifier, MonitoringLocationIdentifier,
MonitoringLocationTypeName, ResolvedMonitoringLocationTypeName,
ActivityStartDate, ActivityStartTime.Time,
ActivityStartTime.TimeZoneCode, harmonized_tz,
harmonized_local_time, harmonized_utc, ActivityStartDateTime,
harmonized_top_depth_value, harmonized_top_depth_unit,
harmonized_bottom_depth_value, harmonized_bottom_depth_unit,
harmonized_discrete_depth_value, harmonized_discrete_depth_unit,
depth_flag, mdl_flag, approx_flag, greater_flag, tier,
field_flag, misc_flag, harmonized_units. This selection limits the
columns included in the final dataset, but we also provide a copy of the
AquaMatch dataset prior to its aggregation (pipeline target
p3_cdom_preagg_grouped; available as an RDS file in the data release:
AquaMatch_harmonize_WQP/_targets/objects/p3_cdom_preagg_grouped), and
include the subgroup_id column, so that users can use the
disaggregated data as well and match make joins between dataset
versions.
Note: We hold out any records where misc_flag is 1 or 2 from the
aggregation step above, and as a result those records are not included
in the p3_cdom_preagg_grouped object, nor its RDS export. The records
are added back into the final dataset once the aggregation is complete,
but they have NA for subgroup_id, harmonized_value_cv, and
harmonized_row_count because they are never grouped or aggregated. We
do this to allow end users to decide how they want to handle values that
exceed published maximums.
The final values are presented in the harmonized_value and
harmonized_value_cv columns. The number of rows used per group is
recorded in the harmonized_row_count column.
1.6263^{4} rows dropped leaving 159.5 thousand remaining in the final harmonized and aggregated CDOM dataset.
9.2.14 Harmonized CDOM
At this point the harmonization of the CDOM data from the WQP is complete and we export the final dataset for use later in the workflow.
Below are several sets of figures illustrating different qualities of the dataset:
- Histograms showing the distribution of harmonized measurements (top)
and CVs (bottom) broken down by
tierafter aggregating simultaneous records. There are very few simultaneous records in the dataset, so only a few CVs are produced.
- Maps showing the geographic distribution of records by tier within the US:
- Bar charts showing the distribution of values by tier across years, months, and days of the week:
- Lastly, bar charts showing the distribution of depth values by parameter, location type, and tier:
