The goal of achieving net zero emissions by 2050 is a significant driver of research focused on understanding greenhouse gas (GHG) flux dynamics and the autotrophic production of ecosystems. This ambitious target highlights the urgent need to explore how different ecosystems contribute to GHG emissions and how they can be managed to reduce these emissions effectively. To support this research, this R package offers a variety of useful tools designed to facilitate the analysis of data related to GHG flux and production in aquatic environments.
This vignette is organized into four parts: GHG flux calculation, covering static-chamber flux measurements from raw analyser output through to a final flux estimate; NEM calculation, covering dissolved oxygen (DO) logger data and the resulting gross primary production (GPP), ecosystem respiration (ER), and net ecosystem metabolism (NEM); Statistical analysis, a set of helpers for descriptive statistics, outlier detection, normality testing, and group comparisons; and Visualization, plotting wrappers and colour palettes built on a shared theme.
Various techniques are used to measure greenhouse gas (GHG) flux, including eddy covariance (EC), the boundary layer method (BLM), and the static chamber method. This study provides R functions specifically designed to analyze data from GHG flux measurements using the static chamber method with a portable gas analyzer.
tidy_ghg_analyzer()First, we need to obtain necessary parameters from the raw data
downloaded from the portable gas analyzer. We will remove unnecessary
rows and columns and extract the GHG concentrations along with the
corresponding date and time recorded by the analyzer using
tidy_ghg_analyzer(). Additionally, this function will also
eliminate any NaN values in the GHG data. Currently, this function
supports raw data downloaded from:
(i) LI-COR Trace Gas Analyzer (LI-7810 and LI-7820). (ii) ABB LGR-ICOS
Gas Analyzer (M-GGA-918).
# The provided file is a raw data file downloaded from
# the LI-COR Trace Gas Analyzer
ghg_data_path <- system.file("extdata", "ch4.xlsx", package = "aelab", mustWork = T)
ch4 <- tidy_ghg_analyzer(ghg_data_path, "ch4")
ch4[c(1:3), ]## DATE TIME CO2 CH4 date_time
## 1 2023/03/11 07:31:59 799.9406 2999.952 2023-03-11 07:31:59
## 2 2023/03/11 07:32:00 770.1415 2995.596 2023-03-11 07:32:00
## 3 2023/03/11 07:32:01 771.6826 2993.581 2023-03-11 07:32:01
Sometimes, the date and time recorded by the analyzer do not match
the actual date and time we recorded in situ. Therefore, we need to
convert the date_time column from the previous step to
align with the real-life time, if there are any discrepancies:
# The analyzer's time was assumed to be
# 15 minutes and 30 seconds faster than real time
ch4 <- convert_time(ch4, min = -15, sec = 30)
ch4[c(1:3), c(5:6)]## date_time real_datetime
## 1 2023-03-11 07:31:59 2023-03-11 07:17:29
## 2 2023-03-11 07:32:00 2023-03-11 07:17:30
## 3 2023-03-11 07:32:01 2023-03-11 07:17:31
calculate_regression()We first need to obtain the slope from the linear regression of GHG
concentration changes over time within the chamber to calculate the
flux. This can be done by defining the measurement time interval from a
reference file (see Method 1 below), directly in R (see Method 2 below),
or from a compact inline string via make_reference() (see
Method 3 below).
In Excel, enter the date and time when the GHG flux measurement
started, then load the file into R. Make sure the column containing the
date and time information is formatted in ISO 8601 as
YYYY-MM-DD hh:mm:ss.
ref_data_path <- system.file("extdata", "reference.xlsx", package = "aelab", mustWork = T)
ref <- read_excel(ref_data_path)
ref## # A tibble: 3 × 5
## real_date real_time date_time site analyzer
## <dttm> <dttm> <dttm> <chr> <chr>
## 1 2023-03-11 00:00:00 1900-01-01 07:32:00 2023-03-11 07:32:00 S1 1337
## 2 2023-03-11 00:00:00 1900-01-01 08:32:00 2023-03-11 08:32:00 S2 1337
## 3 2023-03-11 00:00:00 1900-01-01 09:32:00 2023-03-11 09:32:00 S3 1337
Now, we can perform regressions on the GHG concentrations using
calculate_regression(). The data frame passed to its
reference_df argument — here, ref — must
contain a date_time column (measurement start times), a
site column, and an analyzer column
(e.g. "1337").
Two window modes are available via window_type:
"sliding" (default): moves a
fixed-length sub-window (default num_rows = 300 rows ≈ 5
min) across the full measurement period and keeps the position with the
highest R². Useful when the exact measurement start is uncertain."fixed": fits a single window of
duration_minutes starting at start_offset_s
seconds after the reference time. More reproducible — all gases are
evaluated over the same interval. The first 30 s of chamber deployment —
the “dead-bar period” — are mechanically disturbed; skipping them with
start_offset_s = 30 is recommended.# Sliding window (default): best 5-min sub-window within 7-min period
calculate_regression(ch4, ref, ghg = "CH4", reference_time = "date_time",
site = "site", analyzer_code = "1337",
duration_minutes = 7, num_rows = 300)## # A tibble: 3 × 12
## start_time end_time slope r_square correlation b_param fit_used g_factor
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 2023/03/11 07… 2023/03… -0.0406 0.671 -0.819 NA linear NA
## 2 2023/03/11 08… 2023/03… -0.207 0.671 -0.819 NA linear NA
## 3 2023/03/11 09… 2023/03… 0.104 0.444 0.666 NA linear NA
## # ℹ 4 more variables: flag <chr>, reference_time <dttm>, site <chr>,
## # analyzer <chr>
# Fixed window: regress from t+30 s to t+7 min (skips dead-bar period)
calculate_regression(ch4, ref, ghg = "CH4", reference_time = "date_time",
site = "site", analyzer_code = "1337",
window_type = "fixed", start_offset_s = 30)## # A tibble: 3 × 12
## start_time end_time slope r_square correlation b_param fit_used g_factor
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 2023/03/11 07… 2023/03… -0.101 0.489 -0.700 NA linear NA
## 2 2023/03/11 08… 2023/03… 0.0408 0.0869 0.295 NA linear NA
## 3 2023/03/11 09… 2023/03… 0.0856 0.548 0.740 NA linear NA
## # ℹ 4 more variables: flag <chr>, reference_time <dttm>, site <chr>,
## # analyzer <chr>
When reference_df contains a
duration_minutes and/or start_offset_s column,
a non-NA value in that column overrides the global
parameter for that specific measurement. Rows without the column, or
with NA, fall back to the global value. This is useful when
individual closures have different deployment lengths or when one
measurement needs a longer dead-bar skip.
The fit_type parameter controls the regression
model:
"linear" (default): ordinary
least-squares slope. Valid when concentration change is small relative
to ambient."quadratic": second-order polynomial;
the reported slope is the initial rate at t = 0 (the coefficient of
t)."exp_tz" /
"exp_zhao18": Hutchinson–Mosier nonlinear
exponential models; slope evaluated at t_zero."auto": tries linear → quadratic →
exp_tz in sequence, accepting a more complex model only when R² improves
by > 0.01 and the exponential curvature parameter b does not
exceed κmax (Hüppi et al. 2018). The model chosen for each
measurement is recorded in the fit_used output column.When fit_type = "auto", analyser precision is looked up
automatically from the built-in analyzer_precision dataset
(LI-7810, LI-7820, and LGR UGGA) if instrument_precision
(see Quality flags below) is not supplied. The lookup matches the
analyzer_code argument against model names
case-insensitively, so "LGR" correctly resolves to LGR UGGA
precision rather than falling back to LI-7810.
# Auto model selection with fixed window and 30-s dead-bar skip
calculate_regression(ch4, ref, ghg = "CH4", reference_time = "date_time",
site = "site", analyzer_code = "1337",
fit_type = "auto",
window_type = "fixed", start_offset_s = 30)## # A tibble: 3 × 12
## start_time end_time slope r_square correlation b_param fit_used g_factor
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 2023/03/11 07… 2023/03… 0.155 0.700 -0.700 NA quadrat… -1.53
## 2 2023/03/11 08… 2023/03… -0.392 0.741 0.295 NA quadrat… -9.60
## 3 2023/03/11 09… 2023/03… -0.0855 0.694 0.740 NA quadrat… -0.999
## # ℹ 4 more variables: flag <chr>, reference_time <dttm>, site <chr>,
## # analyzer <chr>
Each measurement receives a flag in the output:
| Flag | Meaning |
|---|---|
ok |
R² ≥ threshold and correlation ≥ threshold; use this slope |
discard |
Detectable trend but poor fit (R² below threshold) |
zero |
No detectable trend (correlation below threshold, or slope within instrument noise) |
no_data |
Insufficient data in the window |
Two optional parameters tighten the quality control:
instrument_precision: analyser
precision in ppm. Slopes whose absolute value falls below
precision / window_duration_s are flagged
"zero" even if R² passes. Also used by
fit_type = "auto" to compute κmax per
measurement.g_factor_threshold: when
fit_type = "auto" selects a nonlinear model, the g-factor
(chosen slope / linear baseline slope) is computed and recorded in the
g_factor output column. If
abs(g_factor) > g_factor_threshold, the flag is
overridden to "discard" (Gaudard et al. 2025). Default
NULL (disabled).The reference information can also be constructed directly in R as a data frame.
ref_direct <- data.frame(
date_time = as.POSIXct("2023-03-11 07:32:00", tz = "Asia/Taipei"),
site = "S1",
analyzer = "1337"
)
calculate_regression(ch4, ref_direct, ghg = "CH4", reference_time = "date_time",
site = "site", analyzer_code = "1337")## # A tibble: 1 × 12
## start_time end_time slope r_square correlation b_param fit_used g_factor
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 2023/03/11 07… 2023/03… -0.0406 0.671 -0.819 NA linear NA
## # ℹ 4 more variables: flag <chr>, reference_time <dttm>, site <chr>,
## # analyzer <chr>
make_reference()make_reference() builds a reference_df from
a compact multiline string — one closure per line, whitespace-separated:
time site analyzer [dur=N] [off=N]. This avoids
maintaining a separate Excel file for small campaigns or ad-hoc
measurements.
## # A tibble: 2 × 3
## site date_time analyzer
## <chr> <dttm> <chr>
## 1 S1 2023-03-11 07:32:00 1337
## 2 S2 2023-03-11 07:45:00 1337
Per-row window overrides use dur= and off=
tokens. Rows without an override simply omit the token.
make_reference(
date = "2023-03-11",
measurements = "
07:32:00 S1 1337 dur=3.5
07:45:00 S2 1337 dur=2.0 off=45
08:00:00 S3 1337
"
)## # A tibble: 3 × 5
## site date_time analyzer duration_minutes start_offset_s
## <chr> <dttm> <chr> <dbl> <dbl>
## 1 S1 2023-03-11 07:32:00 1337 3.5 NA
## 2 S2 2023-03-11 07:45:00 1337 2 45
## 3 S3 2023-03-11 08:00:00 1337 NA NA
When output is specified, the tibble is also written to
.xlsx:
calculate_ghg_flux()The equation we used to calculate the GHG flux was derived from Yong et al. (2024):
\[ F = \frac{(S \times V \times c)}{R \times T \times A} \]
where S is the slope obtained from the linear regression of GHG concentration changes over time (ppm s\(^{-1}\)), V is the chamber volume (liters), c is the conversion factor from seconds to hours, R is the ideal gas constant (0.082 L atm K\(^{−1}\) mol\(^{−1}\)), T is the temperature inside the chamber (kelvin), and A is the surface area of the chamber (m\(^2\)).
Therefore, the function calculate_ghg_flux() requires a
dataframe with columns for the slope, area, volume, and temperature.
Note that the units of the parameters must match those described above,
resulting in a flux with units of mmol m\(^{-2}\) d\(^{-1}\).
results_ch4 <- calculate_regression(ch4, ref_direct, ghg = "CH4", reference_time = "date_time",
site = "site", analyzer_code = "1337")
# In practice, filter to flag == "ok" before calculating flux
flux_ch4 <- data.frame(
slope = results_ch4$slope,
area = 1, # in square meter
volume = 1, # in litre
temp = 1) # in celsius
calculate_ghg_flux(flux_ch4)## slope area volume temp flux unit
## 1 -0.04058922 1 1 1 -0.006495451 mmol m-2 h-1
convert_ghg_unit()convert_ghg_unit() converts a GHG flux value between any
supported flux units. The default output is µg m⁻² h⁻¹, but the target
unit is fully configurable via to_mass,
to_area, and to_time. The function is
vectorized over numeric columns and works directly inside
dplyr::mutate().
dplyr::mutate() for character columns.NA input: returns NA
without error.# Numeric input: 97 mg CH4 m-2 h-1 → µg m-2 h-1 (default target)
convert_ghg_unit(97, ghg = "ch4", mass = "mg", area = "m2", time = "hr")## [1] 97000
# Numeric vector: works directly in dplyr::mutate()
convert_ghg_unit(c(1.2, 0.8, NA), ghg = "ch4", mass = "µmol", area = "m2", time = "hr")## [1] 19.25 12.83 NA
# Custom target unit: µmol m-2 h-1 → mmol m-2 d-1
convert_ghg_unit(97, ghg = "ch4", mass = "µmol", area = "m2", time = "hr",
to_mass = "mmol", to_area = "m2", to_time = "day")## [1] 2.33
# Character string input: each number is converted individually, string is preserved
convert_ghg_unit("0.002 ± 0.003", ghg = "ch4", mass = "mmol", area = "m2", time = "hr")## [1] "32.08 ± 48.12"
calculate_MDF()calculate_MDF() calculates the Minimum Detectable Flux
(MDF) for a static chamber system. This is useful for evaluating whether
a gas analyser is precise enough to detect fluxes at the expected
magnitude of the study site.
calculate_MDF(
precision_ppm = 1,
closure_time_s = 300,
data_point_n = 300,
chamber_volume_m3 = 0.0064,
temperature_C = 25,
chamber_area_m2 = 0.07
)## $MDF
## [1] 113.953
##
## $unit
## [1] "µg m⁻² h⁻¹"
The change in oxygen concentration in the water reflects the balance between photosynthetic production, respiratory consumption, and physical exchange at the water-atmosphere interface. By measuring dissolved oxygen (DO) concentrations at high frequency with in situ data loggers, we can calculate gross primary production (GPP), ecosystem respiration (ER), and net ecosystem metabolism (NEM), as detailed by Staehr et al. (2010).
process_hobo()First, we need to obtain the necessary parameters from the raw data
downloaded from the data loggers. process_hobo() removes
unnecessary rows and columns, extracts the dissolved oxygen
concentration and water temperature along with their date and time,
drops rows with NA values, and labels the output with the
supplied no_hobo identifier for use in later steps.
hobo_data_path <- system.file("extdata", "ex_hobo.csv", package = "aelab")
do <- process_hobo(hobo_data_path, no_hobo = "code_for_logger", type = "do")
do[c(1:3), ]## date_time do temp no_hobo
## 1 2024-04-08 12:00:00 7.69 25.86 code_for_logger
## 2 2024-04-08 12:30:00 7.67 26.02 code_for_logger
## 3 2024-04-08 13:00:00 7.65 26.20 code_for_logger
process_hobo() accepts both Chinese (上午/下午) and
English (AM/PM or 24-hour) HOBOware locale exports. For
type = "do" (the default, used above for a HOBO U26
Dissolved Oxygen Data Logger), it also drops readings with a negative
(sensor-error) DO or temperature value. Set type = "temp"
to process a HOBO Pro v2 Temperature/RH Logger instead, which returns
air_temp and rh columns in place of
do and temp, with one row per reading.
The air pressure and wind speed at the sampling site should be
obtained either manually or from a nearby weather station.
process_weather() tidies the CSV file downloaded from
Taiwan’s weather station. In addition to extracting the columns with air
pressure and wind speed values, it also duplicates the hourly data to
match the half-hourly recorded frequency of the DO concentrations.
weather_data_path <- system.file("extdata", "ex_weather.csv", package = "aelab")
weather <- process_weather(weather_data_path, date = "2024-04-10", zone = "zone_A")
weather[c(1:5), ]## pressure_hpa wind_ms date_time zone
## 1 1015.2 1.4 2024-04-10 00:30:00 zone_A
## 2 1015.2 1.4 2024-04-10 01:00:00 zone_A
## 3 1015.0 1.6 2024-04-10 01:30:00 zone_A
## 4 1015.0 1.6 2024-04-10 02:00:00 zone_A
## 5 1014.9 1.4 2024-04-10 02:30:00 zone_A
In addition to the previously mentioned parameters, we also need the
water depth, salinity, start and end date and time of the data logger
deployment, and the sunrise and sunset times during that period. These
parameters can be entered in Excel and then imported into R using
process_info() to ensure that all necessary parameters are
included. The column zone corresponds to
process_weather(), and column no_hobo
corresponds to process_hobo().
info_data_path <- system.file("extdata", "info.xlsx", package = "aelab")
info <- process_info(info_data_path)
info## # A tibble: 1 × 9
## zone site no_hobo depth_m salinity start_date_time end_date_time
## <chr> <chr> <chr> <dbl> <dbl> <dttm> <dttm>
## 1 zone_A site_A code_f… 1 10 2024-04-10 00:00:00 2024-04-10 23:59:00
## # ℹ 2 more variables: sunrise <dttm>, sunset <dttm>
We can then merge the air pressure and wind speed data from
process_weather() with the processed logger data, combine
the result with the remaining parameters using the zone and
no_hobo columns, and use plot_hobo() to
inspect the resulting DO concentrations.
data <- merge(weather, do, by = "date_time")
merged_df <- data %>%
inner_join(info, by = c("zone", "no_hobo"))
merged_df[c(1:3), ]## date_time pressure_hpa wind_ms zone do temp no_hobo
## 1 2024-04-10 00:30:00 1015.2 1.4 zone_A 8.28 25.10 code_for_logger
## 2 2024-04-10 01:00:00 1015.2 1.4 zone_A 8.07 25.04 code_for_logger
## 3 2024-04-10 01:30:00 1015.0 1.6 zone_A 7.80 24.98 code_for_logger
## site depth_m salinity start_date_time end_date_time
## 1 site_A 1 10 2024-04-10 2024-04-10 23:59:00
## 2 site_A 1 10 2024-04-10 2024-04-10 23:59:00
## 3 site_A 1 10 2024-04-10 2024-04-10 23:59:00
## sunrise sunset
## 1 1899-12-31 05:45:00 1899-12-31 18:15:00
## 2 1899-12-31 05:45:00 1899-12-31 18:15:00
## 3 1899-12-31 05:45:00 1899-12-31 18:15:00
For multi-site or multi-day deployments, the steps above can be
replaced with a batch pipeline built from five functions:
combine_hobo() reads every logger file in a folder,
filter_complete_days() keeps only days with (near-)complete
diel coverage, add_sun_times() computes sunrise and sunset
from site coordinates instead of looking them up, and
process_weather_mh() / combine_weather_mh()
import weather in bulk from Taiwan’s Central Weather Administration
(CWA) CODiS portal instead of one file per day.
combine_hobo()combine_hobo() applies process_hobo() to
every file in a folder matching file_prefix and row-binds
the result.
hobo_data_dir <- system.file("extdata", package = "aelab")
hobo_all <- combine_hobo(hobo_data_dir, file_prefix = "ex_ho")
hobo_all[c(1:3), ]## date_time do temp no_hobo
## 1 2024-04-08 12:00:00 7.69 25.86 bo
## 2 2024-04-08 12:30:00 7.67 26.02 bo
## 3 2024-04-08 13:00:00 7.65 26.20 bo
filter_complete_days()filter_complete_days() drops calendar days where a
logger has fewer than min_slots distinct 30-minute readings
(default 44 of 48), so partial days at the start or end of a deployment
don’t bias the daily metabolism estimate. Set
summary_only = TRUE to inspect per-day coverage before
filtering.
## no_hobo date n_slots complete
## 1 bo 2024-04-08 24 FALSE
## 2 bo <NA> 1 FALSE
## 3 bo 2024-04-09 47 TRUE
## 4 bo 2024-04-10 47 TRUE
## 5 bo 2024-04-11 47 TRUE
## 6 bo 2024-04-12 47 TRUE
## 7 bo 2024-04-13 47 TRUE
## 8 bo 2024-04-14 47 TRUE
## 9 bo 2024-04-15 19 FALSE
## [1] 282
add_sun_times()add_sun_times() attaches per-date sunrise
and sunset columns computed from the site’s latitude and
longitude using the suncalc package, removing the need to look these up
manually for process_info().
hobo_complete <- add_sun_times(hobo_complete, lat = 22.3900, lon = 120.5777)
hobo_complete[c(1:3), ]## date_time do temp no_hobo date sunrise
## 1 2024-04-09 00:30:00 7.82 25.30 bo 2024-04-09 2024-04-09 05:44:08
## 2 2024-04-09 01:00:00 7.76 25.50 bo 2024-04-09 2024-04-09 05:44:08
## 3 2024-04-09 01:30:00 7.74 25.64 bo 2024-04-09 2024-04-09 05:44:08
## sunset
## 1 2024-04-09 18:16:35
## 2 2024-04-09 18:16:35
## 3 2024-04-09 18:16:35
The CWA CODiS portal caps hourly downloads at one month, so a
deployment typically spans several LotsDataReports.txt
files in the “MH” (Multifield Hourly) format.
process_weather_mh() parses one such file;
combine_weather_mh() reads every .txt file in
a directory (recursively), row-binds, and de-duplicates by station and
time. Both expand hourly values to the 30-minute marks expected by
calculate_do().
weather_mh_path <- system.file("extdata", "ex_weather_mh.txt", package = "aelab")
process_weather_mh(weather_mh_path, station_id = "C0R660")## stno date_time pressure_hpa wind_ms air_temp humidity wind_dir
## 1 C0R660 2024-03-01 00:30:00 1012.0 0.3 25.4 97 94
## 2 C0R660 2024-03-01 01:00:00 1012.0 0.3 25.4 97 94
## 3 C0R660 2024-03-01 01:30:00 1011.4 0.3 25.0 98 109
## 4 C0R660 2024-03-01 02:00:00 1011.4 0.3 25.0 98 109
## 5 C0R660 2024-03-01 23:30:00 1015.7 NA 22.8 95 0
## 6 C0R660 2024-03-02 00:00:00 1015.7 NA 22.8 95 0
## rain_mm
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
weather_dir <- system.file("extdata", package = "aelab")
combine_weather_mh(weather_dir, station_id = c("C0R660", "C0R850"))## stno date_time pressure_hpa wind_ms air_temp humidity wind_dir
## 1 C0R660 2024-03-01 00:30:00 1012.0 0.3 25.4 97 94
## 2 C0R660 2024-03-01 01:00:00 1012.0 0.3 25.4 97 94
## 3 C0R660 2024-03-01 01:30:00 1011.4 0.3 25.0 98 109
## 4 C0R660 2024-03-01 02:00:00 1011.4 0.3 25.0 98 109
## 5 C0R660 2024-03-01 23:30:00 1015.7 NA 22.8 95 0
## 6 C0R660 2024-03-02 00:00:00 1015.7 NA 22.8 95 0
## 7 C0R850 2024-03-01 00:30:00 1009.1 1.2 24.7 90 124
## 8 C0R850 2024-03-01 01:00:00 1009.1 1.2 24.7 90 124
## 9 C0R850 2024-03-01 23:30:00 1010.2 0.9 23.1 92 34
## 10 C0R850 2024-03-02 00:00:00 1010.2 0.9 23.1 92 34
## rain_mm
## 1 0.0
## 2 0.0
## 3 0.0
## 4 0.0
## 5 0.0
## 6 0.0
## 7 0.5
## 8 0.5
## 9 0.0
## 10 0.0
For each site, merge its hobo data with the matching station’s
weather by date_time, attach site metadata, and run
calculate_do():
weather <- combine_weather_mh("path/to/weather_mh", station_id = "C0R660")
hobo <- combine_hobo("path/to/logger_csvs/site_A")
hobo <- filter_complete_days(hobo)
hobo <- add_sun_times(hobo, lat = 22.3900, lon = 120.5777)
hobo$site <- "site_A"
hobo$depth_m <- 2.0
hobo$salinity <- 30
merged <- merge(hobo, weather[c("date_time", "pressure_hpa", "wind_ms")], by = "date_time")
calculate_do(merged)calculate_do()calculate_do() returns one row per logger-day with the
columns r_day (ecosystem respiration, ER), gpp
(gross primary production, GPP), and nep (net ecosystem
metabolism, NEM). NEM is calculated as GPP + ER.
## site no_hobo date r_day gpp nep
## 1 site_A code_for_logger 2024-04-10 -17.77689 17.56762 -0.2092709
aelab provides a small suite of statistical helpers that cover a typical aquatic ecology workflow: describe your data, check for outliers, test normality, transform if needed, and choose between a parametric or non-parametric group comparison.
The examples below use a simple self-contained data frame with three groups.
set.seed(42)
stat_df <- data.frame(
group = rep(c("A", "B", "C"), each = 8),
value = c(rnorm(8, 2, 0.4), rnorm(8, 3.5, 0.5), rnorm(8, 5, 0.6))
)descriptive_statistic()descriptive_statistic() returns mean ± SD and min–max
for each group.
## # A tibble: 3 × 3
## group value_mean_sd value_min_max
## <chr> <chr> <chr>
## 1 A 2.18 ± 0.29 1.77 to 2.6
## 2 B 3.77 ± 0.63 2.81 to 4.64
## 3 C 4.62 ± 0.92 3.41 to 5.79
find_outlier()find_outlier() uses the IQR method (values outside 1.5 ×
IQR from Q1/Q3) to flag outliers.
## # A tibble: 0 × 2
## # ℹ 2 variables: row_index <int>, outlier_value <dbl>
Use normality_test_t() when comparing two groups (t-test
context), and normality_test_aov() when comparing three or
more groups (ANOVA context). Both apply Shapiro-Wilk to raw,
square-root, and log10 transforms so you can identify whether a
transformation improves normality.
## # A tibble: 6 × 4
## variable group p_value transformation
## <chr> <chr> <dbl> <chr>
## 1 value A 0.613 None
## 2 value B 0.733 None
## 3 value A 0.581 Square root
## 4 value B 0.691 Square root
## 5 value A 0.547 Logarithm
## 6 value B 0.619 Logarithm
## # A tibble: 3 × 3
## variable p_value transformation
## <chr> <dbl> <chr>
## 1 value 0.848 None
## 2 value 0.521 Square root
## 3 value 0.0745 Logarithm
df_trans()If residuals are not normal, df_trans() appends a
transformed column to the data frame for re-testing.
## group value value_sqrt
## 1 A 2.548383 2.060021
## 2 A 1.774121 2.240078
## 3 A 2.145251 2.155648
## 4 A 2.253145 2.130475
## 5 A 2.161707 2.151827
## 6 A 1.957550 2.198754
aov_test()When normality holds, aov_test() runs a one-way ANOVA
followed by Tukey HSD post-hoc comparisons and returns compact letter
display (CLD) groupings.
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 24.60 12.30 27.98 1.19e-06 ***
## Residuals 21 9.23 0.44
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## $group
## C B A
## "a" "b" "c"
## $anova_summary
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 24.60 12.30 27.98 1.19e-06 ***
## Residuals 21 9.23 0.44
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $tukey_results
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: stats::aov(formula = formula, data = df)
##
## $group
## diff lwr upr p adj
## B-A 1.5980249 0.76252027 2.433529 0.0002598
## C-A 2.4411682 1.60566363 3.276673 0.0000009
## C-B 0.8431434 0.00763877 1.678648 0.0476844
##
##
## $compact_letters
## $compact_letters$group
## C B A
## "a" "b" "c"
ks_test()When normality cannot be assumed, ks_test() runs
Kruskal-Wallis followed by Dunn’s post-hoc test with Bonferroni
correction.
## $ks_results
##
## Kruskal-Wallis rank sum test
##
## data: value by group
## Kruskal-Wallis chi-squared = 17.165, df = 2, p-value = 0.0001874
##
##
## $dunn_results
## Comparison Z P.unadj P.adj
## 1 A - B -2.722361 6.481727e-03 0.0194451820
## 2 A - C -4.065864 4.785484e-05 0.0001435645
## 3 B - C -1.343503 1.791092e-01 0.5373275782
##
## $mean_summary
## # A tibble: 3 × 2
## group mean
## <chr> <dbl>
## 1 A 2.18
## 2 B 3.77
## 3 C 4.62
##
## $compact_letters
## Group Letter MonoLetter
## 1 A a a
## 2 B b b
## 3 C b b
sig_labels()sig_labels() wraps ks_test() to produce
compact letter display (CLD) annotations for faceted ggplot2 boxplots.
It iterates over each level of a faceting variable (by,
default "year"), runs ks_test() on each
subset, and returns a tidy data frame ready to pass directly to
geom_text(). Facets where the Kruskal-Wallis p-value is ≥
alpha are silently dropped.
set.seed(1)
sig_df <- data.frame(
year = rep(c("2023", "2024"), each = 20),
site = rep(c("A", "B", "C", "D"), 10),
flux = c(
rnorm(20, mean = rep(c(1, 3, 2, 6), 5)),
rnorm(20)
)
)
labels <- sig_labels(sig_df, "flux", "site", by = "year")## site Letter MonoLetter y_pos year
## 1 A a a 1.575781 2023
## 2 B ab ab 3.943836 2023
## 3 C ab ab 3.511781 2023
## 4 D b b 7.595281 2023
Only 2023 appears in labels because 2024 has a
non-significant Kruskal-Wallis result. Pass the output to
geom_text() via the data argument to annotate
a faceted boxplot:
ggplot(sig_df, aes(x = site, y = flux)) +
geom_boxplot() +
geom_text(
data = labels,
aes(x = site, y = y_pos, label = Letter),
vjust = -0.5
) +
facet_wrap(~year)When the raw observations used for testing differ from the plotted
data (e.g. the plot shows aggregated values), pass the plotted data as
plot_data so y-positions are computed on the correct
scale.
aelab provides plot wrappers — plot_point(),
plot_line(), plot_box(), and
plot_bar() — with a consistent theme built on Century
Gothic. Custom colour palettes are available through
aelab_palettes() and the discrete and continuous scale
functions scale_colour_aelab_d(),
scale_fill_aelab_d(), scale_colour_aelab_c(),
and scale_fill_aelab_c().
The example below previews the "ghg" palette and applies
it to a box plot comparing CH₄ fluxes across three habitat types.
Gaudard, J., Telford, R. J., & Halbritter, A. H. (2025). fluxible: An R package to process ecosystem gas fluxes from closed-loop chambers in an automated and reproducible way. Methods in Ecology and Evolution, 00, 1–9. https://doi.org/10.1111/2041-210X.70161
Hüppi, R., Felber, R., Krauss, M., Six, J., Leifeld, J., & Fuß, R. (2018). Restricting the nonlinearity parameter in soil greenhouse gas flux calculations for more reliable flux estimates. PLOS ONE, 13(7), e0200876. https://doi.org/10.1371/journal.pone.0200876
Staehr, P. A., Bade, D., Van de Bogert, M. C., Koch, G. R., Williamson, C., Hanson, P., Cole, J. J., & Kratz, T. (2010). Lake metabolism and the diel oxygen technique: State of the science. Limnology and Oceanography: Methods, 8(11), 628–644. https://doi.org/10.4319/lom.2010.8.0628
Yong, Z.-J., Lin, W.-J., Lin, C.-W., & Lin, H.-J. (2024). Tidal influence on carbon dioxide and methane fluxes from tree stems and soils in mangrove forests. Biogeosciences, 21(22), 5247–5260. https://doi.org/10.5194/bg-21-5247-2024