| Title: | Data Processing for Aquatic Ecology |
|---|---|
| Description: | Facilitate the analysis of data related to aquatic ecology, specifically the establishment of carbon budget. Currently, the package allows the below analysis. (i) the calculation of greenhouse gas flux based on data obtained from trace gas analyzer using the method described in Lin et al. (2024). (ii) the calculation of Dissolved Oxygen (DO) metabolism based on data obtained from dissolved oxygen data logger using the method described in Staehr et al. (2010). Yong et al. (2024) <doi:10.5194/bg-21-5247-2024>. Staehr et al. (2010) <doi:10.4319/lom.2010.8.0628>. |
| Authors: | Zhao-Jun Yong [cre, aut] |
| Maintainer: | Zhao-Jun Yong <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.5 |
| Built: | 2026-06-05 07:20:44 UTC |
| Source: | https://github.com/zhao-jun-yong/aelab |
Attach per-day sunrise and sunset times to a data frame of
readings, computed from a site's coordinates with the suncalc
package. Replaces the manual lookup of sunrise/sunset for each deployment
date; the resulting sunrise/sunset columns are consumed by
calculate_do.
add_sun_times(df, lat, lon, time_col = "date_time", tz = "Asia/Taipei")add_sun_times(df, lat, lon, time_col = "date_time", tz = "Asia/Taipei")
df |
A data frame containing a POSIXct time column. |
lat, lon
|
Site latitude and longitude in decimal degrees (length 1). |
time_col |
Name of the POSIXct time column. Default |
tz |
Time zone for the returned sun times. Default |
df with two added POSIXct columns, sunrise and
sunset, matched to each row's calendar date.
## Not run: df <- add_sun_times(df, lat = 22.3900, lon = 120.5777) ## End(Not run)## Not run: df <- add_sun_times(df, lat = 22.3900, lon = 120.5777) ## End(Not run)
Retrieve a named aelab colour palette as a character vector.
aelab_palettes(name, n, type = c("discrete", "continuous"))aelab_palettes(name, n, type = c("discrete", "continuous"))
name |
Name of the palette (string). |
n |
Number of colours to return. Defaults to the full palette length. |
type |
|
Available palette names: "rainbow", "two",
"control", "control2", "control3",
"period", "ghg".
A character vector of hex colour codes with class "palette".
aelab_palettes("rainbow", 5) aelab_palettes("ghg", type = "continuous", n = 20)aelab_palettes("rainbow", 5) aelab_palettes("ghg", type = "continuous", n = 20)
A lookup table of 1-second (1σ) precision values used to compute
kappamax in calculate_regression when
fit_type = "auto". Add rows for additional analyzer models as needed.
A data frame with 5 rows and 3 variables:
Analyzer model name (character).
Gas measured: "CH4", "CO2", or "N2O" (character).
Instrument precision in ppm (1σ, 1 s) (numeric).
LI-COR Biosciences product specifications; LGR (Los Gatos Research) UGGA product specifications.
analyzer_precision analyzer_precision[analyzer_precision$gas == "N2O", "precision_ppm"]analyzer_precision analyzer_precision[analyzer_precision$gas == "N2O", "precision_ppm"]
Perform one-way ANOVA followed by Tukey HSD post-hoc test with compact letter display.
aov_test(df, variable_name, group)aov_test(df, variable_name, group)
df |
A data frame. |
variable_name |
Name of the response variable column (string). |
group |
Name of the grouping column (string). |
A named list with elements anova_summary, tukey_results,
and compact_letters.
df <- data.frame( grp = rep(c("A","B","C"), each = 5), val = c(1,2,1,2,1, 3,4,3,4,3, 5,6,5,6,5) ) aov_test(df, "val", "grp")df <- data.frame( grp = rep(c("A","B","C"), each = 5), val = c(1,2,1,2,1, 3,4,3,4,3, 5,6,5,6,5) ) aov_test(df, "val", "grp")
Calculate chlorophyll-a concentration from trichromatic spectrophotometric absorbance readings using the Jeffrey & Humphrey (1975) equations.
calc_chla_trichromatic(wl_630, wl_647, wl_664, wl_750)calc_chla_trichromatic(wl_630, wl_647, wl_664, wl_750)
wl_630 |
Absorbance at 630 nm. |
wl_647 |
Absorbance at 647 nm. |
wl_664 |
Absorbance at 664 nm. |
wl_750 |
Absorbance at 750 nm (turbidity blank). |
Absorbance values should be measured in a 1 cm path-length cuvette.
The 750 nm reading is used as a turbidity blank correction.
Formula:
where .
Chlorophyll-a concentration in µg L (assuming a 1 cm path
length and standard extraction volume).
calc_chla_trichromatic(wl_630 = 0.05, wl_647 = 0.08, wl_664 = 0.20, wl_750 = 0.01)calc_chla_trichromatic(wl_630 = 0.05, wl_647 = 0.08, wl_664 = 0.20, wl_750 = 0.01)
Calculate the Net Ecosystem Production, Gross Primary Production and Ecosystem respiration based on the change in dissolved oxygen concentration.
calculate_do(df)calculate_do(df)
df |
Merged dataframe produced by process_hobo(), process_weather() and process_info() functions. |
A dataframe.
data(hobo) calculate_do(hobo)data(hobo) calculate_do(hobo)
Calculate the greenhouse gas (GHG) flux based on input parameters from a data frame.
calculate_ghg_flux( data, slope = "slope", area = "area", volume = "volume", temp = "temp" )calculate_ghg_flux( data, slope = "slope", area = "area", volume = "volume", temp = "temp" )
data |
A data frame containing relevant data with columns for slope, area, volume, and temperature. |
slope |
Name of the column in 'data' that contains the slope values of the GHG concentration change (in ppm/s). |
area |
Name of the column in 'data' that contains the values of the area of the chamber (in square meter). |
volume |
Name of the column in 'data' that contains values of the volume of the chamber (in litre). |
temp |
Name of the column in 'data' that contains values of the temperature of the gas (in Celsius). |
A list containing the calculated flux and its unit.
data <- data.frame( slope = c(1.2, 1.5, 1.1), area = c(100, 150, 120), volume = c(10, 15, 12), temp = c(25, 30, 22) ) results <- calculate_ghg_flux(data) print(results)data <- data.frame( slope = c(1.2, 1.5, 1.1), area = c(100, 150, 120), volume = c(10, 15, 12), temp = c(25, 30, 22) ) results <- calculate_ghg_flux(data) print(results)
Calculate the Minimum Detectable Flux (MDF) for a static chamber GHG measurement system.
calculate_MDF( precision_ppm, closure_time_s, data_point_n, chamber_volume_m3, temperature_C, chamber_area_m2, pressure_pa = 101325, ideal_constant = 8.314, ghg = "co2" )calculate_MDF( precision_ppm, closure_time_s, data_point_n, chamber_volume_m3, temperature_C, chamber_area_m2, pressure_pa = 101325, ideal_constant = 8.314, ghg = "co2" )
precision_ppm |
Precision of the gas analyser (ppm). |
closure_time_s |
Closure time of the measurement (seconds). |
data_point_n |
Number of data points recorded during the closure period. |
chamber_volume_m3 |
Internal volume of the chamber (m |
temperature_C |
Air temperature at the measurement location ( |
chamber_area_m2 |
Base area of the chamber (m |
pressure_pa |
Atmospheric pressure (Pa). Default 101325. |
ideal_constant |
Ideal gas constant (J mol |
ghg |
Greenhouse gas type: |
A named list with MDF (numeric,
g m h) and unit (string).
calculate_MDF( precision_ppm = 1, closure_time_s = 300, data_point_n = 300, chamber_volume_m3 = 0.0064, temperature_C = 25, chamber_area_m2 = 0.07 )calculate_MDF( precision_ppm = 1, closure_time_s = 300, data_point_n = 300, chamber_volume_m3 = 0.0064, temperature_C = 25, chamber_area_m2 = 0.07 )
Calculate the slope of GHG concentration change over time using linear or exponential regression.
calculate_regression( data, reference_df, ghg, reference_time, site, analyzer_code, duration_minutes = 7, num_rows = 300, fit_type = "linear", t_zero = 0, window_type = "sliding", start_offset_s = 0, r2_threshold = 0.8, cor_threshold = 0.5, instrument_precision = NULL, kappamax = NULL, g_factor_threshold = NULL )calculate_regression( data, reference_df, ghg, reference_time, site, analyzer_code, duration_minutes = 7, num_rows = 300, fit_type = "linear", t_zero = 0, window_type = "sliding", start_offset_s = 0, r2_threshold = 0.8, cor_threshold = 0.5, instrument_precision = NULL, kappamax = NULL, g_factor_threshold = NULL )
data |
Data from the GHG analyzer that has been processed and time-converted. |
reference_df |
A data frame containing measurement reference times, site names, and an |
ghg |
Column name in |
reference_time |
Column name in |
site |
Column name in |
analyzer_code |
String pattern used to filter results by the |
duration_minutes |
The duration of the measurement window in minutes. Default 7.
When |
num_rows |
The number of rows used per sliding regression window. Only used when
|
fit_type |
Model used to fit the concentration curve. One of |
t_zero |
Reference time in seconds from window start for the |
window_type |
|
start_offset_s |
Seconds to skip from the reference time before the regression window starts.
Only used when |
r2_threshold |
Minimum R-squared for a measurement to pass quality check. Default 0.8. |
cor_threshold |
Minimum absolute Pearson correlation between concentration and elapsed time.
Measurements below this have no detectable trend and are flagged |
instrument_precision |
Optional. Analyser precision in ppm. Used in two ways:
(1) minimum-detectable-slope check — measurements whose absolute slope falls below
|
kappamax |
Optional. Maximum acceptable exponential curvature parameter |
g_factor_threshold |
Optional. Maximum acceptable ratio of the chosen model slope to the
linear baseline slope (g-factor; Gaudard et al. 2025). Only computed when
|
A tibble with columns: start_time, end_time, slope (ppm s),
r_square, correlation, b_param (exponential curvature; NA for
linear/quadratic), fit_used (model actually used: "linear", "quadratic",
or "exp_tz"; always matches fit_type unless fit_type = "auto"),
g_factor (ratio of chosen slope to linear baseline slope; NA when
fit_used = "linear" or for explicit non-auto fits),
flag ("ok", "discard", "zero", or "no_data"),
reference_time, site, analyzer.
data(n2o) n2o_converted <- convert_time(n2o) ref <- data.frame( date_time = n2o_converted$real_datetime[1], site = "S1", analyzer = "1074" ) calculate_regression(n2o_converted, ref, "N2O", reference_time = "date_time", site = "site", analyzer_code = "1074")data(n2o) n2o_converted <- convert_time(n2o) ref <- data.frame( date_time = n2o_converted$real_datetime[1], site = "S1", analyzer = "1074" ) calculate_regression(n2o_converted, ref, "N2O", reference_time = "date_time", site = "site", analyzer_code = "1074")
Convert individual GHG fluxes (mg m h)
to a total CO-equivalent flux (g m d) using
IPCC AR6 100-year GWPs (CO = 1, CH = 27,
NO = 273).
calculate_total_co2e(co2 = 0, ch4 = 0, n2o = 0)calculate_total_co2e(co2 = 0, ch4 = 0, n2o = 0)
co2 |
CO |
ch4 |
CH |
n2o |
N |
Total COe flux as a numeric scalar
(g m d), printed with a diagnostic message.
calculate_total_co2e(co2 = 4.02, ch4 = 0.001, n2o = 0.003)calculate_total_co2e(co2 = 4.02, ch4 = 0.001, n2o = 0.003)
Tidy multiple data retrieved from HOBO U26 Dissolved Oxygen Data Logger.
combine_hobo(file_path, file_prefix = "no.")combine_hobo(file_path, file_prefix = "no.")
file_path |
Directory of the folder containing the files. |
file_prefix |
The prefix before the code for the data logger, defaults to "no." |
A dataframe.
hobo_data_path <- system.file("extdata", package = "aelab") df <- combine_hobo(hobo_data_path, file_prefix = "ex_ho")hobo_data_path <- system.file("extdata", package = "aelab") df <- combine_hobo(hobo_data_path, file_prefix = "ex_ho")
Tidy multiple daily weather data downloaded from weather station in Taiwan.
combine_weather(file_path, start_date, end_date, zone)combine_weather(file_path, start_date, end_date, zone)
file_path |
Directory of folder containing the files (including the character in the file name that precedes the date). |
start_date |
Date of the daily weather data in yyyy-mm-dd format. |
end_date |
Date of the daily weather data in yyyy-mm-dd format. |
zone |
Code for the region of the weather station. |
A dataframe.
weather_data_path <- system.file("extdata", package = "aelab") modified_data_path <- paste0(weather_data_path, "/ex_") df <- combine_weather(modified_data_path, start_date = "2024-01-01", end_date = "2024-01-02", "site_A" )weather_data_path <- system.file("extdata", package = "aelab") modified_data_path <- paste0(weather_data_path, "/ex_") df <- combine_weather(modified_data_path, start_date = "2024-01-01", end_date = "2024-01-02", "site_A" )
Batch-import and combine multiple CWA "MH" (Multifield Hourly)
weather reports. Hourly CODiS downloads are capped at a one-month range, so
a date span typically yields several LotsDataReports.txt files; this
reads them all and row-binds the result.
combine_weather_mh( file_paths, station_id = NULL, expand_30min = TRUE, tz = "Asia/Taipei" )combine_weather_mh( file_paths, station_id = NULL, expand_30min = TRUE, tz = "Asia/Taipei" )
file_paths |
Either a character vector of MH file paths, or a single
directory path (all |
station_id |
Optional character vector of station codes to keep. Passed
to |
expand_30min |
Passed to |
tz |
Time zone. Default |
A combined data frame, de-duplicated and ordered by station and time.
## Not run: df <- combine_weather_mh("data/raw/sw_mono_do/weather_mh", station_id = c("C0R660", "C0R850") ) ## End(Not run)## Not run: df <- combine_weather_mh("data/raw/sw_mono_do/weather_mh", station_id = c("C0R660", "C0R850") ) ## End(Not run)
Batch-import monthly weather CSV files from a Taiwan Central Weather Administration station for a consecutive range of months.
combine_weather_month(file_path, start_month, end_month, year = 2024, zone)combine_weather_month(file_path, start_month, end_month, year = 2024, zone)
file_path |
Path prefix (directory + filename prefix before the date
portion, e.g. |
start_month |
First month to import (1–9; two-digit months not yet supported). |
end_month |
Last month to import. |
year |
Four-digit year. Default 2024. |
zone |
Character label for the weather station / region. |
File names are expected to follow the pattern
<file_path><year>-0<month>.csv (e.g. 2024-01.csv).
A combined data frame produced by process_weather_month.
## Not run: df <- combine_weather_month("data/weather/", start_month = 1, end_month = 6, year = 2024, zone = "site_A" ) ## End(Not run)## Not run: df <- combine_weather_month("data/weather/", start_month = 1, end_month = 6, year = 2024, zone = "site_A" ) ## End(Not run)
Convert a greenhouse gas (GHG) flux value (or a character string
containing one or more numeric values, e.g. "0.002 +/- 0.003")
to micrograms per square meter per hour.
convert_ghg_unit( input, ghg, mass = "µg", area = "m2", time = "hr", digits = 2, ratio = FALSE )convert_ghg_unit( input, ghg, mass = "µg", area = "m2", time = "hr", digits = 2, ratio = FALSE )
input |
A single numeric value or a character string containing one or more numbers. |
ghg |
The molecular formula of the greenhouse gas: |
mass |
Mass unit of the input flux. One of |
area |
Area unit of the input flux. One of |
time |
Time unit of the input flux. One of |
digits |
Number of decimal places to round to. Default 2. |
ratio |
Logical. If |
Numeric values embedded in a string (e.g. mean +/- SD notation) are each converted individually and the surrounding text is preserved. Commas are treated as decimal separators.
A named list with value (converted string) and unit,
or "EMPTY" for missing/non-numeric input.
convert_ghg_unit(97, ghg = "ch4", mass = "mg", area = "m2", time = "hr")convert_ghg_unit(97, ghg = "ch4", mass = "mg", area = "m2", time = "hr")
Convert the time of the LI-COR Trace Gas Analyzer to match the time in real life.
convert_time(data, day = 0, hr = 0, min = 0, sec = 0)convert_time(data, day = 0, hr = 0, min = 0, sec = 0)
data |
Data from the LI-COR Trace Gas Analyzer that had been processed by tidy_licor(). |
day |
Day(s) to add or subtract. |
hr |
Hour(s) to add or subtract. |
min |
Minute(s) to add or subtract. |
sec |
Second(s) to add or subtract. |
The input data with a new column in POSIXct format converted based on the input value.
data(n2o) converted_n2o <- convert_time(n2o, min = -10, sec = 5)data(n2o) converted_n2o <- convert_time(n2o, min = -10, sec = 5)
Compute grouped mean ± SD and min–max summary statistics for one or more numeric variables.
descriptive_statistic(data, vars, groups, digits = 2)descriptive_statistic(data, vars, groups, digits = 2)
data |
A data frame. |
vars |
<['tidy-select'][dplyr::dplyr_tidy_select]> Columns to summarise. |
groups |
<['tidy-select'][dplyr::dplyr_tidy_select]> Grouping columns. |
digits |
Number of decimal places to round to. Default is 2. |
A tibble with one row per group and two summary columns per variable ('<var>_mean_sd' and '<var>_min_max').
df <- data.frame(group = c("A","A","B","B"), value = c(1.1, 2.3, 3.5, 4.7)) descriptive_statistic(df, vars = value, groups = group)df <- data.frame(group = c("A","A","B","B"), value = c(1.1, 2.3, 3.5, 4.7)) descriptive_statistic(df, vars = value, groups = group)
Apply a reverse square-root or reverse log transformation to a numeric column and append the result as a new column.
df_trans(df, variable_name, transformation)df_trans(df, variable_name, transformation)
df |
A data frame. |
variable_name |
Name of the column to transform (string). |
transformation |
Transformation type: |
The input data frame with an additional column named
<variable_name>_sqrt or <variable_name>_log.
df <- data.frame(val = c(1, 4, 9, 16)) df_trans(df, "val", "sqrt")df <- data.frame(val = c(1, 4, 9, 16)) df_trans(df, "val", "sqrt")
Keep only the calendar days that have (near-)complete diel
coverage, per logger. A day qualifies when the number of distinct
time-of-day slots present reaches min_slots. This replaces the manual
inspection of which deployment days run a full 00:00-24:00, so daily DO
metabolism is computed only from complete days.
filter_complete_days( df, group_cols = "no_hobo", time_col = "date_time", min_slots = 44L, tz = "Asia/Taipei", summary_only = FALSE )filter_complete_days( df, group_cols = "no_hobo", time_col = "date_time", min_slots = 44L, tz = "Asia/Taipei", summary_only = FALSE )
df |
A data frame of HOBO readings, e.g. from |
group_cols |
Character vector of grouping columns identifying one logger
deployment, default |
time_col |
Name of the POSIXct time column. Default |
min_slots |
Minimum number of distinct 30-minute slots a day must have to
count as complete. Default |
tz |
Time zone used to derive the calendar date. Default
|
summary_only |
If |
Either the input rows restricted to complete days (default), or a
coverage summary when summary_only = TRUE.
## Not run: hobo <- combine_hobo("data/raw/sw_mono_do/csv/FL_T1") filter_complete_days(hobo, summary_only = TRUE) # inspect coverage clean <- filter_complete_days(hobo) # keep complete days ## End(Not run)## Not run: hobo <- combine_hobo("data/raw/sw_mono_do/csv/FL_T1") filter_complete_days(hobo, summary_only = TRUE) # inspect coverage clean <- filter_complete_days(hobo) # keep complete days ## End(Not run)
Identify outliers in a numeric column using the IQR method
(values outside 1.5 IQR from Q1/Q3).
find_outlier(df, var, other_var = NULL)find_outlier(df, var, other_var = NULL)
df |
A data frame. |
var |
Name of the column to check for outliers (string). |
other_var |
Character vector of additional column names to return
alongside the outlier values, or |
A tibble with columns row_index, outlier_value, and any
requested other_var columns.
df <- data.frame(val = c(1, 2, 2, 3, 100), id = 1:5) find_outlier(df, "val", "id")df <- data.frame(val = c(1, 2, 2, 3, 100), id = 1:5) find_outlier(df, "val", "id")
Processed data from Onset HOBO Dissolved Oxygen Data Logger. A dataset containing 336 dissolved oxygen concentrations changed over time.
A data.frame with 336 rows and 13 variables:
date_time: Date and time in POSIXct format.
pressure_hpa: Atmospheric pressure (hpa).
wind_ms: Wind speed (m/s).
do: Dissolved oxygen concentrations (mg/L)
temp: Water temperature (Celsius)
depth: Water depth (m).
salinity: Salinity (ppt).
start_date_time: Start date and time of the deployment.
end_date_time: End date and time of the deployment.
sunrise: Sunrise time during that day.
sunset: Sunset time during that day.
no_hobo: Name for the data logger .
site: Name for the site.
own data.
Perform Kruskal-Wallis test followed by Dunn post-hoc test (Bonferroni correction) with compact letter display.
ks_test(df, variable_name, group)ks_test(df, variable_name, group)
df |
A data frame. |
variable_name |
Name of the response variable column (string). |
group |
Name of the grouping column (string). |
A named list with elements ks_results, dunn_results,
mean_summary, and compact_letters.
df <- data.frame( grp = rep(c("A","B","C"), each = 5), val = c(1,2,1,2,1, 3,4,3,4,3, 5,6,5,6,5) ) ks_test(df, "val", "grp")df <- data.frame( grp = rep(c("A","B","C"), each = 5), val = c(1,2,1,2,1, 3,4,3,4,3, 5,6,5,6,5) ) ks_test(df, "val", "grp")
Build a reference_df for calculate_regression from a
compact inline string of field notes. Each line is one closure measurement with
whitespace-separated columns (any number of spaces or tabs):
time site analyzer [dur=N] [off=N].
The first three columns are positional (required). Optional per-row window overrides
use key=value syntax in any order: dur= sets duration_minutes,
off= sets start_offset_s. Rows without an override simply omit the token —
no NA placeholder needed.
make_reference(date, measurements, output = NULL, tz = "Asia/Taipei")make_reference(date, measurements, output = NULL, tz = "Asia/Taipei")
date |
Sampling date applied to all rows. A single |
measurements |
A multiline character string, one row per closure:
|
output |
Optional path for the output |
tz |
Timezone for |
A tibble with columns site, date_time, analyzer, and
optionally duration_minutes and/or start_offset_s (only present when at
least one row specifies the override). When output is given, the tibble is written
to Excel and returned invisibly.
make_reference( date = "2026-05-22", measurements = " 10:30:15 JN1-1 1337_1074 10:42:00 JN1-2 1337_1074 10:54:30 JN2-1 1337_1074 " ) # per-row window overrides — no NA placeholders needed make_reference( date = "2026-05-22", measurements = " 10:30:15 TT3-1 1450 dur=3.5 10:42:00 TT3-2 1450 dur=2.0 10:54:30 TT3-3 1450 off=45 11:10:00 TT1-1 1450 " )make_reference( date = "2026-05-22", measurements = " 10:30:15 JN1-1 1337_1074 10:42:00 JN1-2 1337_1074 10:54:30 JN2-1 1337_1074 " ) # per-row window overrides — no NA placeholders needed make_reference( date = "2026-05-22", measurements = " 10:30:15 TT3-1 1450 dur=3.5 10:42:00 TT3-2 1450 dur=2.0 10:54:30 TT3-3 1450 off=45 11:10:00 TT1-1 1450 " )
Processed data from N2O LI-COR Trace Gas Analyzer. A dataset containing 567 N2O concentrations changed over time.
A data.frame with 567 rows and 4 variables:
DATE: Date in character format.
TIME: Time in character format.
N2O: Concentrations of nitrous oxide (N2O), in ppb.
date_time: Date and time in POSIXct format.
own data.
Test normality of ANOVA model residuals using Shapiro-Wilk on raw, square-root, and log10 transforms (one-way or two-way).
normality_test_aov(df, variable_name, group_1, group_2 = NULL)normality_test_aov(df, variable_name, group_1, group_2 = NULL)
df |
A data frame. |
variable_name |
Name of the response variable column (string). |
group_1 |
Name of the first grouping column (string). |
group_2 |
Name of the second grouping column (string), or |
A tibble with Shapiro-Wilk p-values for each transformation.
df <- data.frame( grp = c("A","A","B","B"), val = c(1.1, 1.4, 3.2, 3.8) ) normality_test_aov(df, "val", "grp")df <- data.frame( grp = c("A","A","B","B"), val = c(1.1, 1.4, 3.2, 3.8) ) normality_test_aov(df, "val", "grp")
Test normality of a variable within two groups using Shapiro-Wilk on raw, square-root, and log10 transforms (for t-test context).
normality_test_t(df, variable_name, group, group_1, group_2)normality_test_t(df, variable_name, group, group_1, group_2)
df |
A data frame. |
variable_name |
Name of the numeric variable column (string). |
group |
<['data-masking'][dplyr::dplyr_data_masking]> The grouping column. |
group_1 |
Value identifying the first group. |
group_2 |
Value identifying the second group. |
A tibble with Shapiro-Wilk p-values for each group × transformation combination.
df <- data.frame( grp = c("A","A","A","B","B","B"), val = c(1.1, 2.0, 1.5, 4.2, 3.8, 4.5) ) normality_test_t(df, "val", grp, "A", "B")df <- data.frame( grp = c("A","A","A","B","B","B"), val = c(1.1, 2.0, 1.5, 4.2, 3.8, 4.5) ) normality_test_t(df, "val", grp, "A", "B")
Create a bar plot using the aelab theme.
plot_bar( df, x, y, z = NULL, base_size = 25, line_width = 1, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw", position = "dodge", stat = "identity" )plot_bar( df, x, y, z = NULL, base_size = 25, line_width = 1, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw", position = "dodge", stat = "identity" )
df |
A data frame. |
x |
<['data-masking'][ggplot2::aes]> Column mapped to the x-axis. |
y |
<['data-masking'][ggplot2::aes]> Column mapped to the y-axis. |
z |
<['data-masking'][ggplot2::aes]> Optional column mapped to fill colour. |
base_size |
Base font size. Default 25. |
line_width |
Bar outline width. Default 1. |
text_color |
Text colour. Default |
facet |
Logical; add facet grid? Default |
facet_x |
Column name (string) for the horizontal facet dimension. |
facet_y |
Column name (string) for the vertical facet dimension. |
style |
Theme style. Default |
position |
Bar position: |
stat |
Stat type: |
A ggplot object.
## Not run: df <- data.frame(x = c("A","B","A","B"), g = c("X","X","Y","Y"), y = c(1,2,3,4)) plot_bar(df, x, y, g) ## End(Not run)## Not run: df <- data.frame(x = c("A","B","A","B"), g = c("X","X","Y","Y"), y = c(1,2,3,4)) plot_bar(df, x, y, g) ## End(Not run)
Create a box plot with mean overlay using the aelab theme.
plot_box( df, x, y, z = NULL, base_size = 25, line_width = 0.5, outlier_size = 1.5, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw" )plot_box( df, x, y, z = NULL, base_size = 25, line_width = 0.5, outlier_size = 1.5, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw" )
df |
A data frame. |
x |
<['data-masking'][ggplot2::aes]> Column mapped to the x-axis. |
y |
<['data-masking'][ggplot2::aes]> Column mapped to the y-axis. |
z |
<['data-masking'][ggplot2::aes]> Optional column mapped to fill colour. |
base_size |
Base font size. Default 25. |
line_width |
Box outline width. Default 0.5. |
outlier_size |
Outlier point size. Default 1.5. |
text_color |
Text colour. Default |
facet |
Logical; add facet grid? Default |
facet_x |
Column name (string) for the horizontal facet dimension. |
facet_y |
Column name (string) for the vertical facet dimension. |
style |
Theme style. Default |
A ggplot object.
## Not run: df <- data.frame(x = rep(c("A","B"), each = 5), y = c(1:5, 3:7)) plot_box(df, x, y) ## End(Not run)## Not run: df <- data.frame(x = rep(c("A","B"), each = 5), y = c(1:5, 3:7)) plot_box(df, x, y) ## End(Not run)
Plot GHG concentration over time for each measurement window, overlaid with the
fitted regression line and colored by quality flag. Use this for visual QC of
calculate_regression output.
plot_ghg_flux( data, regression_result, ghg, fit_type = "linear", t_zero = 0, flag_colors = c(ok = "#2ca25f", discard = "#fc8d59", zero = "#74c7ef", no_data = "#aaaaaa") )plot_ghg_flux( data, regression_result, ghg, fit_type = "linear", t_zero = 0, flag_colors = c(ok = "#2ca25f", discard = "#fc8d59", zero = "#74c7ef", no_data = "#aaaaaa") )
data |
Analyzer data processed by |
regression_result |
Output tibble from |
ghg |
Column name in |
fit_type |
Model used to draw the fitted line. Should match the |
t_zero |
Reference time in seconds for |
flag_colors |
Named character vector mapping flag values to colors. |
A ggplot object with one facet per site. Each panel shows raw concentration
points and the fitted line, with strip text showing slope, R², and flag.
data(n2o) n2o_converted <- convert_time(n2o) ref <- data.frame( date_time = n2o_converted$real_datetime[1], site = "S1", analyzer = "1074" ) result <- calculate_regression(n2o_converted, ref, "N2O", reference_time = "date_time", site = "site", analyzer_code = "1074") plot_ghg_flux(n2o_converted, result, "N2O")data(n2o) n2o_converted <- convert_time(n2o) ref <- data.frame( date_time = n2o_converted$real_datetime[1], site = "S1", analyzer = "1074" ) result <- calculate_regression(n2o_converted, ref, "N2O", reference_time = "date_time", site = "site", analyzer_code = "1074") plot_ghg_flux(n2o_converted, result, "N2O")
Plot the dissolved oxygen concentration over time series grouped by different data loggers to observe the variations.
plot_hobo(df)plot_hobo(df)
df |
Dataframe produced by process_hobo() function. |
A plot generated by ggplot2.
data(hobo) plot_hobo(hobo)data(hobo) plot_hobo(hobo)
Create a line plot using the aelab theme.
plot_line( df, x, y, z = NULL, base_size = 25, line_width = 3, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw" )plot_line( df, x, y, z = NULL, base_size = 25, line_width = 3, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw" )
df |
A data frame. |
x |
<['data-masking'][ggplot2::aes]> Column mapped to the x-axis. |
y |
<['data-masking'][ggplot2::aes]> Column mapped to the y-axis. |
z |
<['data-masking'][ggplot2::aes]> Optional column mapped to colour and group. |
base_size |
Base font size. Default 25. |
line_width |
Line width. Default 3. |
text_color |
Text colour. Default |
facet |
Logical; add facet grid? Default |
facet_x |
Column name (string) for the horizontal facet dimension. |
facet_y |
Column name (string) for the vertical facet dimension. |
style |
Theme style. Default |
A ggplot object.
## Not run: df <- data.frame(x = 1:6, y = c(1,3,2,5,4,6), g = rep(c("A","B"), 3)) plot_line(df, x, y, g) ## End(Not run)## Not run: df <- data.frame(x = 1:6, y = c(1,3,2,5,4,6), g = rep(c("A","B"), 3)) plot_line(df, x, y, g) ## End(Not run)
Plot sampling sites on a map of Taiwan with a north arrow and scale bar.
plot_map_taiwan( long, lat, names, color = "darkgrey", textsize = 5, basesize = 16, shape_type = 22 )plot_map_taiwan( long, lat, names, color = "darkgrey", textsize = 5, basesize = 16, shape_type = 22 )
long |
Numeric vector of longitudes. |
lat |
Numeric vector of latitudes. |
names |
Character vector of site labels (same length as |
color |
Fill colour for site markers. Default |
textsize |
Size for annotation and point labels. Default 5. |
basesize |
Base font size for the map theme. Default 16. |
shape_type |
ggplot2 point shape number. Default 22 (filled square). |
A ggplot object (also printed to the active device).
## Not run: plot_map_taiwan( long = c(120.2, 121.5), lat = c(22.9, 24.1), names = c("Site A", "Site B") ) ## End(Not run)## Not run: plot_map_taiwan( long = c(120.2, 121.5), lat = c(22.9, 24.1), names = c("Site A", "Site B") ) ## End(Not run)
Create a scatter plot using the aelab theme.
plot_point( df, x, y, z = NULL, base_size = 25, point_size = 3, stroke_size = 1, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw" )plot_point( df, x, y, z = NULL, base_size = 25, point_size = 3, stroke_size = 1, text_color = "black", facet = FALSE, facet_x = NULL, facet_y = NULL, style = "bw" )
df |
A data frame. |
x |
<['data-masking'][ggplot2::aes]> Column mapped to the x-axis. |
y |
<['data-masking'][ggplot2::aes]> Column mapped to the y-axis. |
z |
<['data-masking'][ggplot2::aes]> Optional column mapped to fill colour. |
base_size |
Base font size passed to the ggplot2 theme. Default 25. |
point_size |
Point size. Default 3. |
stroke_size |
Point stroke width. Default 1. |
text_color |
Text colour. Default |
facet |
Logical; add facet grid? Default |
facet_x |
Column name (string) for the horizontal facet dimension. |
facet_y |
Column name (string) for the vertical facet dimension. |
style |
Theme style. One of |
A ggplot object.
## Not run: df <- data.frame(x = 1:5, y = c(2,4,1,5,3), g = c("A","A","B","B","A")) plot_point(df, x, y, g) ## End(Not run)## Not run: df <- data.frame(x = 1:5, y = c(2,4,1,5,3), g = c("A","A","B","B","A")) plot_point(df, x, y, g) ## End(Not run)
Tidy data exported from a HOBO data logger. Supports the HOBO
U26 Dissolved Oxygen Data Logger (type = "do") and the HOBO Pro v2
Temperature/RH Logger (type = "temp"). Handles both Chinese
(上午/下午) and English (AM/PM or 24-hour) HOBOware locale exports, with
2- or 4-digit years. For type = "do", readings with a negative
(sensor-error) dissolved-oxygen or temperature value are dropped before
aggregation.
process_hobo(file_path, no_hobo, type = c("do", "temp"))process_hobo(file_path, no_hobo, type = c("do", "temp"))
file_path |
Path to the CSV file exported from HOBOware. |
no_hobo |
The code for the data logger. |
type |
Logger type: |
A dataframe. For type = "do": columns date_time,
do, temp, no_hobo, aggregated to 30-minute intervals.
For type = "temp": columns date_time, air_temp,
rh, no_hobo, one row per logger reading.
hobo_data_path <- system.file("extdata", "ex_hobo.csv", package = "aelab") df <- process_hobo(hobo_data_path, "code_for_logger")hobo_data_path <- system.file("extdata", "ex_hobo.csv", package = "aelab") df <- process_hobo(hobo_data_path, "code_for_logger")
Import and process the necessary information, including the sunrise and sunset times of the day, the date and time range of the deployment, and the code for the data logger.
process_info(file_path)process_info(file_path)
file_path |
Directory of file. |
A dataframe.
info_data_path <- system.file("extdata", "info.xlsx", package = "aelab") df <- process_info(info_data_path)info_data_path <- system.file("extdata", "info.xlsx", package = "aelab") df <- process_info(info_data_path)
Tidy the daily weather data downloaded from weather station in Taiwan.
process_weather(file_path, date, zone)process_weather(file_path, date, zone)
file_path |
Directory of file. |
date |
Date of the daily weather data in yyyy-mm-dd format. |
zone |
Code for the region of the weather station. |
A dataframe.
weather_data_path <- system.file("extdata", "ex_weather.csv", package = "aelab") df <- process_weather(weather_data_path, "2024-01-01", "site_A")weather_data_path <- system.file("extdata", "ex_weather.csv", package = "aelab") df <- process_weather(weather_data_path, "2024-01-01", "site_A")
Import and tidy an hourly weather report in the CWA "MH"
(Multifield Hourly) text format, as bulk-downloaded from the CODiS portal
(LotsDataReports.txt). The file is fixed-/space-delimited with a
'#' column-title row listing element codes (e.g. PS01,
WD01) and '*' comment rows. One file may contain multiple
stations and a full month of hourly data.
process_weather_mh( file_path, station_id = NULL, expand_30min = TRUE, tz = "Asia/Taipei" )process_weather_mh( file_path, station_id = NULL, expand_30min = TRUE, tz = "Asia/Taipei" )
file_path |
Path to the MH-format |
station_id |
Optional character vector of station codes (e.g.
|
expand_30min |
If |
tz |
Time zone of the observations. Default |
Element codes are renamed as: PS01 -> pressure_hpa,
WD01 -> wind_ms, TX01 -> air_temp,
RH01 -> humidity, WD02 -> wind_dir,
PP01 -> rain_mm. Only the codes present in the file are
returned. CWA missing-value codes (large negatives, e.g. -9991) are set to
NA. The timestamp yyyymmddhh uses hours 01-24, where hour 24
denotes 00:00 of the following day.
A data frame with columns stno, date_time, and the
renamed weather elements present in the file.
## Not run: df <- process_weather_mh("LotsDataReports.txt", station_id = "C0R660") ## End(Not run)## Not run: df <- process_weather_mh("LotsDataReports.txt", station_id = "C0R660") ## End(Not run)
Import and tidy a monthly weather CSV file downloaded from a Taiwan Central Weather Administration station. Column selection is done via regex so minor header changes are handled gracefully.
process_weather_month(file_path, month, year = 2024, zone)process_weather_month(file_path, month, year = 2024, zone)
file_path |
Path to the monthly CSV file. |
month |
Month number (1–12) covered by the file. |
year |
Four-digit year. Default 2024. |
zone |
Character label for the weather station / region. |
A data frame with columns day, pressure_hpa,
temp, humidity_percent, wind_ms, rain_mm,
daylight_hr, radiation, date, and zone.
## Not run: df <- process_weather_month("path/to/2024-01.csv", month = 1, year = 2024, zone = "site_A" ) ## End(Not run)## Not run: df <- process_weather_month("path/to/2024-01.csv", month = 1, year = 2024, zone = "site_A" ) ## End(Not run)
Continuous ggplot2 colour scale using an aelab palette.
scale_colour_aelab_c(name, direction = 1) scale_color_aelab_c(name, direction = 1)scale_colour_aelab_c(name, direction = 1) scale_color_aelab_c(name, direction = 1)
name |
Palette name passed to |
direction |
|
A ggplot2 scale.
## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg, colour = mpg)) + ggplot2::geom_point() + scale_colour_aelab_c("rainbow") ## End(Not run)## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg, colour = mpg)) + ggplot2::geom_point() + scale_colour_aelab_c("rainbow") ## End(Not run)
Discrete ggplot2 colour scale using an aelab palette.
scale_colour_aelab_d(name, direction = 1) scale_color_aelab_d(name, direction = 1)scale_colour_aelab_d(name, direction = 1) scale_color_aelab_d(name, direction = 1)
name |
Palette name passed to |
direction |
|
A ggplot2 scale.
## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg, colour = factor(cyl))) + ggplot2::geom_point() + scale_colour_aelab_d("rainbow") ## End(Not run)## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg, colour = factor(cyl))) + ggplot2::geom_point() + scale_colour_aelab_d("rainbow") ## End(Not run)
Continuous ggplot2 fill scale using an aelab palette.
scale_fill_aelab_c(name, direction = 1)scale_fill_aelab_c(name, direction = 1)
name |
Palette name passed to |
direction |
|
A ggplot2 scale.
## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(factor(cyl), fill = mpg)) + ggplot2::geom_col() + scale_fill_aelab_c("ghg") ## End(Not run)## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(factor(cyl), fill = mpg)) + ggplot2::geom_col() + scale_fill_aelab_c("ghg") ## End(Not run)
Discrete ggplot2 fill scale using an aelab palette.
scale_fill_aelab_d(name, direction = 1)scale_fill_aelab_d(name, direction = 1)
name |
Palette name passed to |
direction |
|
A ggplot2 scale.
## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(factor(cyl), fill = factor(cyl))) + ggplot2::geom_bar() + scale_fill_aelab_d("control") ## End(Not run)## Not run: ggplot2::ggplot(mtcars, ggplot2::aes(factor(cyl), fill = factor(cyl))) + ggplot2::geom_bar() + scale_fill_aelab_d("control") ## End(Not run)
Run ks_test separately for each level of a
faceting variable and return compact letter display (CLD) annotations
ready for geom_text. Only facets where the Kruskal-Wallis p-value
is below alpha are included in the output.
sig_labels( stat_data, variable, group, by = "year", plot_data = NULL, alpha = 0.05 )sig_labels( stat_data, variable, group, by = "year", plot_data = NULL, alpha = 0.05 )
stat_data |
Data frame of raw observations used for the statistical test. Should already be filtered to the relevant subset (e.g. a single water type). |
variable |
Name of the response variable column (string). |
group |
Name of the grouping column (string). |
by |
Name of the faceting column whose levels are iterated over
(string, default |
plot_data |
Data frame used to compute label y-positions via
|
alpha |
Significance threshold; facets with KW p-value >=
|
A data frame with columns <group>, Letter,
MonoLetter, y_pos, and <by>. Returns an empty data
frame when no facet reaches significance.
set.seed(1) df <- data.frame( year = rep(c("2023", "2024"), each = 20), grp = rep(c("A","B","C","D"), 10), val = c(rnorm(20, mean = rep(c(1,3,2,6), 5)), rnorm(20)) ) sig_labels(df, "val", "grp", by = "year")set.seed(1) df <- data.frame( year = rep(c("2023", "2024"), each = 20), grp = rep(c("A","B","C","D"), 10), val = c(rnorm(20, mean = rep(c(1,3,2,6), 5)), rnorm(20)) ) sig_labels(df, "val", "grp", by = "year")
Tidy the data downloaded from GHG Analyzer, with optional timestamp correction.
tidy_ghg_analyzer( file_path, gas, analyzer = "licor", day = 0, hr = 0, min = 0, sec = 0 )tidy_ghg_analyzer( file_path, gas, analyzer = "licor", day = 0, hr = 0, min = 0, sec = 0 )
file_path |
Directory of file. |
gas |
Choose between CO2/CH4 or N2O LI-COR Trace Gas Analyzer, which is "ch4" and "n2o", respectively. |
analyzer |
The brand of the analyzer which the data was downloaded from. |
day |
Day offset to correct the analyzer timestamp. |
hr |
Hour offset to correct the analyzer timestamp. |
min |
Minute offset to correct the analyzer timestamp. |
sec |
Second offset to correct the analyzer timestamp. |
Return the loaded XLSX file after tidying. If any time offset is non-zero, a
real_datetime column is added via convert_time.
ghg_data_path <- system.file("extdata", "ch4.xlsx", package = "aelab") tidy_ghg_analyzer(ghg_data_path, "ch4") tidy_ghg_analyzer(ghg_data_path, "ch4", sec = -30)ghg_data_path <- system.file("extdata", "ch4.xlsx", package = "aelab") tidy_ghg_analyzer(ghg_data_path, "ch4") tidy_ghg_analyzer(ghg_data_path, "ch4", sec = -30)