Assessing If Functional Features are Predictive

Part 1

Author

Santiago Rodriguez

Introduction

This analysis is the first step in a longer journey. As a first step, the purpose of the analysis is to explore how useful features created using functional data analysis are a) in general and b) compared to other feature extraction/creation tools. The features will be tested using a supervised machine learning task where the alternative hypothesis is that the functional features are useful. Useful is defined as at least one of the feature coefficients is not equal to zero (i.e, \(\beta_j \ne 0\)) and the features are useful if the fitted model performs well on unseen data.

The aspiration of this work is to create something new and to expand the tools available to analysts. Functional data analysis is a powerful tool in the analytic toolbox, but functional data analysis is an esoteric branch of statistics. Thus, not many people know of what functional data are nor how to use functional methods. The ideas of this research are related to the Catch22 libraries in R, Python, and Julia for time series (Henderson 2022). The works are similar in that data, likely time series data enters the model and the functions output features. These features can be used in a variety of tasks, such as classification or regression tasks.

This exploratory paper follows a usual pathway in data analysis. First, the data is presented and described.A methods section proceeds that describes the framework of the analysis. Results are then presented. Followed by a discussion section.

Data

The data for this project pertains to my household’s energy consumption. The data is obtained from a state-run website where residents can download their meter readings. For this project the data consists of 15-minute intervals. The fda library in R (J. Ramsay 2024) requires a matrix as an input so the data must be transformed from a dataframe to a matrix and converted from long to wide where the rows represent the 96 15-minute intervals in a day and the 1,343 columns represent the number of days in the data.

Show Code

# using the 15 min aggregated data
# the below creates a matrix
# rows are 15 min time intervals
# cols are dates
# matrix is used b/c the fda requires a matrix

mat_15min <- matrix(
  data = df_energy_raw_clean[["kwh"]],
  ncol = length(unique(df_energy_raw_clean[["date"]])),
  byrow = FALSE,
  dimnames = list(
    time = unique(df_energy_raw_clean[["time"]]),
    date = as.character(unique(df_energy_raw_clean[["date"]]))
  )
)

Methods

This section will define the features created using functional data analysis methods and provide details about their creation. The functional features are measurements on data derived using functional data analysis methods. In particular, smooth, differentiable curves are fitted to the data. The derivative of the curves are then used to extract or create features,such as the mean velocity or the maximum acceleration. The curves are fitted using a Fourier series expansion with 96 basis functions. Tentative readers will notice that there are as many basis functions as there are 15-minute intervals in a day. The result of this is a fully parameterized model. Regularization (J. O. Ramsay and Silverman 2005) was used to fit the curves to minimizing the impact of the full parameterization. The Fourier series was used because it is well suited for periodic data and because it is easy to obtain derivatives of the series. The data is considered periodic because energy consumption is continuous and the start of one day begins where the previous day ends.

The functional features will be tested against a well-known feature-creation library called Catch22. The purpose of the Catch22 features are to assess how the functional features compare to existing bodies of work in this space.

Both the functional features and the Catch22 features will be applied to a multi-class classification task where the response variable or supervisor is the day of the week (Sunday - Saturday). The model used for this task is the LASSO. It is expected that at least one of the feature coefficients in both sets of features will be non-zero. The number of non-zero coefficients will be compared as well as the test error performance metrics, such as Kappa.

Catch22 Features

In this step the Catch22 features will be calculated for each day. I don’t think calculating the features before splitting the data into train/test sets matters because each day is treated independently.

Show Code

# for each month, create a data frame (n months long by p cols) using the RCatch22 lib
# MARGIN specifies how to apply a function 1 = rows, 2 = cols
# FUN specifies what function to apply
# FUN calculates the catch22 features
# the output is a list of data.frames with features as rows (col name names)
# and the values for these features in a col named values
# simplify tells the apply not to alter the output
df_rcatch22_features <- apply(
  X = mat_15min,
  MARGIN = 2,
  FUN = \(x) catch22_all(data = x, catch24 = FALSE),
  simplify = FALSE
)

# add date as id col for downstream pivot
df_rcatch22_features <- mapply(
  \(x, y) {
    x[["date"]] <- as.Date(y)
    return(x)
  },
  x = df_rcatch22_features,
  y = colnames(mat_15min),
  SIMPLIFY = FALSE
)

# row combine the results
df_rcatch22_features <- do.call(rbind, df_rcatch22_features)

# reset row names
row.names(df_rcatch22_features) <- NULL

# transpose the long dataframe to wide format
# where features are cols
# the names `names` `values` are hard-coded in pivot_wider
# if there were a package update and those names changed,
# the below code would fail
df_rcatch22_features <- pivot_wider(df_rcatch22_features, names_from = names, values_from = values)

# rm unused cols
# df_rcatch22_features <- subset(df_rcatch22_features, select = -date)

# drop missing/NAN vlues
if (nrow(subset(df_rcatch22_features, subset = is.nan(DN_HistogramMode_5))) > 0) stop("NaN present")

Functional Data Analysis

In this section the necessary components to git smooth curves to the data are created. As mentioned, we’ll use a Fourier series expansion to fit the curves. To expedite the process this code distributes computation across worker nodes (if they’re online).

Show Code

# since using Fourier series, with a harmonic accelerator need to specify T, the number of time intervals in the series
# in this case T = 96 15-min periods
fda_T <- nrow(mat_15min)
# num_ofBasis_functions_15min = num_ofBasis_functions_15min

# the fda library allows to label the data/curves
fdaNames <- list("Time", "Date", "kWh")

# defines the start and end point of the range in fractional hours
# start = 0 b/c the beginning period is 00:00:00
# end = 23.75 b/c end = 24 minus 15 minutes (in fractional hours)
range_eval <- c(0, 23.75)

# defines the entire sequence, from start to end
eval_arg <- seq(
  from = range_eval[[1]],
  to = range_eval[[2]],
  by = 15 / 60
)

# for the Fourier harmonic accelerator
# see ... for an explanation
fourier_omega <- (2 * pi) / fda_T
harmonic_accerlation <- vec2Lfd(
  c(0, fourier_omega**2, 0),
  range_eval
)

# optimization: choose best lambda: smoothing via ldf
# - define search grid
# - exclude 0 otherwise error
# - option 1 is finer grid, takes longer to run
# - option 2 is coarse grid, good enough
# lambda_grid <- c(seq(1e-5, 1, 1e-3), seq(1, 1001, 1))
lambda_grid <- c(seq(1e-5, 1, 1e-2), seq(1, 1001, 10))

Show Code

# create fourier basis functions using fda library
# use fourier basis because fitted curves are continuous and easily differentiable
# also, there is a certain cyclicality to the data, as in the end of one day is the beginning of the next

fourier_basis <- create.fourier.basis(
  rangeval = range_eval,
  nbasis = fda_T
)

Show Code

# check if any server is online
if (!is.null(available_workers)) {
  # then use it/them
  plan(
    strategy = "cluster",
    workers = c(rep("localhost", num_cores), available_workers)
  )
} else {
  # else use this machine only
  plan(strategy = "multisession", workers = num_cores)
}

# this part is fast
# the purpose of this part is to ...
step01_fdPar <- lapply(
  X = lambda_grid,
  FUN = \(x) fdPar(
    fdobj = fourier_basis,
    Lfdobj = harmonic_accerlation,
    lambda = x
  )
)

# WARNING: this part can be slow
# leverage additional computing resources if available/desired
step02_smooth <- future_map(
  .x = step01_fdPar,
  .f = function(x) smooth.basis(eval_arg, mat_15min, x),
  .progress = FALSE
)

# this part is fast
# the purpose of this part is to ...
fda_gcv <- vapply(
  X = step02_smooth,
  FUN = \(x) sum(x[["gcv"]], na.rm = TRUE),
  FUN.VALUE = numeric(1)
)

# from the gcv values above, compute the lambda argmin (min gcv value)
# the lambda argmin is used to find the optimal lambda value
min_lambda <- lambda_grid[which.min(fda_gcv)]

# apply optimal lambda
# the purpose of this step is ...
min_lambda_fda_fdPar <- fdPar(
  fdobj = fourier_basis,
  Lfdobj = harmonic_accerlation,
  lambda = min_lambda
)

# the purpose of this step is ...
fda_fd_smooth <- smooth.basis(
  eval_arg,
  mat_15min,
  min_lambda_fda_fdPar
)

# apply previously defined names to the data
fda_fd_smooth[["fd"]][["fdnames"]] <- fdaNames

# house cleaning: parallelization: end
plan(strategy = "sequential")

In this section the derivatives of the fitted curves are calculated - velocity of consumption and acceleration of consumption.

Show Code

# in this section the derivatives of the fitted curves are estimated
# for avoid repeated code (DRY), have created a list structure to contain
# the derivative fda objects, we'll see if this was a good idea

# define named vector of desired derivatives
fda_derivatives <- c(
  velocity = 1,
  acceleration = 2
)

# list with n elements
# elements defined above in named vector
# length of named vector == number of elements
fda_list_of_derivatives <- lapply(
  X = fda_derivatives,
  FUN = \(d) deriv.fd(fda_fd_smooth[["fd"]], d)
)

Show Code

tmp <- 6
plot(
  x = eval_arg,
  y = mat_15min[, tmp],
  type = "l",
  main = "Position"
)
lines(fda_fd_smooth$fd[tmp])

Show Code

plot(fda_list_of_derivatives[[1]][tmp], main = "Velocity")

[1] "done"

Show Code

plot(fda_list_of_derivatives[[2]][tmp], main = "Acceleration")

[1] "done"

Show Code

rm(tmp)

In this section we begin the process of creating the functional features.

Show Code

# define prediction grid
fda_derivative_prediction_grid <- seq(
  from = range_eval[[1]],
  to = range_eval[[2]],
  by = 1 / 60
)

# first, create list of predicted D values
# where D = derivative of order d
fda_list_of_predicted_derivatives <- lapply(
  X = fda_list_of_derivatives,
  FUN = \(x) predict(x, fda_derivative_prediction_grid)
)

# for area under the curve
auc_sequence <- fda_list_of_derivatives[["velocity"]]
auc_sequence <- auc_sequence[["coefs"]]
auc_sequence <- ncol(auc_sequence)
auc_sequence <- seq_len(auc_sequence)

Next we calculate the functional features for velocity and acceleration. Some of the features calculated include: min/max velocity/acceleration, the number of critical points, min/max number of positive/negative runs.

Show Code

# for e/ d in Derivatives defined above, calculate:
# - min, max
# - mean, median, sd, mad velocity
# - number of inflection of points

# calculate main features
fda_features_velocity <- apply(
  X = fda_list_of_predicted_derivatives[["velocity"]],
  MARGIN = 2,
  FUN = function(x) {
    list(
      argmin = which.min(x),
      argmax = which.max(x),
      min = min(x, na.rm = TRUE),
      max = max(x, na.rm = TRUE),
      mean = mean(x, na.rm = TRUE),
      sd = sd(x, na.rm = TRUE),
      median = median(x, na.rm = TRUE),
      mad = mad(x, na.rm = TRUE),
      skewness = skewness(x, na.rm = TRUE),
      kurtosis = kurtosis(x, na.rm = TRUE),
      critical_points = count_inflection_points(x),
      max_positive_run = max_run_lengths(as.numeric(x), TRUE),
      max_negative_run = max_run_lengths(as.numeric(x), FALSE)
    )
  }
)

Show Code

# calculate main features
fda_features_acceleration <- apply(
  X = fda_list_of_predicted_derivatives[["acceleration"]],
  MARGIN = 2,
  FUN = function(x) {
    list(
      argmin = which.min(x),
      argmax = which.max(x),
      min = min(x, na.rm = TRUE),
      max = max(x, na.rm = TRUE),
      mean = mean(x, na.rm = TRUE),
      sd = sd(x, na.rm = TRUE),
      median = median(x, na.rm = TRUE),
      mad = mad(x, na.rm = TRUE),
      skewness = skewness(x, na.rm = TRUE),
      kurtosis = kurtosis(x, na.rm = TRUE),
      inflection_points = count_inflection_points(x),
      max_positive_run = max_run_lengths(as.numeric(x), TRUE),
      max_negative_run = max_run_lengths(as.numeric(x), FALSE)
    )
  }
)

Another functional feature calculated is the area under the velocity and acceleration curves.

Show Code

# calculate area under the curve
auc_velocity <- vapply(
  X = auc_sequence,
  FUN = \(x) area_under_the_curve(
    fda_list_of_derivatives[["velocity"]][x],
    range_eval,
    harmonic_accerlation
  ),
  FUN.VALUE = numeric(1)
)

# calculate area under the curve
auc_acceleration <- vapply(
  X = auc_sequence,
  FUN = \(x) area_under_the_curve(
    fda_list_of_derivatives[["acceleration"]][x],
    range_eval,
    harmonic_accerlation
  ),
  FUN.VALUE = numeric(1)
)

In this section the velocity and acceleration functional features are prepped and column-combined to create a single data frame.

Show Code

# convert to dataframe
# rownames are colnames of input matrix for fda
fda_features_velocity <- map_df(
  .x = fda_features_velocity,
  .f = as.data.frame
)
fda_features_acceleration <- map_df(
  .x = fda_features_acceleration,
  .f = as.data.frame
)

# add new columns
fda_features_velocity[["auc"]] <- auc_velocity
fda_features_acceleration[["auc"]] <- auc_acceleration

# add prefix to col names to identify d
names(fda_features_velocity) <- paste0(
  names(fda_features_velocity),
  sep = "_velocity"
)

names(fda_features_acceleration) <- paste0(
  names(fda_features_acceleration),
  sep = "_acceleration"
)

# combine features
df_functional_features <- cbind(fda_features_velocity, fda_features_acceleration)

Testing Feature Importance

ML classification
LASSO feature selection
Modeling
- model with only Catch22 vs model with only functional features
- model with all features, see which are selected

Train/Test Split

Show Code

# create a vector for Y
Y <- subset(df_energy_raw_clean, select = c(date, day_name))
Y <- unique(Y)
Y <- subset(Y, select = -date)

# update data w/ Y
df_rcatch22_features <- cbind(Y, df_rcatch22_features)
df_functional_features <- cbind(Y, df_functional_features)

# initiate split the data
set.seed(123)
catch22_split <- initial_split(df_rcatch22_features)
functional_features_split <- initial_split(df_functional_features)

# split the data
df_catch22_train <- training(catch22_split)
df_catch22_test <- testing(catch22_split)
df_functional_features_train <- training(functional_features_split)
df_functional_features_test <- testing(functional_features_split)

Modeling

The model chosen for the task is the LASSO with auto cross-validation.

Show Code

# fit the model
model_catch22 <- cv.glmnet(
  x = as.matrix(subset(df_catch22_train, select = -day_name)),
  y = df_catch22_train[["day_name"]],
  family = "multinomial",
  alpha = 1,
  nfolds = 5,
  standardize = TRUE
)

model_functional_features <- cv.glmnet(
  x = as.matrix(subset(df_functional_features_train, select = -day_name)),
  y = df_functional_features_train[["day_name"]],
  family = "multinomial",
  alpha = 1,
  nfolds = 5,
  standardize = TRUE
)

Prediction

Show Code

# generate predictions using test data
# output is a matrix
predictions_catch22 <- predict(
  model_catch22,
  newx = as.matrix(subset(df_catch22_test, select = -day_name)),
  s = model_catch22[["lambda.1se"]],
  type = "class"
)
predictions_functional_features <- predict(
  model_functional_features,
  newx = as.matrix(subset(df_functional_features_test, select = -day_name)),
  s = model_functional_features[["lambda.1se"]],
  type = "class"
)

Evaluation

Catch22

Show Code

dow_levels <- c("Sat", "Sun", "Mon", "Tue", "Wed", "Thu", "Fri")

data.frame(
  obs = factor(df_functional_features_test[["day_name"]], levels = dow_levels),
  pred = factor(predictions_catch22, levels = dow_levels)
) |>
  conf_mat(data = _, obs, pred) |>
  summary()

# A tibble: 13 × 3
   .metric              .estimator .estimate
   <chr>                <chr>          <dbl>
 1 accuracy             multiclass     0.133
 2 kap                  multiclass     0    
 3 sens                 macro          0.143
 4 spec                 macro          0.857
 5 ppv                  macro         NA    
 6 npv                  macro         NA    
 7 mcc                  multiclass   NaN    
 8 j_index              macro          0    
 9 bal_accuracy         macro          0.5  
10 detection_prevalence macro          0.143
11 precision            macro          0.133
12 recall               macro          0.143
13 f_meas               macro          0.235

Functional Features

Show Code

data.frame(
  obs = factor(df_functional_features_test[["day_name"]], levels = dow_levels),
  pred = factor(predictions_functional_features, dow_levels)
) |>
  conf_mat(data = _, obs, pred) |>
  summary()

# A tibble: 13 × 3
   .metric              .estimator .estimate
   <chr>                <chr>          <dbl>
 1 accuracy             multiclass     0.124
 2 kap                  multiclass     0    
 3 sens                 macro          0.143
 4 spec                 macro          0.857
 5 ppv                  macro         NA    
 6 npv                  macro         NA    
 7 mcc                  multiclass   NaN    
 8 j_index              macro          0    
 9 bal_accuracy         macro          0.5  
10 detection_prevalence macro          0.143
11 precision            macro          0.124
12 recall               macro          0.143
13 f_meas               macro          0.221

Results

Overall, neither Catch22 nor the functional features were predictive.

References

Henderson, Trent. 2022. Rcatch22: Calculation of 22 CAnonical Time-Series CHaracteristics.

Ramsay, J. O., and B. W. Silverman. 2005. Functional Data Analysis. 2nd ed. 233 Spring St., New York, NY 10013, USA: Springer Science+Business Media, Inc.

Ramsay, James. 2024. Fda: Functional Data Analysis. http://www.functionaldata.org.