Parallelized Spot-GLM Model Fitting (macOS / Linux) — run_model_parallel

Fits a Spot-GLM model for multiple responses (e.g., genes) in parallel using memory-safe chunking and pbmclapply, which relies on mclapply. Only available on Unix-based systems.

Usage

run_model_parallel_mac(
  Y,
  X,
  lambda,
  family = "spot gaussian",
  beta_0 = NULL,
  fix_coef = NULL,
  initialization = T,
  G = 0.1,
  num_cores = 1,
  offset = NULL,
  CT = NULL,
  weights = NULL,
  ct_cov_weights = NULL,
  n_epochs = 100,
  batch_size = 500,
  learning_rate = 1,
  max_diff = 1 - 1e-06,
  improvement_threshold = 1e-06,
  max_conv = 10
)

Arguments

Y: Response matrix (spots × responses).
X: Covariate matrix (spots × covariates).
lambda: Deconvolution matrix (spots × cell types).
family: The GLM family to use. One of: "spot gaussian", "spot poisson", "spot negative binomial", or "spot binomial".
beta_0: Optional initial coefficient matrix (covariates × cell types).
fix_coef: Optional logical matrix indicating which coefficients to fix (same dimensions as beta_0).
initialization: Boolean if initialization via single cell approximation should be performed. Default TRUE.
G: Maximum chunk size (in GB) to control memory usage during parallelization.
num_cores: Number of CPU cores to use in parallel.
offset: Optional numeric vector (length equal to number of spots).
CT: Optional vector of dominant cell types per spot.
weights: Optional observation-level weight matrix (spots × genes).
ct_cov_weights: Optional cell-type-specific weight matrix (cell types × genes).
n_epochs: Number of training epochs.
batch_size: Size of each mini-batch.
learning_rate: Initial learning rate.
max_diff: Convergence threshold based on likelihood improvement ratio.
improvement_threshold: Minimum improvement ratio between epochs.
max_conv: Number of low-improvement epochs before stopping.

Details

This version uses pbmcapply::pbmclapply for parallelism. On Windows systems, please use run_spot_glm_windows.