Fit a Spatial GLM with Initialization and Optimization

Fits a generalized linear model (GLM) with spatial deconvolution for a single response variable (e.g., gene expression), supporting Poisson, Gaussian, Binomial, and Negative Binomial families. This function handles coefficient initialization, model fitting via mini-batch gradient descent, and automatic coefficient filtering for weak covariates or poorly represented cell types.

Usage

run_model(
  y,
  X,
  lambda,
  family = "spot gaussian",
  beta_0 = NULL,
  fix_coef = NULL,
  offset = rep(0, length(y)),
  initialization = T,
  CT = NULL,
  weights = rep(1, length(y)),
  ct_cov_weights = rep(1, ncol(lambda)),
  n_epochs = 100,
  batch_size = 500,
  learning_rate = 1,
  max_diff = 1 - 1e-06,
  improvement_threshold = 1e-06,
  max_conv = 10
)

Arguments

y: Numeric response vector (e.g., gene expression for one gene across spots).
X: Covariate matrix (spots × covariates).
lambda: Deconvolution matrix (spots × cell types).
family: GLM family: "spot gaussian", "spot poisson", "spot negative binomial", or "spot binomial".
beta_0: Optional initial coefficient matrix (covariates × cell types).
fix_coef: Logical matrix (covariates × cell types) indicating coefficients to fix during optimization.
offset: Optional numeric vector (same length as y), used for Poisson or NB normalization.
initialization: Boolean if initialization via single cell approximation should be performed. Default TRUE.
CT: Optional vector of dominant cell type labels per spot.
weights: Observation weights (same length as y).
ct_cov_weights: Optional vector of cell-type–specific weights (length = number of cell types).
n_epochs: Number of training epochs for gradient descent.
batch_size: Size of mini-batches used during gradient descent.
learning_rate: Initial learning rate for optimization.
max_diff: Convergence threshold based on likelihood ratio.
improvement_threshold: Minimum required improvement in likelihood ratio between epochs.
max_conv: Number of consecutive low-improvement epochs before convergence is assumed.

Value

A list containing:

beta_estimate: Estimated coefficient matrix (covariates × cell types).
standard_error_matrix: Standard error matrix for each coefficient.
time: Elapsed fitting time (in seconds).
disp: Estimated dispersion (for NB models).
converged: Logical indicating if convergence was reached.
likelihood: Final negative log-likelihood.
vcov: Variance-covariance matrix.
niter: Number of optimization epochs completed.
fixed_coef: Final matrix indicating fixed coefficients.