| Title: | Unsupervised Clustering of Individualized Survival Curves |
|---|---|
| Description: | Tools for clustering individualized survival curves using the Partitioning Around Medoids (PAM) algorithm, with monotonic enforcement, optional smoothing, weighted distances (L1/L2), automatic K selection via silhouette width, prediction for new curves, basic stability checks, and plotting helpers. The clustering strategy follows Kaufman and Rousseeuw (1990, ISBN:0471878766). |
| Authors: | Imad EL BADISY [aut, cre] |
| Maintainer: | Imad EL BADISY <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.5.0 |
| Built: | 2026-05-17 06:32:23 UTC |
| Source: | https://github.com/ielbadisy/unsurv |
The unsurv package provides tools for unsupervised clustering of individualized survival curves using medoid-based clustering (PAM).
It is designed for settings where each individual is represented by a survival probability curve evaluated on a common time grid, such as predictions from:
Kaplan–Meier estimates,
Cox models,
parametric survival models,
deep learning survival models,
or other individualized survival predictors.
Core features include:
PAM clustering using weighted L1 or L2 distances,
automatic cluster selection via silhouette width,
optional monotonicity enforcement,
optional median smoothing,
prediction of cluster membership for new curves,
stability assessment via resampling and Adjusted Rand Index,
base R and ggplot2 visualization methods.
Main functions:
unsurv — fit clustering model
predict.unsurv — predict cluster membership
plot.unsurv — plot medoid curves
summary.unsurv — summarize clustering
unsurv_stability — assess stability
Imad EL BADISY
Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley.
Useful links:
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) n <- 10 times <- seq(0, 5, length.out = 40) rates <- sample(c(0.2, 0.6), n, TRUE) S <- sapply(times, function(t) exp(-rates * t)) fit <- unsurv(S, times, K = 2) plot(fit) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) n <- 10 times <- seq(0, 5, length.out = 40) rates <- sample(c(0.2, 0.6), n, TRUE) S <- sapply(times, function(t) exp(-rates * t)) fit <- unsurv(S, times, K = 2) plot(fit) }
ggplot2 autoplot for unsurv objects
## S3 method for class 'unsurv' autoplot(object, ...)## S3 method for class 'unsurv' autoplot(object, ...)
object |
An object of class |
... |
Unused. |
A ggplot object.
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 25) grp <- rep(1:2, each = 10) rates <- c(0.2, 0.55) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) ggplot2::autoplot(fit) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 25) grp <- rep(1:2, each = 10) rates <- c(0.2, 0.55) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) ggplot2::autoplot(fit) }
Convenience helper to visualize the distribution of Adjusted Rand Index
values returned by unsurv_stability(...) when return_distribution = TRUE.
plot_stability(stab)plot_stability(stab)
stab |
Either the numeric vector of ARI values or the list returned by
|
A ggplot histogram with a dashed line at the mean ARI.
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 25) grp <- rep(1:2, each = 10) rates <- c(0.2, 0.55) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) stab <- unsurv_stability(S, times, fit, B = 6, frac = 0.7) plot_stability(stab) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 25) grp <- rep(1:2, each = 10) rates <- c(0.2, 0.55) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) stab <- unsurv_stability(S, times, fit, B = 6, frac = 0.7) plot_stability(stab) }
Produces a base R plot of the cluster medoid survival curves stored in the fitted object.
## S3 method for class 'unsurv' plot(x, ...)## S3 method for class 'unsurv' plot(x, ...)
x |
An object of class |
... |
Additional arguments passed to |
Invisibly returns x.
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 30) grp <- rep(1:2, each = 8) rates <- c(0.2, 0.55) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) plot(fit) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 30) grp <- rep(1:2, each = 8) rates <- c(0.2, 0.55) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) plot(fit) }
Assigns new survival-probability curves to clusters using the medoids from a
fitted unsurv object. New curves are preprocessed using the same
weighting, optional monotonic enforcement, smoothing, and standardization
parameters as the fitted model.
## S3 method for class 'unsurv' predict(object, newdata, clamp = TRUE, ...)## S3 method for class 'unsurv' predict(object, newdata, clamp = TRUE, ...)
object |
An object of class |
newdata |
Numeric matrix of survival probabilities with shape
|
clamp |
Logical; if |
... |
Unused. Included for compatibility with the generic. |
Cluster assignment is performed by computing distances between the new curves
and the stored medoid curves in the weighted feature space defined during
fitting. The distance metric ("L1" or "L2") and any
standardization parameters are reused from the fitted model.
An integer vector of cluster labels of length nrow(newdata),
taking values in 1, ..., object$K.
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) n <- 60; Q <- 40 times <- seq(0, 5, length.out = Q) grp <- sample(1:2, n, TRUE) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) S <- S + matrix(stats::rnorm(n * Q, 0, 0.02), nrow = n) fit <- unsurv(S, times, K = 2) # predict cluster membership for first 5 curves predict(fit, S[1:5, ]) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) n <- 60; Q <- 40 times <- seq(0, 5, length.out = Q) grp <- sample(1:2, n, TRUE) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) S <- S + matrix(stats::rnorm(n * Q, 0, 0.02), nrow = n) fit <- unsurv(S, times, K = 2) # predict cluster membership for first 5 curves predict(fit, S[1:5, ]) }
Print a summary of an unsurv model
## S3 method for class 'summary.unsurv' print(x, ...)## S3 method for class 'summary.unsurv' print(x, ...)
x |
An object of class |
... |
Unused. |
Invisibly returns x.
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 20) grp <- rep(1:2, each = 6) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) s <- summary(fit) print(s) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 20) grp <- rep(1:2, each = 6) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) s <- summary(fit) print(s) }
Print an unsurv model
## S3 method for class 'unsurv' print(x, ...)## S3 method for class 'unsurv' print(x, ...)
x |
An object of class |
... |
Unused. |
Invisibly returns x.
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 20) grp <- rep(1:2, each = 6) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) print(fit) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 20) grp <- rep(1:2, each = 6) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) print(fit) }
Summarize an unsurv model
## S3 method for class 'unsurv' summary(object, ...)## S3 method for class 'unsurv' summary(object, ...)
object |
An object of class |
... |
Unused. |
An object of class "summary.unsurv" with elements:
K: number of clusters
silhouette_mean: mean silhouette width
size: cluster sizes
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 20) grp <- rep(1:2, each = 6) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) summary(fit) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) times <- seq(0, 4, length.out = 20) grp <- rep(1:2, each = 6) rates <- c(0.2, 0.6) S <- sapply(times, function(t) exp(-rates[grp] * t)) fit <- unsurv(S, times, K = 2) summary(fit) }
Clusters individuals using their survival-probability curves evaluated on a
common time grid. The method computes a weighted feature representation of
the curves and applies PAM (Partitioning Around Medoids) on the resulting
dissimilarity matrix. If K is not provided, it is selected by maximizing
the mean silhouette width over K = 2, ..., K_max.
Fits an unsupervised clustering model on survival-probability curves evaluated on a common time grid. Clustering is performed using PAM (Partitioning Around Medoids) on a weighted feature representation of the curves.
unsurv( S, times, K = NULL, K_max = 10, distance = c("L2", "L1"), weights = NULL, enforce_monotone = TRUE, smooth_median_width = 0, standardize_cols = FALSE, eps_jitter = 0.001, seed = NULL ) unsurv( S, times, K = NULL, K_max = 10, distance = c("L2", "L1"), weights = NULL, enforce_monotone = TRUE, smooth_median_width = 0, standardize_cols = FALSE, eps_jitter = 0.001, seed = NULL )unsurv( S, times, K = NULL, K_max = 10, distance = c("L2", "L1"), weights = NULL, enforce_monotone = TRUE, smooth_median_width = 0, standardize_cols = FALSE, eps_jitter = 0.001, seed = NULL ) unsurv( S, times, K = NULL, K_max = 10, distance = c("L2", "L1"), weights = NULL, enforce_monotone = TRUE, smooth_median_width = 0, standardize_cols = FALSE, eps_jitter = 0.001, seed = NULL )
S |
Numeric matrix of survival probabilities with shape |
times |
Numeric vector of length |
K |
Optional integer number of clusters. If |
K_max |
Maximum |
distance |
Distance type: |
weights |
Optional nonnegative vector of length |
enforce_monotone |
Logical; enforce non-increasing survival curves over time. |
smooth_median_width |
Integer; if |
standardize_cols |
Logical; standardize feature columns before clustering. |
eps_jitter |
Nonnegative numeric; feature-space Gaussian jitter sd to break ties. |
seed |
Optional integer seed. |
This function requires the cluster package for PAM clustering and silhouette widths.
The returned object stores medoid curves and metadata required for prediction
on new curves via predict (method predict.unsurv).
If K is NULL, the number of clusters is selected by maximizing
the mean silhouette width over K = 2, ..., K_max.
Requires the cluster package (recommended in Suggests).
An object of class "unsurv" with components including:
clusters: integer vector of cluster assignments
K: number of clusters
times: time grid
medoids: medoid survival curves (one per cluster)
silhouette_mean: mean silhouette width
plus preprocessing/settings fields used for prediction
An object of class "unsurv".
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(2025) n <- 40; Q <- 30 times <- seq(0, 5, length.out = Q) rates <- c(0.12, 0.38, 0.8) grp <- sample(1:3, n, TRUE, c(0.4, 0.4, 0.2)) S <- t(vapply(1:n, function(i) pmin(pmax(exp(-rates[grp[i]] * times) + rnorm(Q, 0, 0.01), 0), 1), numeric(Q) )) fit <- unsurv(S, times, K = NULL, K_max = 6, distance = "L2", enforce_monotone = TRUE, standardize_cols = FALSE, eps_jitter = 0, seed = NULL) print(fit) summary(fit) plot(fit) pred <- predict(fit, S[1:5, ]) pred } if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) n <- 40 times <- seq(0, 5, length.out = 30) grp <- sample(1:2, n, TRUE) rates <- ifelse(grp == 1, 0.2, 0.6) S <- sapply(times, function(t) exp(-rates * t)) S <- S + matrix(stats::rnorm(n * length(times), 0, 0.02), nrow = n) fit <- unsurv(S, times, K = NULL, K_max = 6, seed = 123) table(fit$clusters, grp) }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(2025) n <- 40; Q <- 30 times <- seq(0, 5, length.out = Q) rates <- c(0.12, 0.38, 0.8) grp <- sample(1:3, n, TRUE, c(0.4, 0.4, 0.2)) S <- t(vapply(1:n, function(i) pmin(pmax(exp(-rates[grp[i]] * times) + rnorm(Q, 0, 0.01), 0), 1), numeric(Q) )) fit <- unsurv(S, times, K = NULL, K_max = 6, distance = "L2", enforce_monotone = TRUE, standardize_cols = FALSE, eps_jitter = 0, seed = NULL) print(fit) summary(fit) plot(fit) pred <- predict(fit, S[1:5, ]) pred } if (requireNamespace("cluster", quietly = TRUE)) { set.seed(1) n <- 40 times <- seq(0, 5, length.out = 30) grp <- sample(1:2, n, TRUE) rates <- ifelse(grp == 1, 0.2, 0.6) S <- sapply(times, function(t) exp(-rates * t)) S <- S + matrix(stats::rnorm(n * length(times), 0, 0.02), nrow = n) fit <- unsurv(S, times, K = NULL, K_max = 6, seed = 123) table(fit$clusters, grp) }
Computes a resampling-based stability score for a fitted unsurv model
using the Adjusted Rand Index (ARI) computed on overlap sets across resamples.
unsurv_stability( S, times, fit, B = 30, frac = 0.5, mode = c("bootstrap", "subsample"), jitter_sd = 0.01, weight_perturb = 0.3, eps_jitter = 0.02, return_distribution = TRUE )unsurv_stability( S, times, fit, B = 30, frac = 0.5, mode = c("bootstrap", "subsample"), jitter_sd = 0.01, weight_perturb = 0.3, eps_jitter = 0.02, return_distribution = TRUE )
S |
Numeric matrix of survival probabilities used for stability assessment
( |
times |
Numeric vector of time grid points (length |
fit |
An object of class |
B |
Integer; number of resamples. |
frac |
Numeric in (0, 1]; fraction of rows sampled per resample. |
mode |
Resampling mode: |
jitter_sd |
Nonnegative numeric; curve-space noise level applied before clamping/monotone enforcement. |
weight_perturb |
Numeric in |
eps_jitter |
Nonnegative numeric; feature-space jitter used inside the clustering during resamples. |
return_distribution |
Logical; if |
If return_distribution = TRUE, a list with:
mean: mean ARI across resample-pair overlaps
aris: numeric vector of ARIs
Otherwise, returns a single numeric mean ARI.
if (requireNamespace("cluster", quietly = TRUE)) { set.seed(2025) n <- 60; Q <- 40 times <- seq(0, 5, length.out = Q) rates <- c(0.12, 0.38, 0.8) grp <- sample(1:3, n, TRUE, c(0.4, 0.4, 0.2)) S <- t(vapply(1:n, function(i) pmin(pmax(exp(-rates[grp[i]] * times) + rnorm(Q, 0, 0.01), 0), 1), numeric(Q) )) fit <- unsurv(S, times, K = NULL, K_max = 6, distance = "L2", enforce_monotone = TRUE, standardize_cols = FALSE, eps_jitter = 0, seed = NULL) stab <- unsurv_stability(S, times, fit, B = 8, frac = 0.55, mode = "bootstrap", jitter_sd = 0.3, weight_perturb = 0.0, eps_jitter = 0.3, return_distribution = TRUE) stab$mean }if (requireNamespace("cluster", quietly = TRUE)) { set.seed(2025) n <- 60; Q <- 40 times <- seq(0, 5, length.out = Q) rates <- c(0.12, 0.38, 0.8) grp <- sample(1:3, n, TRUE, c(0.4, 0.4, 0.2)) S <- t(vapply(1:n, function(i) pmin(pmax(exp(-rates[grp[i]] * times) + rnorm(Q, 0, 0.01), 0), 1), numeric(Q) )) fit <- unsurv(S, times, K = NULL, K_max = 6, distance = "L2", enforce_monotone = TRUE, standardize_cols = FALSE, eps_jitter = 0, seed = NULL) stab <- unsurv_stability(S, times, fit, B = 8, frac = 0.55, mode = "bootstrap", jitter_sd = 0.3, weight_perturb = 0.0, eps_jitter = 0.3, return_distribution = TRUE) stab$mean }