TabICL2 is an Work in Progress R implementation of TabICLv2: A better, faster, scalable, and open tabular foundation model (Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan).
TabICL2 is a transformer-based foundation model for tabular data using in-context learning. Unlike traditional supervised learning that requires model fine-tuning, TabICL learns patterns from labeled examples in-context—similar to how GPT processes text, but for tabular data—to make predictions without additional training.
You can install the development version of TabICL2 from GitHub with:
# install.packages("pak")
pak::pak("cregouby/TabICL2")As TabICLv2 use train and test input in a shape, we provide formula or recipe input with a {rsample} rsplit object for the data parameter :
library(TabICL2)
suppressPackageStartupMessages(library(recipes))
library(rsample)
data("attrition", package = "modeldata")
attrition_split <- initial_split(attrition)
rec <- recipe(Attrition ~ ., data = training(attrition_split)) %>%
step_normalize(all_numeric(), -all_outcomes())
fit <- tab_icl2(rec, data = attrition_split)
# Get predicted classes
predicted_classes <- predict(fit, testing(attrition_split))
predicted_classesCurrent implementation relies on nanotabiclv2 and will support advanced features incrementally in the future.
When making multiple predictions on the same training context, use caching to avoid redundant computation:
# Store cache on first forward pass
model$forward_with_cache(
X = X_train, # Training data only
y_train = y_train,
store_cache = TRUE,
cache_mode = "kv" # Cache key-value projections
)
# Reuse cache for subsequent test batches
for (test_batch in test_batches) {
preds <- model$forward_with_cache(
X = test_batch,
use_cache = TRUE,
num_classes = 5
)
}
# Clear cache when done
model$clear_cache()Cache Modes: - "kv": Cache key-value projections (fastest, more memory) - "repr": Cache row representations only (~24x less memory for ICL part)
For large-scale inference, configure automatic offloading:
# Configure inference settings
inf_config <- InferenceConfig(
auto_batch = TRUE, # Automatic batching
batch_size = 32,
offload = OFFLOAD_AUTO, # Automatic memory management
min_memory_gb = 4.0 # Keep 4GB GPU memory available
)
# Use configuration during inference
predictions <- model$inference_forward(
X, y_train,
inference_config = inf_config
)Offload Options: - OFFLOAD_GPU: Keep all on GPU (fastest) - OFFLOAD_CPU: Offload to CPU pinned memory - OFFLOAD_DISK: Offload to memory-mapped files - OFFLOAD_AUTO: Automatically choose based on available memory
TabICL automatically handles datasets with more classes than max_classes using hierarchical classification:
# Model with max_classes = 10
model <- TabICL(max_classes = 10, num_quantiles = 10, ...)
# Dataset with 100 classes - automatically uses hierarchy
y_train_large <- torch_randint(0L, 100L, c(batch_size, train_size))
model$eval()
predictions <- model(X, y_train_large, return_logits = FALSE)
# predictions: (batch, test_size, 100) probabilities via hierarchical ensemblingTabICL processes tabular data through three sequential stages:
ColEmbedding)
Creates distribution-aware embeddings for each column using set transformers with induced self-attention:
RowInteraction)
Captures interactions between features within each row using standard transformer blocks:
ICLearning)
Learns from labeled training examples in-context to predict on test examples:
[train_row_1, ..., train_row_n, test_row_1, ..., test_row_m]
Key Insight: The model is trained on many different tabular datasets. At inference time, it uses the labeled examples you provide (the “context”) to understand the current task and make predictions on unlabeled test rows—no fine-tuning required.
recompute = TRUE) during training to save memorysoftmax_temperature) to calibrate prediction confidence