TabICL2 is an Work in Progress R implementation of TabICLv2: A better, faster, scalable, and open tabular foundation model (Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan).

TabICL2 is a transformer-based foundation model for tabular data using in-context learning. Unlike traditional supervised learning that requires model fine-tuning, TabICL learns patterns from labeled examples in-context—similar to how GPT processes text, but for tabular data—to make predictions without additional training.

Key Features

  • In-Context Learning: Make predictions on new data by learning from labeled examples in the prompt, without fine-tuning
  • Universal Architecture: Single model handles both classification and regression tasks
  • Scalable Classification: Switch to hierarchical classification to automatically handles datasets with thousands of classes (TO-BE)
  • Efficient Inference: KV-caching and representation caching for fast batch predictions (TO-BE)
  • Memory Management: Automatic offloading to CPU or disk for large-scale inference (TO-BE)
  • Distribution-Aware Embeddings: Column embeddings capture statistical properties of features using set transformers (TO-BE)
  • Flexible Attention: Scalable softmax (SSMax) variants prevent attention saturation on long sequences

Installation

You can install the development version of TabICL2 from GitHub with:

# install.packages("pak")
pak::pak("cregouby/TabICL2")

Quick Start

Classification example

As TabICLv2 use train and test input in a Γ\Gamma shape, we provide formula or recipe input with a {rsample} rsplit object for the data parameter :

library(TabICL2)
suppressPackageStartupMessages(library(recipes))
library(rsample)

data("attrition", package = "modeldata")
attrition_split <- initial_split(attrition)
  
rec <- recipe(Attrition ~ ., data = training(attrition_split)) %>% 
  step_normalize(all_numeric(), -all_outcomes())

fit <- tab_icl2(rec, data = attrition_split)

# Get predicted classes
predicted_classes <- predict(fit, testing(attrition_split))
predicted_classes

Regression Example

library(TabICL2)
library(rsample)

data("ames", package = "modeldata")

ames_split <- initial_split(ames)

fit <- tab_icl2(Sale_Price ~ ., data = ames_split)

# Get predicted classes
predicted_sale_price <- predict(fit, testing(ames_split))
predicted_sale_price

Advanced Features (Future)

Current implementation relies on nanotabiclv2 and will support advanced features incrementally in the future.

KV Caching for Fast Inference

When making multiple predictions on the same training context, use caching to avoid redundant computation:

# Store cache on first forward pass
model$forward_with_cache(
  X = X_train,              # Training data only
  y_train = y_train,
  store_cache = TRUE,
  cache_mode = "kv"         # Cache key-value projections
)

# Reuse cache for subsequent test batches
for (test_batch in test_batches) {
  preds <- model$forward_with_cache(
    X = test_batch,
    use_cache = TRUE,
    num_classes = 5
  )
}

# Clear cache when done
model$clear_cache()

Cache Modes: - "kv": Cache key-value projections (fastest, more memory) - "repr": Cache row representations only (~24x less memory for ICL part)

Memory-Efficient Inference

For large-scale inference, configure automatic offloading:

# Configure inference settings
inf_config <- InferenceConfig(
  auto_batch = TRUE,          # Automatic batching
  batch_size = 32,
  offload = OFFLOAD_AUTO,     # Automatic memory management
  min_memory_gb = 4.0         # Keep 4GB GPU memory available
)

# Use configuration during inference
predictions <- model$inference_forward(
  X, y_train,
  inference_config = inf_config
)

Offload Options: - OFFLOAD_GPU: Keep all on GPU (fastest) - OFFLOAD_CPU: Offload to CPU pinned memory - OFFLOAD_DISK: Offload to memory-mapped files - OFFLOAD_AUTO: Automatically choose based on available memory

Hierarchical Classification

TabICL automatically handles datasets with more classes than max_classes using hierarchical classification:

# Model with max_classes = 10
model <- TabICL(max_classes = 10, num_quantiles = 10, ...)

# Dataset with 100 classes - automatically uses hierarchy
y_train_large <- torch_randint(0L, 100L, c(batch_size, train_size))

model$eval()
predictions <- model(X, y_train_large, return_logits = FALSE)
# predictions: (batch, test_size, 100) probabilities via hierarchical ensembling

Architecture

TabICL processes tabular data through three sequential stages:

1. Column-wise Embedding (ColEmbedding)

Creates distribution-aware embeddings for each column using set transformers with induced self-attention:

  • Maps scalar cells to high-dimensional embeddings
  • Captures statistical regularities within columns
  • Uses shared parameters across all features
  • Supports feature grouping and target-aware embeddings
  • Optional affine transformations for final embeddings

2. Row-wise Interaction (RowInteraction)

Captures interactions between features within each row using standard transformer blocks:

  • Applies rotary position embeddings (RoPE) to encode feature positions
  • Uses learnable CLS tokens to aggregate feature information
  • Outputs: concatenated CLS token embeddings as row representations

3. Dataset-wise In-Context Learning (ICLearning)

Learns from labeled training examples in-context to predict on test examples:

  • Input sequence: [train_row_1, ..., train_row_n, test_row_1, ..., test_row_m]
  • Training rows are augmented with label embeddings
  • Uses transformer architecture with scalable softmax
  • For classification: outputs logits/probabilities for each class
  • For regression: outputs quantile predictions

Key Insight: The model is trained on many different tabular datasets. At inference time, it uses the labeled examples you provide (the “context”) to understand the current task and make predictions on unlabeled test rows—no fine-tuning required.

Performance Tips

  1. Use caching when making multiple predictions with the same training context
  2. Enable auto_batch for variable-length sequences in the same batch
  3. Configure offloading when GPU memory is limited
  4. Use gradient checkpointing (recompute = TRUE) during training to save memory
  5. Adjust temperature (softmax_temperature) to calibrate prediction confidence

References

Citation

If you use TabICL2 in your research, please cite:

@software{tabicl2,
  title = {TabICL2: Transformer-based In-Context Learning for Tabular Data},
  author = {Christophe Regouby},
  year = {2026},
  url = {https://github.com/cregouby/TabICL2}
}

License

See LICENSE file for details.