A TabICL2 tabular foundation model • TabICL2

TabICL2 is an Work in Progress R implementation of TabICLv2: A better, faster, scalable, and open tabular foundation model (Jingang Qu, David Holzmüller, Gaël Varoquaux, Marine Le Morvan).

TabICL2 is a transformer-based foundation model for tabular data using in-context learning. Unlike traditional supervised learning that requires model fine-tuning, TabICL learns patterns from labeled examples in-context—similar to how GPT processes text, but for tabular data—to make predictions without additional training.

Key Features

In-Context Learning: Make predictions on new data by learning from labeled examples in the prompt, without fine-tuning
Universal Architecture: Single model handles both classification and regression tasks
Scalable Classification: Switch to hierarchical classification to automatically handles datasets with thousands of classes (TO-BE)
Efficient Inference: KV-caching and representation caching for fast batch predictions (TO-BE)
Memory Management: Automatic offloading to CPU or disk for large-scale inference (TO-BE)
Distribution-Aware Embeddings: Column embeddings capture statistical properties of features using set transformers (TO-BE)
Flexible Attention: Scalable softmax (SSMax) variants prevent attention saturation on long sequences

Installation

You can install the development version of TabICL2 from GitHub with:

# install.packages("pak")
pak::pak("cregouby/TabICL2")

Quick Start

Classification example

As TabICLv2 use train and test input in a $\Gamma$ shape, we provide formula or recipe input with a {rsample} rsplit object for the data parameter :

library(TabICL2)
suppressPackageStartupMessages(library(recipes))
library(rsample)

data("attrition", package = "modeldata")
attrition_split <- initial_split(attrition)
  
rec <- recipe(Attrition ~ ., data = training(attrition_split)) %>% 
  step_normalize(all_numeric(), -all_outcomes())

fit <- tab_icl2(rec, data = attrition_split)

# Get predicted classes
predicted_classes <- predict(fit, testing(attrition_split))
predicted_classes

Regression Example

library(TabICL2)
library(rsample)

data("ames", package = "modeldata")

ames_split <- initial_split(ames)

fit <- tab_icl2(Sale_Price ~ ., data = ames_split)

# Get predicted classes
predicted_sale_price <- predict(fit, testing(ames_split))
predicted_sale_price

Advanced Features (Future)

Current implementation relies on nanotabiclv2 and will support advanced features incrementally in the future.

KV Caching for Fast Inference

When making multiple predictions on the same training context, use caching to avoid redundant computation:

# Store cache on first forward pass
model$forward_with_cache(
  X = X_train,              # Training data only
  y_train = y_train,
  store_cache = TRUE,
  cache_mode = "kv"         # Cache key-value projections
)

# Reuse cache for subsequent test batches
for (test_batch in test_batches) {
  preds <- model$forward_with_cache(
    X = test_batch,
    use_cache = TRUE,
    num_classes = 5
  )
}

# Clear cache when done
model$clear_cache()

Cache Modes: - "kv": Cache key-value projections (fastest, more memory) - "repr": Cache row representations only (~24x less memory for ICL part)

Memory-Efficient Inference

For large-scale inference, configure automatic offloading:

# Configure inference settings
inf_config <- InferenceConfig(
  auto_batch = TRUE,          # Automatic batching
  batch_size = 32,
  offload = OFFLOAD_AUTO,     # Automatic memory management
  min_memory_gb = 4.0         # Keep 4GB GPU memory available
)

# Use configuration during inference
predictions <- model$inference_forward(
  X, y_train,
  inference_config = inf_config
)

Offload Options: - OFFLOAD_GPU: Keep all on GPU (fastest) - OFFLOAD_CPU: Offload to CPU pinned memory - OFFLOAD_DISK: Offload to memory-mapped files - OFFLOAD_AUTO: Automatically choose based on available memory

Hierarchical Classification

TabICL automatically handles datasets with more classes than max_classes using hierarchical classification:

# Model with max_classes = 10
model <- TabICL(max_classes = 10, num_quantiles = 10, ...)

# Dataset with 100 classes - automatically uses hierarchy
y_train_large <- torch_randint(0L, 100L, c(batch_size, train_size))

model$eval()
predictions <- model(X, y_train_large, return_logits = FALSE)
# predictions: (batch, test_size, 100) probabilities via hierarchical ensembling

Architecture

TabICL processes tabular data through three sequential stages:

1. Column-wise Embedding (`ColEmbedding`)

Creates distribution-aware embeddings for each column using set transformers with induced self-attention:

Maps scalar cells to high-dimensional embeddings
Captures statistical regularities within columns
Uses shared parameters across all features
Supports feature grouping and target-aware embeddings
Optional affine transformations for final embeddings

2. Row-wise Interaction (`RowInteraction`)

Captures interactions between features within each row using standard transformer blocks:

Applies rotary position embeddings (RoPE) to encode feature positions
Uses learnable CLS tokens to aggregate feature information
Outputs: concatenated CLS token embeddings as row representations

3. Dataset-wise In-Context Learning (`ICLearning`)

Learns from labeled training examples in-context to predict on test examples:

Input sequence: [train_row_1, ..., train_row_n, test_row_1, ..., test_row_m]
Training rows are augmented with label embeddings
Uses transformer architecture with scalable softmax
For classification: outputs logits/probabilities for each class
For regression: outputs quantile predictions

Key Insight: The model is trained on many different tabular datasets. At inference time, it uses the labeled examples you provide (the “context”) to understand the current task and make predictions on unlabeled test rows—no fine-tuning required.

Performance Tips

Use caching when making multiple predictions with the same training context
Enable auto_batch for variable-length sequences in the same batch
Configure offloading when GPU memory is limited
Use gradient checkpointing (recompute = TRUE) during training to save memory
Adjust temperature (softmax_temperature) to calibrate prediction confidence

References

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data.
TabICLv2: TabICLv2: A better, faster, scalable, and open tabular foundation model.
Set Transformers: Lee et al., 2019
In-Context Learning: Learning from examples in the prompt without parameter updates

Citation

If you use TabICL2 in your research, please cite:

@software{tabicl2,
  title = {TabICL2: Transformer-based In-Context Learning for Tabular Data},
  author = {Christophe Regouby},
  year = {2026},
  url = {https://github.com/cregouby/TabICL2}
}

License

See LICENSE file for details.

TabICL2 (WiP)

Key Features

Installation

Quick Start

Classification example

Regression Example

Advanced Features (Future)

KV Caching for Fast Inference

Memory-Efficient Inference

Hierarchical Classification

Architecture

1. Column-wise Embedding (`ColEmbedding`)

2. Row-wise Interaction (`RowInteraction`)

3. Dataset-wise In-Context Learning (`ICLearning`)

Performance Tips

References

Citation

License

License

Citation

Developers

Dev status

TabICL2 (WiP)

Key Features

Installation

Quick Start

Classification example

Regression Example

Advanced Features (Future)

KV Caching for Fast Inference

Memory-Efficient Inference

Hierarchical Classification

Architecture

1. Column-wise Embedding (ColEmbedding)

2. Row-wise Interaction (RowInteraction)

3. Dataset-wise In-Context Learning (ICLearning)

Performance Tips

References

Citation

License

License

Citation

Developers

Dev status

1. Column-wise Embedding (`ColEmbedding`)

2. Row-wise Interaction (`RowInteraction`)

3. Dataset-wise In-Context Learning (`ICLearning`)