LFW Datasets

Labelled Faces in the Wild (LFW) Datasets

lfw_people_dataset(
  root = tempdir(),
  transform = NULL,
  split = "original",
  target_transform = NULL,
  download = FALSE
)

lfw_pairs_dataset(
  root = tempdir(),
  train = TRUE,
  transform = NULL,
  split = "original",
  target_transform = NULL,
  download = FALSE
)

Arguments

root: Root directory for dataset storage. The dataset will be stored under root/lfw_people or root/lfw_pairs.
transform: Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping).
split: Which version of the dataset to use. One of "original" or "funneled". Defaults to "original".
target_transform: Optional. A function that transforms the label.
download: Logical. If TRUE, downloads the dataset to root/. If the dataset is already present, download is skipped.
train: For lfw_pairs_dataset, whether to load the training (pairsDevTrain.txt) or test (pairsDevTest.txt) split.

Value

A torch dataset object lfw_people_dataset or lfw_pairs_dataset. Each element is a named list with:

x:
- For lfw_people_dataset: a H x W x 3 numeric array representing a single RGB image.
- For lfw_pairs_dataset: a list of two H x W x 3 numeric arrays representing a pair of RGB images.
y:
- For lfw_people_dataset: an integer index from 1 to the number of identities in the dataset.
- For lfw_pairs_dataset: 1 if the pair shows the same person, 2 if different people.

Details

The LFW dataset collection provides facial images for evaluating face recognition systems. It includes two variants:

lfw_people_dataset: A multi-class classification dataset where each image is labelled by person identity.
lfw_pairs_dataset: A face verification dataset containing image pairs with binary labels (same or different person).

This R implementation of the LFW dataset is based on the fetch_lfw_people() and fetch_lfw_pairs() functions from the scikit-learn library, but deviates in a few key aspects due to dataset availability and R API conventions:

The color and resize arguments from Python are not directly exposed. Instead, all images are RGB with a fixed size of 250x250.
The split argument in Python (e.g., train, test, 10fold) is simplified to a train boolean flag in R. The 10fold split is not supported, as the original protocol files are unavailable or incompatible with clean separation of image-label pairs.
The split parameter in R controls which version of the dataset to use: "original" (unaligned) or "funneled" (aligned using funneling). The funneled version contains geometrically normalized face images, offering better alignment and typically improved performance for face recognition models.
The dataset is downloaded from Figshare, which hosts the same files referenced in scikit-learn's dataset utilities.
lfw_people_dataset: 13,233 images across multiple identities (using either "original" or "funneled" splits)
lfw_pairs_dataset:
- Training split (train = TRUE): 2,200 image pairs
- Test split (train = FALSE): 1,000 image pairs

Examples

if (FALSE) { # \dontrun{
# Load data for LFW People Dataset
lfw <- lfw_people_dataset(download = TRUE)
first_item <- lfw[1]
first_item$x  # RGB image
first_item$y  # Label index
lfw$classes[first_item$y]  # person's name (e.g., "Aaron_Eckhart")

# Load training data for LFW Pairs Dataset
lfw <- lfw_pairs_dataset(download = TRUE, train = TRUE)
first_item <- lfw[1]
first_item$x  # List of 2 RGB Images
first_item$x[[1]]  # RGB Image
first_item$x[[2]]  # RGB Image
first_item$y  # Label index
lfw$classes[first_item$y]  # Class Name (e.g., "Same" or "Different")

# Load test data for LFW Pairs Dataset
lfw <- lfw_pairs_dataset(download = TRUE, train = FALSE)
first_item <- lfw[1]
first_item$x  # List of 2 RGB Images
first_item$x[[1]]  # RGB Image
first_item$x[[2]]  # RGB Image
first_item$y  # Label index
lfw$classes[first_item$y]  # Class Name (e.g., "Same" or "Different")
} # }

Arguments

Value

Details

See also

Examples