Pascal VOC Segmentation Dataset

pascal_segmentation_dataset(
  root = tempdir(),
  year = "2012",
  split = "train",
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

pascal_detection_dataset(
  root = tempdir(),
  year = "2012",
  split = "train",
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

Arguments

root

Character. Root directory where the dataset will be stored under root/pascal_voc_<year>.

year

Character. VOC dataset version to use. One of "2007", "2008", "2009", "2010", "2011", or "2012". Default is "2012".

split

Character. One of "train", "val", "trainval", or "test". Determines the dataset split. Default is "train".

transform

Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping).

target_transform

Optional. A function that transforms the label.

download

Logical. If TRUE, downloads the dataset to root/. If the dataset is already present, download is skipped.

Value

A torch dataset of class pascal_segmentation_dataset.

The returned list inherits class image_with_segmentation_mask, which allows generic visualization utilities to be applied.

Each element is a named list with the following structure:

  • x: a H x W x 3 array representing the RGB image.

  • y: A named list containing:

    • masks: A torch_tensor of dtype bool and shape (21, H, W), representing a multi-channel segmentation mask. Each of the 21 channels corresponds to a Pascal VOC classes

    • labels: An integer vector indicating the indices of the classes present in the mask.

A torch dataset of class pascal_detection_dataset.

The returned list inherits class image_with_bounding_box, which allows generic visualization utilities to be applied.

Each element is a named list:

  • x: a H x W x 3 array representing the RGB image.

  • y: a list with:

    • labels: a character vector with object class names.

    • boxes: a tensor of shape (N, 4) with bounding box coordinates in (xmin, ymin, xmax, ymax) format.

Details

The Pascal Visual Object Classes (VOC) dataset is a widely used benchmark for object detection and semantic segmentation tasks in computer vision.

This dataset provides RGB images along with per-pixel class segmentation masks for 20 object categories, plus a background class. Each pixel in the mask is labeled with a class index corresponding to one of the predefined semantic categories.

The VOC dataset was released in yearly editions (2007 to 2012), with slight variations in data splits and annotation formats. Notably, only the 2007 edition includes a separate test split; all other years (2008–2012) provide only the train, val, and trainval splits.

The dataset defines 21 semantic classes: "background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "dining table", "dog", "horse", "motorbike", "person", "potted plant", "sheep", "sofa", "train", and "tv/monitor". They are available through the classes variable of the dataset object.

This dataset is frequently used for training and evaluating semantic segmentation models, and supports tasks requiring dense, per-pixel annotations.

Examples

if (FALSE) { # \dontrun{
# Load Pascal VOC segmentation dataset (2007 train split)
pascal_seg <- pascal_segmentation_dataset(
 transform = transform_to_tensor,
 download = TRUE,
 year = "2007"
)

# Access the first image and its mask
first_item <- pascal_seg[1]
first_item$x  # Image
first_item$y$masks  # Segmentation mask
first_item$y$labels  # Unique class labels in the mask
pascal_seg$classes[first_item$y$labels]  # Class names

# Visualise the first image and its mask
masked_img <- draw_segmentation_masks(first_item)
tensor_image_browse(masked_img)

# Load Pascal VOC detection dataset (2007 train split)
pascal_det <- pascal_detection_dataset(
 transform = transform_to_tensor,
 download = TRUE,
 year = "2007"
)

# Access the first image and its bounding boxes
first_item <- pascal_det[1]
first_item$x  # Image
first_item$y$labels  # Object labels
first_item$y$boxes  # Bounding box tensor

# Visualise the first image with bounding boxes
boxed_img <- draw_bounding_boxes(first_item)
tensor_image_browse(boxed_img)
} # }