Loads the MS COCO dataset for object detection and segmentation.

Loads the MS COCO dataset for image captioning.

coco_detection_dataset(
  root,
  train = TRUE,
  year = c("2017", "2016", "2014"),
  download = FALSE,
  transforms = NULL,
  target_transform = NULL
)

coco_caption_dataset(root, train = TRUE, year = c("2014"), download = FALSE)

Arguments

root

Root directory where the dataset is stored or will be downloaded to.

train

Logical. If TRUE, loads the training split; otherwise, loads the validation split.

year

Character. Dataset version year. One of "2014", "2016", or "2017".

download

Logical. If TRUE, downloads the dataset if it's not already present in the root directory.

transforms

Optional transform function applied to the image.

target_transform

Optional transform function applied to the target (labels, boxes, etc.).

Value

A torch dataset. Each example is a list with two elements:

x

A 3D torch_tensor of shape (C, H, W) representing the image.

y

A list containing:

boxes

A 2D torch_tensor of shape (N, 4) containing bounding boxes in the format c(\(x_{min}\), \(y_{min}\), \(x_{max}\), \(y_{max}\)).

labels

A 1D torch_tensor of type integer, representing the class label for each object.

area

A 1D torch_tensor of type float, indicating the area of each object.

iscrowd

A 1D torch_tensor of type boolean, where TRUE indicates the object is part of a crowd.

segmentation

A list of segmentation polygons for each object.

Details

The returned image is in CHW format (channels, height, width), matching the torch convention. The dataset y offers object detection annotations such as bounding boxes, labels, areas, crowd indicators, and segmentation masks from the official COCO annotations.

Examples

if (FALSE) { # \dontrun{
ds <- coco_detection_dataset(
  root = "~/data",
  train = FALSE,
  year = "2017",
  download = TRUE
)

example <- ds[1]

# Get back target label IDs of the example from COCO category names
label_ids <- as.integer(torch::as_array(example$target$labels))
label_names <- ds$category_names[as.character(label_ids)]

output <- draw_bounding_boxes(
  image = example$image,
  boxes = example$target$boxes,
  labels = label_names
)

tensor_image_browse(output)
} # }

if (FALSE) { # \dontrun{
ds <- coco_dataset(
  root = "~/data",
  train = FALSE,
  download = TRUE
)
example <- ds[1]

# Access image and caption
x <- example$x
y <- example$y

# Prepare image for plotting
image_array <- as.numeric(x)
dim(image_array) <- dim(x)

plot(as.raster(image_array))
title(main = y, col.main = "black")
} # }