COCO Detection Dataset — coco_dataset • torchvision

Loads the MS COCO dataset for object detection and segmentation.

Loads the MS COCO dataset for image captioning.

coco_detection_dataset(
  root,
  train = TRUE,
  year = c("2017", "2016", "2014"),
  download = FALSE,
  transforms = NULL,
  target_transform = NULL
)

coco_caption_dataset(root, train = TRUE, year = c("2014"), download = FALSE)

Arguments

root: Root directory where the dataset is stored or will be downloaded to.
train: Logical. If TRUE, loads the training split; otherwise, loads the validation split.
year: Character. Dataset version year. One of "2014", "2016", or "2017".
download: Logical. If TRUE, downloads the dataset if it's not already present in the root directory.
transforms: Optional transform function applied to the image.
target_transform: Optional transform function applied to the target (labels, boxes, etc.).

Value

A torch dataset. Each example is a list with two elements:

x

A 3D torch_tensor of shape (C, H, W) representing the image.

y

A list containing:

boxes: A 2D torch_tensor of shape (N, 4) containing bounding boxes in the format c(\(x_{min}\), \(y_{min}\), \(x_{max}\), \(y_{max}\)).
labels: A 1D torch_tensor of type integer, representing the class label for each object.
area: A 1D torch_tensor of type float, indicating the area of each object.
iscrowd: A 1D torch_tensor of type boolean, where TRUE indicates the object is part of a crowd.
segmentation: A list of segmentation polygons for each object.

Details

The returned image is in CHW format (channels, height, width), matching the torch convention. The dataset y offers object detection annotations such as bounding boxes, labels, areas, crowd indicators, and segmentation masks from the official COCO annotations.

Examples

if (FALSE) { # \dontrun{
ds <- coco_detection_dataset(
  root = "~/data",
  train = FALSE,
  year = "2017",
  download = TRUE
)

example <- ds[1]

# Get back target label IDs of the example from COCO category names
label_ids <- as.integer(torch::as_array(example$target$labels))
label_names <- ds$category_names[as.character(label_ids)]

output <- draw_bounding_boxes(
  image = example$image,
  boxes = example$target$boxes,
  labels = label_names
)

tensor_image_browse(output)
} # }

if (FALSE) { # \dontrun{
ds <- coco_dataset(
  root = "~/data",
  train = FALSE,
  download = TRUE
)
example <- ds[1]

# Access image and caption
x <- example$x
y <- example$y

# Prepare image for plotting
image_array <- as.numeric(x)
dim(image_array) <- dim(x)

plot(as.raster(image_array))
title(main = y, col.main = "black")
} # }