MobileNetV3 is a state-of-the-art lightweight convolutional neural network architecture designed for mobile and embedded vision applications. This implementation follows the design and optimizations presented in the original paper:MobileNetV3: Searching for MobileNetV3

model_mobilenet_v3_large(
  pretrained = FALSE,
  progress = TRUE,
  num_classes = 1000,
  width_mult = 1
)

model_mobilenet_v3_small(
  pretrained = FALSE,
  progress = TRUE,
  num_classes = 1000,
  width_mult = 1
)

Arguments

pretrained

(bool): If TRUE, returns a model pre-trained on ImageNet.

progress

(bool): If TRUE, displays a progress bar of the download to stderr.

num_classes

number of output classes (default: 1000).

width_mult

width multiplier for model scaling (default: 1.0).

Details

The model includes two variants:

  • model_mobilenet_v3_large()

  • model_mobilenet_v3_small()

Both variants utilize efficient blocks such as inverted residuals, squeeze-and-excitation (SE) modules, and hard-swish activations for improved accuracy and efficiency.

Model Summary and Performance for pretrained weights

| Model                  | Top-1 Acc | Top-5 Acc | Params  | GFLOPS | File Size | Notes                               |
|------------------------|-----------|-----------|---------|--------|-----------|-------------------------------------|
| MobileNetV3 Large      | 74.04%    | 91.34%    | 5.48M   | 0.22   | 21.1 MB   | Trained from scratch, simple recipe |
| MobileNetV3 Small      | 67.67%    | 87.40%    | 2.54M   | 0.06   | 9.8 MB    | Improved recipe over original paper |

Functions

  • model_mobilenet_v3_large(): MobileNetV3 Large model with about 5.5 million parameters.

  • model_mobilenet_v3_small(): MobileNetV3 Small model with about 2.5 million parameters.

Examples

if (FALSE) { # \dontrun{
# 1. Download sample image (dog)
norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants, see
# https://pytorch.org/vision/stable/models.html
norm_std  <- c(0.229, 0.224, 0.225)
img_url <- "https://en.wikipedia.org/wiki/Special:FilePath/Felis_catus-cat_on_snow.jpg"
img <- base_loader(img_url)

# 2. Convert to tensor (RGB only), resize and normalize
input <- img %>%
 transform_to_tensor() %>%
 transform_resize(c(224, 224)) %>%
 transform_normalize(norm_mean, norm_std)
batch <- input$unsqueeze(1)

# 3. Load pretrained models
model_small <- model_mobilenet_v3_small(pretrained = TRUE)
model_small$eval()

# 4. Forward pass
output_s <- model_small(batch)

# 5. Top-5 printing helper
topk <- output_s$topk(k = 5, dim = 2)
indices <- as.integer(topk[[2]][1, ])
scores <- as.numeric(topk[[1]][1, ])

# 6. Show Top-5 predictions
glue::glue("{seq_along(indices)}. {imagenet_label(indices)} ({round(scores, 2)}%)")

# 7. Same with large model
model_large <- model_mobilenet_v3_large(pretrained = TRUE)
model_large$eval()
output_l <- model_large(input)
topk <- output_l$topk(k = 5, dim = 2)
indices <- as.integer(topk[[2]][1, ])
scores <- as.numeric(topk[[1]][1, ])
glue::glue("{seq_along(indices)}. {imagenet_label(indices)} ({round(scores, 2)}%)")
} # }