Published May 15, 2022 | Version 1.0
Dataset Open

Caltech Football Numbers (CaltechFN)

  • 1. ROR icon California Institute of Technology
  • 2. ROR icon Vanderbilt University

Description

Digit datasets are widely used as compact, generalizable benchmarks for novel computer vision models. However, modern deep learning architectures have surpassed the human performance benchmark on the current state-of-the-art digit datasets. These datasets largely contain images of digits that are smooth and fully visible, which limits the variability between the digits. On the other hand, the digits on American football jerseys are highly variable due to the propensity of jerseys to be wrinkled, stretched, twisted, and otherwise distorted in live action. Given that American football is a fast-paced sport, the digits on a jersey will likely be distorted in a different way from moment to moment, making it harder for artificial vision systems to differentiate between distinct digits. Furthermore, the digits on American football jerseys will often be partially occluded in a live-action capture due to the presence of other players and props. While the human brain is able to infer the identity of partially occluded digits by "filling in" visual gaps, artificial vision systems struggle to do this. To catalyze the improvement of computer vision models in these areas, we introduce CaltechFN, an image dataset of American football numbers that will serve as a new state-of-the-art benchmark for classification, detection, and localization tasks.

Files

Files (5.0 GB)
Name Size
md5:1d54f5b931f11323070c9c84392c2794
891.8 MB Download
md5:9d3c2f23d83c7fd774543f175c4a0572
106.5 MB Download
md5:3dfe7c839f770e9b4f77b6bea6d4600b
3.6 GB Download
md5:1190cbb743a4dba556010dc70a149141
429.5 MB Download

Series information

The dataset is presented through four tar.gz files. We describe each: train.tar.gz: This file contains a folder, 'images', and a .json file, 'train.json'. 'images' contains 49,383 full-sized images of various dimensions. 'train.json' contains their bounding box digit annotations in COCO format. test.tar.gz: This file also contains a folder, 'images', and a .json file, 'test.json'. 'images' contains 12,345 full-sized images of various dimensions. 'test.json' contains their bounding box digit annotations in COCO format. train_cropped.tar.gz: This file contains a folder, 'images', and a .mat file, 'train.mat'. 'images' contains 211,661 images of various dimensions, but much smaller than the images in the previous two formats. 'train.mat' contains two arrays: 'names', which is a list of the image names, and 'y', which contains the ground-truth labels of the images, matching 'names'. test_cropped.tar.gz: This file contains a folder, 'images', and a .mat file, 'test.mat'. 'images' contains 52,911 images of various dimensions, but much smaller than the images in the previous two formats. 'test.mat' contains two arrays: 'names', which is a list of the image names, and 'y', which contains the ground-truth labels of the images, matching 'names'.

Other

We have benchmarked our dataset with the methods provided in this repository: https://github.com/snigdhasaha7/caltechfn_eval

Additional details

Created:
September 9, 2022
Modified:
November 28, 2022