Caltech Football Numbers (CaltechFN)
Description
Digit datasets are widely used as compact, generalizable benchmarks for novel computer vision models. However, modern deep learning architectures have surpassed the human performance benchmark on the current state-of-the-art digit datasets. These datasets largely contain images of digits that are smooth and fully visible, which limits the variability between the digits. On the other hand, the digits on American football jerseys are highly variable due to the propensity of jerseys to be wrinkled, stretched, twisted, and otherwise distorted in live action. Given that American football is a fast-paced sport, the digits on a jersey will likely be distorted in a different way from moment to moment, making it harder for artificial vision systems to differentiate between distinct digits. Furthermore, the digits on American football jerseys will often be partially occluded in a live-action capture due to the presence of other players and props. While the human brain is able to infer the identity of partially occluded digits by "filling in" visual gaps, artificial vision systems struggle to do this. To catalyze the improvement of computer vision models in these areas, we introduce CaltechFN, an image dataset of American football numbers that will serve as a new state-of-the-art benchmark for classification, detection, and localization tasks.
Files
Series information
The dataset is presented through four tar.gz files. We describe each: train.tar.gz: This file contains a folder, 'images', and a .json file, 'train.json'. 'images' contains 49,383 full-sized images of various dimensions. 'train.json' contains their bounding box digit annotations in COCO format. test.tar.gz: This file also contains a folder, 'images', and a .json file, 'test.json'. 'images' contains 12,345 full-sized images of various dimensions. 'test.json' contains their bounding box digit annotations in COCO format. train_cropped.tar.gz: This file contains a folder, 'images', and a .mat file, 'train.mat'. 'images' contains 211,661 images of various dimensions, but much smaller than the images in the previous two formats. 'train.mat' contains two arrays: 'names', which is a list of the image names, and 'y', which contains the ground-truth labels of the images, matching 'names'. test_cropped.tar.gz: This file contains a folder, 'images', and a .mat file, 'test.mat'. 'images' contains 52,911 images of various dimensions, but much smaller than the images in the previous two formats. 'test.mat' contains two arrays: 'names', which is a list of the image names, and 'y', which contains the ground-truth labels of the images, matching 'names'.
Other
We have benchmarked our dataset with the methods provided in this repository: https://github.com/snigdhasaha7/caltechfn_eval
Additional details
- CALTECHDATA_ID
- 20174
- Caltech (Housner Fund)