Awesome Datasets: A curated list of datasets for Machine learning enthusiasts

For a while, I was thinking to collect all the datasets available over the Internet over one place. Now, its time to start this list, citing the ever growing interest of the community in machine learning. Each element of the list is hyperlinked, and it will take browser to the official website for downloading.

Here is “THE LIST”

  1. MNIST: Collection of Handwritten digits.
  2. notMNIST: Collection of Alphabets in various fonts. (not handwritten)
  3. Text8: A large corpus of text for NLP beginners.
  4. GTSRB: German traffic sign recognition dataset.
  5. GTSDB: German traffic sign detection dataset.
  6. Imagenet: Collection of 1000 classes and more adding up.
  7. Pascal VOC: Very large compilation of images of everyday life.
  8. INRIA: Dataset of people in different position and environment.
  9. Piano midi files: Compilation of Piano MIDI files useful for sequence to sequence modeling of music.
  10. MIDI world: Compilation of various instruments MIDI files useful for sequence to sequence modeling of music.
  11. Movie Posters: Collection of movie posters around the globe annotated with reviews, rating, genre and casts.
  12. KITTI benchmark: Wide range collection of object tracking, odometry, road navigation, maps, stereo datasets. The website hosts dataset of datasets. 🙂
  13. RoMa: Collection of road lane marking for lane detection learning.
  14. LISA collection of on-road datasets: Wide range collection line traffic sign, traffic light, on-road vehicles, etc

List of links which directs towards range of datasets available.

  1. Awesome public datasets: This github repo is a full compilation of datasets from various sectors. This website is lively updates, and till date keeps account of almost all datasets available from various sectors.
  2. Amazon public dataset: Compilation of datasets available on Amazon AWS.

Leave a Reply, I generally respond quickly

%d bloggers like this: