DataSet

Flickr-3.5M

The Flickr-3.5M is a collection of 3.5 million social-tagged images randomly collected from Flickr. The dataset was used in our tag relevance learning work

Data

  • MetaData of Flickr-3.5M, including user tags, titles, descriptions, geo information, etc. You can use the id.farm.server.secret.txt file to obtain image source urls and download original images (see get-images/get_images.py).
  • Ground-truth
    • Social20: A ground-truth set for tag-based social image retrieval.

Statistics

No. of images~3,500,000
No. of unique tags~570,000
No. of unique user-ids~270,000
Proportion of images with faces detected by OpenCV~18%

Further reading

  • Xirong Li, Cees G.M. Snoek, and Marcel Worring, Learning Social Tag Relevance by Neighbor Voting, IEEE Transactions on Multimedia, volume 11, issue 7, page 1310-1322, 2009 [ PDF | BibTex ]
  • Xirong Li, Cees G.M. Snoek, and Marcel Worring, Unsupervised Multi-Feature Tag Relevance Learning for Social Image Retrieval, in Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR), Xi'an, China, July 2010 [ PDF | BibTex | Slides ] (Best Paper Award)