To study the influence of precision and recall for word detection in the context of fine-grained classification, we have annotated text regions for the first 10 classes of the dataset (in alphabetical order). All the text (Latin alphabet) visible and recognizable has been annotated. The annotated dataset consists of 9131 images. 5220 of these images contain at least one word box. In total there are 27601 word boxes annotated. We created annotation label files only to those which contain minimum of one word box. Format: x_init, y_init, x_end, y_end