Multimodal dataset
The dataset contains 4-turn dialogs, each featuring a sequence of question-answer pairs designed to explore the content, context, and relationships within a given image. It is well-suited for training
What’s in this dataset
The dataset contains 4-turn dialogs, each featuring a sequence of question-answer pairs designed to explore the content, context, and relationships within a given image. It is well-suited for training AI models in image recognition, reasoning, and contextual inference, with questions targeting various aspects of the visual scene.
Dataset design
Skills distribution
The dataset focuses on image captioning, details analysis and contextual inference.
Subject areas covered: landscape 12%, nature & wildlife 8.8%, still life 19%, food 9%, architecture 7.8%, street photography 4.5%, etc.
Size
3K+ samples
Sample - image with 4-turns dialogue
Examples
Fill out the form to purchase the dataset
Products