Multimodal dataset

The dataset contains 4-turn dialogs, each featuring a sequence of question-answer pairs designed to explore the content, context, and relationships within a given image. It is well-suited for training

What’s in this dataset

The dataset contains 4-turn dialogs, each featuring a sequence of question-answer pairs designed to explore the content, context, and relationships within a given image. It is well-suited for training AI models in image recognition, reasoning, and contextual inference, with questions targeting various aspects of the visual scene.

Dataset design

Skills distribution

The dataset focuses on image captioning, details analysis and contextual inference.

Subject areas covered: landscape 12%, nature & wildlife 8.8%, still life 19%, food 9%, architecture 7.8%, street photography 4.5%, etc.

Size

3K+ samples

Sample - image with 4-turns dialogue

Examples

Fill out the form to purchase the dataset

Products

Resources

Impact on AI

Company