‹ Back to Datasets page

Multimodal Conversations Dataset

This dataset is designed to enhance image understanding, reasoning, and visual analysis in VLMs.

Size

3,500+ dialogues

Format

Each sample consists of an image paired with a 4-turn user-assistant conversation.

Quality

All dialogues were created and validated by trusted writers and editors, ensuring high-quality, natural interactions.

Subject areas covered

landscape 12%
nature & wildlife 8.8%
still life 19%
food 9%
architecture 7.8%
street photography 4.5%

Learn more about how we collected the multimodal data

Data samples

Contact us to purchase the dataset

‹ Back to Datasets page

Solutions

Datasets

Research

Resources

Company

Talk to us