Multimodal Conversations Dataset

This dataset is designed to enhance image understanding, reasoning, and visual analysis in VLMs.

Size

3,500+ dialogues

Format

Each sample consists of an image paired with a 4-turn user-assistant conversation.

Quality

All dialogues were created and validated by trusted writers and editors, ensuring high-quality, natural interactions.

Subject areas covered

  • landscape 12%

  • nature & wildlife 8.8%

  • still life 19%

  • food 9%

  • architecture 7.8%

  • street photography 4.5%

Learn more about how we collected the multimodal data

Data samples

Contact us to purchase the dataset

Solutions

Datasets

Research

Resources

Company