Multimodal Conversations Dataset
This dataset is designed to enhance image understanding, reasoning, and visual analysis in VLMs.
Size
3,500+ dialogues
Format
Each sample consists of an image paired with a 4-turn user-assistant conversation.
Quality
All dialogues were created and validated by trusted writers and editors, ensuring high-quality, natural interactions.

Subject areas covered
landscape 12%
nature & wildlife 8.8%
still life 19%
food 9%
architecture 7.8%
street photography 4.5%
Learn more about how we collected the multimodal data