University-level Math Reasoning Dataset
This dataset is designed to develop complex reasoning and problem-solving skills in STEM.
Size
13,500+ text-only and 600+ multimodal real-world math problems, each with step-by-step solutions and final answers.
Format
LaTeX and natural language explanations. Multimodal samples include images (graphs, diagrams, etc.).
Quality
Created and validated by domain experts (university math professors, teachers, and vetted professionals), ensuring non-synthetic, high-quality content.
Complexity
University-level problems aligned with US university curricula.
Diversity
Covers 7 core subjects:
(12%) Sequences and Series
(7%) Algebra
(4%) Precalculus Review
(13%) Multivariable Calculus
(37%) Differential Calculus
(27%) Integral Calculus
Fine-tuning experiments
Our fine-tuning experiments demonstrate that this dataset significantly improves LLM performance on complex mathematical reasoning tasks at every skill level, from high school to university and olympiad-level problems.