Solutions

Datasets

Research

Resources

Company

Talk to us

Back to Events

Webinar

Evaluating Mathematical Reasoning in LLMs

Best practices for benchmarking and fine-tuning - discover how mathematical reasoning unlocks new LLM applications. Join experts from Google DeepMind, Toloka, Gradarius, and Stevens Institute of Technology as they share their findings and explore in-depth approaches to evaluating and enhancing LLM proficiency in mathematical reasoning at the university level. We’ll also share new benchmarking results and show how DeepSeek and o1 stack up against Gemini, GPT-4, Qwen2-Math, and more.

Recording

Feb 26, 2025

17:00 CET

Back to Events

Webinar

Evaluating Mathematical Reasoning in LLMs

Recording

Feb 26, 2025

17:00 CET

Back to Events

Webinar

Evaluating Mathematical Reasoning in LLMs

Recording

Feb 26, 2025

17:00 CET

Back to Events

Webinar

Evaluating Mathematical Reasoning in LLMs

Recording

Feb 26, 2025

17:00 CET

Evaluating Mathematical Reasoning in LLMs

Where:

Online

Date:

Feb 26, 2025

17:00 CET

What you'll learn about:

Mathematical reasoning for LLMs: Key use cases and areas where math reasoning fundamentally enhances language model capabilities.

Auto-formalization in math: AlphaProof, Lean, and the applicability of automated verifiers for LLM training and evaluation.
Expert-curated data collection: Best practices for sourcing high-quality datasets to assess and enhance model performance, including for university-level mathematical reasoning.
Designing effective benchmarks: Developing robust evaluation metrics, assessing model performance across different domains, and extracting actionable insights.
Boosting LLM performance with new data: Optimally leveraging a custom university-level training dataset to enhance the mathematical proficiency of top-performing LLMs.
Practical applications: Applying the results in academic, educational, and product contexts.

View presentation