Products

Resources

Impact on AI

Company

Evaluating Mathematical Reasoning in LLMs

Best practices for benchmarking and fine-tuning - discover how mathematical reasoning unlocks new LLM applications. Join experts from Google DeepMind, Toloka, Gradarius, and Stevens Institute of Technology as they share their findings and explore in-depth approaches to evaluating and enhancing LLM proficiency in mathematical reasoning at the university level. We’ll also share new benchmarking results and show how DeepSeek and o1 stack up against Gemini, GPT-4, Qwen2-Math, and more.

Feb 26, 2025

17:00 CET

Evaluating Mathematical Reasoning in LLMs

Best practices for benchmarking and fine-tuning - discover how mathematical reasoning unlocks new LLM applications. Join experts from Google DeepMind, Toloka, Gradarius, and Stevens Institute of Technology as they share their findings and explore in-depth approaches to evaluating and enhancing LLM proficiency in mathematical reasoning at the university level. We’ll also share new benchmarking results and show how DeepSeek and o1 stack up against Gemini, GPT-4, Qwen2-Math, and more.

Feb 26, 2025

17:00 CET

Evaluating Mathematical Reasoning in LLMs

Best practices for benchmarking and fine-tuning - discover how mathematical reasoning unlocks new LLM applications. Join experts from Google DeepMind, Toloka, Gradarius, and Stevens Institute of Technology as they share their findings and explore in-depth approaches to evaluating and enhancing LLM proficiency in mathematical reasoning at the university level. We’ll also share new benchmarking results and show how DeepSeek and o1 stack up against Gemini, GPT-4, Qwen2-Math, and more.

Feb 26, 2025

17:00 CET

Evaluating Mathematical Reasoning in LLMs

Best practices for benchmarking and fine-tuning - discover how mathematical reasoning unlocks new LLM applications. Join experts from Google DeepMind, Toloka, Gradarius, and Stevens Institute of Technology as they share their findings and explore in-depth approaches to evaluating and enhancing LLM proficiency in mathematical reasoning at the university level. We’ll also share new benchmarking results and show how DeepSeek and o1 stack up against Gemini, GPT-4, Qwen2-Math, and more.

Feb 26, 2025

17:00 CET

Evaluating Mathematical Reasoning in LLMs

Where:

Online

Date:

Feb 26, 2025

17:00 CET

What you'll learn about:

  • Mathematical reasoning for LLMs: Key use cases and areas where math reasoning fundamentally enhances language model capabilities.

  • Auto-formalization in math: AlphaProof, Lean, and the applicability of automated verifiers for LLM training and evaluation.

  • Expert-curated data collection: Best practices for sourcing high-quality datasets to assess and enhance model performance, including for university-level mathematical reasoning.

  • Designing effective benchmarks: Developing robust evaluation metrics, assessing model performance across different domains, and extracting actionable insights. 

  • Boosting LLM performance with new data: Optimally leveraging a custom university-level training dataset to enhance the mathematical proficiency of top-performing LLMs.

  • Practical applications: Applying the results in academic, educational, and product contexts.

Participants

Iuliya Beloshapka

Senior Research Engineer, Google DeepMind

Profile link

Alexei Miasnikov

Research Professor of Mathematics, Stevens Institute of Technology

Profile link

Vlad Stepanov

CEO, Gradarius

Profile link

Konstantin Chernyshev

Machine Learning Researcher

Profile link

Vitaliy Polshkov

Senior Research Engineer

Profile link

Registration

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe