A new era for AI maths whizzes

A new era for AI maths whizzes


Alibaba Cloud’s Qwen team has unveiled Qwen2-Math, a series of large language models specifically designed to tackle complex mathematical problems.

These new models – built upon the existing Qwen2 foundation – demonstrate remarkable proficiency in solving arithmetic and mathematical challenges, and outperform former industry leaders.

The Qwen team crafted Qwen2-Math using a vast and diverse Mathematics-specific Corpus. This corpus comprises a rich tapestry of high-quality resources, including web texts, books, code, exam questions, and synthetic data generated by Qwen2 itself.

Rigorous evaluation on both English and Chinese mathematical benchmarks – including GSM8K, Math, MMLU-STEM, CMATH, and GaoKao Math – revealed the exceptional capabilities of Qwen2-Math. Notably, the flagship model, Qwen2-Math-72B-Instruct, surpassed the performance of proprietary models such as GPT-4o and Claude 3.5 in various mathematical tasks.

Binance

“Qwen2-Math-Instruct achieves the best performance among models of the same size, with RM@8 outperforming Maj@8, particularly in the 1.5B and 7B models,” the Qwen team noted.

This superior performance is attributed to the effective implementation of a math-specific reward model during the development process.

Further showcasing its prowess, Qwen2-Math demonstrated impressive results in challenging mathematical competitions like the American Invitational Mathematics Examination (AIME) 2024 and the American Mathematics Contest (AMC) 2023.

To ensure the model’s integrity and prevent contamination, the Qwen team implemented robust decontamination methods during both the pre-training and post-training phases. This rigorous approach involved removing duplicate samples and identifying overlaps with test sets to maintain the model’s accuracy and reliability.

Looking ahead, the Qwen team plans to expand Qwen2-Math’s capabilities beyond English, with bilingual and multilingual models in the pipeline.  This commitment to inclusivity aims to make advanced mathematical problem-solving accessible to a global audience.

“We will continue to enhance our models’ ability to solve complex and challenging mathematical problems,” affirmed the Qwen team.

You can find the Qwen2 models on Hugging Face here.

See also: Paige and Microsoft unveil next-gen AI models for cancer diagnosis

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: ai, alibaba cloud, artificial intelligence, maths, models, qwen, qwen2, qwen2-math



Source link

[wp-stealth-ads rows="2" mobile-rows="3"]

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

#GlobalNewsIt
Coinmama
#GlobalNewsIt
A new era for AI maths whizzes
Binance
Changelly
ChatGPT hits record usage after viral Ghibli feature—Here are four risks to know first
RARE (Retrieval-Augmented Reasoning Modeling): A Scalable AI Framework for Domain-Specific Reasoning in Lightweight Language Models
Return Entertainment launches Rivals Arena smart TV trivia game on Amazon Fire TV in UK
AWISEE.com Analyzes Gmail's AI-Powered Search Update and Its Impact on Influencer Marketing
Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models
Meta's answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!
bitcoin
ethereum
bnb
xrp
cardano
solana
dogecoin
polkadot
shiba-inu
dai
Trump’s Tariffs Stir Emergency Rate Cut Bets as Recession Fears Mount
ChatGPT hits record usage after viral Ghibli feature—Here are four risks to know first
Stablecoin loan repayments flag early signs of Ethereum volatility, report finds
Bitcoin hashrate tops 1 Zetahash in historic first, trackers show
Whales Increase Holdings by 12% Despite Market Downturn
Trump’s Tariffs Stir Emergency Rate Cut Bets as Recession Fears Mount
ChatGPT hits record usage after viral Ghibli feature—Here are four risks to know first
Stablecoin loan repayments flag early signs of Ethereum volatility, report finds
Bitcoin hashrate tops 1 Zetahash in historic first, trackers show
bitcoin
ethereum
tether
xrp
bnb
usd-coin
solana
dogecoin
tron
cardano
bitcoin
ethereum
tether
xrp
bnb
usd-coin
solana
dogecoin
tron
cardano