Skip to main content

Rack scale is on the rise, but it's not for everyone... yet

1 month 3 weeks ago
Still buying B200s and MI300Xs? Don't feel bad, Nvidia and AMD's NVL72 and Helios rack systems aren't really for the enterprise anyway

Analysis  With all the hype around Nvidia's NVL72, AMD's newly announced Helios, and Intel's upcoming Jaguar Shores rack systems, you'd be forgiven for thinking the days of eight-way HGX servers are numbered.…

Tobias Mann

How Do Olympiad Medalists Judge LLMs in Competitive Programming?

1 month 3 weeks ago
A new benchmark assembled by a team of International Olympiad medalists suggests the hype about large language models beating elite human coders is premature. LiveCodeBench Pro, unveiled in a 584-problem study [PDF] drawn from Codeforces, ICPC and IOI contests, shows the best frontier model clears just 53% of medium-difficulty tasks on its first attempt and none of the hard ones, while grandmaster-level humans routinely solve at least some of those highest-tier problems. The researchers measured models and humans on the same Elo scale used by Codeforces and found that OpenAI's o4-mini-high, when stripped of terminal tools and limited to one try per task, lands at an Elo rating of 2,116 -- hundreds of points below the grandmaster cutoff and roughly the 1.5 percentile among human contestants. A granular tag-by-tag autopsy identified implementation-friendly, knowledge-heavy problems -- segment trees, graph templates, classic dynamic programming -- as the models' comfort zone; observation-driven puzzles such as game-theory endgames and trick-greedy constructs remain stubborn roadblocks. Because the dataset is harvested in real time as contests conclude, the authors argue it minimizes training-data leakage and offers a moving target for future systems. The broader takeaway is that impressive leaderboard jumps often reflect tool use, multiple retries or easier benchmarks rather than genuine algorithmic reasoning, leaving a conspicuous gap between today's models and top human problem-solvers.

Read more of this story at Slashdot.

msmash