new_benchmark_50
A new benchmark of olympiad-level problems we gathered to test our own engine. Problems were selected to be compatible with the interpretation capabilities of the original AlphaGeometry. Problems were obtained from the following sources:
IMO exams prior to 2000 and from 2024;
IMO shortlists from 2009 to 2022;
USA Math Olympiad from 1988 to 2023.
All problems are named either as “translated_imo_YEAR_PROBLEM-NUMBER”, for problems from the IMO exams, “translated_imo_YEAR_sl_PROBLEM-IDENTIFIER”, for problems from the IMO shortlist, or “translated_usamo_YEAR_PROBLME-NUMBER”, for problems from the USA Math Olympiad.
We aimed at having 50 problems, but the only criteria on choosing the problems was the possibility of translating them into the original formal language from AlphaGeometry. With that criterium, the lists are ideally exhaustive in each time range for each olympiad, as long as there is no overlap with the imo_ag_30 benchmark. We have sourced 48 problems, with problems from IMO shortlists G4 from 2018 and G7 from 2020 split into two problems each to account for multiple goals, as demanded by the original AlphaGeometry limitations.
Newclid solved 17/50 problems by itself. They are registered in the table below.
Problem Name |
Solved w/ original DDAR? |
Solved w/ Newclid? |
---|---|---|
translated_imo_1983_p2 |
||
translated_imo_1995_p1 |
Yes |
|
translated_imo_2024_p4 |
Yes |
|
translated_imo_2009_sl_g3 |
||
translated_imo_2009_sl_g6 |
||
translated_imo_2010_sl_g1 |
Yes |
|
translated_imo_2010_sl_g1 |
||
translated_imo_2010_sl_g2 |
||
translated_imo_2011_sl_g6 |
Yes |
|
translated_imo_2012_sl_g2 |
Yes |
|
translated_imo_2012_sl_g3 |
||
translated_imo_2012_sl_g4 |
||
translated_imo_2013_sl_g2 |
||
translated_imo_2013_sl_g4 |
Yes |
|
translated_imo_2014_sl_g3 |
||
translated_imo_2015_sl_g1 |
Yes |
|
translated_imo_2015_sl_g3 |
||
translated_imo_2015_sl_g5 |
||
translated_imo_2016_sl_g2 |
||
translated_imo_2016_sl_g4 |
||
translated_imo_2016_sl_g5 |
||
translated_imo_2016_sl_g6 |
||
translated_imo_2017_sl_g3 |
||
translated_imo_2017_sl_g4 |
||
translated_imo_2017_sl_g7 |
Yes |
|
translated_imo_2018_sl_g2 |
||
translated_imo_2018_sl_g4a |
Yes |
|
translated_imo_2018_sl_g4b |
Yes |
|
translated_imo_2018_sl_g5 |
||
translated_imo_2018_sl_g7 |
Yes |
|
translated_imo_2019_sl_g1 |
Yes |
|
translated_imo_2019_sl_g2 |
||
translated_imo_2019_sl_g7 |
||
translated_imo_2020_sl_g7a |
||
translated_imo_2020_sl_g7b |
||
translated_imo_2020_sl_g8 |
||
translated_imo_2021_sl_g1 |
Yes |
|
translated_imo_2021_sl_g4 |
||
translated_imo_2022_sl_g2 |
||
translated_imo_2022_sl_g3 |
||
translated_usamo_1988_p4 |
Yes |
|
translated_usamo_1990_p5 |
Yes |
|
translated_usamo_1997_p2 |
||
translated_usamo_1999_p6 |
||
translated_usamo_2001_p2 |
||
translated_usamo_2005_p3 |
||
translated_usamo_2008_p2 |
||
translated_usamo_2012_p5 |
||
translated_usamo_2013_p1 |
||
translated_usamo_2014_p5 |
Yes |
|
translated_usamo_2023_p1 |
Yes |