new_benchmark_50

A new benchmark of olympiad-level problems we gathered to test our own engine. Problems were selected to be compatible with the interpretation capabilities of the original AlphaGeometry. Problems were obtained from the following sources:

  • IMO exams prior to 2000 and from 2024;

  • IMO shortlists from 2009 to 2022;

  • USA Math Olympiad from 1988 to 2023.

All problems are named either as “translated_imo_YEAR_PROBLEM-NUMBER”, for problems from the IMO exams, “translated_imo_YEAR_sl_PROBLEM-IDENTIFIER”, for problems from the IMO shortlist, or “translated_usamo_YEAR_PROBLME-NUMBER”, for problems from the USA Math Olympiad.

We aimed at having 50 problems, but the only criteria on choosing the problems was the possibility of translating them into the original formal language from AlphaGeometry. With that criterium, the lists are ideally exhaustive in each time range for each olympiad, as long as there is no overlap with the imo_ag_30 benchmark. We have sourced 48 problems, with problems from IMO shortlists G4 from 2018 and G7 from 2020 split into two problems each to account for multiple goals, as demanded by the original AlphaGeometry limitations.

Newclid solved 17/50 problems by itself. They are registered in the table below.

Problem Name

Solved w/ original DDAR?

Solved w/ Newclid?

translated_imo_1983_p2

translated_imo_1995_p1

Yes

translated_imo_2024_p4

Yes

translated_imo_2009_sl_g3

translated_imo_2009_sl_g6

translated_imo_2010_sl_g1

Yes

translated_imo_2010_sl_g1

translated_imo_2010_sl_g2

translated_imo_2011_sl_g6

Yes

translated_imo_2012_sl_g2

Yes

translated_imo_2012_sl_g3

translated_imo_2012_sl_g4

translated_imo_2013_sl_g2

translated_imo_2013_sl_g4

Yes

translated_imo_2014_sl_g3

translated_imo_2015_sl_g1

Yes

translated_imo_2015_sl_g3

translated_imo_2015_sl_g5

translated_imo_2016_sl_g2

translated_imo_2016_sl_g4

translated_imo_2016_sl_g5

translated_imo_2016_sl_g6

translated_imo_2017_sl_g3

translated_imo_2017_sl_g4

translated_imo_2017_sl_g7

Yes

translated_imo_2018_sl_g2

translated_imo_2018_sl_g4a

Yes

translated_imo_2018_sl_g4b

Yes

translated_imo_2018_sl_g5

translated_imo_2018_sl_g7

Yes

translated_imo_2019_sl_g1

Yes

translated_imo_2019_sl_g2

translated_imo_2019_sl_g7

translated_imo_2020_sl_g7a

translated_imo_2020_sl_g7b

translated_imo_2020_sl_g8

translated_imo_2021_sl_g1

Yes

translated_imo_2021_sl_g4

translated_imo_2022_sl_g2

translated_imo_2022_sl_g3

translated_usamo_1988_p4

Yes

translated_usamo_1990_p5

Yes

translated_usamo_1997_p2

translated_usamo_1999_p6

translated_usamo_2001_p2

translated_usamo_2005_p3

translated_usamo_2008_p2

translated_usamo_2012_p5

translated_usamo_2013_p1

translated_usamo_2014_p5

Yes

translated_usamo_2023_p1

Yes