@aesopjah - lemmy.net.au

aesopjah@sh.itjust.works

0 Posts
1 Comment

Joined 9 months ago

Cake day: June 27th, 2025

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

aesopjah@sh.itjust.workstoTechnology@lemmy.world•Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.
link
fedilink
English
arrow-up
9
arrow-down
1·
19 hours ago
it’s also an odd metric since only 20-60% of the humans completed it. Very 60% of the time they complete it everytime energy.

Ideally they’d run the bots multiple times through (with no context or training of previous run), but I guess that is cost prohibitive?

link
fedilink