Absolutely needed: to get high efficiency for this beast … as it gets better, we’ll become too dependent.
“all of this growth is for a new technology that’s still finding its footing, and in many applications—education, medical advice, legal analysis—might be the wrong tool for the job,”
Lul yes but no, but they are clearly better at many types of tasks.
For example? Citations?
Pretty sure these “tasks” are meaningless metrics made up by pseudo-scientific grifters.
AlphaFold 3 which can help in the prediction of some proteins. Although it has some limitations, it cannot be used in all cases, only in what it can perform without any problem.
Small bits of code, language related tasks, basic context understanding, not metrics I have literally measured simply noticed has improved compared to non reasoning models in my homelab testing. 🤷♂️