Absolutely needed: to get high efficiency for this beast … as it gets better, we’ll become too dependent.

“all of this growth is for a new technology that’s still finding its footing, and in many applications—education, medical advice, legal analysis—might be the wrong tool for the job,”

  • msage@programming.dev
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    3
    ·
    14 hours ago

    Which chatbots are getting smarter?

    I know AI has potential, but specifically LLMs (which most people mean when talking about AI) seem to have hit their technological limits.

        • Terrasque@infosec.pub
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 hour ago

          Yes, which has improved some tasks measurably. ~20% improvement on programming tasks, as a practical example. It has also improved tool use and agentic tasks, allowing the llm to plan ahead and adjust it’s initial approach based on later parts.

          Having the llm talk through the tasks allows it to improve or fix bad decisions taken early based on new realizations on later stages. Sort of like when a human thinks through how to do something.

          • technocrit@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            edit-2
            8 hours ago

            For example? Citations?

            Pretty sure these “tasks” are meaningless metrics made up by pseudo-scientific grifters.

            • IsaamoonKHGDT_6143@lemmy.zip
              link
              fedilink
              English
              arrow-up
              2
              ·
              7 hours ago

              AlphaFold 3 which can help in the prediction of some proteins. Although it has some limitations, it cannot be used in all cases, only in what it can perform without any problem.

            • Jakeroxs@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              2
              ·
              7 hours ago

              Small bits of code, language related tasks, basic context understanding, not metrics I have literally measured simply noticed has improved compared to non reasoning models in my homelab testing. 🤷‍♂️