• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    10 hours ago

    Gemini actually has a really interesting architecture, hence it has fast responses, and it’s easily the best long context model out there.

    And outside of bechmaxxing or pure coding, Gemma is very good for its size. 12B is an incredible multimodal LLm, the only one natively trained for image/text ingestion without a mmproj hacked on at the end.

    …But it sure feels like executive meddling kills it.

    The pattern I see is:

    • Gemini preview is released.

    • It’s genuinely good! It’s smart, it’s straight.

    • Then they “refine” it, it’s gets more and more sycophantic, more deep fried. Long context performance degrades… benchmark scores go up, but anyone who actually uses it can immediately tell it’s gotten worse.

    • Only then, is it released for mass use.

    It’s obvious they took a good model, then enshittified it to make their bosses happy and tech bros in Twitter excited.

    Gemma has the same pattern. Researchers tease the local community, delay it, and then when a new Gemma finally comes out, it turns out to be using some old SWA architecture. And the biggest model is cut. And only a smaller one uses the multimodal training.

    It’s obvious it was neutered to not “threaten” Gemma API or be too “unsafe.”


    Another thing I’ve noticed is that both Gemini and Gemma are awful with their default 1.0 temperature/top-p 0.95. Sampling completely screws them up. But they like low temperature + minp, and Gemma loves constrained sampling.

    But 99% of users don’t know anything about sampling, so that’s going to leave a bad impression.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        8 hours ago

        I use sigma N sampling at 1.0, a slop phrase banlist, and maybe a little rep penalty.

        Beyond that it depends on the usage.

        For scripts or “questioning a document,” it’s as low as can be until it loops. I start with zero temperature. But I don’t really use Gemma for coding, TBH, and it’s not good for longer documents.

        If it’s for a specific language or a very specific script, I sometimes constrain grammar for the language.

        For more “general” writing, like brainstorming or RP or whatever, I start at around 0.7 with minimal DRY sampling and look at the logit percentages in the Mikupad UI. Especially “important” tokens like names or information recall. If the probability of getting correct answers is too low, I turn the temperature down.

        …But honestly, I tend to use big MoEs instead of Gemma for that, too.


        And if none of this makes any sense…

        Yeah. That’s the problem.

        Sampling was supposed to be a temporary stopgap until looping and such was figured out, but the big LLM devs just never addressed it in production. There are all sorts of interesting papers, including one from Google about sampling logits per-layer, but they don’t implement any of them in the API models.

  • Smith6612@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    ·
    20 hours ago

    My favorite meme about Google AI is the one where it tries to justify that the pool of the Titanic is not full of water.

  • canadaduane@lemmy.ca
    link
    fedilink
    English
    arrow-up
    13
    ·
    22 hours ago

    404media: “This post is for paid members only” But we’ll sure as hell put ads on it anyway.

  • kreskin@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    1 day ago

    None of my management cares if AI agents work well, they just want to get them deployed asap. I dread the day they go into use. They will claim I have no engineering talent or something like that. I’m not sure malicious compliance will work this time but its worth a shot.

    On the bright side its never too late to be a meth head salvaging copper from around town, and I know where a bunch of metal is at.

  • [object Object]@lemmy.ca
    link
    fedilink
    English
    arrow-up
    178
    ·
    edit-2
    2 days ago

    This is too real.

    Now I get PRs entirely written by Claude from my VP that include things like full plaintext secret keys, or reimplement an API that exists, just shittier.

    “Claude wrote this in an hour, why is review taking so long”

    Uhh because I can’t figure out the diplomatic way to say this is shit and you need to stop without creating an incident, and I don’t want to spend half my day reviewing crap.

      • whotookkarl@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        25
        ·
        2 days ago

        Or spending hours explaining in excruciating detail all the reasons why it’s shit and what they should have done instead, make sure to throw all the heavy handed certification standards and strict audit requirements and mind numbing bike shedding naming standards back at them.

        • pinball_wizard@lemmy.zip
          link
          fedilink
          English
          arrow-up
          2
          ·
          9 hours ago

          Yes. This is the way.

          I’m the VP’s ally. Practically their beat friends. It’s all these pesky regulations, lawyers, audits and extreme personal liability that is slowing both of us down from doing things the sociopath way…at least until I find a gig with a less sociopathic boss.

    • Pechente@feddit.org
      link
      fedilink
      English
      arrow-up
      69
      ·
      2 days ago

      Yeah also noticing similar bullshit. People send me exact steps on what to do written by ChatGPT that understands exactly nothing about the context and is therefore often wrong or a half truth at best.

      Another client has pushed a single commit to a messy project that added 70k lines and a load of new features. The project is now unmaintainable.

      • Taleya@aussie.zone
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        10 hours ago

        Our dev tried to send me a generated summary and code. My reply was “Yes, i’ve read your llm summary. You’re still missing the fact that the script has hardcoded the same ip into every single client and consequently doesn’t work”

      • webhead@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        23 hours ago

        I am not even a developer but I’ve noticed tickets having a response written clearly by AI that miss several things I already talked to the person over teams about. Like dude read your own fucking comment before you post. The conclusion is wrong and you know that because we talked about it before you had the AI “figure out the problem” in the first place. Fuck. I know reading logs is really time consuming and annoying but the AI isn’t always very good or won’t just say “hey that log isn’t showing that I’m looking for” and instead just hallucinates something.

        I don’t even hate AI, but could we at least use our fucking brains while using the AI? When it spits out code to me for my home projects, I, someone who is not a developer, still look at the code to make sure it’s not say running a loop that will hammer disk looking for 1200 files one at a time instead of pulling a directory listing and searching it or something very similar in the database I’m using. People have gotten so lazy. Maybe they’re tired of their bosses trying to force them and are providing garbage? I don’t know but can we just not? Lol.

        • Taleya@aussie.zone
          link
          fedilink
          English
          arrow-up
          5
          ·
          10 hours ago

          I’ve had multiple people try and use LLMs to troubleshoot. It gives me a great feeling of job security. Those fuckers cannot think and in fact drove a boss to screen punching strokeout running him in a two hour circle over something i fixed with two clicks and cognitive function

          • webhead@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            8 hours ago

            Sometimes it can help troubleshoot but you have to already know what you’re doing so you can filter out really stupid suggestions and get to the “oh yeah I didn’t think about that” kind of stuff. If you’re relying on it completely, you’re gonna have a bad time lol.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        12
        ·
        2 days ago

        Instructions are sloppy, code can be sloppy, but what I find is: when they review code changes they find real stuff. Not all the real stuff, but more real stuff than human reviewers typically find. A code review doesn’t need to be perfect, not even 100% correct, it just needs to show you stuff that you look at and think “damn, good to catch this now instead of in a field problem report a year from now…”

          • MangoCats@feddit.it
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            2
            ·
            12 hours ago

            We only do about 3-10 reviews a week, depending… it’s not there to replace you, it’s there to help.

            Before AI assistance we would do fewer reviews, because the AI is finding things - real things worth fixing - now some reviews (the reviews of our colleagues who haven’t figured out how to use AI to review their pull requests before submitting them effectively) get recycled 2-3 times before they’re adequately cleaned up.

            Documentation and requirements are better aligned with code, unit test coverage is better, and the developers who use AI to review their code before putting in a pull request generally are getting through on the first pass. You still have to read the documentation and requirements, review the code, but now it’s actually approaching accurate and complete much more closely than it used to.

            Our team is small and diverse, some do embedded C, some do GUI oriented .NET, some do backend processing in Rust / Linux - we all know our domains and there is lots of value in the collective wisdom, but it doesn’t translate super easily or efficiently - AI is helping with that.

            If you’ve got 100 pull requests to review every day - quit. Maybe stick around for the paycheck until you find something better, but that’s not a job, that’s a clusterbomb waiting to go off.

            • BradleyUffner@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              12 hours ago

              I was referring to the people running open source projects that are receiving 100s of reviews per day from people just blasting outs PRs.

              • MangoCats@feddit.it
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                10 hours ago

                For this, we need to start using (much more) secure ID tech, so you really know who is submitting, and prioritize those who have made good quality submissions in the past. Sadly, this may negelect “unknown” authors, but such is life.

                Also, we may need to recruit more code authors / wanna be code authors to act as code reviewers more of the time, perhaps following the model we use in our commercial operation where all authors also act as reviewers.

  • Vanth@reddthat.com
    link
    fedilink
    English
    arrow-up
    174
    arrow-down
    1
    ·
    2 days ago

    Best part of the article, hat tip to author Emanuel for how he included the correction request:

    After this story was published Google’s spokesperson reached out and asked us to publish a slightly different version of that statement. The new statement no longer stated that “it’s critical that we maintain humans in the loop.”

    • KeenFlame@feddit.nu
      link
      fedilink
      English
      arrow-up
      2
      ·
      19 hours ago

      “No no that is wrong! We said fuck all kids too, we really meant everyone! not just the adults??”

    • Th4tGuyII@fedia.io
      link
      fedilink
      arrow-up
      65
      ·
      2 days ago

      Its a very damning line to retract, but I don’t think anybody is surprised at this point

    • A_norny_mousse@piefed.zip
      link
      fedilink
      English
      arrow-up
      49
      ·
      edit-2
      2 days ago

      Google: 🙋 “Erm, sorry, your portrayal of our complete lack of ethics is incomplete. Thank you.”

  • uuj8za@piefed.social
    link
    fedilink
    English
    arrow-up
    85
    arrow-down
    1
    ·
    edit-2
    2 days ago

    Google’s CEO says 75% of the company’s code is AI-generated.

    Everyone should take this with a huge grain of salt. Like all other internal company stat reports, it’s bullshit and manufactured.

    Example: my company has recently introduced a gate on CI. All commits must have “Co-Authored-By: X”. Technically, you can set X=None, but most people aren’t doing that because we’re not stupid and we know the commit history can easily be data mined and used to generate stats on who is or isn’t using AI. And we don’t want to get fired.

    Result: 99% of all new commits use “Co-Authored-By: Claude”. Every commit I make now has “Co-Authored-By: Claude”. Am I using AI? FUCK NO. But, now I have to add that stupid line to any work I turn in.

    • mcv@lemmy.zip
      link
      fedilink
      English
      arrow-up
      13
      ·
      2 days ago

      This is insane to me. Having a way to easily distinguish AI generated commits from human created ones makes a lot of sense, but lying that your honest, high quality handcrafted commit is AI slop makes it pointless.

      That people feel they need to do this in order to protect their jobs is fucking insane and self destructive.

    • criss_cross@lemmy.world
      link
      fedilink
      English
      arrow-up
      33
      ·
      2 days ago

      We have a commit skill we’re supposed to use. So for non-trivial work that I don’t want the AI to screw up i do it by hand then use the skill so it can vomit put a commit message and PR.

      I get the shiny “Co-Authored-By: Claude” and burn a ton of tokens to make myself look “AI Fluent”

    • Steve@startrek.website
      link
      fedilink
      English
      arrow-up
      15
      ·
      2 days ago

      Remember that part in The Big Short where the stripper is talking about all the houses she owns? Similar vibes.

    • 0x0@infosec.pub
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 days ago

      Microslop really went to shit after statements just like that. Can’t wait for google to implode too

    • masterspace@lemmy.ca
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      2 days ago

      We’re a small company so I do the opposite and am avoiding any co-authored tag being applied to the code I publish.

      I review and test my code before it’s published to make sure that it works and that it’s the right solution to the problem, and I’m the one responsible for fixing it if it goes wrong late at night in prod.

      That was the case when I was using Intellisense and codegen tools and that’s still the case now.

      That makes me the author.

      Anything else is a lie, a violation of engineering ethics, and is flat out not SOC2, nor regulatorily compliant for anything that matters.

  • Th4tGuyII@fedia.io
    link
    fedilink
    arrow-up
    73
    ·
    2 days ago

    “We encourage our engineers to vigorously test and critique our internal tools; that candid feedback loop, even via our internal meme generator, is vital to how we build technology”

    Google listening to employee feedback:

    ...

    • uuj8za@piefed.social
      link
      fedilink
      English
      arrow-up
      13
      ·
      2 days ago

      Honestly, that would be great if they just tossed it out the window.

      What they’re probably doing is building a list of who they should layoff next based on the feedback.

    • MrKoyun@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 hours ago

      It says on the article that 404 Media recreated similiar images to the memes they saw to protect their sources, so there is a chance that the originals were pure gold.

  • Deebster@infosec.pub
    link
    fedilink
    English
    arrow-up
    7
    ·
    2 days ago

    Kinda weird experience to be reading textual descriptions of memes and having to reconstruct them in my head. They had enough to say to not need to pad out their word count that way.

    • Vanth@reddthat.com
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 days ago

      They’re probably doing that to protect the identity of any Google workers providing them with information. If they posted the actual meme, Google could possibly trace it back to an employee and fire them.

      Some of the memes they do have in the article, they note they are reconstructions and not the actual memes from Googles internal channels.

      I agree it’s long though, they could have just recreated them and skipped the written description.

  • reksas@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    2 days ago

    paid tons of money to fool around while some who would be willing to work dont get hired no matter what

  • ddplf@szmer.info
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    2 days ago

    Also a big, chunky and oily FUCK YOU to all of you who work for or aspire to work for FAANG, MAAMA or whatever fucking letters you call it these days