• panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    5
    ·
    19 hours ago

    Batch process turning unstructured free form text data into structured outputs.

    As a crappy example imagine if you wanted to download metadata about your albums but they’re all labelled “Various Artists”. You can use an LLM call to read the album description and fix the track artists for the tracks, now you can properly organize your collection.

    I’m using the same idea, different domain and a complex set of inputs.

    It can be much more cost effective than manually spending days tagging data and writing custom importers.

    You can definitely go lighter than LLMs. You can use gensim to do category matching, you can use sentence transformers and nearest neighbours (this is basically what Semantle does), but LLM performed the best on more complex document input.