I haven’t found anything yet that isn’t in its training data that I’d want it to evaluate as a control group, but you’re right that it would be a useful exercise.
Here are some examples of the feedback it has given me:
This plot point hasn’t been “earned” and needs more setup to pay off properly
This dialog is an exposition dump. Find a better way to show, not tell.
This character feels like a vehicle for jokes, and isn’t developed enough.
Most of the advice I’ve gotten so far relates straight back to what I’ve read in writing books and is pretty cut and dry. Some things are a matter of opinion, and I push back when I disagree or when I am deliberately breaking a rule.
Edit:
To your other point, you’re correct that a LLM saying something is good doesn’t mean humans will think so, or vice-versa. A LLM is but one tool in the process, and doesn’t replace real human feedback. For example, with a comedy, do human readers laugh out loud when reading it? A LLM can determine statistically whether something is intended to be a joke and whether the joke is overused, etc., but can’t tell you if the joke is actually funny.
I’m running Qwen on my own hardware.
I haven’t found anything yet that isn’t in its training data that I’d want it to evaluate as a control group, but you’re right that it would be a useful exercise.
Here are some examples of the feedback it has given me:
This plot point hasn’t been “earned” and needs more setup to pay off properly
This dialog is an exposition dump. Find a better way to show, not tell.
This character feels like a vehicle for jokes, and isn’t developed enough.
Most of the advice I’ve gotten so far relates straight back to what I’ve read in writing books and is pretty cut and dry. Some things are a matter of opinion, and I push back when I disagree or when I am deliberately breaking a rule.
Edit:
To your other point, you’re correct that a LLM saying something is good doesn’t mean humans will think so, or vice-versa. A LLM is but one tool in the process, and doesn’t replace real human feedback. For example, with a comedy, do human readers laugh out loud when reading it? A LLM can determine statistically whether something is intended to be a joke and whether the joke is overused, etc., but can’t tell you if the joke is actually funny.