# LLMs are still not consistent judges of qualitative work， and small changes to how that work is pres…

- 来源：Ethan Mollick (@emollick)
- 发布时间：2026-04-22 03:12
- AIHOT 链接：https://aihot.virxact.com/items/cmo90zjni0092sl2g7rm61y8m
- 原文链接：https://x.com/emollick/status/2046668472449405152

## AI 摘要

LLM 在评判定性工作时仍缺乏一致性，作品呈现方式的细微变化会影响结果。

更好的运用和方法（多次评判并随机排序等）肯定有所帮助，但锯齿状前沿（jagged frontier）仍然真实存在。

## 正文

LLMs are still not consistent judges of qualitative work， and small changes to how that work is presented affect outcomes.

Better harnessing and methods （multiple judging runs with randomized orders， etc） would certainly help， but the jagged frontier is very much still real.

### 引用推文

> Lech Mazur：Does an LLM keep the same judgment when you swap the answer order? New LLM Position Bias Benchmark! Judge models compare two lightly edited versions of the same...