用户指出,Claude在普通聊天中(特别是技术搜索)表现较懒散,但通过Claude Code编程智能体,却能精准获取所需论文图表并完成任务。相比之下,GPT 5.5和OpenAI近期模型表现得极为彻底和坚持不懈,而Codex harness(编程工具框架)对模型的改造相对更轻量。核心对比在于不同模型与不同工具框架结合后,在搜索与研究任务上的表现差异。
Given that Claude seems so lazy in chat (especially with technical search topics), it seems pretty telling about how a harness can make a model far more independent and thorough.
GPT 5.5, and many of OpenAI's recent models, seem incredibly thorough -- like they won't give up -- and the codex harness is a much lighter change on the model.
Of course I have a lot of uncertainty here, but it's surprising to me how weak Claude's search is when I try the Claude app again. I only use ChatGPT for research, but Claude Code can do wonderful things like getting exactly the right figures from papers I know and insert them into a slide deck.
Interesting times ahead!