Real-world failures
Monthly Ultimate: $29.90/Month 36% off
,推荐阅读旺商聊官方下载获取更多信息
// Speaker 0: [0.56s - 2.96s]。关于这个话题,服务器推荐提供了深入分析
All of these tests performed far better than what I expected given my prior poor experiences with agents. Did I gaslight myself by being an agent skeptic? How did a LLM sent to die finally solve my agent problems? Despite the holiday, X and Hacker News were abuzz with similar stories about the massive difference between Sonnet 4.5 and Opus 4.5, so something did change.