Crosby releases Redline Bench to evaluate AI models for legal work
Summary
Crosby, a startup that operates as a law firm, has launched the Redline Bench to address the challenge of evaluating AI models for legal work. Unlike coding, where success is binary, legal tasks are subjective, making it difficult to define 'good' work. To solve this, Crosby's team of engineers and lawyers, including experts from Stripe and Sullivan & Cromwell, created a benchmark based on weighted criteria derived from simulated software deal negotiations. The tool compares AI-generated contract redlines against these lawyer-defined standards. The initial results show ChatGPT 5.5 leading with a 50.5% score, followed by Gemini 3.5 Flash at 45.1% and Claude Opus at 44.4%. Crosby aims to provide a transparent, public yardstick to help lawyers trust AI tools, which is crucial as billions of dollars are invested in the promise of AI lowering legal costs.
(Source:Business Insider)