AI coding benchmarks miss long-term code quality degradation from repeated iterative changes.