AI-Generated Code Quality: What the Research Says About Bugs, Security, and Technical Debt
March 18, 2026 · 12 min read · Code Quality, Security, Research
45% of AI-generated code fails security tests according to Veracode's 2025 report testing 100+ LLMs. Code duplication has increased 8x since 2022 per GitClear's analysis of 211 million changed lines. And METR's randomized controlled trial found experienced open-source developers are actually 19% slower when using AI tools. Here is what the research says about AI code quality and what developers should do about it.
Security Vulnerabilities: The 45% Failure Rate
Veracode tested 100+ LLMs across 80 code completion tasks in Java, Python, C#, and JavaScript. 45% of generated code introduced OWASP Top 10 vulnerabilities. Java had the worst failure rate at 72%. Cross-site scripting defenses failed 86% of the time; log injection was insecure 88% of the time. Only SQL injection showed reasonable results with an 80% pass rate.
Code Quality: 1.75x More Logic Errors
Qodo's 2025 State of AI Code Quality report surveyed developers and analyzed PR data. AI-generated pull requests contain 10.83 issues each versus 6.45 in human-written PRs. AI code shows 1.75x more logic errors, 1.64x more maintainability issues, 1.57x more security findings, and 1.42x more performance problems. 65% of developers say AI misses critical context during refactoring and code review.
The Technical Debt Multiplier
GitClear analyzed 211 million changed lines of code (2020-2024) and found code duplication increased 8x with AI adoption. Copy-pasted lines rose from 8.3% to 12.3% of all changes, while refactoring dropped from 25% to under 10%. Code churn — code added then quickly modified or deleted — is projected to hit 7% by 2025, indicating developers are shipping AI code and immediately fixing it.
The METR Productivity Paradox
METR's randomized controlled trial with 16 experienced open-source developers (averaging 5 years and 1,500 commits on their repos) found AI tools made them 19% slower. Developers expected a 24% speedup and still believed AI helped even after measured slowdowns. Less than 44% of AI suggestions were accepted — the review-test-reject cycle consumed more time than manual coding.
Practical Defenses
Never ship AI-generated code without security scanning. Run SAST tools (CodeQL, Semgrep, Snyk) on every PR. Treat AI output like junior developer code — review every line. Focus AI on boilerplate and tests where quality risks are lower. Track defect density from AI-assisted vs manual PRs to measure real impact on your codebase. Monitor your AI tool usage patterns with BurnRate to understand which workflows produce the best results: brew install burnrate-dev/tap/burnrate
Sources: Veracode 2025 GenAI Code Security Report, Qodo 2025 State of AI Code Quality, GitClear AI Code Quality 2025, METR Developer Productivity Study, GitHub Copilot Code Quality Research.