Code Review in the AI Era: What Actually Matters When AI Writes the Code

I'm sharing here my personal experience from recent months working with code produced by humans and AI agents, reviewing manually and using agents like Copilot or Cursor. I don't have conclusions yet but the subject is becoming super interesting. I hope some of this is useful as you figure out your own approach.

What Changed (And What Didn't)

Last week I approved a PR that looked perfect, the code was clean, the tests passed, and changed code related to the problem. But a few days later I've found myself fighting with the same problems the PR was meant to fix, the changes weren't really useful. Maybe AI code was too human and never actually entered the uncanny valley. The reviewer was trapped by a compelling solution and nice code but missed the whole point.

The goal of the code review for me was always about communication, I want to know that changes are aligned with the needs that motivated the pull request, merge request or any other change request. On top of that, I want to be sure the changes are not affecting parts of the system that are not intended to change and other maintainability criteria. There are two scenarios, reviewing my own code and reviewing the code produced by another person. Now we have a third variable doubling the number of cases, reviewing the AI produced code to my own instructions and reviewing AI produced code to instructions from a peer.

I'm changing the way I do reviews to code produced by AI, because now I feel I need to check the approach first, then the logic, and finally the formatting and style (was already handled by linting tools, but now we need to check it again). Reviewing my own work was always flawed by definition, I'm not fully impartial. Now I'm reviewing AI produced code for my own instructions. This is the most fun part of the job to me, when I found a silly solution in code is mostly about a bad prompt.

On the other hand, reviewing changes to a fellow's solutions is trickier, AI agents tend to generate more verbose code and code for tests (AI is not lazy like me). I mostly feel overconfident about AI-produced code. It's unusual to find obvious bugs, which makes subtle bugs harder to spot.

The New Review Checklist: What to Look For

Something totally new is the need to check for changes totally unrelated to the goal. It's not that hard to find out the coding agent modifying other parts of the system for unknown reasons.

A working solution is not the same as a correct solution, we want a solution that can be part of our codebase, ideally the same kind of code I would write myself. This requires that changes follow our intended style, responsibilities assigned to modules, classes, and functions. Code organization and naming should be consistent with the rest of the codebase, some frameworks and libraries are more opinionated and strict than others.

Checking the architectural approach is a must, I want to know if the changes are aligned with the system design. Depending on the coding agent, your documentation, and your codebase, you might need to check more or less things. Recently I'm favouring the use of simple markdown documents inside the repository to describe the system design, business rules, all details about the tooling and procedures to operate the system. I prefer to do this kind of review right after the planning phase. This is a review that we can perform over the documentation and plan for the agent, before writing the first line of executable code. Changing the plan or fixing the documentation is way cheaper than dealing with changes in code in many files. That's why it's important to me to ensure the changes are aligned with it, any change on those documents or decisions against those principles must be justified and not result of hallucinations.

Checking the business logic seems to be the easy part, a good code organization, good naming, and the documentation makes everything simpler to review. One of the topics I'm still trying to figure out is the tests, they are super important to avoid regressions and provide some hard guardrails to coding agents. The problem is reviewing the tests is super boring, a lot of repetitive code, a lot of trivial test cases, it's definitely a job for a bot, not for a human. I'm still searching for good ways to use agents to review, summarize and criticize the test cases.

Security, performance, and edge cases require a good review. It's not because of lack of proficiency of the agent but because of optimization, when requirements are not clear or too specific sometimes the produced code is really great for the known cases but we also want code that anticipates the most likely evolution.

The Knowledge Problem: When Less Coding Means Less Understanding

AI-written code sometimes is more compact or optimized (harder to read), sometimes is longer (takes more time to read). In any case we read less code when it's written by AI. And reading less code means we are gonna learn less about the codebase, the system, and the problem.

I'm a firm believer that working code is (was?) the best documentation. This new era is forcing us to make the liaison between humans and AI stronger, textual documentation seems to be a good way to do that. Yes, text is always ambiguous and prone to misinterpretation. But we can have a tight feedback loop from documentation to generated code, when code is wrong we refine the documentation and try again. The docs become the source of truth, not something to write after the code is working.

Year 2026 is just starting and LLMs are evolving faster than humans, I've found many cases where implementations from some months ago can be way better now. Having the documentation as source of truth allows us to use a newer model and agent to re-implement a part for a minimal cost. Depending on the size and scope of changes, this could be a pure refactor keeping the tests unchanged and ensuring no regressions will arise. These changes require a different kind of review process, focusing on the intent and the business logic.

Maintaining Quality Standards: The Professional's Dilemma

Some months ago, we could confidently say "AI code cannot be considered production ready". Nowadays, we're a lot closer to the opposite. Using a well-crafted AGENTS.md or CLAUDE.md file gives you code that matches your standards most of the time. The devil is in the details, getting a good standard doesn't mean something dangerous is not gonna slip in the codebase.

Reviewing the AI-written code is our last defense to keep the quality. At the same time, because rewriting code with AI is faster and cheaper, we should reconsider some still-considered best practices. For example, DRY (Don't Repeat Yourself) may not be as important since AI can manage repetition much better than us because it will not be bored to rewrite the 500 lines of code again and again. Something we need to consider is cost, repeated code means more cost to tokenize and less context available for other things.

So we can add quality later, like adding a topping to a gelato, right? Probably not, but we can assume some debt and work on it later. As always, depends on the context and the phase of the project. Certainly, we are now changing the dynamics of product development, we can experiment much more and faster. We can iterate on the product and the codebase much more frequently. So in case a new feature is not working as expected, we can remove it without the usual sunk cost guilt. In case something is working well we can rework it to fit our standards.

Practical Strategies: Making Review Work in the AI Era

In my experience, instructions to agents to follow certain rules, standards, and best practices work in a limited way. The post-coding review is the best way to ensure the code is aligned with the expectations.

My takes on this are:

Use AI to review the code, after you created a PR or MR before submitting it for peer review.
Review AI-generated code the same way you review your own.
Use AI to review code sent to you for peer review.
Manually review the code after the AI review, to ensure the AI didn't miss anything.

How to use AI to review the code?

I'm using Cursor and Copilot for code review. I'm sure we'll find better tools in the future, but for now these are the ones I'm using.
Using a different model and/or tool to review the code is a good idea because it will look at the changes from a different perspective.

Conclusion

Code review is more important, not less, in the AI era. The focus shifts from syntax to architecture, intent, and knowledge transfer. The devil is in the details: we need to be more careful with the code we write and the code we review.

This is still evolving, no perfect solution yet. We're still figuring out where AI helps most and where humans remain essential.

Each team is different, each project is different, each codebase is different. We need to experiment and find what works for us.

TL;DR