AI Code Review Agents: My Vision for the Future of Merge Requests

Anton Belev

2025-01-25

ai, code-review, github, gitlab, merge-request

Introduction

Merge Requests (MRs) on platforms like GitHub and GitLab have become the backbone of modern software collaboration. We already rely on tools like ESLint, Prettier, SonarQube, Jacoco, Snyk, and Dependabot to maintain code quality, security, and consistency within CI/CD pipelines.

But with the rise of AI-assisted development—exemplified by tools like GitHub Copilot and GitLab Duo—we’re on the brink of a major transformation. Imagine AI code review agents that go beyond code suggestions to provide feedback tailored to team conventions, engineer preferences, and company strategies. Below is my vision for this future, and I’d love to hear your thoughts!

Moving Beyond Static Analysis

Today’s static analysis tools handle basic checks well:

ESLint/Prettier for style and formatting.
SonarQube for detecting code smells, bugs, and security flaws.
Jacoco for enforcing test coverage.
Snyk for scanning dependency vulnerabilities.
Dependabot for automating dependency updates.

While these tools are effective, they can be rigid. They lack the flexibility to fully adapt to a team’s unique style or a project’s specific quirks. AI code review agents could fill that gap by learning from real-world commits, coding patterns, and project histories.

Running in CI/CD Pipelines

An AI code review agent could be a standard step in your CI/CD pipeline—analyzing code and flagging potential issues right after a developer pushes changes. By catching problems before an MR is even opened, we can cut down on repetitive review feedback and focus on meaningful discussions during peer reviews.

Context Is Key: AI That Understands Your Team and Purpose

To be truly effective, AI code review agents must understand both the code and the environment it’s written in. Beyond syntax and style, these agents must grasp the intent behind each pull request, align with team norms, and adhere to company standards. Here’s how:

Understanding the Purpose of the Pull Request
Pull requests aren’t just lines of code—they’re implementations of specific business outcomes. Most companies connect code changes to tasks or stories in tools like Jira, where each ticket represents a feature request, bug fix, or performance improvement. An effective AI code review agent should bridge this gap, confirming that the pull request’s changes align with the intent specified in the Jira ticket. For instance, if the ticket aims to optimize a query, the AI should check whether the PR includes evidence—like benchmarks—showing actual performance gains.
Learning from Repositories
The AI could analyze your entire codebase to learn common patterns, preferred libraries, and architectural conventions. If your team consistently favors a functional style, the AI might suggest using map or filter over imperative loops. It can also flag areas where past issues recurred, helping developers preempt mistakes.
Absorbing Team Standards
A team’s unique standards often come from retrospectives or kickoff meetings—like adopting a particular logging framework or standardizing error handling. An AI referencing these norms would highlight inconsistencies early, keeping reviews aligned with collective decisions.
Respecting Company Policies
In larger organizations, strict security or compliance guidelines must be followed. An AI agent could automatically flag deviations—say, if a policy mandates encrypting sensitive data and the pull request uses cleartext. This helps prevent compliance issues from slipping through the cracks.

The Challenge of Large Context Windows

According to OpenAI’s tokenizer documentation, one token generally corresponds to about four characters of English text (roughly ¾ of a word). This means 100 tokens equates to around 75 words. Modern AI models like GPT-4o and GPT-4o mini can handle up to 128k tokens in their context windows, as noted in OpenAI’s model documentation. That translates to roughly 96,000 words—or about 192 pages of text, if we assume a page is around 500 words.

However, loading every relevant file, policy, and Jira ticket into the model’s context at once simply won’t be feasible for many large enterprises. AI code review agents will inevitably need to leverage additional techniques to handle huge codebases and related documents—such as Retrieval-Augmented Generation (RAG), where only the most relevant snippets are fetched from an external source, or Chunking and Summaries, where large files are broken into smaller sections for more targeted analysis.

I’m not an expert on which approach works best in all cases, but as AI code review agents evolve, they’ll need to adapt strategies like these to stay within token limits while still providing meaningful, context-aware insights.

Generating MR Summaries: Instant Context

Once an MR is opened, an AI agent could auto-generate a concise summary of the changes:

Feature Overview: “This MR adds a caching layer to reduce database load.”
Key Refactors: “UserService is refactored to separate data access from business logic.”
Potential Risks: “Introduces Redis. Validate compatibility with your deployment environment.”

These summaries help reviewers quickly gauge the scope of changes and focus on the most critical areas.

Tailored Feedback: Supporting Engineers of All Levels

AI code review agents could offer nuanced feedback based on each engineer’s experience:

Early-Career Engineers
- Scenario: Potential null-pointer exceptions or repeated code.
- AI Feedback: Suggests null checks, explains underlying risks, and proposes refactoring into helper methods.
Experienced Engineers
- Scenario: Introducing a caching layer for performance.
- AI Feedback: Advises on caching strategies for distributed systems and flags potential concurrency issues.

By aligning feedback with each developer’s skill level, AI can foster learning for juniors while prompting deeper architectural discussions for seniors.

Early Examples

GitLab Duo is already making its mark on the Merge Request experience. According to GitLab’s documentation, one key capability is generating automated summaries of code changes, giving reviewers a concise overview of what’s new. Another feature is the ability to suggest improvements directly within the Merge Request interface—developers can then accept or reject these AI-driven code suggestions with a single click.

These features, still in Beta, hint at a future where AI-assisted Merge Requests become a standard offering rather than a novelty. GitLab clarifies that it does not store user data from these interactions, underscoring the platform’s focus on privacy and security.

Meanwhile, other companies have begun offering AI code review agents. Tools like Bito AI provide specialized functionality—analyzing pull requests and suggesting context-aware improvements—but my feeling is these third-party solutions may become obsolete if GitLab or GitHub incorporate similar features natively across their platforms. As these large providers continue investing in AI-driven workflows, we may eventually see a future where built-in AI review is simply part of every developer’s daily workflow.

Conclusion

AI code review agents could be the next big leap in the evolution of MRs. Rather than just running static checks, they’d learn your workflows, adapt to each engineer’s needs, and seamlessly enforce both team conventions and company policies.

From pre-MR checks to auto-generated summaries and in-depth feedback, these agents have the potential to reduce repetitive manual tasks and free engineers to focus on higher-level problem-solving. GitLab and GitHub, in particular, are in a unique position to integrate AI seamlessly into the Merge Request process—once they fully roll out these capabilities, it could transform the daily workflow of developers worldwide. As the technology matures, it’s only a matter of time before AI-driven code reviews become a standard feature in every team’s toolkit.