Performance reviews are one of the most important responsibilities of an engineering manager — and one of the hardest. The challenge isn't effort or intent. It's that the information needed to write fair reviews is spread across months of work, conversations, and tools. This page explains why reviews feel so difficult, what managers commonly try, and what's often missing.
Why performance reviews feel harder than they should
Most engineering managers genuinely want to write fair, accurate reviews. They care about their reports and take the responsibility seriously. And yet, when review season arrives, the process still feels broken.
The problem isn't a lack of diligence. It's that the human brain wasn't designed to retain six months of nuanced observations about multiple people working on different projects. Without a system for preserving context, managers are forced to reconstruct history from fragments: a Slack thread here, a vague memory of a standup comment there, maybe a few bullet points from a 1:1 doc that hasn't been updated since September.
What gets remembered tends to be whatever happened recently, whatever was dramatic, or whoever advocated loudest for their own work. The steady contributor who quietly shipped critical infrastructure in Q1? Easy to underweight. The engineer who had a rough week right before reviews? Easy to overweight.
There's also a structural mismatch. Performance reviews ask managers to evaluate growth, impact, and behavior over an extended period — but the information needed to do that is scattered across dozens of tools, conversations, and contexts that weren't designed for retrieval. Pull requests don't capture mentorship. Jira tickets don't reflect how someone handled a production incident at 2am. Slack messages disappear into the void.
The result is that even well-intentioned managers end up writing reviews that are more impressionistic than evidential. They know something is missing, but there's no practical way to recover it.
What managers usually try (and why it doesn't work)
When managers realize their memory isn't enough, they reach for workarounds. These approaches are reasonable — they're what anyone would try given the constraints. But each comes with tradeoffs that make reviews less fair, not more.
Spreadsheets and running docs. The most common approach is some version of "I'll just write things down as they happen." A shared doc, a personal spreadsheet, a note in the 1:1 file. In theory, this works. In practice, it requires consistent effort over months, often during the busiest periods when there's no time for documentation. The spreadsheet gets updated enthusiastically for the first few weeks, then sporadically, then not at all until two days before reviews are due. What you end up with is a record of whatever you happened to notice during the brief windows when you remembered to write things down — which is its own form of bias.
Last-minute PR scraping. When the spreadsheet fails, the backup plan is usually to pull a list of merged PRs and use that as a proxy for contribution. This is fast and feels objective — there's a number attached. But PR counts reward a specific type of work: small, frequent, code-centric changes. They penalize engineers who spend weeks on a single complex refactor, or who primarily contribute through design docs, code review, mentoring, or incident response. The engineer who merged 47 PRs looks more productive than the one who shipped one critical system that took three months of careful work. Neither number tells you much about actual impact.
Relying on memory anyway. Even with docs and data, most reviews still come down to what the manager can recall. And memory is shaped by factors that have nothing to do with performance. You remember the engineer who pushed back in a meeting last week more vividly than the one who quietly unblocked a teammate three months ago. You remember the incident that woke you up at 3am, but not the dozens of potential incidents that someone prevented through careful work you never saw. Memory is a highlight reel, not a documentary — and the highlights are chosen by cognitive biases, not by relevance.
Recency bias. This deserves its own mention because it's so pervasive. The last few weeks before reviews carry disproportionate weight simply because they're easier to recall. An engineer who struggled in Q1 but finished strong looks like they're "on a good trajectory." An engineer who had a great first half but hit a rough patch in November looks like they're "slipping." Six months of work gets compressed into whatever happened most recently, which isn't fair to anyone.
Over-weighting visible work. Some contributions are naturally visible: launching features, presenting in team meetings, responding quickly in Slack. Others are almost invisible: improving test coverage, mentoring a struggling teammate, doing the unglamorous maintenance work that keeps systems running. Without deliberate effort, reviews tend to reward visibility over value. The engineers who self-promote get credit; the ones who assume their work speaks for itself often don't.
None of these approaches are wrong, exactly. They're just incomplete. Managers use them because they're the best options available without better infrastructure for preserving context over time. The problem isn't that managers aren't trying hard enough — it's that the information they need to write fair reviews decays faster than any manual process can capture it.
Why output alone can be misleading
Most connected AI tools are excellent at analyzing artifacts. They can read pull requests, summarize tickets, map timelines, and count activity with a level of speed no manager could match.
But performance is not the same as output.
The parts of an engineer's work that matter most often live outside systems of record:
- 1:1 conversations where priorities shifted
- Periods of intentional learning
- Mentoring others without visible deliverables
- Navigating ambiguous problems
- Collaboration that prevented future issues
Looking only at artifacts can make two very different situations appear identical:
- An engineer moving slowly because they are stuck
- An engineer moving slowly because they are mastering a new domain
- An engineer with low output while unblocking the rest of the team
- An engineer investing in quality that prevents incidents later
From a distance, the signals look the same. The context tells a different story.
Good reviews depend on understanding trajectory, not just totals. They require knowing how someone responded to feedback, how their decision-making evolved, and how their influence changed over time. Those elements rarely show up in PR counts or ticket histories.
This is why many managers feel uneasy relying solely on dashboards. The data is real, but incomplete. Without the narrative layer that lives in notes, check-ins, and ongoing conversations, even accurate analysis can lead to unfair conclusions.
Some teams use tools like Vereda as a data layer to preserve the why alongside patterns and context, so review season isn't a reconstruction exercise.
What good review inputs actually look like
How to prepare for review season before it starts
The managers who feel least stressed during review season are rarely the ones who start preparing two weeks out. They're the ones who built lightweight habits earlier in the year. The goal isn't elaborate documentation—it's having enough context that you're not reconstructing from scratch when deadlines hit.
Start with your 1:1s. After each conversation, spend two or three minutes noting the key themes: what the engineer accomplished since your last meeting, what challenges came up, any feedback you gave or received. You don't need transcripts. A few bullet points are enough. The discipline isn't in the format—it's in doing it consistently. Those notes compound over time into a record that covers the entire review period, not just the parts you happen to remember.
Capture context, not just outcomes. When something notable happens—a project ships, an incident gets resolved, a difficult conversation goes well—write a sentence about why it mattered, not just what happened. "Led migration to new auth system" tells you less than "Led migration under tight deadline; coordinated across three teams; caught edge case that would have caused outage." The context is what makes evidence useful later. Without it, you're left with a list of accomplishments that all look roughly the same.
Track behaviors alongside results. Reviews shouldn't just assess what someone shipped—they should also reflect how they worked. Did they mentor others? Did they handle ambiguity well? Did they communicate proactively when things went sideways? These behaviors matter for growth conversations and for calibration, but they're easy to forget if you're only tracking deliverables. When you notice someone demonstrating (or struggling with) a behavior that matters for their level, note it. A few observations per month is enough to establish patterns.
Don't let recent events dominate. Recency bias is the most common distortion in reviews. Whatever happened in the last few weeks is vivid; whatever happened in March is vague. The structural fix is to ensure you have records from the beginning of the review period that are as accessible as recent ones. Before you write, go back to your notes from the first quarter. Look at projects that wrapped months ago. If you can't remember what someone did early in the cycle and have no records, that's a gap in your process—don't fill it with assumptions.
Keep the system sustainable. Any note-taking habit that requires significant effort will collapse under a busy quarter. The best systems are ones you'll actually use. Five minutes after a 1:1 is sustainable. Thirty-minute weekly documentation sessions are not. A running doc with loose structure beats a detailed spreadsheet you abandon in October. The goal is minimum viable context—enough signal to write fair reviews, captured in a way that doesn't become a burden.
Preparation isn't about creating more work. It's about distributing the work across the year so that review season becomes synthesis rather than archaeology. The managers who do this well aren't working harder—they're working at a more sustainable pace, with better information, and less anxiety when deadlines arrive.