One of the biggest organizational conundrums out there
Discussing performance may very well be my favorite organizational can of worms to open. It’s one of the things that make work organizations exponentially more complex than other forms of organizing, and I’ve been noodling on it quite a bit over the years.
Human performance is a fuzzy concept, especially in a knowledge work context. But the way it’s utilized is a bit more straight forward. The assessment of performance is being used to distribute the collective value that the organization had generated to the members of the organization. More concretely, the assessment drives decisions such as an increase in compensation, a promotion (more responsibilities and more compensation), a termination, or simply maintaining the status quo.
The silver lining here is that understanding that performance is a distributional issue gives us criteria for assessing different ways of managing it. If this is essentially a distributional issue, then what we should care about are procedural justice/fairness and distributive justice/fairness.
If you’re reading this hoping for a big “a-ha” moment and a robust solution, I’m going to disappoint you. I haven’t found a good solution. Yet. And it may very well be that the current way we’re managing performance is “the worst form, except from all the others that have been tried from time to time”, to blatantly misquote Churchill.
What follows below is my messy attempt to dissect this issue. My hope is that by doing so some smaller pieces of the issue will prove out to be non-issues at all, while other pieces may have better point solutions emerging over time. I found it helpful to break down performance management to three sub-issues: What are we evaluating? Who is doing the evaluation? When are they doing so?
What are we evaluating?
Machine performance is fairly straight forward: both outputs and inputs are visible and predictable, and the dimensions of evaluation are easily defined: speed, quality, efficiency.
Human performance, in a professional setting, is more challenging. If we look at performance as a distributive challenge, we can define it as the gap between the value that we generate to the organization and the value we’re taking out of the organization. Some simplistic examples for illustration: if I’m suddenly doing poor work and the value that I’m adding to the organization is lower, while I’m still taking the same salary, we’d say that my performance has declined. Similarly, if I just got promoted and receiving a higher salary, but still making the same contribution to the organization that I made prior to being promoted, we’d say that I’m not meeting the new performance expectations.
A big part of the challenge in fairly managing performance stems from the gap between output and outcome. Which makes both measuring and attributing value difficult.
The value I take out of the organization certainly includes my salary, as well as costs incurred by the organization such as health care premiums or the fractional cost of renting the office. But should probably also include harder-to-measure elements such as the time and energy I take from others. What if I get the job done, but I do it in a way that alienates and demotivates others, perhaps even to a point that causes them to leave the organization?
Unless I’m a salesperson, the value that I generate to the organization is even harder to measure since my monetary contribution is indirect. And the same component of interacting with others, now in the positive case of supporting and helping them succeed is just as difficult to measure as the negative case.
Attributing value is challenging across two dimensions. The first has to do with the collaborative nature of work. If I was a traveling salesman selling shoe polish door-to-door, you could argue that whether I’m successful in selling the shoe polish is mostly up to me (we’ll touch on an important caveat in a moment) and therefore I should get the full credit for every sale I make. But what if I’m selling enterprise software, and the sales process required multiple conversations, including a lot of heavy lifting by the sales engineer; both the product manager, the marketing manager and the CEO had to make guest appearances and address specific concerns in their respective domains; the engineering team had to make a small tweak in the product to make the integration work, and the support team had to commit to a non-standard SLA? How much of the credit for making the sale should I get then?
The second attribution challenge has to do with separating skill from luck or any other element that impacted the outcome and was out of our control. I’ve been obsessing recently over this thought experiment from “Thinking in Bets”:
Take a minute to imagine your best decision last year. Now take a minute to imagine your worst decision.
It’s very likely that your best decision preceded a good outcome and your worst decision preceded a bad outcome. That was definitely the case for me, and a classic case of “outcome bias”. We tend to conflate good outcomes and good decision-making (skill). Which one should we get credit for? Is it fair to get credit/be blamed for things that were out of our control?
A holistic solution may be found by integrating some of the imperfect pieces below, or by taking a completely different approach:
- Goals — successful execution against (SMART) goals is one of the most common approaches to measuring performance. While this approach has many flaws, some can be rectified, for example by using relative goals rather than absolute ones.
- Career ladders — thoughtfully designed career ladders define a connection between a certain level of contribution to the business and a certain level of compensation. Good rubrics span “being good at what you do”, “being a good teammate”, and “having an impact on the business”. In addition to avoiding the short-termism often associated with goals-based performance management, they also partially mitigate some of the side-effects of exclusively focusing on just one of the three categories. However, this “balanced scorecard” approach, amplifies the “who?” challenge in the performance management process, which we’ll cover in the next section.
- Track record — if you beat me in one hand of poker, it’ll be hard to tell if you’re a better player than me, or you just got lucky. But if you beat me in 90 out of 100 hands, it’d be fair to say that skill must’ve had something to do with it. It’s incredibly difficult to measure a track record in a knowledge work setting, but that has not discouraged companies like Bridgewater Associates from trying. Some may even say, quite successfully.
- Learning — in some ways, we can think of learning as the derivative, in the mathematical sense, of the value that we create, and as a leading indicator of the value we will create. Similar to the way that velocity is the derivative of the distance we’re traveling and a leading indicator of the distance that we will travel. A focus on learning helps address some of the value attribution issues, but it does not simplify its measurement by much.
- Avoiding definition — perhaps the purest and most chaotic solution. The case for it is best articulated here. But in a nutshell, since this is a multi-party distribution issue, we may not need a universal definition of performance or fairness. As long as all parties agree that the distributive process and outcome are fair, we are good. Even if they’ve reached those conclusions using different evaluation processes. Deloitte’s “I would always want this person on my team” and “I would give this person the highest possible compensation” sit better with me than the more standard expectations-based rubric (meets, exceeds, etc.).
Who should be evaluating performance?
The standard approach to performance management tasks the manager with evaluating the performance of her team members. A big part of that makes sense since it’s the manager’s job to align the effort of their team with what the organization as a whole is trying to accomplish, so they’re in the best position to evaluate how an individual contribution supports those efforts.
However, research suggests that:
Managers overrating their team is an enduring, scientifically proven fact in companies. It’s most pronounced where performance ratings are used to determine compensation, where it’s difficult to assess an employee’s true competence, and where the manager and employee have a strong relationship. (references here)
The collection of peer feedback as input to the manager’s evaluation and the use of a cross-managers calibration exercise help mitigate some of this effect but not all of it, and significantly increase the level of effort in the overall process.
Having individuals evaluate themselves is certainly an option, but here as well, research suggests that it may not lead us to a process and outcome that would be considered fair:
Self-perceptions correlated with objective performance roughly .29 — a correlation that is hardly useless, but still far from perfection (source)
Our immediate team is the next usual suspect, as the people with the most visibility into our work and contribution, though with a more limited view on the impact we have on the organization at large. Furthermore, if some skills-based elements are integrated into our definition of performance (career ladders), then it doesn’t make much sense to have more junior/inexperienced member of a team assess the skill of more senior/experienced member.
This, among other things, is what led Google down the path of having a standalone committee of more senior members evaluate the performance of more junior members of the organization. However, while they are able to evaluate the skill elements of performance more accurately, being so disconnected from the actual context in which those skills are applied makes it harder to assess the impact components, increasing the risk of “confusing motion for progress”.
For team members that are not in an individual contributor role, another option becomes available: under a servant leadership paradigm, the people the leader is serving should be the ones evaluating their performance. However, the typical way that these roles are designed tends to contain responsibilities beyond the team that you serve which the team may not have visibility into. Power dynamics will also add some distortion to the evaluations. And the same competency question exists — using a more extreme analogy: as patients, are we able to evaluate how good our doctor is? or just their bedside manner?
We can also look at a more generalizable case of the servant leadership model and argue that we all have customers/stakeholders that we serve, whether internally or externally. Therefore, they should be the ones evaluating our performance since, at the end of the day, the value we generate goes to them. We can think of our manager as a customer (in their alignment of efforts capacity), we can think of our teammates as customers (utilizing our advice, feedback, and expertise), and of course, the people who are using our work product, be it another engineering team, the organization at large, or external customers. Having multiple customers/stakeholders provide their evaluation on the pieces of value that matter to them, certainly complicates things, but not in an insurmountable way. Yet the patient/doctor challenge still applies, to an extent, in this setting as well.
When should we be evaluating?
Perhaps less contentious than the first two sections, the timing and synchronization of the evaluation process are not without challenge either.
Performance evaluations are typically done in a synchronized cycle once a year. The synchronicity allows for calibration and a fair allocation of budget, and the annual cadence supports the observation of long-term patterns of performance (smoothes out short-term fluctuations) and keeps the overall process effort in check. The annual cadence, in particular, has drawn a lot of criticism. Some of it justified, like the impact recency bias has on the process, and the obligation to address significant changes in performance, in either direction, quicker than once a year. Some of it, unjustified. You can and should provide and receive developmental feedback more frequently than once a year. And you can and should set, modify and review goals more frequently than once a year. Neither has anything to do with the cadence by which you evaluate performance. If we stick to a synchronous, cadence-based approach, a review every six months seems to be a better anchor point.
However, there is a growing body of alternatives to the synchronous cadence-based approach, that happens more ad-hoc, based on a natural trigger or an emerging need. Deloitte’s project-based work marries itself well to an evaluation that’s triggered at the end of the project (projects usually take 3–9 months). The Bridgewater Associates’ approach mentioned above, essentially conducts micro-evaluation at the end of every meeting/interaction to construct a track record that’s rich enough for analysis. In some self-managed organizations, the process that can lead to a team member’s termination can be triggered at any time by any member of the team. And in others, team members can decide when they want to initiate the process for reviewing and updating their own salary. A key concern here is that unintended bias will be introduced into the process. For example, someone with stronger self-advocacy skills or a more positive perception of their performance will trigger the process more frequently than someone who is just as worthy of a raise but is more humble or not as confident in their skills. Some of those concerns can be mitigated by putting in place some cadence-based guardrails (automatic review X months since the last review) and having a strong coaching and support culture.
So where does this +2,000 words essay leave us? Definitely not with a comprehensive solution. But that wasn’t the intent either.
Understanding that performance is a distributive challenge, balancing value generated and value taken, was illuminating to me. It then allowed me to understand the core drivers behind the difficulties in assessing those elements, and how different solutions either mitigate or amplify them. While it’s even clearer to me now that there may not be a perfect solution, it does seem like some solutions are better than others. At least in my mind.
But perhaps most importantly, it made me realize how much of the core beliefs and assumptions that underlie this foundational organizational practice are often left implicit and uncommunicated. The simple act of making them explicit, coupled with painting the whole landscape of alternative solutions and the trade-offs that they entail, can go a long way in driving a stronger collective sense of fairness in any organization, regardless of the assumptions/beliefs they have, the trade-offs they choose, and the solutions they implement.
And the enhancement of the collective sense of fairness is exactly what we were trying to do, to begin with.