Untangling skill and luck [Mauboussin]

In the months following my most recent post about performance, I’ve been noodling on one key aspect of the challenge: if we take a more outcome (rather than output)-based approach to evaluating performance, how do we separate outcomes caused by luck and outcomes caused by skill? 

I started off reading Annie Duke’s “Thinking in Bets” which have been sitting in my queue and received good feedback from colleagues. I finished the book somewhat disappointed having gained some good insights on pursuing the truth and having a stronger decision-making process, but not a lot that pertained to the assessment of performance and untangling luck and skill. However, Duke did mention another book with a rather promising title by Michael Mauboussin:

The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing

Random side note, the cover art for the two books looks disturbingly similar.

Weary of committing another big chunk of time to a book on this topic, I decided to look for more lightweight mediums and was able to find this Talks at Google video for the more audio-visual inclined and this 25iq blog post for the more textual inclined. Both provide good summaries of the major themes covered in the book. There’s a lot there, including lots of interesting tangents in their own right, but I’ll try to focus on one arc that’s relevant to my own area of inquiry. 

Different domains of performance fall on a spectrum between pure luck and pure skill, but all of them have a combination of some luck and some skill.  

Source: The Success Equation

As a domain evolves, it becomes more dependent on luck than skill. That’s not because skill matters less, but because knowledge dissemination happens more quickly and cheaply, causing skill to be distributed more uniformly. Mauboussin refers to that phenomenon as “The Paradox of Skill”. 

Source: The Success Equation
The difference between Olympic Men’s Marathon 1st and 20th place times shows a similar pattern

The strategy for “how to get better” also varies depending on where the performance domain falls on the luck-skill spectrum. The closer it is to the skill edge of the spectrum, a deliberate practice strategy will yield better outcomes. The closer it is to the luck edge of the spectrum, the emphasis needs to be on a strong decision-making process. The latter helps frame Duke’s book more clearly: since poker falls closer to the luck edge of the spectrum, the heavy emphasis on the decision-making process makes a lot of sense. 

It’s worth noting, however, that this point does not seem to be corroborated with a lot of evidence, at least in the resources that I reviewed (it may be treated differently in the book). 

Mauboussin offers the following criteria for evaluating the process: 

  • Analytical — finding edge and figuring how much to bet on that edge.
  • Behavioral — understanding the common biases we all tend to fall for and weaving into the process methods to mitigate and manage those.
  • Organizational — avoiding “agency costs” (misalignment of incentives). Is the organization helping or impeding the quality of the decision?

So where does all of this leave us with regards to performance management? 

  • It supports the claim that variance in outcomes may have more to do with luck than skill. 
  • This gets compounded in more mature domains where the “Paradox of Skill” is in full effect. 
  • It supports the shift from focusing on the outcome to focusing on the process when evaluating performance. 

Did it solve our problem? No. Did it get us closer to a solution? Yes. Baby steps…

Untangling skill and luck [Mauboussin]

Remote Hiring [Husney]

source: Parabol.com

Jordan Husney, co-founder and CEO of Prabol gave a great talk about remote hiring as part of NOBL’s Change@Work conference, which he later transformed into an even better blog post: 

6 steps to hiring as a fully-remote team 

Parabol’s team has been fully-remote for the last 5-years, so while many organizations had to transition to hiring remotely relatively recently, the Parabol team has had a few reps under its belt and it’s great to learn from their experience. 

Jordan is an incredibly sharp thinker, so I’d highly recommend reading his post in its entirety to fully benefit from his deep observations. Below, I’d only outline Prabol’s hiring process at a high level and offer my perspective on it. 

1. Application

The application process is extremely lightweight: contact info, work eligibility, and relevant materials that the candidate thinks attest well to their fit for the role. Note that a resume is not required (but is an option), which I’m a big fan of as they’re often bad predictors of fit. The one tweak I’d offer here would be to offer a fast-track option (quicker application review time) that requires completing a short assignment, demonstrating deeper interest from the candidate. 

2. Optional pre-screen 

Throughout the process, there’s an intentional effort to not waste either the candidate’s or the team’s time and this is a good example of that. The outcome of reviewing the application doesn’t have to be a definitive pass/fail. If the outcome of the review is inconclusive, the team simply emails the candidate asking a specific question or requesting additional information, rather than forcing a definitive, suboptimal outcome — passing on a candidate who had a shot or wasting time with a borderline candidate.

3. Phone screen 

A 30-min phone call (sometimes shorter) where the agenda is optimized to reject a candidate who’s not a fit as quickly as possible, by asking the biggest question first. Parabol’s “big question” is very straightforward: 

Compared to your previous roles, what would you like to do more of and less of in your next role? And why does Parabol feel like a good fit for you?

However, it packs a lot of insights, allowing the team to get a rough assessment of the candidate’s self-awareness, motivation, alignment of interests, excitement about the opportunity, and level of verbal communication skills. 

At the end of the screen, the baton for driving the process forward is passed to the candidate. If they’d like to move forward, they’re asked to send the team an email with any questions that they didn’t get answered today and want answered as part of upcoming conversations. 

I LOVE this little tweak! Not only does it give the team a strong signal on the candidate’s level of interest in the role, and doesn’t waste their time with candidates that would just show up to the interview day because they were invited to one, it also, and perhaps more importantly, a deeply empathetic way to connect with the candidate, acknowledge that this is a two-way evaluation process, and in a small way, allows them to co-design the remainder of the process to fit their needs. 

4. Skills assessment: 2 months, 2 weeks

A 30–60mins session in which candidates are asked to look critically at Parabol data and ask questions in order to create their own onboarding plan and scope out about 2 months of work. 

Towards the end of the session, they are given a take-home assignment (that’s emailed back to the team once complete) in which they are asked to distill the plan down to: 

  1. The 3–5 things they’d like to get done in the first 2 months.
  2. The 3–5 things they’d like to get done in the first 2 weeks.

I’m a big supporter of the overall approach of avoiding brain teasers and various whiteboarding exercises for assessing skills. However, there’s some nuance that’s not fully captured in the description of this step that may or may not cause it to introduce bias of its own. 

Most of us are pretty bad in engaging with out-of-context hypothetical scenarios: thinking how we’d act in a situation we’ve never been in before, or how we’d solve a problem we’ve never solved before. This gets compounded if we have to do that “thinking on our feet” without time to fully digest the new situation and pattern-match it to a challenge we have been in before. 

The “live” portion of the exercise outlined above runs that risk, though it can be mitigated by teeing up the conversation and sharing the data ahead of time. Recording the interview and making the recording available for the take-home assignment, as well as ensuring that follow-up questions are encouraged, can further mitigate some biases. 

Personally, I’d still couple this exercise with a deep dive on a recent project that the candidate was involved with/led. Hearing the candidate truly in their element, speaking about something that they’re an expert on (their own experience) can be a good counterbalance for some of the challenges with the hypothetical exercise. 

5. Cultural assessment

This is a 60-mins group session (a member from each team is present) aimed at assessing the candidate’s alignment with Parabol’s 3 core values: transparency, empathy, and experimentation. 

The format uses “tell me about a time…” questions (“Can you think of a time when you last lost your cool?”) and follow-up questions to explore deeper (“if we were to ask the other person what their version of this story would be, what would it sound like?”).

The laser focus on values alignment, rather than broad and fuzzy “culture fit” is fantastic. 

However, the method, as Jordan points out himself, is imperfect in ways that go beyond needing to be mindful that “absence of evidence is not evidence of absence”. “Tell me about a time” questions suffer from the same retrieval/out-of-context challenges as hypotheticals. I don’t keep a running list of times that I lost my cool in my head, and it may be difficult for me to think of one on the spot. Yet that has little to do with my actual alignment with the company’s values. “Tell me about a time” questions run the risk of assessing more for preparedness for answering the particular question than the essence of the response itself. An alternative approach will be similar to the one outlined in the previous section: asking broader experience questions and zooming in from there. For example: what did you like/dislike the most in your previous role? what were your greatest strengths/areas of growth in that role? what would your manager say if we asked him? what was your proudest achievement? They are not without fault of their own, but better than “tell me about a time” questions, in my opinion. 

6. Contract-to-hire “batting practice”

Rather than forcing the team towards an expensively reversible “hire/don’t hire” decision, after the cultural assessment the team answers a different question, consistent with their experimentation/safe to try value: 

Do we want to put some of our company’s money and more of our team’s time to try working alongside this candidate?

A 20-hour task is picked, often from the onboarding plan the candidate created in the skills assessment interview, and the candidate is extended a 2–4 weeks part-time contract to complete it, depending on their availability. At the end of the project, the candidate reviews the deliverable with the team and they conduct a shared retrospective, after which the team needs to make a unanimous decision on whether to extend a full-time offer to the candidate. 

Conceptually, I’m a big supporter of this type of contract-to-hire assessment as a way to give both parties a better feel for what it would be like to work together. Practically, it can be a challenging commitment for many candidates with existing full-time jobs and family obligations. 

My only other hope is that the team is embodying the “safe enough to try” value in their final decision as well, looking for consent, rather than consensus on that final decision. 

Taking a step back, the Parabol process is a great blueprint for a highly effective remote hiring process. I’ve outlined the tweaks that I’d make to make it even better. You should consider your own. The only big thing that I would have liked to see more of, is carving out more time for the candidate to assess the company, not just for the company to assess the candidate. While I didn’t see it listed in the post, one way to go about it is still credited in my head to Jordan: have one of the interviews be an interview where the candidate explicitly interviews an employee in the company rather than the other way around. I’ll let this be my parting thought for this post. 

Remote Hiring [Husney]

Hiring for Conscientiousness [Osman]

Source: Hollaway.com

A pretty neat piece by Ozzie Osman that’s been sitting in my backlog for a few months now: 

Hiring for Conscientiousness

The gist is pretty straightforward, Osman defines conscientiousness: 

Conscientious people have a desire to do good work, and are self-motivated to perform well regardless of whether someone is watching over them. They are action-oriented, dutiful, and careful.

And makes a compelling case for why conscientiousness should be an attribute to look for in our hiring process. He then offers a sample set of questions that can help evaluate it. 

Osman starts building off on Andy Grove’s framework for “effectiveness”, decomposing it into two main drivers: “skill” and “will”. Skill is decomposed further to a stable and general component — “intelligence”, and a dynamic and specific one — “experience”. The latter can grow over time, with more opportunities to perform the specific task. Similarly, Will can be decomposed further, to a general component — “conscientiousness”, and a specific component — “engagement”. Conscientiousness affects a person’s base level of motivation and how much they care about work, whereas engagement is more context-specific and can vary by the task at hand, relationship with their manager, current level of morale, etc. Osman posits that conscientious people may experience times of lower or higher engagement, but as a general rule of thumb, they always care about their work and perform it to the best of their ability.

Finally, and sadly, as somewhat of a disjointed afterthought, Osman highlights the importance of “values alignment”, which he distinguishes from the superficial/erroneous “culture fit”, as an additional hiring criterion but he doesn’t integrate it fully into the framework.

With the full 5-attribute criteria in mind: intelligence, experience, conscientiousness, engagement and values alignment; Osman observes that most strong recruiting processes do a good job evaluating for 4 out of the 5 attributes, but usually do not address conscientiousness. He offers the following questions as jumping-off points for assessing a candidate’s level of conscientiousness: 

  1. Ask them to walk you through a past failure — conscientious candidates will often define their failures by their impact on their commitments and will move mountains to avoid (or fix) such failures.
  2. Ask them about a time they weren’t able to meet their commitments — a more specific version of the above question aimed at getting a more nuanced understanding of the way they view their obligations to others. 
  3. What motivates them to work, and what does success mean? — Conscientious candidates will have a more outward-facing view on success (impact on others/the company) and can often balance long-term and short-term success, avoiding short-term optimization. 
  4. Have them tell you about a time they worked on something they didn’t enjoy — Willingness to do unpleasant work if it’s important to their team or company is a positive sign of conscientiousness.
  5. Look for evidence of side-projects or things that go above and beyond 
  6. What triggered them to leave past (or current) jobs, and how did they go about leaving?thoughtfulness about what they work on and deliberate regard for transition plans are additional positive signs of conscientiousness.

Personally, I’m not a big fan of out-of-context “tell me about a time when…” questions (#1, 2, and 4) since they often test recall abilities and favor candidates who luckily prepared for the specific question asked. But that can be easily addressed by starting with a broader question like “tell me about your most recent project” and going into more specific questions while already within that normal/fresh context: what worked well and didn’t well? (#1), did you have to reset expectations? how? (#2) what parts of the project were unpleasant? (#4). 

Since conscientiousness is a Big 5 personality trait, another alternative would be to utilize a scientifically validated method for assessing conscientiousness. 

Hiring for Conscientiousness [Osman]

Knowledge Management nuggets [Brier]

source: variance.com

I recently listened to a webinar by the team behind Variance which I found to be highly informative. The first part was an introduction to Product-led Growth (PLG) and Product Qualified Leads (PQLs) which is too far outside of the scope/focus of this publication to cover here but quite interesting for business nerds like myself. 

This post focuses on the second part, delivered by Noah Brier and detailed in full here: 

6 Rules of good documentation 

There were two highly useful knowledge management nuggets in that section that are worth highlighting: 

Nugget #1: Writing is being used in the service of four different purposes 

  1. Writing to communicate — get ideas across.
  2. Writing to converse — synchronous.
  3. Writing to think — as a way to crystallize and firm up abstract ideas/connections.
  4. Writing to archive/document— to make knowledge explicit and sharable. 

It is often the case that writing that was used to serve one purpose cannot be used effectively to serve a different purpose. So the next time that you’re digging through a long Slack exchange (#2 writing to converse) trying to find that small bit about how to set up the environment variables so the software will work correctly (#4 writing to document) getting frustrated— you will know why.

Nugget #2: 6 rules of good documentation. 

Digging deeper into the fourth purpose, Brier offers the following list as guidance: 

  1. Fit for context.
  2. Clearly written and to the point.
  3. Visual where possible.
  4. Skimmable (can easily skip irrelevant sections).
  5. Up-to-date.
  6. Discoverable and tracked.

KM nerds can endlessly debate additions, omissions, and refinements to the list, but I think they’d agree that it’s a pretty great starter list. If your documentation checks the box on those 6 things — you’re in good shape. 

I particularly appreciate including #5 and #6 on the list, which go beyond the way the text is structured to highlight a couple of additional elements that ended up tripping many documentation efforts that I’ve seen. 

And as a useful double-click on #1 (fit for context), Brier offers the adaptation to a framework developed by Daniele Procida captured in the diagram above, distinguishing between different documentation artifacts depending on whether the documentation is aimed at helping the reader perform an action or understand a concept; and whether consuming the content is self-directed or guided. 

Knowledge Management nuggets [Brier]

OWKRs (not a typo)

Pictured: the Great Auk penguin (inspired by the O’Reilly “Effective AWK Programming” book cover)

Goals are a core organizational practice in many organizations and therefore the topic of several blog posts in this publication. From “Goals: connecting strategy and execution”, through “Why setting ambitious goals backfires” and “Goals gone wild”, to “How we align our goals”. 

The challenge with goals is captured beautifully when we look at them through the framework outlined by Donald Sull in the first piece above for the 4 different uses for goals: 

  1. Improve individual performance
  2. Drive strategic alignment
  3. Foster organizational agility
  4. Enable members of a networked organization to self-organize their activities 

#1, in particular, is rife with pitfalls and tends to draw most of the heat when a case against goals is made. Yet if we think about goals less as a target to be hit and more as an intent to align on — it’s clear that they play a critical role in supporting #2. 

More on this here

Abandoning goals altogether is probably a no-go. So how can we shift the way we set and articulate goals to be more supportive of that?

Much as been written about OKRs, the most popular goals structure in use today, and in recent years more nuanced pieces addressed some of the common pitfalls in how phrases and set. For example, avoiding the “OKR cascade”. However, none, that I know of, have suggested any changes to the OKR structure. Which is what I’m intending to do today. 

If we intend to use OKRs primarily as an alignment mechanism, the structural gap becomes clear: the “objective” describes the goal that we’re working towards, but it doesn’t connect it to the broader strategy. It doesn’t help answer the most meaningful question that a conversation should be centered around: 

Why is this goal the best thing you could do to advance our strategy?  

It is in answering this question that the biggest assumptions and interpretations are being made and the risk of meaningful misalignment is highest. Yet, we leave the answer to that question implicit, hoping that all parties involved are skilled enough to uncover it on their own. 

No more. Introducing: OWKRs. 

A small, but meaningful tweak to the traditional OKR structure: 

  • Objective
  • Why? (new) — a short (2–3 sentences) explanation of why this goal is the best thing that you could do to advance the strategy. 
  • Key Results 

My hypothesis is that making the “Why?” explicit in the structure will shift the focus of the O(W)KR setting conversation to discussing the underlying assumptions in selecting the objective and catching any critical misalignments sooner. 

And as a bonus point, OWKR is an anagram for “work”… 🙂 

OWKRs (not a typo)

Visualizing the voice of the employee [Coolen]

Patrick Coolen is the Global Head People Analytics, Strategic Workforce Planning and HR Survey Management at ABN AMRO, the third-largest bank in the Netherlands. Recently he penned a great piece about one of my favorite topics:

Visualizing the voice of the employee

In this piece, Coolen outlines how they conduct and digest engagement survey data at ABN AMRO. 

Data collection

The engagement survey is SUPER simple and light-weight containing only 3 questions: 

  1. How likely are you to recommend our organization to a friend or relative as an organization to work for? (quantitative, NPS-like question)
  2. What is our organization doing well as an employer? (qualitative, “Top” question)
  3. What could our organization do better as an employer? (qualitative, “Tip” question)

To get a more continuous view of the data, while avoiding survey fatigue, since ABN AMRO is a large-enough organization, they run the survey monthly, but only 1/12 of the employees are asked to take it each time, utilizing a stratified sampling approach to ensure that the sample is representative. 

I LOVE the lightweight approach and the balance of a single quantitative question and the two “top & tip” open-ended qualitative questions, as well as leveraging the size of the organization to reduce survey fatigue without jeopardizing the quality of insights.

My one nit is that I’m not a huge fan of the NPS-like quantitative question and would probably replace it with a different quantitative metric that has a causal link to performance. 

Data analysis

The extreme simplicity of the survey and open-endedness of the qualitative do create some non-trivial data analysis challenges in classifying the responses that Coolen’s team did a brilliant job overcoming. 

First, they “normalized” the responses by translating all responses to a single language (English), splitting responses with multiple subjects, lower-casing all text, removing punctuations, and lemmatizing key words. 

Then, they evaluated several machine learning classification algorithms, landing on Support Vector Machine as the best candidate, a refined its precision further using a supervision process. 

The output of the data analysis phase is the classification of all responses to one of 150 topics, who, in turn, roll up to a smaller set of “expert domains” (Recruiting, L&D, IT, etc.). 

Data visualization

The data is then presented and made available to the entire organization using the bubble chart below where each bubble represents a topic: 

source: Patrick Coolen
  • The bubble is larger the more responses map to that topic.
  • The bubble is higher the more the topic showed up in “top” responses, rather than “tip” responses. 
  • The bubble is positioned further to the right, the more positive the responses to the quantitative question were when the topic was brought up in the qualitative questions. 

The area of the chart can be segmented into 4 quadrants driving different actions: 

  • Topics (bubbles) in the top-right — Celebrate — things that the organization does well and are positively correlated with the quantitative measure. 
  • Topics (bubbles) in the bottom-left — Focus Areas — things that the organization does not do well, and are negatively correlated with the quantitative measure. Therefore, they are the areas where the opportunity for impactful change is the highest. 
  • Topics (bubbles) at the bottom-right — Suggestions — things that the organization does not do well, but are not negatively correlated with the quantitative measure. 
  • Topics (bubbles) at the top-left — Investigate — things that the organization does well but are still negatively correlated with the quantitative measure. Since this is an anomalous pattern, it is worthy of further investigation. 

The chart can also be filtered by time, business line, role, etc. to draw more refined insights which are then reviewed and acted upon in quarterly business reviews. 

Net-net I think this comes pretty darn close to the best way for surfacing insights out of a “working on work” exercise. Effective actions will be the next hurdle to overcome. 

Visualizing the voice of the employee [Coolen]

The evolution of Cynefin [Snowden, Corrigan]

I first learned about Dave Snowden’s Cynefin model in a Lean-Kanban conference circa 2015–16 and have made references to it in a handful of blog posts in the past [1, 2]. 

It first received broad recognition in a 2007 HBR piece titled A Leader’s Framework for Decision Making. On March 1 (St. David’s Day) 2019, Snowden took it upon himself to write a series of blog posts (5 in total) covering updates to the model, and in this year’s St. David’s Day, he decided to turn it into an annual ritual. 

Cynefin St David’s Day 2020 (1 of 5)

Chris Corrigan took it upon himself to aggregate the model and the key changes here:

A tour around the latest Cynefin iteration

And I am going to attempt to distill it even further. This is going to be a challenging post to write and I know the end product is not going to be great. Both because the subject matter is difficult, and because I have yet to have mastered the framework. But that’s exactly the point of writing about it…

First, a quick orientation: the Cynefin model is designed to aid decision-making and inform actions, recognizing that the decision-making process leading to the best action is different based on the context (domain) — the environment/situation — in which the action needs to be taken. 

 The model discerns between 5 different domains, the two on the right (Clear, Complicated) are “ordered” domains where the environment is mostly knowable and predictable and problems are solvable. The distinction between those two domains is more nuanced and is a factor of the number of parts in the system/situation. The higher the number, we’re going deeper into the Complicated domain and the level of expertise required to know the right answer increases. 

The two on the left (Complex, Chaotic) are “unordered” domains where the environment is mostly unknowable and unpredictable. In the Complex domain phenomena such as emergence and self-organization exist but those are enabled by some constrains. In the complex domain, there are no meaningful constraints leading to semi-random behavior. 

Going counter-clockwise (Clear -> Complicated -> Complex -> Chaotic) there are fewer constraints and therefore the more unordered and unstable the situation becomes. Going clockwise, there are more constraints on the situation and it becomes more ordered and stable. 

In the middle is the Confusion domain, broken down to “Aporetic” (“at a loss”) where the confusion is unresolved or paradoxical, and “Confused” where we just haven’t fully understood the situation yet — a more temporal state. 

I’m going to keep the green sections indicating liminality out of the scope of this post for the time being. 

Putting the framework to action

Almost any situation that requires a response has multiple aspects, each mapping to a domain. 

Step 1 is decomposing the situation to its various aspects. 

Step 2 is mapping each aspect to its respective domain: 

  • A clear and obvious aspect where things are tightly connected and there is a best practice → Clear
  • An aspect with a knowable answer or a solution, which has an endpoint, but requires an expert to solve it for you → Complicated.
  • An aspect with many different possible approaches, and uncertainty around which is going to work → Complex.
  • An aspect that is a total crisis, which completely overwhelms you → Chaotic.
  • Aspects whose domain is still unclear should be left in the middle, “Confused” domain. 

Step 3 is applying the appropriate approach to the aspects in each domain: 

  • Clear (Sense → Categorize → Response): just do them.
  • Complicated (Sense → Analyze → Response): research using literature and experts, make a plan, and execute.
  • Complex (Probe → Sense → Respond): get a sense of the possibilities, try something, and watch what happens. As you learn things, document practices and principles that guide in making decisions. If rules are too tight, loosen them. If rules are too loose, tighten them. 
  • Chaotic (Act → Sense → Respond): apply constraints quickly and maintain them until the situation stabilizes. 
  • Confusion: monitor those aspects and re-evaluate as new information becomes available and may help classify them into the appropriate domain. 

Key changes in the framework 

  • Renaming the first domain as “Clear” instead of “Simple” (or “Obvious”)
  • Highlighting the roles that constraints play in each of the domains: fixed constrains (Clear), governing constraints (Complicated), enabling constraints (Complex), no constraints (Chaotic). 
  • Renaming the middle domain to “Confusion” (from “Disordered”) and decomposing it to: “Aporetic” and “Confused”. 
  • Adding liminal boundaries around the Complex domain. 
  • Adding approach “labels” in addition to approach sequences: best practice (Clear), good practice (Complicated), exaptive discovery (Complex), novelty under stress (Chaotic). 
The evolution of Cynefin [Snowden, Corrigan]