10 common challenges with performance appraisals and how to fix them

Performance appraisals (PAs) require a lot of attention, and they get it. The problem is they never seem to get enough. For most organizations PA comes around once a year. And every year questions arise regarding the process – many of them quite familiar to Talent Management, “Should we use a 4-pt scale or a 5-pt scale?” It’s déjà vu all over again. It can make even the most skilled talent management professional begin to question their decisions.

Should we use a 4-pt scale or a 5-pt scale?

Here I’ve listed 10 of the most common challenges that arise when rolling out a PA process along with a response template based on affirmation, negation and a potentiality better way. They will be familiar to you. But when viewed outside of your own sandbox they can be amusing in a “rubber necking” sort of way. Additionally, they can reassure you that you’re not stuck in some talent management “do loop.”

  1. Should individuals rate themselves?
    1. Absolutely – including self ratings helps to calibrate evaluations between individuals and their boss; can provide information the boss doesn’t have; gives employees a voice in the process
    2. No way – some bosses will agree with the employee for the sake of peace; some bosses will adopt the employee’s rating out of laziness; it’s the boss’ call and they shouldn’t be bothered with predictably lenient self ratings
    3. How about we… – have employees rate themselves but without numbers to minimize trivial arguments over decimals, be clear about how their ratings will be used, hold bosses accountable for final ratings
  2. Should others be included in the rating?
    1. Absolutely – bosses aren’t the only stakeholder, other perspectives matter enough to solicit
    2. No way – bosses over-depend on or hide behind others’ ratings; creates a burden for some raters who have to rate many employees
    3. How about we… – communicate that others may be relevant to review, have bosses informally solicit others’ feedback but hold the boss accountable for final ratings
  3. Should we include objective results?
    1. Absolutely – objective results avoid problems with rater judgment; personal goals should align with organization goals; the specificity helps to motivate; they help distinguish busy versus productive employees
    2. No way – objective measures are difficult to attribute exclusively to the employee; objective targets can bring about counterproductive work behavior whereby the end justifies the means; objective goals can be difficult to identify for some employees; bosses can legitimately disagree with objective results and shouldn’t be forced to agree with a given metric
    3. How about we… – include objective evaluations along with behavioral evaluations for jobs where clear cause and effect exists between employee behavior and valued organization results; do so with full transparency
  4. Our strategic plan changed, should we change what gets evaluated?
    1. Absolutely – change happens, it’s more important to be relevant than consistent
    2. No way – changing targets mid-year is confusing, makes us look unsure and ignores what was once important
    3. How about we… – design the process to accommodate reasonable changes in performance standards at the organization level but not as a “one-off” for isolated individuals; consider mid-term evaluations for material changes in goals that still leave adequate time for review
  5. Should we use a common anniversary date?
    1. Absolutely – it improves calibration to rate everyone at the same time; improves consistency and communication across individuals
    2. No way – results for everyone don’t happen at the same time or on cue, it’s important to be timely with reviews; it takes time to get results, everyone should be given the same amount of time under review – including new hires
    3. How about we… – use common anniversary dates but supplement them with mid-term evaluations; recognize other talent processes such as succession planning in the performance appraisal cycle
  6. Should we use PA results for merit reviews?
    1. Absolutely – performance and compensation need to be clearly and formally linked; a separate process for merit review is inefficient and may be discordant
    2. No way – puts too much weight on internal equity without consideration of labor markets; some will use ratings to effect pay changes on their own terms instead of accurately appraising performance
    3. How about we… – use PA results as one source of input to merit reviews but not the sole determinant; ensure feedback providers understand exactly how merit and performance relate to each other
  7. Should we “fix” rater results that are clearly inaccurate?
    1. Absolutely – a fair process requires consistency of reviews between raters and justifies corrections
    2. No way – raters need to have the final say in their evaluations and should be the author of any changes
    3. How about we… – use rater training and procedural checks throughout the process to minimize outlier evaluations; communicate any changes to bosses and the broader review team
  8. Should we use an even number of performance gradations?
    1. Absolutely – it pushes raters “off the fence” of favoring mid-scale evaluations that don’t differentiate between employees
    2. No way – normal distributions do predict “average” ratings; it upsets raters not to have a midpoint evaluation when they are the ones giving the feedback
    3. How about we… – use an odd numbered scale for performance reviews where feedback is generally expected and bosses need the organizations’ confidence and support, use an evenly anchored scale for succession planning to generate differentiation for a process that doesn’t carry the same feedback responsibility
  9. Should we do succession planning and performance appraisals at the same time?
    1. Absolutely – performance and potential are necessarily linked, besides the ratings would be redundant; using one process is more efficient
    2. No way – PA ratings need to be based on past performance whereas succession planning ratings need to reflect projections, asking raters to do both at the same time is confounding
    3. How about we… – maintain distinct processes for PA and succession planning but openly reference each with the other when calibrating ratings between raters (this simplifies both tasks while maintaining independence)
  10. Should we use any and all available data?
    1. Absolutely – the more input available to final evaluations the better
    2. No way – some data comes at the expense to ethical standards and privacy
    3. How about we… – ensure that whatever data used for evaluation is well known and generally accepted as reasonable by those being rated; do not pry into private lives outside of work or spy on employees by incorporating any- and everything that could be measured; Overreaching is tempting in our “more is better” world but attitudes differ significantly between employee and employer about what’s fair for review. Performance appraisals don’t like surprises.

Performance appraisals don’t like surprises.

The most important thing that must be done with performance appraisals is to clarify and communicate exactly what will happen, when and to whom. The process must be well understood by everyone and it’s good practice to solicit and include input from organization members. Beyond being fair and professional, you must be perceived as fair and professional for the system to work as expected. And this isn’t a “one off” or isolated effort. It’s important to keep in mind that you very well may be repeating the show in the future so don’t act like you’ll never have to cross the same bridge again. Individuals will remember how they’ve been treated and aren’t typically shy about sharing this with you and others – performance appraisals get a lot of bad press.

As mentioned, no system is perfect, and these “fixes” won’t apply or work in every situation. They are offered as my recommendations based on an abstract, hypothetical model of performance appraisal. Specifics will largely depend on exactly why you have a PA process in the first place.

Performance appraisals are delicate, far-reaching and highly sensitive processes. A “little mistake” can have serious consequences. The referenced concerns have been posed as something of a “mock list.” They are not intended as prescription. Individual results will vary but the basic principles should translate to various situations.

Be safe and true.

9 signs you might be using the wrong personality test

Personality testing is a big part of the way organizations make hiring decisions — it has been for a some time now (it wasn’t popular before about 1980). With advances in technology there has been a great proliferation of personality assessments. They’re not all good. These assessments are much easier to generate than they are to validate. This quiz, below, can help you to know if you’re using the wrong personality test. (Have some fun with it.)

Directions: The following list of paired statements(questions) reflects things I occasionally hear when folks are evaluating personality tests. For each pair, one response is more problematic when it comes to evaluating personality tests. Reflecting on your current situation, which of the two statements would I be most likely to hear from you or others if I were a fly on the wall when you were getting the pitch from your vendor?

This 9-item quiz can help you to know if you're using the wrong personality test

Response Key: For all odd numbered pairs the problematic statement is in column A, for even numbered items the more problematic one is in column B.

Some of the statements do require more assumption than others, don’t get too caught up in the scoring. These are my answers and rationale:

  1. “It sure worked for me” — Frequently personality tests are sold by having the decision maker complete the assessment. This isn’t a bad thing — I encourage users to complete the assessment for themselves. The potential problem is that this is frequently the primary (or sole) evaluation criterion for a decision maker. Vendors know this and some hawk an instrument that produces unrealistically favorable results. “It says I’m good, therefore it must be right.” As for column B, the 300 page manual, good ones are typically lengthy. It takes some pulp to present all the evidence supporting a properly constructed inventory.
  2. “A type’s a type” – The most popular personality assessment of all, the MBTI, presents results for an individual as one of 16 types. Scores, to the extent that they are reported, only reflect the likelihood that the respondent is a given type or style – not that they are more or less extraverted, for example. But research and common sense say that personality traits do vary in degree, someone can be “really neurotic.” Two individuals with the same type can be quite different behaviorally based on how much of a trait they possess. A very extraverted person is different from someone who is only slightly extraverted — same type, different people. (No, I don’t condone mocking or calling out anyone’s score, as it would appear I’m suggesting in column A, but with a good test such a statement is potentially valid.)
  3. “That’s a clever twist” – Few personality tests are fully transparent to the respondent – this helps control the issue of social desirability. But some go too far with “tricky” scoring or scales. This is a problem in two ways: 1) if the trick gets out (google that) the assessment loses its value, and 2) respondents don’t like being tricked. It’s better to be fairly obvious with an item than to deal with {very} frustrated respondents who may just take you to court.
  4. “It was built using retina imaging” – Here’s another statement that needs a little help to see what’s going on (no pun intended). I’m not against new technology, it’s driving ever better assessment. But sometimes the technology is misused or inadequately supported with research. There’s a reason that some personality assessments have been around for more than 50 years. Validity isn’t always sexy.
  5. “That’s what I heard in a TED talk” — My intent here was to implicate “faddish” assessments. They may say they’re measuring the hot topic of the day, but more often than not, what’s hot in personality assessment, at least as far as traits are concerned, is not new. Research has concluded that many traits are not meaningfully different from ones that have been around a while. Don’t fall for an assessment just because you like the vocabulary, check the manual to see if it’s legitimately derived. There’s a reason that scientists prefer instruments based on the Big 5 traits (not the big 50).
  6. “Now that’s what I call an algorithm” — More complicated isn’t necessarily better. Some very good — typically public domain — assessments can be scored by hand. Tests that use Item Response Theory (IRT) for scoring, do have more complicated algorithms than tests scored via Classical Test Theory (i.e., more like your 3rd grade teacher scored your spelling test). Still, a three parameter IRT scoring method isn’t necessarily better than a one parameter model and it isn’t three times more complicated anyway. Proprietary assessments typically protect their copyright with nontransparent scoring, but for the most part what’s obfuscated or obscure is what items go into a calculation, not that the calculation is necessarily complex. Good assessments should employ fairly straightforward scoring to render both raw scores and percentile, or normed scores.
  7. “It really has big correlations” — As with some prior items a bit more context is needed to get the point I’m trying to make. Here the issue is sufficiency. Yes, a good instrument will show some relatively high correlations, but they need to be the right correlations. (And they need to be truthful. Unfortunately, I know of cases where misleading statistics have been presented. It helps to know about research design and to have a realistic expectation for that validity correlation. If the vendor tells you that their assessment correlates with performance above .40, make them prove it. (And a .40 correlation equates to a 16% reduction in uncertainty, not a 40% reduction. Sometimes vendors get this confused.)
  8. “It’s too long, let’s cut some items” – It’s tempting to simply eliminate irrelevant scales or items for your specific need. After all, you’re not touching the items that comprise the traits you want to know. The problem is that the assessment is validated “as is.” Both the length of an assessment and its contents can influence scores. Priming biases are one example of how items interact with each other. Anytime you modify an assessment it needs to be validated. This is typically the case for short forms of assessments (i.e., they’ve been specifically validated), so it’s fair to ask about this alternate form.
  9. “That’s amazing” — By now you should see that a common factor in my problem statements has to do with how much goes on “out of view” (less is better) and how thorough the test manual is. “That’s amazing” is for magic shows, not science (I realize I’m parsing semantics here – you get my point).

Personality inventories can be legitimate assessments for many (most) jobs. (This even applies to machines. Researchers are using a variation of personality inventories to manipulate the perceived personality of robots.) Without exception, it’s critical to ensure that any assessment be validated for specific use, but you want to start with something that has been thoroughly researched. If everything has been done right, you can expect local results to be in line with the manual (assuming your tested population isn’t that different from the test manual sample(s)).

A lot goes into validating a personality assessment and test manuals are lengthy. Although this is good and necessary for adequately evaluating the test, it can be used in intimidating or misleading ways. It’s easy for claims to be made out of context even if the manual is true, especially when decisions are made that affect one’s job. It’s important to review that test manual, not just the marketing brochure. (The good news is these manuals are boringly redundant. For example, the same figure is used for each scale, or trait, when repeating testing for gender bias.) Although I’m sure your vendor is a “stand up” person, you can’t rely on this fact if your process gets challenged in court. It pays to review the manual thoroughly.

I hope your personality inventory passed the test.