Small Wars Journal

Game Theory in Peer Evaluations

Wed, 01/23/2013 - 3:30am

Abstract: Game Theory is an application of expected value and rational actor theory.  It can answer big questions:  Will the US win a war against China?  It’s only a matter of gathering the information required to validate your assumptions.  But game theory can help answer the mundane; questions about how best to gather information for evaluations. 

This past summer, the Department of the Army (DA) adopted changes to the Officer Evaluation Report.  Whenever changes to the officer evaluation report (OER) system are discussed, one option always gathers adherents; particularly it seems, in SOCOM with almost religious devotion- peer evaluations.  The basic idea is to allow officers to self-select their most capable.  The most recent round of revisions has produced some positive changes, most notably, more descriptive sections on an officer’s performance and potential and less ‘box checks’, but the idea of ‘peer evaluations’ was not adopted. There is an application of mathematical logic which illustrates why peer evaluations are bad.  Game theory, which looks at the strategic interactions of the players available moves, takes into account preferred outcomes.  In particular, game theory can illustrate outcomes that might not have been intended by the players.  The success of each actor in the game is dependent on how his or her own actions interact with the competitors.  A simple matrix format shows the interaction between the players’ preferred outcomes.  Observers can then use the game to more easily conceptualize the effects of proposed policies.

Game theory actually provides two models- one is the zero-sum game, in which the outcome is either winning or losing.   The zero-sum model was popular during the early Cold War to model US-Soviet nuclear exchanges, so it is easy to conceive the connection between that type of total war, and a total game, which has only two outcomes, a win or a loss.  Economists developed the other model, which was an attempt to understand human interaction in terms of maximizing value.  They understood that people possess varying levels of information and will, to some degree, cooperate with each other.  This is the model we will use and is similar to the classic ‘prisoner’s dilemma’ which Axelrod made famous in his book, The Evolution of Cooperation (Axelrod, Robert, The Evolution of Cooperation, Basic Books, New York, 1985).   In the prisoner’s dilemma, the police promise freedom to the prisoner that cooperates and ‘snitches’ on the other guy. 

Imagine two officers, both up for their annual evaluation at the same time.   The commander does not have enough information to determine which of these junior officers should receive the coveted Above Center of Mass check, which is restricted by regulation to only 49% of the commander’s rated population. The commander discusses his problem with his XO, who visited the officers and asks each of them to write a recommendation of the other. If both officers write exceptional ratings of the other, they are ‘cooperating’ with the commander (who secretly thinks both officers deserve the best rating).  If Lieutenants Anderson and Brown rate each other as substandard, then the Brigade commander would rate both as center of mass- A case in which each junior officer receives less than the superior rating.  If the lieutenants cooperate and write glowing reports of each other, one will still lose, because the commander can only rate one as exceptional.   However, if Lieutenant Anderson rates the Lieutenant Brown as substandard, then Anderson will receive the better rating.  If LT Brown rates Anderson as substandard, then Brown will receive the better rating.  What do the officers do?  We can use the following model to find out.

The officers have four strategies- they can cooperate and write generally good reviews of each other, or they can adopt a negative strategy in which they ‘trash’ the other officer.  In theory, each strategy can have unlimited iterations, and often game theory is explored using complex computer programs which run almost unlimited iterations, using infinite, minute variations on strategies.  In order to simplify this demonstration, we limit the officers to four types of evaluations:

-Write a superior peer evaluation

-Write an excellent peer evaluation

-Write a generic peer evaluation

-Write a bad peer evaluation

It is important to remember that Lieutenants Anderson and Brown understand that the commander is limited to giving only one of them a ‘top-block’ OER.  In this case, each officer prefers to maximize his own outcome.  In other words, each officer wants to receive the superior rating from the commander for himself.  Now, we can assign relative values to each strategy (1 through 4 respectively).

The strategies are valued this way because if Anderson writes a superior evaluation for Brown, then the commander would give Brown the top-block, which is the worst outcome for Anderson.  Each officer is rational, so each officer’s expected value for each strategy would be identical.  There are a couple rules to our game, which reflect real world restraints on the system.  One, the commander can only gives one officer a top block- there are no tricks to get around this.  Two, our two lieutenants understand the officer evaluation process.  Three, each officer will seek to maximize his own evaluation.  These three rules force the model to be rational.  Surprising to some who would criticize game theory as being too math centric to be understood, or too abstract, these rules reflect, to a surprising large extent, the real world.

At this point, we can put the strategies into the matrix, and observe the interactions.   In our matrix, the strategies are grouped under ‘Go Positive’ which reflects generally positive recommendations, and ‘Go Negative’ which reflect generally negative recommendations.  Anderson and Brown don’t know what the other will finally write, and neither one will get the chance to read what the other turns into the XO. 

However, the observer can easily see which strategy the officers will adopt.  Follow the arrows in figure 1:  Both officers would like to go positive, but can’t be sure if the other will honor ‘the code’ and so they move to higher valued strategies.  Once the movement begins, each officer is confronted with the same dilemma that the other officer will go to his dominant position.  So each officer must go to his dominant position, as shown by the highest value strategy, which is to go negative on the other. 

Figure 1. Game matrix showing how the lieutenants flow away from 'losing' strategies towards 'winning' strategies in order to protect themselves.


This is reminiscent of the old Cold War Zero Sum game, in which the only recourse available to either country was to go all out, all at once.  This became known as MAD, mutually assured destruction, when neither country could benefit unilaterally from a change in strategies.  Our game, because its Nash Equilibrium is its saddle point, appears to be a zero-sum game. However, this is not the case, since zero-sum games preclude any cooperation in the form additional information or any further interaction between the players.

This dominant position, effectively a stalemate, is known as a ‘Nash equilibrium’.  It is a point in the model at which neither lieutenant will move (in this case down) without additional incentives.  These incentives are called ‘side payments’.  Side payments are values added to lower ranked strategies in order to make the actor willingly choose that lower valued course of

Figure 2. Graphing the strategies shows the lieutenants' dominant position, or Nash Equlibrium, which lies on the Pareto Optimal line, and the area which side payments would move the lieutenants

action.  In international relations, the United States often uses side payments, in the form of loans through the IMF, the import-export bank, or favored status in world organizations to get one or more countries to move away from a strategy which they see as maximum value, but which is detrimental to US interests.  In our example, side payments could take the form of reminders from the XO on the other officer’s achievements, or promises of more important jobs in the organization hierarchy.  With the restriction to 49% in the current OER system, officers are already familiar with the concept of ‘bill-paying.’  These side payments enter the game as information, and provide bias, moving the players up or down on their scale.  In figure 4, we can see the dominant position each lieutenant occupies on the line of the Pareto optimal-   If one lieutenant can gain a position that is further to the right, further up, or both, at a reciprocal cost to the other lieutenant’s position, then that officer would win.   The two lieutenants could even work together to give each other the same ‘generic’ peer evaluation.  It’s implausible they would do this, but still rational within the bounds of the game.



Game theory provides a framework on which to develop options for policies- it gives broad right and left limits for success and in this case, failure.   The greatest problem with ‘peer evaluations’ revealed here is the internal, value maximizing impulse in human nature.  Unfortunately, officers have no internal motivation to do so if they follow the rules of rationality.  Any activity introduced to mitigate this bias only further unbalances the system.  Still, the Army has the problem of determining which officers are actually doing the best work.  If the officer’s cooperate fully, they would only rate each other with generic recommendations, throwing the issue back on the commander.  Additional rules can be instituted, taking the form of side payments and meant to influence the officers to either inflate or deflate their recommendations according to some pre-determined system.  When this happens, the commander is not receiving an un-biased report, and so is still making the decision based on his original observations.  Even more ominously, mathematicians has produced an ‘evolutionary’ model of the prisoner’s dilemma, there are infinite iterations of the game played in a set population.  Jonathan Bendor and Piotr Swistak  found that the most successful strategies are replicated themselves faster because they survive.  In our case, officers who are successful based on their peer recommendations will continue to use that strategy.  It’s then fairly obvious the problems that would ensue from basing promotions on a system that rewards inflated evaluations.  Bendor and Swistak also found that players would begin switching from lower valued strategies to the higher valued strategies that are successful.  It’s an interesting application of Bayes’ Theorem.   In our model, we would soon have all officers switching to the most destructive peer evaluations, regardless of the truth.  Only a strong central authority, in this case the commander, could re-introduce stability.    If the officers work together and write ‘generic’ evaluations, then the commander still has to decide which gets the better rating.  In any case, the burden still falls to the commander, even though the ‘peer evaluation’ system was set up to help the commander with his judgment. (Bendor, Jonathan and Port Swistak, “The Evolutionary Stability of Cooperation,” The American Political Science Review, Vol 91, No 2, (Jun 1997), pp 290-307.  Also by the same, “The Controversy about the Evolution of Cooperation and the Evolutionary Roots of Social Institutions,” in Gasparski, Wojciech et al (eds),Social Agency, New Brunswick, N.J.: Transaction Publishers, 1996.)

Still, the process of applying game theory to evaluations has some merit.  It has shown that ‘peer evaluations’ are not a viable method, and so it allows decision makers to concentrate on other, more viable alternatives.  For example, if the commander needs more information, he should ask, not peers, but other groups. For example, company commanders, generally captains, can provide information on lieutenants, even the ones not in their own rating chain.  The staff can provide reports on company commanders.  As far as the current regulations, removing the 49% restriction would change the values of rated officers’ strategies. Put in place during the last OER revision in 1997, it was meant to stop the inflation of every officer’s evaluation to ‘superior’ status.  This may now be resolved by allowing commander’s with small populations to break the 49% rule.  Still, considered separately, ‘peer evaluations’ would not give the commander the kind of information he needs to select ‘superior’ rated officers.

So why Game theory?  It can answer big questions:  Will the US win a war against China?  It’s only a matter of gathering the information required to validate your assumptions.  But game theory can help answer the mundane; questions about how best to gather information for evaluations. 


About the Author(s)

Major Phil Reynolds is a U.S. Army Civil Affairs officer.  He wrote this while attending the Naval Postgraduate School where he earned his Master of Science in Defense Analysis. He holds a B.A. from Saint Bonaventure University and an M.A. from the University of Oklahoma. MAJ Reynolds served with 1st Battalion, 319th Airborne Field Artillery Regiment and the 96th Civil Affairs Battalion (Airborne). He has worked in Africa, Iraq, and extensively in Central Asia. MAJ Reynolds is currently assigned to 3 BCT/25th ID.



Thu, 01/20/2022 - 3:19am

Hi, I need to dive more deeply into wagering world. Presently im attempting to play baccarat, check this page for more details, do you know a few hints about it? I need to bring in some cash on the web, in view of pandemic I lost my employment.


Fri, 01/25/2013 - 11:43am


Bravo! For applying game theory to what the author calls “the mundane.” Game theoretical models lie on the far left of the modeling spectrum- they are meant to simplify complex problems in order to allow relatively black and white policy options to become apparent. Incredibly complex applications of game theory, using high-end computers, do indeed model war between states, but can also be used for pencil and paper figuring. The Major’s thesis is that peer evaluations are bad and he proves it mathematically. This doesn’t mean that ‘peer evaluations’ could never be adopted. This research simply says this narrow application would not work. So the policy makers would go forward with a better idea of how to implement ‘peer evaluations.’ Of course, the major is slyly using a zero sum game to model human nature. I would recommend a non-zero sum game. I think the Major understands that it would quickly grow beyond a basic matrix, and require heavy data collection to support the strategies- the game would have far more than four per player. At that point, he could any number of commercial computer programs to run multiple iterations of the game. Excel solver is the most widely used in colleges. What I found most interesting was near the end. The Bender and Swistak article about evolutionary cooperation is probably more important than the actual game. I would like to hear more about how this applies to Army officers. Great article!

G Martin

Wed, 01/23/2013 - 10:52pm

I have advocated- and still do- that the 360 degree eval system results be noted in everyone's evals. And to require all subordinates, peers, and seniors to fill out the surveys for those in their respective commands. This would give the army a good indication of whether or not an officer is toxic or incompetent, untrustworthy or not a team player, and/or disloyal or incompetent. I'd say a certain threshold should be required for promotion and an even tougher one for command. Those who fear a "popularity contest" I submit do not understand or trust soldiers. And for the "game theory" argument- although I disagree it would apply- the peer portion should not play the sole determining factor for disqualifying one for promotion or command. We could even weigh the 360 evals accordingly (seniors more important, etc.)- or even only use them as another metric- along with the more traditional ones. I'd even be in favor of adding a local board for officers- conducted face-to-face by local seniors in order to aid in the vetting of promotions, etc. The centralized system is insane IMO.


Wed, 01/23/2013 - 12:22pm

A fascinating look, but a few things to note:

1. Rationality in the real world is an useful construct, but hardly accurate. Behavioral economics has proven the "irrational" nature of human decision making time and again (see Kahneman, etc). Nassim Nicholas Taleb has also written extensively about the fallibility of the "rational" human, especially when faced with incomplete information, or even worse, inaccurate information.

2. A two person game is certainly one iteration of this particular scenario, but hardly the only one. It seems unlikely that a "rational" commander would provide a prisoners dilemma type-scenario to two, and only two, Lt's. Instead, he would give evaluations to all his junior officers, with only the commander knowing the true intent. Thus the individuals on the fence would not know who they were evaluating and why, rank all their peers appropriately (of course, excluding themselves), and thus provide an accurate, ground level perspective for the commander. A wise commander would also solicit recommendations from their subordinates as well.

3. It seems a premature to write off an entire evaluation system using a very simplistic, single-iteration game as a model for the whole system. The real world of human dynamics and even irrational altruism would prove much of this analysis as incorrect. Indeed, a culture can shape how games are played. Similar games given to Japanese businessmen reveal an equilibrium of cooperation, maximizing results for all involved -- due to a culture of collective cooperation. Such a culture could be fostered by a good command climate, thus shifting the payoffs, and overall results. I tend to believe human nature tends toward a self-serving mindset, but the military preaches selfless sacrifice -- why would that ethos change when it comes to evaluations?