ArticlePDF Available

Getting it Right: Assessing the Intelligence Community's Analytic Performance

Article

Getting it Right: Assessing the Intelligence Community's Analytic Performance

Figures

No caption available
… 
No caption available
… 
Content may be subject to copyright.
AMERICAN
INTELLIGENCE
JOURNAL THE MAGAZINE FOR INTELLIGENCE PROFESSIONALS
__NMIA__________________________
Vol. 30, No. 2, 2012
Information Warfare
Page iAmerican Intelligence Journal Vol 30, No 2
American
Intelligence
Journal
The American Intelligence Journal (AIJ) is published by the National Military Intelligence Association (NMIA),
a non-profit, non-political, professional association supporting American intelligence professionals and the U.S.
Intelligence Community, primarily through educational means. The Board of Directors is headed by Lieutenant General James A.
Williams (USA, Ret), and the president of NMIA is Colonel Joe Keefe (USAF, Ret). NMIA membership includes active duty,
retired, former military, and civil service intelligence personnel and U.S. citizens in industry, academia, or other civil pursuits who
are interested in being informed on aspects of intelligence. For a membership application, see the back page of this Journal.
Authors interested in submitting an article to the Journal are encouraged to send an inquiry – with a short abstract of the text – to
the Editor by e-mail at <aijeditor@nmia.org>. Articles and inquiries may also be submitted in hard copy to Editor, c/o NMIA, 256
Morris Creek Road, Cullen, Virginia 23934. Comments, suggestions, and observations on the editorial content of the Journal are
also welcome. Questions concerning subscriptions, advertising, and distribution should be directed to the Production Manager at
<admin@nmia.org>.
The American Intelligence Journal is published semi-annually. Each issue runs 100-200 pages and is distributed to key govern-
ment officials, members of Congress and their staffs, and university professors and libraries, as well as to NMIA members, Journal
subscribers, and contributors. Contributors include Intelligence Community leaders and professionals as well as academicians
and others with interesting and informative perspectives.
Copyright NMIA. Reprint and copying by permission only.
THE MAGAZINE FOR INTELLIGENCE PROFESSIONALS
Vol. 30, No. 2 2012 ISSN 0883-072X
NMIA Board of Directors
LTG (USA, Ret) James A. Williams, CEO
Col (USAF, Ret) Joe Keefe, President
Mr. Antonio Delgado, Jr., Vice President
Dr. Forrest R. Frank, Secretary/Director
Mr. Mark Lovingood, Treasurer/Director
Col (USAF, Ret) William Arnold, Awards Director
Brig Gen (USAF, Ret) Scott Bethel, Director
MSgt (USAF, Ret) Thomas B. Brewer, Director
CDR (USNR, Ret) Calland Carnes, Chapters Director
Mr. Joseph Chioda, PMP, Membership Director
RADM (USN, Ret) Tony Cothron, Director
Lt Gen (USAF, Ret) David Deptula, Director
COL (USA, Ret) Jim Edwards, Director
Ms. Jane Flowers, Director
Col (USAFR, Ret) Michael Grebb, Director
COL (USA, Ret) Charles J. Green, Director
COL (USA, Ret) David Hale, Director
COL (USA, Ret) William Halpin, Director
Ms. Tracy Iseler, Director
MG (USA, Ret) David B. Lacquement, Director
Maj (USAF, Ret) Shawn R. O'Brien, Director
Mr. Louis Tucker, Director
Ms. Marcy Steinke, Director
COL (USA, Ret) Gerald York, Director
Editor - COL (USA, Ret) William C. Spracher, Ed.D.
Associate Editor - Mr. Kel B. McClanahan, Esq.
Editor Emeritus - Dr. Anthony D. McIvor
Production Manager - Ms. Debra Hamby-Davis
LTG (USA, Ret) Harry E. Soyster, Director Emeritus
RADM (USN, Ret) Rose LeVitre, Director Emeritus
LTG (USA, Ret) Patrick M. Hughes, Director Emeritus
Lt Gen (USAF, Ret) Lincoln D. Faurer, Director Emeritus
COL (USA, Ret) Michael Ferguson, Director Emeritus
MG (USA, Ret) Barbara Fast, Director Emeritus
Page iiiAmerican Intelligence Journal Vol 30, No 2
Table of Contents
AMERICAN INTELLIGENCE JOURNAL
The opinions expressed in these articles are those of the authors alone. They do not reflect the official position of the U.S.
government, nor that of the National Military Intelligence Association, nor that of the organizations where the authors are
employed.
Editor's Desk ................................................................................................................................................................................. 1
Lest We Forget: Five Decades after the Construction of the Wall
by Dr. David M. Keithly ................................................................................................................................................ 4
Teaching Integrated Information Power: The NDU-NIU Approach
by Dr. Daniel T. Kuehl and LTC (USA, Ret) Russell C. Rochte..................................................................................... 6
Arming the Intelligence Analyst for Information Warfare
by Maj (USAF) Robert D. Folker, Jr. ............................................................................................................................. 13
The Roots of Deception
by Dr. James A. Sheppard ............................................................................................................................................. 17
Geo-Profiling for Geospatial Intelligence: An Application to Maritime Piracy
by Lt Col (Italian AF) Vinicio Pelino and Capt (Italian AF) Filippo Maimone .............................................................. 22
The Other Fifty Percent: Psychological Operations and Women in Eastern Afghanistan
by Kailah M. Karl ......................................................................................................................................................... 28
Soft Power: The Limits of Humanitarian Intervention
by Dr. James E. McGinley, Louis P. Kelley, and Laura M. Thurston Goodroe ............................................................. 34
All That Jazz: CIA, Voice of America, and Jazz Diplomacy in the Early Cold War Years, 1955-1965
by Lt Col (USAF, Ret) James E. Dillard ........................................................................................................................ 39
A Labor Migrant's Handbook: A Looking-Glass into the State of Russia's Social Labor Divide
by LT (USN) Jason Gregoire ......................................................................................................................................... 51
Three Critical Factors in Intelligence Activity: Product, Process, and Personnel (The 3P Project)
by Ionel Nitu ................................................................................................................................................................ 57
Performing the Intelligence Mission Despite Large Budget Reductions
by John G. Schwitz ........................................................................................................................................................ 68
The Militarization of Intelligence: Can the Military Perform Covert Action More Effectively than the Intelligence Community?
by Stuart D. Lyle........................................................................................................................................................... 79
Intercultural Competence for Future Leaders: Addressing Cultural Knowledge and Perception at West Point
by Dr. Richard L. Wolfel and Dr. Frank Galgano .......................................................................................................... 84
Mexico’s Challenges: Why Colombia’s Solution to the Drug War Won’t Work in Mexico
by Meghan Harrison .................................................................................................................................................... 93
Getting It Right: Assessing the Intelligence Community’s Analytic Performance
by Welton Chang ......................................................................................................................................................... 99
Vol 30, No 2 Page iv American Intelligence Journal
A Shift in Leadership Praxis: A Creative and Innovative Leadership Journey of Transformation in a Military Intelligence Context
by Donald Patterson, Jr. ............................................................................................................................................. 109
Formal Modeling of Heterogeneous Social Networks for Human Terrain Analytics
by Dr. J. Wesley Regian.............................................................................................................................................. 114
In My View
To Win in Afghanistan, Destroy Pakistan’s ISI
by Howard Kleinberg ................................................................................................................................................. 120
Profiles in Intelligence series...
Joseph Goebbels: Propagandist
by Dr. Kenneth J. Campbell ........................................................................................................................................ 125
NMIA Bookshelf...
Robert Leiken’s Europe’s Angry Muslims: The Revolt of the Second Generation
reviewed by MAJ (USA) John M. Rose .................................................................................................................... 135
Peter Hart’s Gallipoli
Otto Liman von Sanders’ Five Years in Turkey
reviewed by Christopher N. Bailey ............................................................................................................................ 136
Amy Zegart’s Eyes on Spies: Congress and the United States Intelligence Community
reviewed by Charles Carey ........................................................................................................................................ 137
Eric Felton’s Loyalty: The Vexing Virtue
reviewed by Stephanie Wesley .................................................................................................................................. 138
Mark Owen’s No Easy Day: The Firsthand Account of the Mission that Killed Osama Bin Laden
reviewed by LTC (USA) Anthony Shaffer ................................................................................................................. 140
Table of Contents (Continued)
AMERICAN INTELLIGENCE JOURNAL
The opinions expressed in these articles are those of the authors alone. They do not reflect the official position of the U.S.
government, nor that of the National Military Intelligence Association, nor that of the organizations where the authors are
employed.
American Intelligence Journal Page 99 Vol 30, No 2
Getting It Right:
Assessing the Intelligence Community's Analytic Performance
by Welton Chang
EXECUTIVE SUMMARY
How often does the Intelligence Community “get it
right”? This article describes the need to evaluate
when, why, and how often analytic products make
accurate or inaccurate assessments. It then proposes a
specific methodology; the establishment of two
organizations—one at the Office of the Director of National
Intelligence and another at the agency level; and several
changes to institutional culture, which jointly serve to
address the lack of systematic evaluation of accuracy within
the IC.
INTRODUCTION
Again, how often does the United States Intelligence
Community (IC) “get it right”? We simply do not
know. Why cannot an enterprise with a roughly $75
billion budget answer this question? Despite reform and
oversight efforts since 9/11 and myriad commissions
examining intelligence failures, the IC has not developed a
way to determine when, how often, and why it makes the
right or wrong assessments. As former CIA Assistant
Director for Analysis and Production Mark Lowenthal points
out: “[I]n the aftermath of 9/11 and Iraqi WMD, and after the
promulgation of analytic standards, there still has not been
closure on the key question: How good is intelligence
supposed to be…?”1
The IC needs to develop a methodology, process, and
organization to evaluate the accuracy of its analytic
products. This methodology should be scientifically sound,
metrics-based, and both quantitative and qualitative. The
accuracy of predictive intelligence products should be
evaluated in the near, medium, and long term. Without
assessing the Community’s performance, we are essentially
stuck in “fire-and-forget mode,” unable to capitalize on
lessons learned from each mistaken or correct judgment.
While the IC occasionally and unsystematically evaluates
analytic tradecraft using product evaluation boards, the IC
should institutionalize the evaluation of analytic
performance. I recognize that some finished intelligence
products do not make predictive assessments and some
long-term predictions may take years to evaluate, but I argue
we should evaluate all products that make predictions about
the future.
In the private sector, customer feedback is crucial for
improving performance and customer opinions are often
reflected in the growth or decline of profit. However,
intelligence customers are too busy to provide frequent or
in-depth feedback, and the feedback we do receive usually
only describes whether a product provided new information
or useful insight.2 We cannot expect policymakers and also
warfighters to tell us how often we get it right.3 I argue that
the IC will improve its utility to policymaker and warfighter
customers by creating and implementing a process to
measure the accuracy of its predictive statements.
WHAT IS “GETTING IT RIGHT”?
Getting it right” requires both predictive accuracy
and sound analytic tradecraft. Did we make a
predictive assessment that assigned realistic
probabilities to outcomes and did we utilize consistent logic
to arrive at that assessment? At first glance, evaluating
whether or not someone got it right simply entails looking at
how closely predictions matched events.4 Professor Philip
Tetlock, in Expert Political Judgment, defines getting it
right as “assigning realistic probabilities to possible
outcomes.” A combination of this definition and “how close
internal understanding of a situation matches the reality of
that situation” forms the basis for defining what is “right.”5
A PHILOSOPHICAL DEBATE
UNRESOLVED, POSSIBLY
UNANSWERABLE
At its core, getting it right is an epistemological issue:
how do we know what we know and how do we
know when it is right?6 In order to answer these
questions, one must consult the philosophical tenets of
positivism, exploit modern technology, and consider
principles of perception and brain function.7 Post-modernist
questions of “whose right is right?” and “how can one truly
know?” are difficult to answer, and any response is usually
American Intelligence JournalPage 100
Vol 30, No 2
not capable of satisfying the most ardent critics of
positivism.8 Although post-modernists argue that what is
“right” is constructed and a matter of interpretation, science
has discovered a measurable universal reality.9 Moreover,
intelligence failures yield real consequences: planes crashed
into the World Trade Towers, the U.S. military did not find
WMD in Iraq, parts of the IC failed in their warning
responsibility that would have stopped Umar Farouk
Abdulmutallab from boarding a Detroit-bound flight, and so
on. Post-modernist philosophy cannot stop the next
terrorist attack, but sound inferential logic coupled with solid
evidence can.
WHY ASSESSING THE IC’S
PERFORMANCE IS NECESSARY
Intelligence analysis, when successful, is usually
predictive. Intelligence analysis can “assess the
significance of new developments,” “provide warning of
dangerous situations to policymakers,” and “develop
longer-term assessments of major political, military,
economic, and technical trends.”10 Former Deputy Director
of the CIA Richard J. Kerr goes on to emphasize that
“warning remains the principal rationale for having an
intelligence community.”11 Any organization that conducts
warning should consistently evaluate the accuracy of its
predictive statements.
Although the creation of a process and organization to
assess IC performance is not specifically prescribed in the
2004 Intelligence Reform and Terrorism Prevention Act
(IRTPA), the Office of the Director of National Intelligence
(ODNI) has recognized that “mak[ing] accurate judgments
and assessments”12 is an implied part of “exhibit[ing] proper
standards of analytic tradecraft.”13 Intelligence Community
Directive (ICD) Number 203 also does not lay out a way to
conduct an assessment of whether or not accurate
judgments were rendered.14
The development of a methodology and organization for
assessing the accuracy of intelligence analysis also
strengthens intelligence analysis as a discipline and the
intelligence field as a profession. CIA officers Rebecca
Fisher and Rob Johnston identified several shortfalls that
hold intelligence analysis back from full maturity as a
discipline, including falling short of the definition of a
“learning organization.”15 Accuracy ultimately ties into the
professionalization of the intelligence career field and its
associated production processes, which are making strides
toward a soundly scientific methodology.
As tradecraft evolves to mirror the rigorous process of the
scientific method, the IC should examine the effectiveness of
the techniques it uses to improve analysis. We have come a
long way even since 2005, when Johnston’s ethnographic
study of the IC noted that “a formal system for measuring
and tracking the validity or reliability of analytic methods”
would be elusive so long as analysts relied on idiosyncratic,
non-methodological approaches to conduct analysis.16 With
more robust analytic standards in place, it is necessary to
begin generating concrete data on whether or not analysis is
improving as the IC implements new tradecraft methods.
Without data, we cannot determine whether the IC’s focus
on analytic tradecraft standards has improved our predictive
accuracy.17
Data on predictive accuracy will enable us to evaluate our
performance not only as a community, but as individual
analysts. Analysts are not often notified when their
assessments are right or wrong. Analysts should remember
when and why they got it wrong so they can avoid past
errors. Analysts naturally spend their time looking forward,
continuing to write assessments on a constantly
approaching future, leaving them with little time to look
backwards to discover whether or not they were ultimately
right on a judgment. IC managers should use the accuracy
of an analyst’s judgments as a tool for professional
development.18
There are also cognitive benefits to assessing our accuracy
and recognizing errors which led to inaccuracy. Research
shows that the main difference between individuals with
higher or lower measures of intelligence relates to their
ability to understand and learn from mistakes.19 Although
scientists once believed people were born with a specific
level of intelligence, we now believe it is possible to actually
get smarter by learning from mistakes.20 Extrapolating from
this understanding, every evaluation of a discrete predictive
statement, whether the statement turns out to be right or
wrong, is a learning opportunity. If the statement turns out
to be right, it should reinforce the importance of sound
analytic tradecraft. If the statement turns out to be wrong,
identification of the reasons why it was wrong—unreliability
of underlying evidence, incorrect or unidentified key
assumptions, and so forth—should lead to lessons learned. 21
WHAT ARE WE EVALUATING?22
Evaluating the accuracy of finished intelligence
judgment statements should shed light on the overall
accuracy of the IC’s predictions. We should evaluate
every predictive product created at the strategic level, which
by some estimates numbers nearly 50,000 products per
year.23 At a minimum, we should consistently evaluate
assessments in flagship IC analytic products such as
National Intelligence Estimates (NIEs), Presidential Daily
Briefings (PDBs), Defense Intelligence Digest (DID) articles,
State/INR Assessments, and World Intelligence Review
(WIRe) articles.24 Although the IC also provides
American Intelligence Journal Page 101 Vol 30, No 2
assessments in numerous other product lines, the flagship
products contain the assessments we have determined to be
most important for our customers—h ence, those are the
products we should evaluate.
To fully evaluate the accuracy of flagship products, we
should extract all assessment statements from the product.
There are often multiple “key judgments” and
“assessments”—some buried inside the article, some stated
up front, some highlighted at the end—but all are
predictions necessitating evaluation.
ESTABLISHING A BASELINE TO GIVE A
DOSE OF REALISM
By having a baseline understanding of the IC’s
predictive capability, a policymaker will be able to
better judge the reliability of the intelligence analysis
put in front of him/her. As a result, some intelligence
products may be ignored—and rightly so, if the track record
shows that the products and the analysts/organization that
produced them are more often wrong than right. However, it
will engender greater trust in production lines that generally
do get it right.25
To create a baseline understanding of IC accuracy, we will
need to evaluate the past accuracy of IC production. We
will be unable to determine how well we are performing now
if we do not have an accurate understanding of how we
performed in the past. Determining our accuracy before and
after reform efforts will also allow us to measure our rate of
improvement, such as starting around 2004 when tradecraft
improvements were mandated by national-level
investigations. By keeping score of when a prediction is
right or wrong, we can ensure that even a strike-out will
result in adjustments that help us get a hit the next time up at
bat.
PROPOSED SOLUTIONS: HOW DO WE GET
THERE FROM HERE?
The proposals in this article largely follow theories of
organizational learning and knowledge management
that have been studied in academia and practiced in
the business sector. The proposal should be implemented at
the ODNI level to maintain focus, consistency, and
objectivity. Initially, the recommendations I propose could
be executed as a pilot program at a single agency with
minimal expenditure of funds and commitment of personnel.
A full implementation of the recommendations contained
herein could be accomplished with current IC personnel or
by contracting out the work to an organization such as
RAND or MITRE.
The following recommendations entail evaluation
methodology, organizational improvement,
institutionalization of lessons learned, improved analytic
training, technological solutions, and growing a culture that
rewards well-calibrated judgment.26 Johnston recommended
a performance improvement model that includes “measuring
actual analytic performance to create baseline data,
determining ideal analytic performance and standards,
comparing actual performance with ideal performance,
identifying performance gaps, creating interventions to
improve analytic performance, and measuring actual analytic
performance to evaluate the effectiveness of
interventions.”27 I argue we should operationalize
Johnston’s improvement infrastructure combined with
Tetlock’s previous work on judging judgment.
EVALUATION METHODOLOGY AND
ORGANIZATIONAL SOLUTIONS: TOP
ACES AND HONEST ABE
Some elements of the IC are already examining whether
or not analysts employed solid analytic tradecraft in
their finished intelligence. Elements such as DIA’s
Product Evaluation Boards and CIA’s Product Evaluation
Staff perform reviews of analytic integrity and sound
tradecraft.28 However, this is not enough. Organizations
tasked with prediction and warning should not exist without
an internal mechanism to assess the accuracy of their
primary outputs, and the mechanism I propose will remedy
this problem.
With the amount of finished intelligence produced by the IC,
evaluation for predictive accuracy will be a difficult
undertaking. However, more data will yield a more accurate
assessment of our predictive ability, especially in making
low-probability calls. The approach can be centralized,
decentralized, or a combination of the two. I believe that
centralization of the process will lead to the most effective
solution.29
Author(s) Years of
total IC
experience/
years of
target
experience
Office Agency Collaborat
ors Review
Chain Topic Structured
analytic
techniques
or
alternative
analyses
Time
Horizon for
each Key
Judgment
Key
Judgment
and
confidence
score
To be filled
in by ACE:
Was Key
Judgment
right?
(calibration
score,
disc rimina t
ion score)
Table 1. Form for evaluating judgments30
In order to conduct centralized analytic evaluations,
Analytic Cells for Evaluation (ACEs) should be established
at each of the three all-source agencies (DIA, CIA, and
State/INR) with each ACE responsible for evaluating the
finished intelligence products. Analysts would fill out the
above table and personnel in the ACE would compile the
data. If the table was not filled out prior to submission due
American Intelligence JournalPage 102
Vol 30, No 2
to time constraints, ACE personnel could go back and fill in
the information. Just filling out the form would force the
analyst(s) to review whether or not they assigned a realistic
probability to the key judgments.
For evaluations to work, predictions need to be falsifiable.
Three general components need to be present: (1) a clearly
predicted event, (2) a probability of this event occurring, and
(3) a time horizon for the predicted event to occur. Most
assessments, as they are currently written, do have these
components. A statement such as “we judge, with high
confidence, that overall violence levels in Iraq will decrease
over the next three months” can be proven right or wrong
and can be scored.31
To further facilitate the measurement of accuracy, ODNI also
should develop a common judgment vocabulary to be used
by the all-source agencies. Expressions of certainty by CIA,
DIA, and State/INR are not currently easily compared,
although each organization does have an internal sliding
ruler placing estimative words on the scale of uncertainty to
certainty. The IC’s estimative language is already nearly
ready to be quantified on a numerical scale. For example,
DIA’s “What we mean when we say” scale provides seven
values for likelihood words between “certainty” and
“impossibility.”32
Assessments should be scored for accuracy in calibration.33
Calibration is defined as the extent to which predicted
probabilities for an event matched the actual rate of
occurrence in the real world (i.e., timing). For example, an
analyst writes a product on whether or not country X will
test a nuclear weapon. The analyst could predict “no” each
month until the month of the actual test and then predict
“yes” for that month. The analyst’s predictions would be
considered perfectly calibrated.34 Calibration can be scored
based on the time horizon delineated in the predictive
statement.35
ACEs should collect metrics on the analytic
performance of their agencies down to the
regional and functional office level.
Assessments also should be scored for accuracy in
discrimination. Discrimination is assigning realistic
probabilities to the occurrence or non-occurrence of an
event.36 A highly discriminatory assessment, for example,
would be a high-confidence assessment that country X
likely possesses a nuclear weapon. When we later learn that
country X does have a nuclear weapon, the analyst would
be rewarded for making a call and assigning an accurate
probability to the occurrence or non-occurrence of an event.
If an analyst consistently hedges, then that analyst would
have a low discrimination score.
ACEs would report directly to each agency’s director of
analysis. For example, DIA’s ACE would evaluate DID
articles and CIA’s ACE would evaluate WIRe articles. If
ACEs were created out of hide, they could be staffed on a
rotational basis similar to joint duty rotations. ACEs should
collect metrics on the analytic performance of their agencies
down to the regional and functional office level.
I also propose the establishment of an ODNI Analytic
Branch for Evaluations (ABE) to oversee the analytic
evaluation process conducted by ACEs. The ODNI ABE
should establish standards and a common set of criteria for
evaluating products, and should keep metrics on the IC as a
whole. Reports should be systematically published on
analytic accuracy, trends, and how improvements can be
made.
Table 2. Workflow for ACEs and ABE
THE ACE WORKFLOW
The cell should extract judgments that are made in
products. These judgments would be databased and
categorized by confidence level, expected duration of
prediction, whether or not alternative analyses were
attempted, whether structured analytic techniques were
employed to arrive at judgments, and also data regarding the
principal authors and those in the review chain. The
database would also include (when possible) whether or not
the intelligence was acted upon, the outcome if it was, and
any feedback received from customers.
How would this work? DIA’s 10-person ACE receives the
day’s DID articles. The ACE analysts are assigned accounts
both regionally-based (geography) and functionally-based
American Intelligence Journal Page 103 Vol 30, No 2
(cross-cutting issues like proliferation). The Middle East
regional analyst receives two articles to file for evaluation
that day—one on events in Iraq that will take place in the
next two weeks, and another on events in Syria that are
expected to come to fruition within a year. Each is filed by
the estimation of when it should be examined for accuracy
(time horizon). All of the judgments in the article are
extracted if they were not already included in the form in
Table 1.37 The analytic tradecraft utilized (such as
alternative analyses) can be examined and annotated that
day.
A month later, the ACE analyst revisits the Iraq article (this
could be an automated notification from the prediction
database) and finds that the article’s predictions were
accurate (both well-calibrated and discriminatory). The
evaluation is processed, databased, and passed to the home
office. The author’s and the office’s accuracy ratings
increase. The agency ACE compiles all of the assessments
as they become available and forwards them to the ODNI
ABE for review and further databasing.
With 50,000 pieces of finished intelligence produced each
year (of which some is assuredly duplicative),38 there exists
plenty of data to study. We must be careful to avoid the
problems of endogeneity and selection bias in the prediction
datasets in order to ensure that our inferences about
judgment accuracy are scientifically sound.39 ACEs must
also take into account that some outcomes are easier to
predict and other outcomes are more difficult to predict, by
grading on a curve. The curve could be based upon
comparisons of an analyst’s predictive accuracy against the
accuracy of other analysts looking at the same account or
with similar training/biographic backgrounds.40
Reasons for inaccurate predictions are
numerous, including analytic bias,
politicization of judgment, failure of
imagination, lack of adequate time and
resources to analyze a complex problem,
lack of data, misreading data, and failure
to consider all pertinent data.
Additionally, evaluating accuracy would be difficult if an
analyst does not make a prediction in the first place. For
instance, if a country were to test a nuclear weapon but the
analyst never said that they would not, was the analyst
wrong? Or what if the analyst simply stated that the country
could test at any time? One possible solution would be to
have analysts make regular predictions (once a week or once
a month) on relevant and high-interest issues to their
accounts. These predictions would be aggregated,
traceable, and public. This could stop analysts/
organizations from trying to protect their accuracy rates by
simply not making difficult predictions.
Judgments should be evaluated based upon whether they
were right or wrong, and also why they were right or wrong.
That is where the immediate tradecraft evaluation comes in.
Reasons for inaccurate predictions are numerous, including
analytic bias, politicization of judgment, failure of
imagination, lack of adequate time and resources to analyze a
complex problem, lack of data, misreading data, and failure to
consider all pertinent data. ACEs should examine whether or
not structured analytic techniques and alternative analyses
were used and whether or not they helped the accuracy of
the bottom-line assessment.
USING WHAT WE LEARN FOR TRAINING
REGIMENS AND INCENTIVES
It is imperative that the ABE develops lessons learned
based on the outcome of ACEs’ evaluations of accuracy,
and that these lessons are fed back into the analytic
courses that all analysts must attend at their respective
agencies.41 Why is this so important? Cognitive biases that
affect analysts include reporting bias,42 confirmation/
expectation bias, anchoring, resistance to change, attribution
error bias,43 loss aversion bias,44 availability bias, mirror-
imaging,45 clientism, layering, overconfidence/unjustified
certainty, seeking out consistency, being unable to change
perceptions despite previously discredited information—and
the list goes on.46 Despite the fact that “review after review
of adjudged intelligence failures has found cognitive bias to
have been a—if not the—principal factor,”47 we do not have
a way to measure how well our current training counteracts
the negative effects of cognitive biases. Not only will
evaluating products let us know how well we are performing,
it will also tell us how well our efforts to improve are
progressing. New areas of training curricula and additional
areas of emphasis should come out of the ACEs and ABE.
Lastly, an incentive structure should be devised to reward
good judgment as it is discovered by the ACEs and the
ABE. Whether through monetary or recognition awards,
junior analysts and senior leadership alike should be
motivated to seek good judgment. We also must be careful
not to create disincentives for making strong judgments, and
IC leadership will need to make a serious effort to change the
typical IC responses to making analytic mistakes. IC
leadership should acknowledge that making mistakes is
possibly inevitable given our line of work, and that as long
as we learn from our mistakes, they are okay. At the end of
the day, we should not just reward working hard; we should
also reward being right, and use the knowledge we gain
about when and why we are right to become right more
often.
American Intelligence JournalPage 104
Vol 30, No 2
CHANGING THE CULTURE OF THE
COMMUNITY
The adage “what gets measured gets done” is
particularly appropriate in the IC. The only routine IC
self-measurement is output, and normally only for
year-end performance appraisals. If we do not measure
getting it right, then we mean to say that it either is not
important enough to be measured or that we do not care
whether we get it right or not. However, if we start
measuring whether or not we get it right, analysts will
gravitate toward practices and techniques that get them
closer to the right answer because their evaluations depend
on it.48 This dovetails nicely with the idea of setting up an
incentive structure that rewards accuracy. Right now our
incentive structure is skewed toward output—those who
write the most and the most quickly generally get bigger
bonuses, more recognition, and promotions. This current
incentive scheme creates a structure that supports the
creation of thousands of products a year that are of
questionable impact on the safety of America. Since analytic
quantity does not usually correspond with analytic quality,
more, in this case, is not better.
In order for ACEs and the ABE to be successful, IC culture
with regard to self-critical examinations of rightness has to
be changed. Cultural shifts or maintenance of the status quo
can be top-down driven and often this is the case. What
this means is that leadership plays a large role in defining
culture, especially in a hierarchical organization like the IC
(which still maintains an industrial-era model for production,
with papers pushed from one editor to the next like a Ford
assembly line). When leadership emphasized innovation
and stated that innovation would be valued and rewarded,
and those who sought to hold back innovation would be
punished, there was more innovation in the IC. Likewise, if
leadership wanted to emphasize “getting it right,” it could do
so by putting policies in writing that would make “getting it
right” important, rewarding those who “got it right” the
most, and promoting the best forecasters.
COUNTERARGUMENTS AND REBUTTALS
There are numerous counterarguments to the above
proposed solutions. One counterargument is that
assessments of rightness are too hard to do. There is
too much data to be analyzed, the judgments as currently
written are too ambiguous to be analyzed in such a concrete
way, events do not always lend themselves to a binary “yes
or no” assessment, and the accuracy of some assessments
could be ambiguous for years to come. Constructing
falsifiable predictions with a common vocabulary can help
address these concerns. Moreover, if an issue was
important enough to write about in the first place, then it is
worth expending the energy to assess the predictions made
about it. While the reasons behind why a call was right or
wrong sometimes may be difficult to interpret, the prediction
and outcome should not be as ambiguous. Finally, we need
to be patient with regard to evaluating the effectiveness of
evaluations. It may be that we are able to predict a lot of
minor tactical events but always miss the long-term trends,
or vice versa. Evaluations are not a fire-and-forget weapon;
they need to be monitored and groomed to provide visibility
on those long-term and high-impact predictions that are
made or not made.
Another counterargument claims that holding analysts and
supervisors accountable for analytic judgments will
disincentivize making strong calls. I believe that leadership
emphasis on making “strong calls” is misguided and
leadership should instead focus its efforts on ensuring that
analysts make “clear calls.” Strong calls generally lead to
greater discrimination in the occurrence or non-occurrence
of an event. However, if there simply is not enough data to
support a call, a strong call will become a wrong call.
Forcing unwarranted strong calls may incentivize analysts to
make only patently obvious predictions. By emphasizing
accuracy of calibration, analysts will be incentivized to make
difficult, non-obvious, low-probability, high- impact
estimates.49
How do we ensure that we do not succumb to the temptation
to make predictions on only the most obvious of calls, or
make only ambiguous predictions? I believe that this can
happen, but I contend that we can take steps to avoid it.
This disincentive will ultimately have to be addressed by
leadership and internalized by developing a culture that
embraces continual self-improvement. IC leaders need to
encourage and promote analysts and supervisors who make
predictive calls that go beyond the patently obvious. IC
leaders also need to protect analysts when policymakers
demand a strong assessment even when the supporting data
are too ambiguous to do so.
Lastly, we might find out that it is not possible to be
predictive on the issues that really matter. We may find out
that we are better off tossing a coin than making a
prediction. If this is the outcome, then we need to be honest
with ourselves and the policymakers we support on the
efficacy of the entire prediction enterprise.50
CONCLUSION
It is possible that, after closely examining and evaluating
the accuracy of the IC’s warnings, we will not arrive at
any significant conclusions about how well we are doing.
It is possible that information availability bias will influence
the types of predictions and assessments we are able to
evaluate in the near and medium terms. It is possible that
American Intelligence Journal Page 105 Vol 30, No 2
only years after a number of predictions are made will we
have enough data to evaluate IC performance with a
significant degree of confidence.
But that does not mean we should not try. While the IC has
been very self-critical, especially following major warning
failures, the IC has not institutionalized the process for
consistent, comprehensive, and ongoing self-examination of
analytic accuracy. To be sure, these extensive reactive
responses to IC-wide problems have led to innovations and
changes that have improved the intelligence enterprise. We
have identified and are addressing issues of stovepiping,
lack of coordination and collaboration, politicization of
analysis, and reliance on questionable evidence; however,
the feedback loop is not complete. We owe it to the
policymakers whom we support to be self-critical and
innovative. We owe it to ourselves to be honest about
actual performance. We owe it to the men and women of the
U.S. Armed Forces who are sent into harm’s way based on
the intelligence assessments we provide. Finally, we owe it
to the rest of the American public not to hold anything
back—because their safety is the entire reason for our
existence, and their tax dollars pay our salaries and fund our
enterprise. Let us not forget these factors when the
naysayers tell us that it cannot be done.
Notes
1 Mark M. Lowenthal, Intelligence: From Secrets to Policy
(Washington, DC: CQ Press, 2009), p. 149. This problem was also
described by Lowenthal in 2005. “At the time of this writing, the
intelligence community is still learning its ‘lessons.’ It would appear
that many of them involve managerial and review safeguards and
tradecraft. But the intelligence community, unlike its military
colleagues, has no institutionalized capability to learn from its
mistakes, no system through which it can assess both its successes
and its failures. This is not to suggest that intelligence analysis and
military operations are the same, but the ability to determine how
close one is to the ideal is extremely useful and worth examining as one
way to transform intelligence analysis.” Mark M. Lowenthal,
“Intelligence Analysis: Management and Transformation Issues,” in
Jennifer E. Sims and Burton Gerber, eds., Transforming U.S.
Intelligence (Washington, DC: Georgetown University Press, 2005),
p. 234.
2 At the strategic level, “the community receives feedback less often
than it desires, and it certainly does not receive feedback in any
systematic manner, for several reasons. First, few people in the
policy community have the time to think about or to convey their
reactions. They work from issue to issue, with little time to reflect on
what went right or wrong before pushing on to the next issue. Also,
few policy makers think feedback is necessary. Even when the
intelligence they are receiving is not exactly what they need, they
usually do not bother to inform their intelligence producers.” Mark
M. Lowenthal, Intelligence: From Secrets to Policy (Washington, DC:
CQ Press, 2009), p. 64. At the tactical level, the feedback can be
nearly instantaneous and authoritative, with battle damage
assessments showing whether intelligence enabled air support to
destroy the right building or not, or an operator on the ground calling
over the radio to tell someone whether or not his/her intelligence
assessment led them to capture the right person. Even at the tactical
level, however, there is no systematic process for evaluating either the
accuracy of judgments or a database where these judgments are stored.
Therefore, while one may have captured the right (targeted) person,
this does little good in the long run if the right person was not actually
a terrorist/criminal.
3 Feedback from policymakers, both positive and negative, is
important (this product was what I wanted and helped me make a
decision vice this product is not what I requested and is confusing),
but this feedback does not track whether the intelligence assessment
reflected reality. Just because a policymaker felt that a product was
helpful in making a decision does not mean it was helpful in making
the right choice. It could have pushed a policymaker to make or
consider the wrong choices.
4 Alternatively, Robert Jervis defines intelligence failure as “a
mismatch between the estimates and what later information reveals.” I
believe that this gap is not necessarily an intelligence failure as much as
it is just getting it wrong, as an intelligence failure implies a major
policy debacle vice the hundreds of judgments made each day by
analysts across the IC. Robert Jervis, Why Intelligence Fails: Lessons
From the Iranian Revolution and the Iraq War (Ithaca, NY: Cornell
University Press, 2010), p. 2. Lowenthal differentiates between
tactical surprise, which is “not of sufficient magnitude and importance
to threaten national existence,” and strategic surprise, although he
notes that “repetitive tactical surprise”… “suggests some significant
intelligence problems.” Mark M. Lowenthal, Intelligence: From
Secrets to Policy (Washington, DC: CQ Press, 2009), p. 3.
5 Philip E. Tetlock, Expert Political Judgment: How Good Is It? How
Can We Know? (Princeton, NJ: Princeton University Press, 2005),
pp. 7, 16. It is important to note that Tetlock’s research showed that
many purported experts were no better at predicting outcomes than
dart-throwing monkeys. Expertise did not necessarily translate to
predictive ability and fame actually decreased predictive ability.
These conclusions should trouble any aspiring career intelligence
analyst looking to become a better analyst simply by working an
account for a long time.
6 James Bruce outlined five ways of knowing, through authority, habit
of thought, rationalism, empiricism, and science. Science is based
upon the premise that we can use the facts we know to learn about the
facts we do not know. This proposal also believes it necessary to
take into account what has been defined by Dan Ariely as “predictable
irrationality,” ways in which human beings behave irrationally in a
predictive way and also advocates for better IC education on
understanding the underlying motivations and behaviors of adversaries
to prevent what Robert Jervis calls the Rashomon effect. Richards
Heuer, in his seminal work Psychology of Intelligence Analysis,
recommends in the final pages of the publication that analysts
essentially utilize the scientific method to improve the quality of their
analysis. James B. Bruce, “Making Analysis More Reliable: Why
Epistemology Matters to Intelligence,” Roger Z. George and James B.
Bruce, Analyzing Intelligence: Origins, Obstacles, and Innovations
(Washington, DC: Georgetown University Press, 2008), pp. 172-
178. Richards J. Heuer, Jr., Psychology of Intelligence Analysis
(Washington, DC: Center for the Study of Intelligence), pp. 173-178.
Dan Ariely, Predictably Irrational: The Hidden Forces That Shape
Our Decisions (New York, NY: HarperCollins, 2009), p. xxx. Robert
Jervis, Why Intelligence Fails: Lessons From the Iranian Revolution
and the Iraq War (Ithaca, NY: Cornell University Press, 2010), p.
177. Gary King, Robert O. Keohane, and Sidney Verba, Designing
Social Inquiry: Scientific Inference in Qualitative Research (Princeton,
NJ: Princeton University Press, 1994), p. 46.
American Intelligence JournalPage 106
Vol 30, No 2
7 Webster’s defines positivism as a theory that theology and
metaphysics are earlier imperfect modes of knowledge and that
positive knowledge is based on natural phenomena and their
properties and relations as verified by the empirical science.
8 A logical endpoint of the post-modern idea that everything is
constructed and that there is no agreed upon reality can be nihilism.
9 For an argument on how to quantify the unquantifiable, see Philip E.
Tetlock, Expert Political Judgment: How Good Is It? How Can We
Know? (Princeton, NJ: Princeton University Press, 2005), pp. 10-24.
While most social scientists agree that constructivist theories
contribute important perspectives to international relations,
constructivists themselves have not even established a common
definition for constructivism. I believe self-critical examinations of
one’s assumptions and experiences, along with examinations of one’s
evidence informed by constructivist thought, are valuable. For a
summary treatment of post-modernist and constructivist critiques in
international relations theory, see Maja Zehfuss, Constructivism in
International Relations: The Politics of Reality (New York, NY:
Cambridge University Press, 2002), pp. 7-22, 259-263.
10 Richard J. Kerr, “The Track Record: CIA Analysis from 1950-
2000,” in Roger Z. George and James B. Bruce, eds., Analyzing
Intelligence: Origins, Obstacles, and Innovations (Washington, DC:
Georgetown University Press, 2008), p. 36.
11 Kerr, “The Track Record: CIA Analysis from 1950-2000,” p. 51.
12 “Intelligence Community Directive Number 203, Analytic
Standards,” Office of the Director of National Intelligence, 21 June
2007, p. 4.
13 U.S. Congress, Intelligence Reform and Terrorism Prevention Act,
Section 1019(a), 2004.
14 While it is important to use structured analytic techniques and
alternative analyses when appropriate, the feedback loop is not
complete until an evaluation is made as to whether or not any of these
techniques actually led to the correct judgment.
15 Fisher and Johnston utilized Peter Senge’s definition of a learning
organization. Senge defined learning organizations as “environments in
which people continually expand their capacity to create the results
they truly desire, where new and expansive patterns of thinking are
nurtured, where collective aspiration is set free, and where people are
continually learning to see the whole together.” Rebecca Fisher and
Rob Johnston, “Is Intelligence Analysis a Discipline?” in Roger Z.
George and James B. Bruce, Analyzing Intelligence: Origins,
Obstacles, and Innovations (Washington, DC: Georgetown
University Press, 2008), p. 65.
16 Rob Johnston, Analytic Culture in the US Intelligence Community:
An Ethnographic Study (Washington, DC: CIA Center for the Study
of Intelligence, 2005), p. 18.
17 It is possible that we may find that some structured analytic
techniques we teach as part of the tradecraft toolkit do not yield
accurate analysis. We may not actually have a good grasp on what
“solid tradecraft” is due to a lack of self-evaluation. If an analyst’s
“unorthodox” tradecraft produces consistently better accuracy than
traditional methods, then it suggests traditional methods are flawed.
18 Johnston, Analytic Culture in the US Intelligence Community: An
Ethnographic Study, p. 108.
19 Jonah Lehrer, How We Decide (New York, NY: Houghton Mifflin
Harcourt, 2009), p. 45.
20 See David Shenk, The Genius in All of Us: Why Everything You’ve
Been Told About Genetics, Talent, and IQ Is Wrong (New York, NY:
Doubleday, 2010). See also Kruger, J.M., & Dunning, D., “Unskilled
and unaware of it: How difficulties in recognizing one’s own
incompetence lead to inflated self-assessments,” Journal of
Personality and Social Psychology 77, 1999, pp. 1121-1134.
21 We now have a better understanding of how a person learns by
studying brain chemistry (specifically dopamine receptivity). One of
the most important ways learning occurs is by making mistakes and
learning from those mistakes. Journalist Jonah Lehrer’s discussion of
this revolved around the differences between Deep Blue, the highly
calculating IBM computer that ultimately defeated reigning champion
Garry Kasparov in chess via brute force tactics, and TD-Gammon, a
computer program designed to learn from its mistakes when playing
backgammon. TD-Gammon ultimately rose to the skill level of Bill
Robertie, one of the world’s champion backgammon players, doing so
without having to resort to developing supercomputer-like processing
capability. TD-Gammon’s superior performance came from its
ability to learn from mistakes like a human being, vice brute force
tactics that are impossible for a human being to replicate. Evaluation
of analytic products will be able to reinforce good practices and
discourage poor ones, with analytic improvements becoming driven
by the self-sustaining and non-bureaucratic granular process of
dopamine response. Jonah Lehrer, How We Decide (New York, NY:
Houghton Mifflin Harcourt, 2009), p. 41. See also Andrew Smith,
Ming Li, Sue Becker, and Shitij Kapur, “Dopamine, prediction error
and associative learning: A model-based account,” Network:
Computation in Neural Systems (Taylor and Francis Group, March
2006, 17), pp. 61-84.
22 While I attempt to specifically address shortcomings in evaluating
finished intelligence, I also recognize there are shortcomings in the way
the IC approaches evaluating raw single-source intelligence reporting.
Some single-source agencies such as NSA and NGA also make
assessments based on single-source reporting. I argue that, while these
assessments give all-source analysts some context with which to look
at the reporting, many of these assessments turn out to be wrong
because single-source agencies simply do not have all of the available
information on an event. Currently, reporting evaluation does not
address the ultimate rightness or wrongness of a source’s information.
Grading of reporting is assigned based upon factors including
corroboration through reporting from other sources within or outside
that discipline, if the report met the threshold of being high value
because of the number of times it was viewed by analysts, if the
report was cited in a finished intelligence product, or if the report
provided new insight to policymakers. While some of this
information may be important second-order indicators of the value of
a specific report, none of them ultimately addresses the rightness or
wrongness of the information contained therein. While I recognize
that there are issues with the objectivity and “rightness” of the
reporting building blocks that we use to make assessments in analytic
finished intelligence, it is not within the scope of this article to address
these issues.
23 Dana Priest and William Arkin, “A Hidden World, Growing
Beyond Control,” The Washington Post, July 19, 2010.
24 This article also will not address whether the “what” of the
intelligence products in question is appropriate. As outlined in
“Fixing Intel,” the “getting it right” concept proposed by Michael
Flynn, Paul Batchelor, and Matthew Pottinger refers more to ensuring
that the right information is being collected, analyzed, and produced,
but not whether that information itself is true. What the proposed
evaluation solutions will do is bolster and reinforce the fixes proposed
by “Fixing Intel” by ensuring that analysts are making accurate key
judgments which should ultimately answer the right questions posed
by policymakers. Michael Flynn, Paul Batchelor, and Matthew
Pottinger, “Fixing Intel: A Blueprint For Making Intelligence Relevant
American Intelligence Journal Page 107 Vol 30, No 2
in Afghanistan,” (Washington, DC: Center for a New American
Security, 2010). Another issue that must be pointed out is the idea
that of corroborated intelligence, in most cases a report from one
discipline that is correlated with another report from another
collection discipline. While most analysts would argue that
corroboration is a strong indicator of rightness, this author wonders
how often corroborated intelligence turned out to be correct.
25 The field of weather forecasting provides a useful example of what
the IC calls competing product lines. Many weather organizations
possess their own predictive models which generate different
scenarios for the weather. Knowing which organizations get it right
more often than others helps people know which organizations to
trust more. A company called Forecast Watch “calculates the
accuracy, skill, and quality of weather forecasts. [It] collect[s] over
40,000 forecasts each day from Accuweather, the National Weather
Service, MyForecast, Weather.com, and others for over 800 U.S. cities
and 20 Canadian cities and compare[s] them with what actually
happened.” ForecastAdvisor, “Frequently Asked Questions,” http://
www.forecastadvisor.com/docs/about/, accessed July 24, 2010. By
doing so, Forecast Watch can produce accuracy rates by organization
and geographic area. This is not to say that baselining the IC’s
predictive abilities will be easy. Data inputs for weather forecasting
are highly accurate, obtained via a network of automated sensors, and
processed by powerful computers utilizing complex weather
algorithms. Indeed, weather forecasting differs from intelligence
production in many ways; the weather is not actively attempting to
deny and deceive weather forecasters, for instance. However, the IC
can improve its ability to integrate lessons learned taking a page from
the weather playbook; weather algorithms are tweaked based upon the
performance of their models against what actually happened.
26 Much of what follows builds on the work of Philip Tetlock and
Rob Johnston. The author is greatly indebted to Professor Tetlock
and Dr. Johnston as their work provided much of the background and
insight necessary to develop a possible solution to a difficult problem.
27 Johnston, Analytic Culture in the US Intelligence Community: An
Ethnographic Study, p. 108.
28 ODNI’s National Intelligence Analysis and Production Board also
performs some oversight and standardization functions related to
analytic integrity and tradecraft.
29 Alternatively, decentralizing the process could also work. For this
to work all products should be posted to the Library of National
Intelligence (LNI). From a software technology standpoint, each
product page should include a comments and rating section where any
analysts, especially those that work the target set, can comment on
the soundness of the tradecraft utilized in creating the product and
whether or not the product was ultimately borne out by events. An
option should be included, even after an analytic product is evaluated,
for others to dispute the assessment. A scraping tool could then be
built to automatically aggregate the judgments that were evaluated or
highlighted by users and also whether or not they were rated as being
correct by users. The product metadata should also contain detailed
and relevant author data, to include years of analytic intelligence
experience and also the review chain for the product. Through this
data, specified feedback can be given to not only the analyst(s) who
wrote the product but also the analyst’s supervisory chain (usually a
number of individuals). Leadership should closely monitor these
evaluations for rigor and to prevent the problem referred to as
“trolling,” where analysts feel empowered to write unfounded
controversial comments for self-entertainment because they are behind
a computer screen and not addressing the issue in person. However, it
is the view of the author that, given time constraints and other
primary responsibilities, it is not possible to expect all products to be
evaluated in this manner.
30 Tetlock utilized a confidence scale of 0-1.0 at 0.1 increments and
quantified his research subjects’ predictions using this scale. This was
difficult for Tetlock but will be easier for the IC as we already have a
rough scale and could refine the scale based on developing and
implementing a common judgment vocabulary. Once a common
vocabulary is established for judgments, key judgments with qualifiers
such as “impossible” or “with certainty” can be quantified utilizing
such a scale. Tetlock, Expert Political Judgment: How Good Is It?
How Can We Know? p. 45.
31 Designing a well-thought-out scoring method is difficult but not
impossible. For an example of how weather forecasters are scored, see
Glenn W. Brier, “Verification of Forecasts Expressed in Terms of
Probability,” in James Caskey, ed., Monthly Weather Review, January
1950, Vol. 78, Issue 1.
32 Tetlock, Expert Political Judgment: How Good Is It? How Can We
Know? p. 45.
33 Tetlock, Expert Political Judgment: How Good Is It? How Can We
Know? pp. 47-49.
34 Calibration could also be scored in the aggregate for offices, agencies,
and the entire IC. Calibration is the extent to which one’s predicted
probabilities for each event match the actual rate of its occurrence in
the real world. Therefore, if one says something has a 75% chance of
happening, those predictions should “come true” 75% of the time.
That would be perfect calibration. Any higher or lower actual rate of
occurrence of one’s 75% predictions would be an error.
Consequently, if one makes 100 predictions at 75% confidence, and
60 events occur, one would be off by 15%.
35 This type of evaluation punishes the alarmist “Chicken Littles”
who roam the IC crying wolf and only get away with this type of
behavior because no one has kept score of the times when they were
wrong.
36 Tetlock, Expert Political Judgment: How Good Is It? How Can We
Know? pp. 51-54.
37 IC leadership has pushed for all analytic products to make “strong
calls” (judgments) even when data are ambiguous and incomplete.
Keeping track of “calls” would discourage unfounded “calls” based on
ambiguous and incomplete data. Analysts should strive to make
“clear calls,” especially when justified by solid analysis of existing
data, employing good tradecraft and contextual knowledge.
38 Dana Priest and William Arkin, “A Hidden World, Growing
Beyond Control,” The Washington Post, July 19, 2010.
39 King, Keohane, and Verba, Designing Social Inquiry: Scientific
Inference in Qualitative Research, 139; 191.
40 Tetlock identifies five challenges to developing a correspondence
theory of truth to identify good judgment: challenging whether the
playing fields are level, challenging whether forecasters’ “hits” have
been purchased at a steep price in “false alarms,” challenging the equal
weighting of hits and false alarms, challenges of scoring subjective
probability forecasts, and challenging reality. Tetlock, Expert Political
Judgment: How Good Is It? How Can We Know? pp. 10-11.
Furthermore, Tetlock differentiates between easy predictions such as
elections in a stable democracy versus difficult predictions.
41 A lesson is not truly “learned” until an organization institutionalizes
its teachings.
42 Intelligence collectors can more easily collect against known gaps;
thus, the information that is available to analysts makes it difficult to
discover the gaps categorized as “unknown unknowns.”
American Intelligence JournalPage 108
Vol 30, No 2
43 Central Intelligence Agency, Kent Center for Analytic Tradecraft,
“A Tradecraft Primer: Structured Analytic Techniques for Improving
Intelligence Analysis,” Vol. 2, No. 2, June 2005, p. 2.
44 Jonah Lehrer, How We Decide (New York, NY: Houghton Mifflin
Harcourt, 2009), p. 77.
45 Mark M. Lowenthal, Intelligence: From Secrets to Policy
(Washington, DC: CQ Press, 2009), p. 121.
46 Central Intelligence Agency, Kent Center for Analytic Tradecraft,
“A Tradecraft Primer: Structured Analytic Techniques for Improving
Intelligence Analysis,” Vol. 2, No. 2, June 2005, p. 2.
47 Douglas MacEachin, “Analysis and Estimates,” in Transforming
U.S. Intelligence, Sims and Gerber, eds. (Washington, DC:
Georgetown University Press, 2005), p. 129.
48 Psychological research indicates that satisfaction of intrinsic
(internalized) motivations is a powerful incentive, sometimes as
powerful or more so than extrinsic motivations such as monetary gain.
I believe that the self-selection and vetting process for IC analysts has
created a highly motivated population. Work conditions (such as last
generation IT infrastructure) and the current set-up of extrinsic
motivations, however, do not easily allow IC analysts to work to
satisfy intrinsic motivations. See Clay Shirky, Cognitive Surplus:
Creativity and Generosity in a Connected Age (New York, NY: The
Penguin Press, 2010).
49 If analysts only make obvious calls (e.g., Canada will not invade the
U.S. next week), then all of their predictions should have high
probabilities assigned to them. If they make difficult calls,
presumably they will assign lower probabilities and will not be
penalized as these events rarely happen.
50 Leonard Mlodinow, The Drunkard’s Walk: How Randomness
Rules Our Lives (New York: Random House, Inc., 2008), pp. 192-
195. See also Nassim Nicholas Taleb, The Black Swan: The Impact of
the Highly Improbable (New York: Random House, Inc., 2007).
Intelligence analysis could be improved by developing in analysts a
better understanding of how randomness affects outcomes. Errors in
judging the probability of an outcome arise when analysts assume a
situation is more random than it actually is or believe that the
situation/players involved have more control over the outcome than in
reality. Mlodinow, The Drunkard’s Walk: How Randomness Rules
Our Lives, 11, 195.
Welton Chang is a Defense Department analyst responsible
for assessing the performance and effectiveness of
intelligence programs. He served as an Army officer prior
to joining the Defense Department as a civilian. He
currently sits on the board of the John Sloan Dickey Center
for International Understanding and is also a Truman
National Security Fellow. Mr. Chang earned a BA in
Government from Dartmouth College and is currently a
part-time MA candidate in Georgetown University’s
Security Studies Program.
Article
Full-text available
Despite intense scrutiny and promised fixes resulting from intelligence ‘transformation’ efforts, erroneous analytic assessments persist and continue to dominate news coverage of the US intelligence community. Existing analytic training teaches analysts about common cognitive biases and then aims to correct them with structured analytic techniques. On its face, this approach is eminently reasonable; on close inspection, incomplete and imbalanced. Current training is anchored in a mid-twentieth century understanding of psychology that focuses on checking over-confidence and rigidity but ignores the problems of under-confidence and excessive volatility. Moreover it has never been validated against objective benchmarks of good judgment. We propose a new approach: (a) adopting scientifically validated content and regularly testing training to avoid institutionalizing new dogmas; (b) incentivizing analysts to view training guidelines as means to the end of improved accuracy, not an end in itself.
Article
Full-text available
It is a rare season when the intelligence story in the news concerns intelligence analysis, not secret operations abroad. The United States is having such a season as it debates whether intelligence failed in the run-up to both September 11 and the second Iraq war, and so Rob Johnston's wonderful book is perfectly timed to provide the back-story to those headlines. The CIA's Center for the Study of Intelligence is to be commended for having the good sense to find Johnston and the courage to support his work, even though his conclusions are not what many in the world of intelligence analysis would like to hear. He reaches those conclusions through the careful procedures of an anthropologist -- conducting literally hundreds of interviews and observing and participating in dozens of work groups in intelligence analysis -- and so they cannot easily be dismissed as mere opinion, still less as the bitter mutterings of those who have lost out in the bureaucratic wars. His findings constitute not just a strong indictment of the way American intelligence performs analysis, but also, and happily, a guide for how to do better. Johnston finds no baseline standard analytic method. Instead, the most common practice is to conduct limited brainstorming on the basis of previous analysis, thus producing a bias toward confirming earlier views. The validating of data is questionable -- for instance, the Directorate of Operation's (DO) "cleaning" of spy reports doesn't permit testing of their validity -- reinforcing the tendency to look for data that confirms, not refutes, prevailing hypotheses. The process is risk averse, with considerable managerial conservatism. There is much more emphasis on avoiding error than on imagining surprises. The analytic process is driven by current intelligence, especially the CIA's crown jewel analytic product, the President's Daily Brief (PDB), which might be caricatured as "CNN plus secrets."
Article
The intelligence failures surrounding the invasion of Iraq dramatically illustrate the necessity of developing standards for evaluating expert opinion. This book fills that need. Here, Philip E. Tetlock explores what constitutes good judgment in predicting future events, and looks at why experts are often wrong in their forecasts. Tetlock first discusses arguments about whether the world is too complex for people to find the tools to understand political phenomena, let alone predict the future. He evaluates predictions from experts in different fields, comparing them to predictions by well-informed laity or those based on simple extrapolation from current trends. He goes on to analyze which styles of thinking are more successful in forecasting. Classifying thinking styles using Isaiah Berlin's prototypes of the fox and the hedgehog, Tetlock contends that the fox--the thinker who knows many little things, draws from an eclectic array of traditions, and is better able to improvise in response to changing events--is more successful in predicting the future than the hedgehog, who knows one big thing, toils devotedly within one tradition, and imposes formulaic solutions on ill-defined problems. He notes a perversely inverse relationship between the best scientific indicators of good judgement and the qualities that the media most prizes in pundits--the single-minded determination required to prevail in ideological combat. Clearly written and impeccably researched, the book fills a huge void in the literature on evaluating expert opinion. It will appeal across many academic disciplines as well as to corporations seeking to develop standards for judging expert decision-making.
Article
Expert Political Judgment: How Good Is It? How Can We Know? By Philip E. Tetlock. Princeton: Princeton University Press, 2005. 352p. $45.00 cloth, $19.95 paper. This is a wonderful and important book. Philip Tetlock is a political psychologist who has a knack for innovative research projects (e.g., his earlier work on how people cope with trade-offs in politics). In this book, he addresses a question that would scare away more timid souls: How well do experts predict political and economic events?
The Hidden Forces That Shape Our Decisions
  • Dan Ariely
  • Predictably Irrational
Dan Ariely, Predictably Irrational: The Hidden Forces That Shape Our Decisions (New York, NY: HarperCollins, 2009), p. xxx. Robert Jervis, Why Intelligence Fails: Lessons From the Iranian Revolution and the Iraq War (Ithaca, NY: Cornell University Press, 2010), p.
impossible. For an example of how weather forecasters are scored, see Verification of Forecasts Expressed in Terms of Probability Issue 1. 32 Tetlock, Expert Political Judgment: How Good Is It? How Can We Know? p. 45. 33 Tetlock, Expert Political Judgment: How Good Is It? How Can We Know
  • Gary King
  • Robert O Keohane
  • Sidney Verba Glenn
  • W Brier
Gary King, Robert O. Keohane, and Sidney Verba, Designing Social Inquiry: Scientific Inference in Qualitative Research (Princeton, NJ: Princeton University Press, 1994), p. 46. impossible. For an example of how weather forecasters are scored, see Glenn W. Brier, " Verification of Forecasts Expressed in Terms of Probability, " in James Caskey, ed., Monthly Weather Review, January 1950, Vol. 78, Issue 1. 32 Tetlock, Expert Political Judgment: How Good Is It? How Can We Know? p. 45. 33 Tetlock, Expert Political Judgment: How Good Is It? How Can We Know? pp. 47-49.
The Drunkard's Walk: How Randomness Rules Our Lives See also Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable
  • Leonard Mlodinow
50 Leonard Mlodinow, The Drunkard's Walk: How Randomness Rules Our Lives (New York: Random House, Inc., 2008), pp. 192- 195. See also Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable (New York: Random House, Inc., 2007).
  • Analyzing Bruce
  • Intelligence
Bruce, Analyzing Intelligence: Origins, Obstacles, and Innovations (Washington, DC: Georgetown University Press, 2008), pp.