Content uploaded by Andreas Graefe
Author content
All content in this area was uploaded by Andreas Graefe on Jan 07, 2016
Content may be subject to copyright.
0
1
1.1
1.2
2
3
3.1
3.2
3.3
3.4
3.5
3.6
4
4.1
4.2
4.3
4.4
5
6
7
8
TableofContents
Introduction
ExecutiveSummary
StatusQuo
KeyQuestionsandImplications
Introduction
StatusQuoofAutomatedJournalism
HowAutomatedJournalismWorks
ProvidersofAutomatedJournalismSolutions
TheStateofAutomatedJournalisminNewsrooms
Potentials
Limitations
Relevance
KeyQuestionsandImplications
ForJournalists
ForNewsConsumers
ForNewsOrganizations
ForSociety
SummaryandOutlook
FurtherReading
AbouttheAuthor
Citations
GuidetoAutomatedJournalism
2
ThisworkwasfundedbytheTowFoundationandtheJohnS.andJamesL.Knight
Foundation.ThankstointervieweesSaimAlkan,ReginaldChua,LouFerrara,TomKent,
andJamesKotecki.ThanksalsotoPeterBrown,ArjenvanDalen,NickDiakopoulos,
KonstantinDörr,MarioHaim,NoamLemelshtrichLatar,andClaireWardleforproviding
commentsandsuggestions.January2016
GuidetoAutomatedJournalism
3Introduction
ExecutiveSummary
Inrecentyears,theuseofalgorithmstoautomaticallygeneratenewsfromstructureddata
hasshakenupthejournalismindustry—mostespeciallysincetheAssociatedPress,oneof
theworld’slargestandmostwell-establishednewsorganizations,hasstartedtoautomate
theproductionofitsquarterlycorporateearningsreports.Oncedeveloped,notonlycan
algorithmscreatethousandsofnewsstoriesforaparticulartopic,theyalsodoitmore
quickly,cheaply,andpotentiallywithfewererrorsthananyhumanjournalist.Unsurprisingly,
then,thisdevelopmenthasfueledjournalists’fearsthatautomatedcontentproductionwill
eventuallyeliminatenewsroomjobs,whileatthesametimescholarsandpractitionerssee
thetechnology’spotentialtoimprovenewsquality.Thisguidesummarizesrecentresearch
onthetopicandtherebyprovidesanoverviewofthecurrentstateofautomatedjournalism,
discusseskeyquestionsandpotentialimplicationsofitsadoption,andsuggestsavenuesfor
futureresearch.Someofthekeypointscanbesummarizedasfollows.
GuidetoAutomatedJournalism
4ExecutiveSummary
StatusQuo
Marketphase
Companiesworldwidearedevelopingsoftwaresolutionsforgeneratingautomated
news.
LeadingmediacompaniessuchastheAssociatedPress,Forbes,TheNewYorkTimes,
LosAngelesTimes,andProPublicahavestartedtoautomatenewscontent.
Althoughthetechnologyisstillinanearlymarketphase,automatedjournalismhas
arrivedinnewsroomsandislikelyheretostay.
Conditionsanddrivers
Automatedjournalismismostusefulingeneratingroutinenewsstoriesforrepetitive
topicsforwhichclean,accurate,andstructureddataareavailable.
Automatedjournalismcannotbeusedtocovertopicsforwhichnostructureddataare
availableandischallengingwhendataqualityispoor.
Thekeydriversofautomatedjournalismareanever-increasingavailabilityofstructured
data,aswellasnewsorganizations’aimtobothcutcostsandincreasethequantityof
news.
Potential
Algorithmsareabletogeneratenewsfaster,atalargerscale,andpotentiallywithfewer
errorsthanhumanjournalists.
Algorithmscanusethesamedatatotellstoriesinmultiplelanguagesandfromdifferent
angles,thuspersonalizingthemtoanindividualreader’spreferences.
Algorithmshavethepotentialtogeneratenewsondemandbycreatingstoriesin
responsetousers’questionsaboutthedata.
Limitations
Algorithmsrelyondataandassumptions,bothofwhicharesubjecttobiasesand
errors.Asaresult,algorithmscouldproduceoutcomesthatwereunexpected,
unintended,andcontainerrors.
GuidetoAutomatedJournalism
5StatusQuo
Algorithmscannotaskquestions,explainnewphenomena,orestablishcausalityand
arethuslimitedintheirabilitytoobservesocietyandtofulfilljournalistictasks,suchas
orientationandpublicopinionformation.
Thewritingqualityofautomatednewsisinferiortohumanwritingbutlikelytoimprove,
especiallyasnaturallanguagegenerationtechnologyadvances.
GuidetoAutomatedJournalism
6StatusQuo
KeyQuestionsandImplications
Forjournalists
Humanandautomatedjournalismwilllikelybecomecloselyintegratedandforma
“man-machinemarriage.”
Journalistsarebestadvisedtodevelopskillsthatalgorithmscannotperform,suchasin-
depthanalysis,interviewing,andinvestigativereporting.
Automatedjournalismwilllikelyreplacejournalistswhomerelycoverroutinetopics,but
willalsogeneratenewjobswithinthedevelopmentofnews-generatingalgorithms.
Fornewsconsumers
Peoplerateautomatednewsasmorecrediblethanhuman-writtennewsbutdonot
particularlyenjoyreadingautomatedcontent.
Automatednewsiscurrentlymostsuitedfortopicswhereprovidingfactsinaquickand
efficientwayismoreimportantthansophisticatednarration,orwherenewsdidnotexist
previouslyandconsumersthushavelowexpectationsregardingthequalityofthe
writing.
Littleisknownaboutnewsconsumers’demandforalgorithmictransparency,suchas
whethertheyneed(orwant)tounderstandhowalgorithmswork.
Fornewsorganizations
Sincealgorithmscannotbeheldaccountableforerrors,liabilityforautomatedcontent
willrestwithanaturalperson(e.g.,thejournalistorthepublisher).
Algorithmictransparencyandaccountabilitywillbecomecriticalwhenerrorsoccur,in
particularwhencoveringcontroversialtopicsand/orpersonalizingnews.
Apartfrombasicguidelinesthatnewsorganizationsshouldfollowwhenautomatically
generatingnews,littleisknownaboutwhichinformationshouldbemadetransparent
regardinghowthealgorithmswork.
Forsociety
Automatedjournalismwillsubstantiallyincreasetheamountofavailablenews,which
willfurtherincreasepeople’sburdentofindcontentthatismostrelevanttothem.
GuidetoAutomatedJournalism
7KeyQuestionsandImplications
Anincreaseinautomated—and,inparticular,personalized—newsislikelyto
reemphasizeconcernsaboutpotentialfragmentationofpublicopinion.
Littleisknownaboutpotentialimplicationsfordemocracyifalgorithmsaretotakeover
partofjournalism’sroleasawatchdogforgovernment.
GuidetoAutomatedJournalism
8KeyQuestionsandImplications
Introduction
Considerthefollowingearningsreport,whichtheAssociatedPresspublishedshortlyafter
ApplereleaseditsquarterlyfiguresinJanuary2015.
AppletopsStreet1Qforecasts
Appleposts1Qprofit,resultsbeatWallStreetforecasts
AP.January27,20154:39PM
CUPERTINO,Calif.(AP)_AppleInc.(AAPL)onTuesdayreportedfiscalfirst-quarter
netincomeof$18.02billion.TheCupertino,California-basedcompanysaidithadprofit
of$3.06pershare.TheresultssurpassedWallStreetexpectations.Theaverage
estimateofanalystssurveyedbyZacksInvestmentResearchwasforearningsof$2.60
pershare.ThemakerofiPhones,iPadsandotherproductspostedrevenueof$74.6
billionintheperiod,alsoexceedingStreetforecasts.Analystsexpected$67.38billion,
accordingtoZacks.ForthecurrentquarterendinginMarch,Applesaiditexpects
revenueintherangeof$52billionto$55billion.AnalystssurveyedbyZackshad
expectedrevenueof$53.65billion.Appleshareshavedeclined1percentsincethe
beginningoftheyear,whiletheStandard&Poor’s500indexhasdeclinedslightlymore
than1percent.InthefinalminutesoftradingonTuesday,shareshit$109.14,an
increaseof39percentinthelast12months.
_____
ThisstorywasgeneratedbyAutomatedInsights(http://automatedinsights.com/ap)
usingdatafromZacksInvestmentResearch.AccessaZacksstockreportonAAPLat
http://www.zacks.com/ap/AAPL.
Atfirstglance,thearticlemayappeartobearoutine,financialnewsreport.Exceptforthe
footnote,whichstatesthatthestorywasgeneratedbysoftwareor,moreprecisely,an
algorithm.Granted,thepiecemaysoundabittechnicalandboring,butitprovidesallthe
factsajournalistislikelytocoverandinwhichaninvestorislikelytobeinterested.
Thistechnologicalinnovation,knownasautomatedjournalism,isarelativelynew
phenomenonintheareaofcomputationaljournalism.Automatedjournalismreferstothe
processofusingsoftwareoralgorithmstoautomaticallygeneratenewsstorieswithout
humanintervention—aftertheinitialprogrammingofthealgorithm,ofcourse.Thus,oncethe
algorithmisdeveloped,itallowsforautomatingeachstepofthenewsproductionprocess,
fromthecollectionandanalysisofdata,totheactualcreationandpublicationofnews.
Automatedjournalism—alsoreferredtoasalgorithmic1or,somewhatmisleadingly,robot
GuidetoAutomatedJournalism
9Introduction
journalism2—worksforfact-basedstoriesforwhichclean,structured,andreliabledataare
available.Insuchsituations,algorithmscancreatecontentonalargescale,personalizingit
totheneedsofanindividualreader,quicker,cheaper,andpotentiallywithfewererrorsthan
anyhumanjournalist.
Whilecomputationhaslongassistedjournalistsindifferentphasesofthenewsproduction
process—asinthecollection,organization,andanalysisofdata,aswellasthe
communicationanddisseminationofnews—journalistshaveremainedtheauthorityfor
actuallycreatingthenews.Thisdivisionoflaborischanging,which,notsurprisingly,has
shakenupjournalisminrecentyears.TheWorldEditorsForumlistedautomatedjournalism
asatop2015newsroomtrend,3andbothresearchersandpractitionersaredebatingthe
implicationsofthisdevelopment.4Forexample,whilesomeobserversseepotentialfor
automatingroutinetaskstoincreasenewsquality,journalists’fearsthatthetechnologywill
eventuallyeliminatenewsroomjobsoftendominatesthepublicdebate.5
Inanycase,opinionsrunstrongontheuseofautomatedjournalism,whichiswhythe
technologyhasattractedsomuchattention.PopularmediacoverageincludesNPR’sPlanet
Moneypodcast,whichhadoneofitsmostexperiencedreporterscompetewithanalgorithm
towriteanewsstory,6andTheNewYorkTimes’squizthatallowsreaderstoguesswhether
ahumanoranalgorithmwroteaparticularstory.7EvenTheDailyShow’shumorous
coverageofthetopicshedslightonpotentialsandconcernsofincreasedusage8.
Thisguideisstructuredasfollows.Chapter2describesthestatusquoofautomated
journalism;Chapter3thendiscusseskeyquestionsandimplicationsforstakeholders,such
asjournalists,newsconsumers,newsorganizations,andsocietyatlarge;andChapter4
summarizesthefindingsandprovidesrecommendationsforfutureresearch.
GuidetoAutomatedJournalism
10Introduction
StatusQuoofAutomatedJournalism
Thefollowingsectiondescribeshowautomatedjournalismworks;namestheleading
softwareproviders;andaddresseshowthetechnologyisbeingusedinnewsrooms,whatits
potentialsandlimitationsare,andwhyitwilllikelybecomeamajorplayerintheprocessof
newscreation.
GuidetoAutomatedJournalism
11StatusQuoofAutomatedJournalism
HowAutomatedJournalismWorks
Currentsolutionsrangefromsimplecodethatextractsnumbersfromadatabase,whichare
thenusedtofillintheblanksinpre-writtentemplatestories,tomoresophisticated
approachesthatanalyzedatatogainadditionalinsightandcreatemorecompelling
narratives.Thelatterrelyonbigdataanalyticsandnaturallanguagegenerationtechnology,
andemergedfromthedata-heavydomainofsportsreporting.Bothmajorprovidersof
naturallanguagegenerationtechnologyintheUnitedStates,AutomatedInsightsand
NarrativeScience,beganbydevelopingalgorithmstoautomaticallywriterecapsofsporting
events.Forexample,NarrativeScience’sfirstprototype,StatsMonkey,whichemergedfrom
anacademicprojectatNorthwesternUniversity,automaticallywroterecapsofbaseball
games.9Baseballservedasanidealstartingpointduetothewealthofavailabledata,
statistics,andpredictivemodelsthatareableto,forexample,continuouslyrecalculatea
team’schanceofwinningasagameprogresses.
Figure1showsthebasicfunctionalityofstate-of-the-artnaturallanguagegeneration
platforms.10First,thesoftwarecollectsavailabledata,suchas—inthecaseofbaseball—
boxscores,minute-by-minuteplays,battingaverages,historicalrecords,orplayer
demographics.Second,algorithmsemploystatisticalmethodstoidentifyimportantand
interestingeventsinthedata.Thosemayincludeunusualevents,aplayer’sextraordinary
performance,orthedecisivemomentfortheoutcomeofagame.Third,thesoftware
classifiesandprioritizestheidentifiedinsightsbyimportanceand,fourth,arrangesthe
newsworthyelementsbyfollowingpredefinedrulestogenerateanarrative.Finally,thestory
canbeuploadedtothepublisher’scontentmanagementsystem,whichcouldpublishit
automatically.
GuidetoAutomatedJournalism
12HowAutomatedJournalismWorks
Figure1:Howalgorithmsgeneratenews
Duringthisprocess,thesoftwarereliesonasetofpredefinedrulesthatarespecifictothe
problemathandandwhichareusuallyderivedfromcollaborationbetweenengineers,
journalists,andcomputerlinguists.Forexample,withinthedomainofbaseball,thesoftware
hastoknowthattheteamwiththemostruns—butnotnecessarilythemosthits—winsthe
game.Furthermore,domainexpertsarenecessarytodefinecriteriaofnewsworthiness,
accordingtowhichthealgorithmlooksforinterestingeventsandranksthembyimportance.
Finally,computerlinguistsusesampletextstoidentifytheunderlying,semanticlogicand
translatethemintoarule-basedsystemthatiscapableofconstructingsentences.Ifnosuch
sampletextsareavailable,trainedjournalistspre-writetextmodulesandsamplestorieswith
theappropriateframesandlanguageandadjustthemtotheofficialstyleguideofthe
publishingoutlet.
GuidetoAutomatedJournalism
13HowAutomatedJournalismWorks
ProvidersofAutomatedJournalismSolutions
Areviewofthemarketidentifiedelevencompaniesthatprovideautomatedcontentcreation
forjournalisticproductsindifferentcountries.11Thereof,fivearebasedinGermany(AX
Semantics;Text-On;2txtNLG;Retresco;Textomatic),twointheUnitedStates(Narrative
Science;AutomatedInsights)andFrance(Syllabs;Labsense),andoneeachintheUnited
Kingdom(Arria)andChina(Tencent).Thefieldisgrowingquickly:thereviewisnoteven
publishedyet,andwecanalreadyaddanotherproviderfromRussia(Yandex)tothelist.
Whileeightcompaniesfocusonprovidingcontentinonelanguage,theremainingfouroffer
theirservicesinmultiplelanguages.TheGermancompanyAXSemantics,forinstance,
offersautomatedcontentcreationinasmanyastwelvelanguages.Itshouldbenotedthat
thesecompaniesdonotconsiderthemselvesjournalisticorganizations;neitherdotheir
namesindicatearelationshiptojournalism,noraretheirproductsspecificallygearedtoward
providingjournalisticcontent.Rather,theirtechnologycanbeappliedtoanydatafromany
industry,andsomeoftheirmajorbusinessfieldsincludewritingforproductdescriptions,
portfolioanalyses,orpatientsummariesinhospitals.
GuidetoAutomatedJournalism
14ProvidersofAutomatedJournalismSolutions
TheStateofAutomatedJournalismin
Newsrooms
Automatednewsemergedalmosthalfacenturyagofromthedomainofweather
forecasting.Oneearlystudydescribesasoftwarethatworkssimilarlytotheprocessdetailed
above.Thesoftwaretakestheoutputsofweatherforecastingmodels(e.g.,windspeed,
precipitation,temperature),prioritizesthembyimportance(e.g.,whetherthevalueisabove
orbelowacertainthresholdlevel),andusesabouteightypre-writtenphrasestogenerate
“wordedweatherforecasts.”Interestingly,theauthor’sdiscussionofthesoftware’sbenefits
resemblesmuchoftoday’sconversationabouthowautomatedjournalismcouldpotentially
freeupjournalistsandleavetimeformoreimportantwork(seeChapter3):“Themore
routinetaskscanbehandledbyacomputer,therebyfreeingthemeteorologistforthemore
challengingrolesofmeteorologicalconsultantandspecialistonhigh-impactweather
situations.”12
Anotherdomaininwhichorganizationshavelongusedautomationisfinancialnews,where
thespeedinwhichinformationcanbeprovidedisthekeyvalueproposition.Forexample,
companiessuchasThomsonReutersandBloombergextractkeyfiguresfrompress
releasesandinsertthemintopre-writtentemplatestoautomaticallycreatenewsalertsfor
theirclients.Inthisbusiness,automationisnotaboutfreeinguptime.Itisanecessity.
ReginaldChua,executiveeditorforeditorialoperations,data,andinnovationatThomson
Reuters,toldme:“Youcan’tcompeteifyoudon’tautomate.”
Inmorerecentyears,automatedjournalismalsofounditswayintonewsroomstoaddress
othertypesofproblems,oftenintheformofcustom-made,in-housesolutions.Aprominent
exampleistheworkattheLosAngelesTimesonautomatinghomicideandearthquake
reportingdescribedincasestudies1and2.Whenaskedtodescribethealgorithms,Ken
Schwencke,whodevelopedthem(andnowworksforTheNewYorkTimes),notedthatthe
underlyingcodeis“embarrassinglysimple,”asitmerelyextractsnumbersfromadatabase
andcomposesbasicnewsstoriesfrompre-writtentextmodules.13Despite—orperhaps
becauseof—itssimplicity,Schwencke’sworkmarksanimportantstepintheeraof
automatedjournalism,demonstratinghowsimplein-housesolutionscanhelptoincrease
boththespeedandbreadthofnewscoverage.
Manynewsrooms,however,lackthenecessaryresourcesandskillstodevelopautomated
journalismsolutionsin-house.Mediaorganizationshavethusstartedtocollaboratewith
companiesthatspecializeindevelopingnaturallanguagegenerationtechnologyto
automaticallygeneratestoriesfromdataforavarietyofdomains.In2012,forexample,
Forbes.comannounceditsuseofNarrativeScience’sQuillplatformtoautomaticallycreate
GuidetoAutomatedJournalism
15TheStateofAutomatedJournalisminNewsrooms
companyearningspreviews.14Ayearlater,ProPublicausedthesametechnologyto
automaticallygeneratedescriptionsforeachofthemorethan52,000schoolsforits
OpportunityGapnewsapplication.15In2014,automatedjournalismmadeitswayintothe
public’sfocuswhentheAssociatedPress,oneoftheworld’smajornewsorganizations,
beganautomatingitsquarterlycompanyearningsreportsusingAutomatedInsights’
Wordsmithplatform.AsdescribedinCaseStudy3,theprojectwasasuccessand,asa
result,theAPrecentlyannouncedtheexpansionofitsautomatedcoveragetosports.16
CaseStudy1:CrimeReporting
MaryLynnYoungandAlfredHermidadescribetheevolutionofthetheLosAngelesTimes’s
HomicideReportasanearlyexampleofautomatedjournalism.17Beforetheproject’s
launchinJanuaryof2007,theTimes’sprinteditioncoveredonlyabouttenpercentofthe
nearly1,000annualhomicidesinL.A.County.Thereby,thecoveragetypicallyfocusedon
themostnewsworthycases,whichwereoftenthemostsensationalonesandthereforedid
notprovidearepresentativepictureofwhatwasreallyhappening.ThegoaloftheHomicide
Reportwastoaddressthisbiasinthemediacoveragebyprovidingcomprehensive
coverageofallannualhomicides.Theprojectoriginallystartedasablogthatpostedbasic
informationabouteachhomicide,suchasthevictim’sraceandgenderorwherethebody
wasfound.Afewmonthslater,aninteractivemapwasaddedtovisualizetheinformation.
Soon,however,itbecameclearthattheprojectwastooambitious.Duetolimitednewsroom
resources,aswellastechnicalanddataissues,itwasimpossibletoreporteveryhomicide.
TheprojectwasputonholdinNovember2008.WhentheHomicideReportwasrelaunched
inJanuary2010,itreliedonstructureddatafromtheL.A.Countycoroner’soffice,which
includesinformationsuchasthedate,location,time,raceorethnicity,age,jurisdiction,and
neighborhoodofallhomicidesinthearea.TherevisedHomicideReportusedthesedatato
automaticallyproduceshortnewssnippetsandpublishthemontheblog.Whilethesenews
reportsweresimple,providingonlythemostrudimentaryinformation,theyaccomplishedthe
project’soriginalgoaltocovereverysinglehomicideandwereabletodosoinaquickand
efficientmanner.AsnotedbyKenSchwencke,whowrotethecodeforautomatically
generatingthehomicide-relatednews,thistechnologicalinnovationreduced“theloadon
reportersandproducersandprettymucheverybodyingettingtheinformationoutthereas
fastaspossible.”18JournalistsattheLosAngelesTimeswereopen-mindedtowardthe
automationprocess.AstudyoftheHomicideReportfoundthatjournalists“understoodthe
algorithmasenhancingtheroleofcrimereportersratherthanreplacingthem.”19Thatis,
crimereportersusedtheautomaticallygeneratedstoriesasinitialleadsforexploringa
particularcaseinmoredetail,forexamplebyaddinginformationaboutthevictim’slifeand
family.
GuidetoAutomatedJournalism
16TheStateofAutomatedJournalisminNewsrooms
ArelatedLosAngelesTimes’sprojectthatalsousesalgorithmstocreateautomatednews,
MappingL.A.providesmapsandinformationthatallowreaderstocomparetwohundred
seventy-twoneighborhoodsinLosAngelesCountywithregardtodemographics,crime,and
schools.TheplatformusesdataprovidedbytheL.A.PoliceandCountySheriff’s
Departmentstoautomaticallygeneratewarningsifcrimereportssurpasscertainpredefined
thresholds.Forexample,thesystemtriggersacrimealertforacertainneighborhoodifa
minimumofthreecrimesisreportedinasingleweek,andifthenumberofreportedcrimes
inthatweekissignificantlyhigherthantheweeklyaverageofthepreviousquarter.
GuidetoAutomatedJournalism
17TheStateofAutomatedJournalisminNewsrooms
Potentials
Inautomatingtraditionaljournalistictasks,suchasdatacollectionandanalysis,aswellas
theactualwritingandpublicationofnewsstories,therearetwoobviouseconomicbenefits:
increasingthespeedandscaleofnewscoverage.Advocatesfurtherarguethatautomated
journalismcouldpotentiallyimprovetheaccuracyandobjectivityofnewscoverage.Finally,
thefutureofautomatedjournalismwillpotentiallyallowforproducingnewsondemandand
writingstoriesgearedtowardtheneedsoftheindividualreader.
Speed
Automationallowsforproducingnewsinnearlyrealtime,orattheearliestpointthatthe
underlyingdataareavailable.Forexample,theAP’squarterlyearningsreportonApple(see
Chapter1)waspublishedonlyminutesafterthecompanyreleaseditsfigures.Another
exampleisLosAngelesTimes’sQuakebot,whichfirstbrokethenewsaboutanearthquake
intheLosAngelesareain2014(seeCaseStudy2).
Scale
Automationallowsforexpandingthequantityofnewsbyproducingstoriesthatwere
previouslynotcoveredduetolimitedresources.Forexample,boththeLosAngelesTimes
(forhomicidereports;CaseStudy1)andtheAssociatedPress(forcompanyearnings
reports;CaseStudy3)reportedthatautomationincreasedtheamountofpublishedstories
bymorethententimes.Similarly,whilehumanjournalistshavetraditionallyonlycovered
earthquakesthatexceededacertainmagnitudeorleftsignificantdamage,Quakebot
providescomprehensivecoverageofallearthquakesdetectedbyseismographicsensorsin
SouthernCalifornia(CaseStudy2).Whileanyoneofthesearticlesmayattractonlyafew
hitsintargetingasmallaudience,totaltrafficincreaseswithpositiveeffectsonadvertising
revenues.
Accuracy
Algorithmsdonotgettiredordistracted,and—assumingthattheyareprogramedcorrectly
andtheunderlyingdataareaccurate—theydonotmakesimplemistakeslikemisspellings,
calculationerrors,oroverlookingfacts.Advocatesthusarguethatalgorithmsarelesserror-
pronethanhumanjournalists.Forexample,LouFerrara,formervicepresidentand
managingeditorforentertainment,sports,andinteractivemediaattheAssociatedPress,
reportsthatautomationhasdecreasedtherateoferrorsinAP’scompanyearningreports
GuidetoAutomatedJournalism
18Potentials
fromaboutsevenpercenttoonlyaboutonepercent,mostlybyeliminatingtyposor
transposeddigits.“Theautomatedreportsalmostneverhavegrammaticalormisspelling
errors,”hetoldme,“andtheerrorsthatdoremainareduetomistakesinthesourcedata.”
Yet,Googling“generatedbyautomatedinsightscorrection”liststhousandsofexamples
whereautomaticallygeneratedarticleshadtobecorrectedaftertheirpublication.20Inthe
vastmajorityofcases,theerrorsareratheruncritical,suchaswronginformationabout
wherethecompanyisbasedorwhenitsquarterends.Sometimestheerrorsarecrucial,
however.AprominentexampleisaJuly2015reportaboutNetflix’ssecond-quarter
earnings.21Thisarticle,whichwaslatercorrected,wronglyreportedthatthecompany
missedexpectationsandthatthesharepricehadfallenbyseventy-onepercentsincethe
beginningoftheyearwhen,infact,ithadmorethandoubledduringthatperiod.Thereason
fortheerrorwasthatthealgorithmfailedtorealizethattheNetflixstockunderwentaseven-
to-onesplit.Thisexamplethusdemonstratestheimportanceof,first,foreseeingunusual
eventsintheinitialdevelopmentofthealgorithmsand,second,beingabletodetectoutliers
andrequesteditorialmonitoringifnecessary.22
CaseStudy2:EarthquakeAlerts
InautomaticallyproducingshortnewsstoriesaboutearthquakesinCalifornia,theLos
AngelesTimes’sQuakebotdemonstratestheuseofsensordataforautomatedjournalism.
WhentheU.S.GeologicalSurvey’sEarthquakeNotificationServicereleasesanearthquake
alert,Quakebotcreatesastorythatprovidesallthebasicinformationajournalistwould
initiallycover—includingtime,location,andmagnitudeoftheearthquake—andsavesitasa
draftintheLosAngelesTimescontentmanagementsystem.Afterastaffmemberhas
reviewedthestoryforpotentialerrors,itonlytakesasingleclicktopublishthestory.
Althoughthesystemhasbeeninusesince2011,Quakebotfirstattractednationalmedia
attentioninMarch2013whenitwasthefirstnewsoutlettobreakthestorythata4.4
magnitudeearthquakehadhitSouthernCalifornia.WhenKenSchwencke,whodeveloped
Quakebot,felttheearthshakingat6:27a.m.,hewenttohiscomputertoreviewthe
automaticallygeneratedstoryalreadywaitingforhiminthesystemandpublishedit.Three
minuteslater,at6:30a.m.,thestorywasonlineattheLosAngelesTimes’s“L.A.Now”
blog.23
Quakebotisallaboutspeed.Itsgoalistogettheinformationoutasquicklyaspossible.
However,whilespeedisimportant,soistheaccuracyofthenews,andachievingbothgoals
canbedifficult.Forautomatednews,acrucialaspectofaccuracyisthequalityofthe
underlyingdata.ThisbecameevidentinMay2015whenseismologicsensorsinNorthern
CaliforniapickedupsignalsfrommajorearthquakesthathappenedinJapanandAlaska,
whichtheU.S.GeologicalSurvey(USGS)mistakenlyreportedasthreeseparate
GuidetoAutomatedJournalism
19Potentials
earthquakesinCaliforniawithmagnitudesrangingfrom4.8to5.5.Earthquakesofthat
magnitudewouldleavesignificantlocaldamage.Luckily,thealarmswerefalse.The
earthquakeshadneverhappenedandnobodycouldfeelthem.Nonetheless,Quakebot
publishedstoriesforeachofthethreefalsealarms.Inotherwords,thehumanreview
processfailed.Theeditortrustedthealgorithmandpublishedthestorywithoutmakingsure
thattheinformationwascorrect.24
Asimplewaytoverifythecorrectnessofearthquakealertsmightbetolookatthenumberof
relatedtweets.Assoonastheearthstartsshaking,Twitteruserswhofeeltheearthquake
publishtheinformationonthenetwork.Whena6.0earthquakehittheNapaValleyinAugust
2014,thefirsttweetsappearedalmostimmediatelyandbeattheofficialUSGSalertsby
minutes.Thus,thenumberoftweetsprovidesanindependentsourceofdataforverifying
whetherareportedearthquakehasactuallyoccurred.Infact,researchattheUSGSshowed
thatTwitterdatacanbeusedtolocateanearthquakewithintwentysecondstotwominutes
afteritsorigintime.Thisisconsiderablyfasterthanthetraditionalmethodofusing
seismometerstomeasuregroundmotion,particularlyinpoorlyinstrumentedregionsofthe
world.25Alongwithearthquakealerts,theUSGSnowpublishesthenumberoftweetsper
minutethatcontaintheword“earthquake”inseverallanguagesonitsofficialTwitteraccount
@USGSted.Forthefalsealarmsdiscussedabove,@USGStedreportedzerotweetsper
minute,whichisnotsurprisingsincenoearthquakehadhappened.Incomparison,forthe
actualearthquakethatdidoccuroffJapan,@USGStedreportedfifty-sixtweetsperminute
atthetimeitpublishedtheearthquakealert.TheLosAngelesTimeseditorcouldhave
lookedatthisinformationwhendecidingwhetherornottopublishthenews.Or,evenbetter,
Quakebotcouldbeupdatedsothatitsalgorithmaccountsforthisinformationand
automaticallypublishesastoryifthenumberoftweetsinarespectiveareaisabovea
certainthreshold.
Objectivity
Algorithmsstrictlyfollowpredefinedrulesforanalyzingdataandconvertingtheresultsinto
writtenstories.Advocatesarguethatautomatednewsprovidesanunbiasedaccountof
facts.Thisargumentofcourseassumesthattheunderlyingdataarecorrectandthe
algorithmsareprogrammedwithoutbias,aviewthat,asdiscussedinthenextchapter,is
falseortoooptimisticatbest.26Thatsaid,experimentalevidenceavailabletodatesuggests
thatreadersperceiveautomatednewsasmorecrediblethanhuman-writtennews(see
TextboxI).
Personalization
GuidetoAutomatedJournalism
20Potentials
Automationallowsforprovidingrelevantinformationforverysmallaudiencesandinmultiple
languages.Inthemostextremecase,automationcanevencreatenewsforanaudienceof
one.Forinstance,AutomatedInsightsgeneratespersonalizedmatchdayreports(atotalof
morethanthreehundredmillionin2014)foreachplayerofYahooFantasyFootball,a
popularonlinegameinwhichpeoplecancreateteamsoffootballplayersandcompete
againsteachotherinvirtualleagues.Similarly,oneofNarrativeScience’scorebusinesses
istoautomaticallygeneratefinancialmarketreportsforindividualcustomers.Itiseasyto
imaginesimilarapplicationsforotherareas.Forexample,algorithmscouldcreaterecapsof
asportseventthatfocusontheperformanceofaparticularplayerthatintereststhereader
most(e.g.,grandparentsinterestedintheperformanceoftheirgrandchild).Furthermore,as
shownwithAutomatedInsights’FantasyFootballmatchdayreports,thealgorithmscould
eventellthesamestoryinadifferenttonedependingonthereader’sneeds.Forexample,
therecapofasportingeventcouldbewritteninanenthusiastictoneforsupportersofthe
winningteamandinasympathetictoneforsupportersofthelosingone.
Newsondemand
Theabilitytopersonalizestoriesandanalyzedatafromdifferentanglesalsoprovides
opportunitiesforgeneratingnewsondemand.Forexample,algorithmscouldgenerate
storiesthatanswerspecificquestionsbycomparingthehistoricalperformanceofdifferent
baseballplayers.Algorithmscouldalsoanswerwhat-ifscenarios,suchashowwella
portfoliowouldhaveperformedifatraderhadboughtstockXascomparedtostockY.While
algorithmsforgeneratingnewsondemandarecurrentlynotyetavailable,theywilllikelybe
thefutureofautomatedjournalism.InOctober2015,AutomatedInsightsannouncedanew
betaversionofitsWordsmithplatform,whichenablesuserstouploadtheirowndata,pre-
writearticletemplates,andautomaticallycreatenarrativesfromthedata.27TheGerman
companyAXSemanticsprovidesasimilarfunctionalitywithitsATML3programming
language.
GuidetoAutomatedJournalism
21Potentials
Limitations
Algorithmsforgeneratingautomatednewsfollowasetofpredefinedrulesandthuscannot
innovate.Therefore,theirapplicationislimitedtoprovidinganswerstoclearlydefined
problemsforwhichdataareavailable.Furthermore,atleastatthecurrentstage,thequality
ofwritingislimited.
Dataavailabilityandquality
Automatedjournalismrequireshigh-qualitydatainstructuredandmachine-readable
formats.Inotherwords,youneedtobeabletosaveyourdatainaspreadsheet.Forthis
reason,automationworksparticularlywellindomainssuchasfinance,sports,orweather,
wheredataprovidersmakesurethattheunderlyingdateareaccurateandreliable.
Needlesstosay,automationcannotbeappliedtodomainswherenodataareavailable.
Automationischallenginginsituationswheredataqualityispoor.Forexample,inMarch
2015,theAssociatedPressannouncedthatitwouldcommenceautomaticallyproducing
storiesoncollegesportseventsforlowerdivisionsusinggamestatisticsdatafromthe
NCAA.Thegoalofthisendeavoristoexpandtheexistingsportscoveragebyproviding
storiesonsportseventsthatwerepreviouslynotcovered.AccordingtoLouFerrara,this
projectwasmorecomplicatedthanexpectedduetoissueswiththeunderlyingdata.Since
thedataareoftenenteredbycoachesanddonotundergostrictverificationprocedures,they
canbemessyandcontainerrors.
Validation
Algorithmscanaddvaluebygeneratinginsightsfromdataanalysis.Inapplyingstatistical
methodstoidentifyoutliersorcorrelationsbetweenmultiplevariables,algorithmscouldfind
interestingeventsandrelationships,whichinturncouldleadtonewstories.However,
algorithmsthatanalyzecorrelationscannotestablishcausalityoraddmeaning.Thatis,
whilealgorithmscanprovideaccountsofwhatishappening,theycannotexplainwhythings
arehappening.28Asaresult,findingsderivedfromstatisticalanalysis—regardlessoftheir
statisticalsignificance—canbecompletelymeaningless(seewww.tylervigan.comfor
examplesofstatisticallysignificantbutcompletelyspuriouscorrelations).Humansstillneed
tovalidatethefindingsbyapplyinglogicandreasoning.29
Ingenuity
GuidetoAutomatedJournalism
22Limitations
Oncethefindingshavebeenvalidated,algorithmscancontributeknowledge.Yet,this
contributionislimitedtoprovidinganswerstoprewrittenquestionsbyanalyzinggivendata.
Algorithmscannotusetheknowledgetoasknewquestions,detectneeds,recognize
threats,solveproblems,orprovideopinionsandinterpretationon,forexample,matters
regardingsocialandpolicychange.Inotherwords,algorithmslackingenuityandcannot
innovate.Asaresult,automatedjournalismislimitedinitsabilitytoobservesocietyand
fulfilljournalistictasks,suchasorientationandpublicopinionformation.30
Writingquality
Anotheroftenmentionedlimitationofautomatednewsisthequalityofthewriting.Current
algorithmsarelimitedinunderstandingandproducingnuancesofhumanlanguage,like
humor,sarcasm,andmetaphors.Automatednewscansoundtechnicalandboring,and
experimentalevidenceshowsthatpeoplepreferreadinghuman-writtentoautomatednews
(seeTextboxI).Thatsaid,accordingtoGartner’s“HypeCycleforBusinessIntelligenceand
Analytics,2015”naturallanguagegenerationisonlyattheverybeginningofits
development.31Therefore,thetechnology,andthusthequalityofwriting,islikelytofurther
improveovertime.Itremainsanopenquestion,however,whetheralgorithmswilleverbe
abletoproducesophisticatednarrationcomparabletohumanwriting.32
CaseStudy3:CompanyEarningsReports
InJuly2014,theAssociatedPressbegantoautomatetheprocessofgeneratingcorporate
earningsstoriesusingtheWordsmithplatformfornaturallanguagegeneration,developed
byAutomatedInsightswithdataprovidedbyZacksInvestmentResearch(foranexample,
seetheApplequarterlyearningsreportshowninChapter1).Theprojectturnedintoa
massivesuccess.InJanuary2015,APannouncedthattheautomationallowedforthe
generationofmorethan3,000storiesperquarter,comparedtoaboutthreehundredstories
thatAPreportersandeditorspreviouslycreatedmanually.Bytheendof2015,theAP
expectstogenerate4,700stories,andsoonitwillalsogenerateearningsreportsfor
companiesinCanadaandtheEuropeanUnion.AccordingtoAPassistantbusinesseditor
PhilanaPatterson,thereactionfrombothAPmembersandreadershasbeen“incredibly
positive.”33First,readersarehappybecausetheyhaveaccesstomorestories,whichalso
containfewererrorsthanthemanuallywrittenones.Second,staffmembersarepleased
because“everybodyhateddoingearningsreports”and,moreimportantly,“automationhas
freedupvaluablereportingtimeformoreinterestingtasks,”saidLouFerrara.Pattersonalso
revealedthat,inadditiontoincreasingthenumberofcorporateearningreportsbymorethan
tentimes,automationhasfreedupabouttwentypercentofthetimepreviouslyspent
GuidetoAutomatedJournalism
23Limitations
producingearningsreports.AccordingtoAP,thefreedresourceshavenotledtoanyjob
lossesbuthavebeenusedtoimproveactivitiesinotherareas,likeAP’sbreakingnews
operationsorinvestigativeandexplanatoryjournalism.34
TheAPwasnotthefirstmajornewsorganizationtousenaturallanguagegenerationfor
writingcompanyearningsstories.Since2012,http://www.forbes.com/hasbeencooperating
withNarrativeSciencetoautomaticallycreatecompanyearningspreviews.Thegoalofthis
projectwastoprovidecost-effective,broad,anddeepmarketcoverageforitsreaders.
SimilartotheexperienceattheAP,Forbes’sautomatinghasallowedforgeneratingmore
storieswhilefreeingupresources.Asaresultoftheadditionalcoverage,Forbes’saudience
hasbroadened,andsitetrafficandadvertisingrevenueshaveincreased.35
GuidetoAutomatedJournalism
24Limitations
Relevance
Thenumberofmediaorganizationsthatautomatedjournalismproviderscurrentlyreportas
customersissmall.Fewprovidersofferactualjournalisticproducts,andmostproducts
availabletodatearelimitedtoroutinetopics,suchassportsandfinance,forwhichreliable
andstructureddataareavailable.Automatedjournalismisthusstillinanexperimentalor,at
best,early-marketexpansionphase.36
Thismaychangequickly,however.Apartfromongoingadvancesincomputingpower,big
dataanalytics,andnaturallanguagegenerationtechnology,themostimportantdriverof
automatedjournalismistheever-increasingavailabilityofstructuredandmachine-readable
dataprovidedbyorganizations,sensors,orthegeneralpublic.First,inanattempttomake
governmentmoretransparentandaccountable,manycountriesarelaunchingopendata
initiativestomakedatapubliclyavailable.Second,ourworldisincreasinglyequippedwith
sensorsthatautomaticallygenerateandcollectdata.Currently,sensorsconstantlytrack
changesinanenvironment’stemperature,seismologicalactivity,orairpollution.Sensors
arealsoincreasinglyusedtoprovidefine-graineddataonrealworldevents.TheNFLnow
usessensorstotrackeachplayer’sfieldposition,speed,distancetraveled,acceleration,
andeventhedirectionheisfacing—whichprovidesmanynewopportunitiesfordata-driven
reporting.Third,usersaregeneratinganincreasingamountofdataonsocialnetworksor
amongparentsatlocalyouthsportingevents.
Furthermore,automatedjournalismfitsintothebroadertrendwithinnewsorganizationsto
commercializejournalismandfollowbusinesslogics.Inlightofdecliningprofitsandreaders’
increasingdemandforcontent,newsorganizationsareconstantlylookingfornewrevenue
andproductionmodelsthathelpcutcostsbyautomatingroutinetasksand,atthesame
time,increasethequantityofnews.Duetoitsabilitytoproducelow-costcontentinlarge
quantitiesinvirtuallynotime,automatedjournalismappearstosomeresearchersasyet
anotherstrategyfornewsorganizationstolowerproductioncostsandincreaseprofit
margins.37
Giventhesedrivers,itisnotsurprisingthatadvocatesofautomatedjournalismexpectthe
fieldtoexpandquickly.SaimAlkan,CEOoftheGermansoftwareproviderAXSemantics,
estimatesthatalreadytodayalgorithmswouldbecapableofproducingabouthalfofthe
contentofaregulardailynewspaper.AlexanderSiebert,founderofRetresco,another
Germancompany,thinksthatwithinfiveyearsautomatednewswillbeindistinguishable
fromhuman-writtennews.38AndKristianHammond,co-founderofNarrativeScience,
predictsthatwithinthenexttenyearsmorethanninetypercentofnewswillbe
automated.39
GuidetoAutomatedJournalism
25Relevance
Theseclaimsarecertainlydebatable,inparticularastheycomefrompeoplewithavested
interestinthesuccessofautomatedjournalism.However,withrenownednews
organizationssuchastheAssociatedPressspearheadingthemovementtowardautomated
newsproduction,itislikelythatotherswillfollowsuit.LouFerrarapredictsthat“everymedia
outletwillbeunderpressuretoautomate”and,eventually,“everythingthatcanbe
automatedwillbeautomated.”Similarly,TomKent,AP’sstandardseditor,expectsan
“explosionofautomatedjournalism.”
Infact,thereareindicationsthatmoreandmoremediacompaniesarealreadyheadingin
thisdirection.Mostprovidersofautomatedjournalismsolutionsareinconstantnegotiations
withmediaorganizationsinterestedintheirproducts.NarrativeScienceandAXSemantics
declinedtoprovideinformationaboutjournalisticclients,asnon-disclosureagreements
preventthemfromrevealingexistingcollaborations.40Still,automatedjournalismmight
alreadybemorecommonthanispubliclyknown.
GuidetoAutomatedJournalism
26Relevance
KeyQuestionsandImplications
Automatedjournalismislikelytoaffecttheevolutionofnewswritingintheyearstocome.As
showninFigure2,theincreasingavailabilityofautomatednewswillimpactjournalismand
thegeneralpublicatboththeindividual(micro)andorganizational(macro)level.This
sectiondiscussespotentialbenefitsandrisksthatarisefromtheincreasingspreadof
automatedjournalism.
GuidetoAutomatedJournalism
27KeyQuestionsandImplications
ForJournalists
Sinceautomatedjournalismisoftenperceivedasathreattothelivelihoodofclassic
journalism,itisnotsurprisingthatithasattractedalotofattentionfromjournalists.In
particular,journalistshavefocusedonthequestionofhowthetechnologywillaltertheirown
rolesandrequiredskillsets.Twostudiesanalyzingthecontentofnewsarticlesandblog
postsaboutautomatedjournalismprovideinsightintojournalists’expectations.Thefirst
studyanalyzedsixty-eightarticlespublishedin2010,whichcoveredStatsheet(the
predecessorofAutomatedInsights),aservicethatautomaticallycreatedmatchreportsand
previewsofallthreehundredforty-fiveNCAADivision1collegebasketballteams.41The
secondstudyanalyzedsixty-threearticlesthatreportedonNarrativeScience’stechnology
anddiscusseditsimpactonjournalism.42Thearticleswerepublishedfrom2010toearly
2014andthuscoveralongerandmorerecentperiodofjournalists’exposuretoautomated
news.
Figure2:Effectsofautomatedjournalism
Bothstudiesfoundthatjournalistsexpectedautomationtochangethewaytheywork,
althoughtheextenttowhichautomationtechnologywillreplaceorcomplementhuman
journalistswilldependonthetaskandtheskillsofthejournalist.Forroutineandrepetitive
tasks,suchassportsrecapsorcompanyearningsreports—merelyaconversionofrawdata
intostandardwriting—therewasaconsensusamongjournaliststhattheywillnotbeableto
competewiththespeedandscaleofautomatedcontent.Theirreactiontothisdevelopment
usuallyfiteitheranoptimisticorpessimisticframe.
GuidetoAutomatedJournalism
28ForJournalists
Accordingtotheoptimistic“machineliberatesman”frame,theabilitytoautomateroutine
tasksmayofferopportunitiestoimprovejournalisticquality.Theargumentisthatautomation
freesupjournalistsfromroutinetasksandthusallowsthemtospendmoretimeonproviding
in-depthanalysis,commentary,andinvestigativework,whichareinturnskillsthatwill
becomemoreimportant.ThisappearstobethecaseattheAssociatedPress,whichreports
thattheresourcesfreedupasaresultofautomationhavebeenusedtoimprovereportingin
otherareas(seeCaseStudy3).
Accordingtothepessimistic“machineversusman”frame,automatedjournalismcompetes
withhumanjournalists.Thatis,automatedjournalismisportrayedasyetanotherwaytocut
costsandreplacethosejournalistswhomerelycoverroutinetaskswithsoftware.Indeed,if
anincreasingshareofnewswilleventuallybeautomated,thelogicalconsequenceisthat
journalistswhousedtocoversuchcontentwillneedtoeitherproduceabetterproductor
focusontasksandskillsforwhichhumansoutperformalgorithms.AsReginaldChuatold
me,“journalistshavetoaskthemselveswhattheybringtothetable.”
Intheircoverageofautomatedjournalism,journalistscommonlyjudgedthewritingqualityof
automatedcontentaspooror,atbest,“goodenough.”Theyfurtheremphasizedhumans’
abilitytowritesophisticatednarrativesastheirowncompetitiveadvantage.Yet,although
humanwritingiscertainlysuperiortoautomatedcontent,atleasttodate,thisdebateis
somewhatmisleading.Forone,storytellingisnotamongthemostimportantskillsthat
journalistscommonlymentionwhendefiningtheirprofession;thosementionedinsteadare
factorswherealgorithmsexcel,suchasobjectivity,simplification,andspeed.43More
importantly,theargumentoverlooksthefactthatautomatednewsismostusefulin
repetitive,routine,andfact-basedstoriesforwhichthequalityofthewritingmightnotbethat
essential.Forexample,whenseekingfinancialnews,readersaremostinterestedinquickly
obtaininginformation.Insuchsituations,complexandsophisticatedwritingmayevenbe
counterproductive,makingtheinformationhardertounderstand.Thisis,ofcourse,the
reasonwhymuchoftheexistingfinancialnewswritingisratherroutineinitsfollowingof
predefinedtemplatesandisthusdifficulttodistinguishfromautomatednews(seeTextboxI).
Rather,journalistsarebestadvisedtofocusontasksthatalgorithmscannotperform.Inthe
future,humanandautomatedjournalismwilllikelybecomecloselyintegratedandforma
relationshipthatReginaldChuareferstoasa“man-machinemarriage.”Accordingtothis
view,algorithmswillanalyzedata,findinterestingstories,andprovideafirstdraft,which
journalistswillthenenrichwithmorein-depthanalyses,interviewswithkeypeople,and
behind-the-scenesreporting.AnearlyexamplecanbefoundincrimereportingbytheLos
AngelesTimes’sHomicideReport(CaseStudy1),inwhichanalgorithmprovidesbasic
facts,suchasthedate,location,time,age,gender,race,andjurisdictionofahomicide.
Then,inthesecondstep,journalistscanpickthemostinterestingstoriesandaddahuman
touchbyprovidingdetailsaboutthevictim’slifeandfamily.44
GuidetoAutomatedJournalism
29ForJournalists
Journalistswillalsotakeovernewroleswithintheprocessofautomatingnewsproduction.
Forexample,theAssociatedPressrecentlyhiredaso-calledautomationeditor,whosejobis
toidentifyinternalprocessesthatcanbeautomated.Whenitcomestodevelopingnews-
generatingalgorithms,amajorchallengeisdefiningtherulesandcriteriathatanalgorithmis
tofollowwhencreatingastoryfromdata.Whileasportsjournalistmayknowfrom
experiencewhichmomentsinaparticularbaseballgamearegame-changing,itcanbe
difficulttotranslatethisknowledgeintoarule-basedsystemthatcanapplytoallbaseball
games.Thistaskrequiresanalyticthinking,creativity,andacertainunderstandingof
statistics.Similarly,so-calledmeta-writersarerequiredtotrainthealgorithmsbydefining
whichwordstousefordescribingaparticularevent(e.g.,whenaleadislargeorsmall)or
determiningthestory’sgeneralstructure(e.g.,theheadlineinformswhowonthegame,the
firstparagraphsummarizesthescoreandkeyevents,therestofthearticleprovidesdetails,
etc.).
GuidetoAutomatedJournalism
30ForJournalists
ForNewsConsumers
Advocatesofautomatedjournalismarguethatthetechnologybenefitsnewsconsumersby
providingnewcontentthatwaspreviouslyunavailableandpersonalizesthatcontenttomeet
theneedsoftheindividualconsumer.Thisraisestwoimportantquestions.First,howdo
newsconsumersperceivethequalityofautomatednews?Second,whatarenews
consumers’requirementsregardingalgorithmictransparency?
Qualityofautomatednews
Asnotedintheprevioussection,journalistscommonlyjudgethequalityofautomated
contentaspoororjust“goodenough”tomeetminimumexpectationsaroundclarityand
accuracyoftheprovidedinformation.Akeycriticismofautomatedcontentisthatitoften
lacksinsophisticatednarrationandsoundsratherboringandtechnical.Experimental
researchfromthreecountries,namelyGermany,Sweden,andtheNetherlands,suggests
thatconsumerperceptionsofthequalityofautomatednewsaresimilartojournalists’
judgments.Inthesestudies,participantswereaskedtoreadarticleswrittenbyeithera
humanoranalgorithmandratethemaccordingtovariousaspectsofquality.45Despite
usingvariedexperimentaldesignsandmeasures,thestudies’mainfindingsweresimilar(for
detailsseeTextboxI).First,human-writtennewstendedtoearnbetterratingsthan
automatednewsintermsofreadability.Second,automatednewsratedbetterthanhuman-
writtennewsintermsofcredibility.Third,andperhapsmostimportant,differencesinthe
perceivedqualityofhuman-writtenandautomatednewswererathersmall.
TextboxI:EvidenceonthePerceivedQualityofAutomatedNews
Inthefirststudyofitskind,ChristerClerwallfromKarlstadUniversityinSwedenanalyzed
howpeopleperceivethequalityofnewsarticlesiftheyareignorantofthearticle’ssource.46
Theexperimentaldesignreflectedasituationinwhichpublishersdidnotbylinenewsstories,
apracticethatisnotuncommonforwirestoriesandautomatednews.47Clerwallpresented
forty-sixSwedishundergraduatesinmediaandcommunicationstudieswithanarticlethat
providedarecapofanAmericanfootballgame.Onegroupsawanarticlegeneratedbyan
algorithm,andtheremainingparticipantssawoneahumanjournalisthadwritten.Noneof
theparticipantsknewwhetherahumanoralgorithmhadwrittenthearticleheorshewas
seeing.ThearticleswerewritteninEnglish(andthusnotintheparticipants’firstlanguage),
containednopictures,andwereapproximatelyofthesamelength.Participantsratedthe
articlealongvariouscriteriathatmeasuredcredibilityandreadability.Then,theyhadto
GuidetoAutomatedJournalism
31ForNewsConsumers
guesswhetherthearticlewaswrittenbyajournalistorgeneratedbyacomputer.
Interestingly,participantswereunabletocorrectlyidentifythearticle’ssource.Furthermore,
theautomatednewsarticleratedhigherthanthehuman-writtenoneintermsofcredibility
butlowerintermsofreadability.Ingeneral,however,differencesinqualityratingswere
small.
Theresultsmightseemsurprising.Communicationstudents,whowouldbeexpectedto
haveahigherlevelofmedialiteracythanaveragenewsconsumers,wereunableto
distinguishbetweenhuman-writtenandautomatedarticles,andevenperceivedthelatteras
somewhatmorecredible.Butwhatifreadersarefullyawarethattheyarereading
automatednews?Howdoesthisinformationaffecttheirperceptionofthecontent’squality?
Twostudiesprovideanswerstothatquestion.
Thefirststudy,whichwaspresentedatthe2014Computation+JournalismSymposiumat
ColumbiaUniversity’sBrownInstitute,askedonehundredsixty-eightnewsconsumersto
rateoneoffourautomatednewsarticlesintermsofjournalisticexpertiseand
trustworthiness.48Thearticleswereeithercorrectlybylinedas“writtenbyacomputer”or
wronglyas“writtenbyajournalist.”Theywerewrittenintheparticipants’nativelanguage
(Dutch),containednopictures,andcoveredthedomainsofsportsorfinance(twoeach).
Participantswereaskedtoratethearticle’sjournalisticexpertiseandtrustworthiness.The
resultsshowedthatthemanipulationofthebylinehadnoeffectonpeople’sperceptionsof
quality.Thatis,newsconsumers’ratingsofexpertiseandtrustworthinessdidnotdiffer
dependingonwhethertheyweretoldthatthearticlewaswrittenbyahumanoracomputer.
Thesecondstudy,whichwasconductedinGermanyandpresentedatthe11thDubrovnik
MediaDaysinOctoberof2015,providesfurtherevidence.49Thisstudyusedalarger
sampleofninehundredandeighty-sixparticipants,alsovaryingtheactualarticlesourceand
itsdeclaredsource.Thatis,insteadofonlyusingautomatedarticles,theresearchersalso
obtainedratingsforhuman-writtencounterpartsonthesametopic.Participantswere
randomlyassignedtooneoffourexperimentalgroups,inwhichtheywerepresenteda
human-writtenorautomatedarticle(eithercorrectlyorwronglydeclared).Thearticleswere
writtenintheparticipants’nativelanguage(German),containednopictures,wereofsimilar
length,andfromthedomainsofsportsandfinance(oneeach).Eachparticipantsawtwo
articlesandratedtheircredibility,journalisticexpertise,andreadability.Theresultswere
similartothoseobtainedinpreviousstudies.Thatis,participants’qualityratingsdidnot
differdependingonwhetheranarticlewasdeclaredaswrittenbyahumanorcomputer.
Furthermore,automatedarticleswereratedasmorecredible,andhigherintermsof
expertise,thanthehuman-writtenarticles.Forreadability,however,theresultsshowedthe
oppositeeffect.Participantsratedhuman-writtennewssubstantiallyhigherthanautomated
news.
GuidetoAutomatedJournalism
32ForNewsConsumers
Whendiscussingpotentialreasonsforthesmalldifferences,researcherssuggestedthat
consumers’initialandperhapssubconsciousexpectationscouldhaveinfluencedtheresults
infavorofautomatednews.50Accordingtothisrationale,participantsmaynothave
expectedmuchfromautomatednewsandwerethuspositivelysurprisedwhentheir
expectationswereexceeded,whichpotentiallyledthemtoassignhigher-qualityratings.In
contrast,subjectsmayhavehadhighexpectationsforhuman-writtenarticles,butwhenthe
articlesfailedtomeasureuptothoseexpectations,theyassignedlowerratings.Ifthis
rationaleistrue,thenhuman-writtenarticlesshouldhavescoredhigherwhentheywere
wronglydeclaredasautomatednews,andviceversa.However,evidencefromtheGerman
studydoesnotsupportthisrationale.51Infact,theresultsshowtheoppositeeffect.Human-
writtennewswasperceivedlessfavorablewhenreadersweretoldthenewswasgenerated
byanalgorithm.Similarly,automatednewswasratedmorefavorablewhenreadersthought
ahumanwroteit.TheresultsthussupporttheexperiencesofJamesKotecki,headof
communicationsatAutomatedInsights,whoreportedthatnewsconsumershavehigh
standardsforautomatedcontent.Inparticular,Koteckiconjecturesthat“knowingthenewsis
automatedcanprimereaderstolookforsignsthatarobotwroteitandthereforescrutinizeit
morecarefully.”
Amorelikelyreasonforwhynewsconsumersperceiveautomatedandhuman-writtennews
tobeofsimilarqualityrelatestotheactualcontentofthearticles.Again,theGermanstudy
providesinsightsinthisregard.52Althoughhuman-writtenarticleswereperceivedas
somewhatmorereadablethanautomatedones,peopledidnotparticularlyenjoyreading
eitherofthem.Theseresultsmightindicateageneraldissatisfactionwithnewswriting,at
leastforthetopicsoffinanceandsports,whichwerethefocusofthestudy.Suchtopicsare
routineandrepetitivetasks,oftenperformedbynovicejournalistswhoneedtowritealarge
numberofstoriesasquicklyaspossible.Asaresult,routinenewswritingoftencomesdown
toasimplerecitationoffactsandlackssophisticatedstorytellingandnarration.Sincethe
algorithmsthatgenerateautomatedcontentareprogrammedtostrictlyfollowsuchstandard
conventionsofnewswriting,thelogicalconsequenceisthattheresultingarticlesreflect
theseconventionsandthereforedonotdiffermuchfromtheirhuman-writtencounterparts.
Furthermore,ifautomatednewssucceedsindeliveringinformationthatisrelevanttothe
reader,itisnotsurprisingthatpeopleratethecontentascredibleandtrustworthy.
Inconclusion,theavailableevidencesuggeststhatthequalityofautomatednewsis
competitivewiththatofhumanjournalistsforroutine,repetitivetasks.However,itis
importanttonotethattheseresultscannotbegeneralizedtotopicsthatarenotsolelyfact-
basedandforwhichjournalistscontributevaluebyprovidinginterpretation,reasoning,and
opinion.Currently,automatedstoriesforsuchcomplexproblemsarenotyetavailable.That
said,asnotedearlier,thequalityofautomatednewswilllikelycontinuetoimprove,bothin
termsofreadabilityandtheabilitytogenerateinsightsthatgobeyondthesimplerecitation
offacts.Futurestudiesmightevenfindsmallerdifferencesbetweentherelativereadability
GuidetoAutomatedJournalism
33ForNewsConsumers
ofautomatedandhuman-writtencontent.Thatsaid,sucheffectsmaynotnecessarilypersist
asreaders’initialexcitementwiththenewtechnologymayfadeifautomatednewsthat
buildsonastaticsetofrulesfeelsredundant,especiallyifdispersedatalargescale.Inthis
case,readersmaybeagaindrawntowardfreshandcreativehumanwritingstyles,
generatingnewopportunitiesforjournalists.
Itisuptofutureresearchtotrackhowthequalityofbothautomatedandhuman-written
newswillevolveovertime.Inparticular,it’sworthlookingathowpeople’sexpectations
towardandperceptionsofsuchcontentmaychange,especiallyforcontroversialandcritical
topicsthatarenotmerelyfact-based.Futurestudiesthatanalyzepeople’srelative
perceptionofhuman-writtenandautomatednewsshouldgobeyondthepreviousworkby
focusingonthewhy:Whyisitthatautomatednewstendstobeperceivedasmorecredible
butlessreadablethanhuman-writtennews?This,ofcourse,requiresfocusingonthe
articles’actualcontentatthesentencelevelandmightrequirecollaborationwithlinguists.
Anotherinterestingapproachwouldbetousewebanalyticsdatatoanalyzeactualuser
engagementwithautomatedcontent,suchasthenumberanddurationofvisits.
Transparency
Forcriticalandcontroversialtopics,asinautomatedstoriesthatusepollingdatatowrite
aboutacandidate’schanceofwinninganelection,itiseasytoimaginethatreadersor
certaininterestgroupsmayquestionunderlyingfactsorcriticizetheanglefromwhichthe
storyisbeingtold.Similarly,whenalgorithmsareusedtocreatepersonalizedstoriesatthe
individualreaderlevel,peoplemaywanttoknowwhatthealgorithmknowsaboutthemor
howtheirstorydiffersfromwhatotheruserssee.Insuchcases,readersmayrequest
detailedinformationaboutthefunctionalityoftheunderlyingalgorithms.
ResearchersandpractitionersfromthefielddiscussedsuchquestionsinMarch2015atan
expertworkshop,AlgorithmicTransparencyintheMedia,heldattheTowCenterand
organizedbyTowFellowNicholasDiakopoulos.Inafirststep,theexpertsidentifiedfive
categoriesofinformationthatconsumersofautomatedcontentmaypotentiallyfindinterest:
humaninvolvement,theunderlyingdata,themodel,theinferencesmade,andthe
algorithmicpresence.53Forexample,readersmightwanttoknowwhoisbehindthe
automatedcontent—whatisthepurposeandintentofthealgorithm,includingeditorial
goals;whocreatedandcontrolsthealgorithm;andwhoisheldaccountableforthecontent?
Thelattermayalsoincludeinformationaboutwhichpartsofanarticlewerewrittenbya
personoralgorithm,whetherthefinalproductwasreviewedbyahumaneditorbefore
publication,and,ifso,bywhom.Regardingthesourcedata,newsorganizationscould
publishthecompleterawdataor,ifthisisnotpossible(e.g.,duetolegalreasons),provide
informationaboutthequalityofthedata,suchasitsaccuracy(orunderlyinguncertainty),
completeness,andtimeliness.Furthermore,readersmaywanttoknowwhether,andifso
GuidetoAutomatedJournalism
34ForNewsConsumers
how,thedatawerecollected,transformed,verified,andedited;whetherthedataarepublic
orprivate;whichpartsofthedatawereused(orignored)whengeneratingastory;and
whichinformationaboutthereaderwasusedifthestorywaspersonalized.Regardingthe
actualalgorithms,readersmaybeinterestedintheunderlyingmodelsandstatistical
methodsthatareusedtoidentifyinterestingeventsandinsightsfromthedata,aswellas
theunderlyingnewsvaluesthatdeterminewhichofthosemakeitintothefinalstory.
Thesequestionsprovideastartingpointforthekindofinformationnewsorganizationsmight
potentiallyrevealabouttheiralgorithmsandtheunderlyingdata.However,expertsidentified
thesequestions,sotheymaynotreflectwhataudiencesactuallythink.Infact,theremaynot
evenbeademandforalgorithmictransparencyontheuserside,asprobablyonlyfew
peopleareevenawareofthemajorrolethatalgorithmsplayinjournalism.This,ofcourse,
maychangequicklyonceautomatednewsbecomesmorewidespread,andespeciallywhen
errorsoccur.Forexample,imagineasituationinwhichanalgorithmgeneratesalarge
numberoferroneousstories,eitherduetoaprogrammingerrororbecauseitwashacked.
Suchaneventwouldimmediatelyleadtocallsforalgorithmictransparency.
Inhissummaryoftheworkshopresults,NicholasDiakopoulospointstotwoareasthat
wouldbemostfruitfulforfutureresearchonalgorithmictransparency.54First,weneedto
betterunderstandusers’demandsaroundalgorithmictransparency,aswellashowthe
disclosedinformationcouldbeusedinthepublicinterest.Second,weneedtofindwaysfor
howtobestdiscloseinformationwithoutdisturbingtheuserexperience,inparticular,for
thosewhoarenotinterestedinsuchinformation.TheNewYorkTimesoffersanexamplefor
howtoachievethelatterinits“BestandWorstPlacestoGrowUp,”whichprovides
automatedstoriesabouthowchildren’seconomicfutureisaffectedbywheretheyare
raised.55Whenusersclickonadifferentcounty,thepartsofthestorythatchangeare
highlightedforashortperiodoftime.
GuidetoAutomatedJournalism
35ForNewsConsumers
ForNewsOrganizations
Thecoverageofroutinetopicslikesportsandfinanceonlyprovidesastartingpoint.Given
theobviouseconomicbenefitsinprovidingopportunitiestocutcostsand,atthesametime,
increasethebreathofnewscontent,moremediaorganizationsarelikelytoadopt
automationtechnology.Mostlikely,automationwillsoonbeappliedtomorechallenging
subjects,suchaspublicinterestjournalism,bycoveringpoliticalandsocialissues.Infact,
theprecursorsofthisdevelopmentcanalreadybeobservedintheformofalgorithmsthat
automaticallycreatecontentonTwitter.56
Whenautomatingcontentforcriticalproblems,issuesofaccuracy,qualityofthecontent,
andtransparencyoftheunderlyingdataandproceduresbecomemoreimportant.Inafirst
attempttoaddressthesequestions,TomKentproposed“anethicalchecklistforrobot
journalism,”whichhederivedfromAP’sexperienceautomatingcorporateearningsreports,
aprojectthattookclosetooneyear.Thechecklistposestenquestionsthatnews
organizationsandeditorsneedtothinkaboutwhenautomatingcontent.57Thesequestions
considerqualityissuesrelatingtothesourcedata,thedataprocessing,andthefinaloutput.
Sourcedata
Newsorganizationsneedtoensurethat,first,theyhavethelegalrighttomodifyandpublish
thesourcedataand,second,thedataareaccurate.Dataprovidedbygovernmentsand
companiesareprobablymorereliableandlesserror-pronethanuser-generateddatalike
scoresfromlocalyouthsportingenteredintoadatabasebycoachesortheplayers’parents.
Thatsaid,asdemonstratedinthecaseofearthquakereporting(seeCaseStudy2),even
governmentdatamaycontainerrorsorfalseinformation.Dataproblemsmayalsoariseif
thestructureofthesourcedatachanges,acommonproblemfordatascrapedfrom
websites.Thus,newsorganizationsneedtoimplementdatamanagementandverification
procedures,whichcouldbeeitherperformedautomaticallyorbyahumaneditor.
Dataprocessing
Iftheunderlyingdataorthealgorithmsthatprocessthemcontainerrors,automationmay
quicklygeneratelargenumbersoferroneousstories,whichcouldhavedisastrous
consequencesforapublisher’sreputation.Newsorganizationsthereforeneedtoengagein
thoroughtestingbeforeinitialpublicationofautomatednews.Whenpublicationstarts,Kent
recommendshavinghumaneditorscheckeachstorybeforeitgoeslive,although,as
demonstratedbytheQuakebot(CaseStudy2),thisso-called“handbreak”solutionisnot
GuidetoAutomatedJournalism
36ForNewsOrganizations
error-freeeither.Oncetheerrorrateisdowntoanacceptablelevel,thepublicationprocess
canbefullyautomated,withoccasionalspotchecks.ThelatteristheapproachtheAP
currentlyusesforitscompanyearningsreports.
Output
Regardingthefinaloutput,Kentrecommendsthatthewritingmatchtheofficialstyleguideof
thepublishingorganizationandbecapableofusingvariedphrasingfordifferentstories.
Furthermore,newsorganizationsshouldbeawareoflegalandethicalissuesthatmayarise
whenthetextisautomaticallyenhancedwithvideosorimageswithoutproperchecking.For
suchcontent,publishingrightsmaynotbeavailableorthecontentmayviolatestandardsof
taste.Newsorganizationsmustalsoprovideaminimumleveloftransparencybydisclosing
thatthestorywasgeneratedautomatically,forexample,byaddinginformationaboutthe
sourceofthedataandhowthecontentwasgenerated.TheAPaddsthefollowing
informationattheendofitsfullyautomatedcompanyearningsreports:
ThisstorywasgeneratedbyAutomatedInsights(http://automatedinsights.com/ap)using
datafromZacksInvestmentResearch.AccessaZacksstockreportonACNat
http://www.zacks.com/ap/ACN.
Ofcourse,newsconsumersmaybeunfamiliarwiththesecompaniesandtheirtechnologies,
andthereforeunawarethatthecontentisprovidedbyanalgorithm.Itremainsunclear
whetherreadersactuallyunderstandthemeaningofsuchbylines.Furtherresearchonhow
theyareperceivedwouldbeuseful.Also,sincemoreandmorestoriesaretheresultof
collaborationbetweenalgorithmsandhumans,thequestionarisesofhowtoproperly
disclosewhencertainpartsofastorywereautomated.TheAPcurrentlydealswithsuch
casesbymodifyingthefirstsentenceintheabovestatementto“Elementsofthisstorywere
generatedbyAutomatedInsights.”58Thatsaid,Kentnotedthatthediscussionabouthowto
properlybylineautomatednewsmaybeatemporaryone.Onceautomatednewsbecomes
standardpractice,somepublishersmaychoosenottorevealwhichpartsofastorywere
automaticallygenerated.
Accountability
Automationadvocatesarguethatalgorithmsallowforanunbiasedaccountoffacts.This
view,however,assumesthattheunderlyingdataarecompleteandcorrectand,more
importantly,thealgorithmsareprogrammedcorrectlyandwithoutbias.Likeanyother
model,algorithmsforgeneratingautomatednewsrelyondataandassumptions,bothof
whicharesubjecttobiasesanderrors.59First,theunderlyingdatamaybewrong,biased,
orincomplete.Second,theassumptionsbuiltintothealgorithmsmaybewrongorreflectthe
GuidetoAutomatedJournalism
37ForNewsOrganizations
(consciousorunconscious)biasesofthosewhodevelopedorcommissionedthem.Asa
result,algorithmscouldproduceoutcomesthatwereunexpectedandunintended,andthe
resultingstoriescouldcontaininformationthatisinaccurateorsimplyfalse.60
Insuchsituations,itisnotenoughtostatethatanarticlewasgeneratedbysoftware,in
particularwhencoveringcriticalorcontroversialtopicsforwhichreaders’requirementsof
transparencyandaccountabilitymaybehigher.Whenerrorsoccur,newsorganizationsmay
comeunderpressuretopublishthesourcecodebehindtheautomation.Attheveryleast,
theyshouldbeabletoexplainhowastorywasgenerated,ratherthansimplystatingthat
“thecomputerdidit.”61Fromalegalstandpoint,algorithmscannotbeheldaccountablefor
errors.Theliabilityiswithanaturalperson,whichcouldbethepublisherorthepersonwho
madeamistakewhenfeedingthealgorithmwithdata.62
Whileprovidersofautomatednewscould—andinsomecasesprobablyshould—be
transparentaboutmanydetailsoftheiralgorithms,therewasconsensusamongexpertsat
theTowworkshoponalgorithmictransparencythatmostorganizationsareunlikelyto
voluntarilyprovidefulltransparency,especiallywithoutaclearvalueproposition.However,if
newsorganizationsandsoftwaredevelopersdonotfullydisclosetheiralgorithms,itremains
unclearhowtoevaluatethequalityofthealgorithmsandthecontentproduced,inparticular,
itssensitivitytochangesintheunderlyingdata.Apromisingyetcomplexapproachmightbe
reverseengineering,whichaimsatdecodinganalgorithm’ssetofrulesbyvaryingcertain
inputparametersandassessingtheeffectsontheoutcome.63Anotherimportantquestion
forfutureresearchiswhether,andifsotowhatextent,usersofautomatedcontentultimately
careabouttransparency,inwhichcasetheprovisionofsuchinformationcouldbea
competitiveadvantagebyincreasingapublisher’scredibilityandlegitimacy.64
GuidetoAutomatedJournalism
38ForNewsOrganizations
ForSociety
Duetoitsabilitytocreatecontentquickly,cheaply,atlargescale,andpotentially
personalizedtotheneedsofindividualreaders,automatedjournalismisexpectedto
substantiallyincreasetheamountofavailablenews.Whilethisdevelopmentmightbe
helpfulinmeetingpeople’sdemandforinformation,itcouldalsofurtherincreasepeople’s
burdentofindcontentthatismostrelevanttothem.Tocopewiththeresultinginformation
overload,theimportanceofsearchenginesandpersonalizednewsaggregators,suchas
GoogleNews,arelikelytoincreasefurther.
Searchengineprovidersclaimtoanalyzeindividualuserdata(e.g.,locationandhistorical
searchbehavior)toprovidenewsconsumerswiththecontentthatmostintereststhem.In
doingso,differentnewsconsumersmightreceivedifferentresultsforthesamekeyword
searches,whichwouldbeartheriskofpartialinformationblindness,theso-called“filter
bubble”hypothesis.65Accordingtothisidea,personalizationwillleadindividualsto
consumemoreandmoreofthesameinformation,asalgorithmsprovideonlycontentthat
usersliketoreadoragreewith.Consequently,peoplewouldbelesslikelytoencounter
informationthatchallengestheirviewsorcontradictstheirinterests,whichcouldcarryrisks
fortheformationofpublicopinioninademocraticsociety.
Thefilterbubblehypothesishasbecomewidelypopularamongacademics,aswellasthe
generalpublic.EliPariser’s2011book,TheFilterBubble:HowtheNewPersonalizedWeb
IsChangingWhatWeReadandHowWeThink,66hasnotonlybecomeaNewYorkTimes
bestsellerbuthasattractedmorethan1,000citationsonGoogleScholarthroughOctober
2015.However,despitethetheory’spopularityandappeal,empiricalevidenceavailableto
datedoesnotsupporttheexistenceofthefilterbubble:Moststudiesfindeitherno,oronly
verysmall,effectsofpersonalizationonsearchresults.67Ofcourse,thismaychangeasthe
amountofavailablecontent—andthustheneedforpersonalization—increasesand
algorithmsforpersonalizingcontentcontinuetoimprove.Thestudyofpotentialeffectsfrom
personalization,whetherpositiveornegative,remainsanimportantareaofresearch.
Moregenerally,afurtherincreaseandmoresophisticateduseofautomatedjournalism
wouldeventuallyraisebroaderquestionsthatfutureresearchmustaddress.Ifalgorithms
wereemployedforpublicinterestjournalism,questionswillariseastowhetherwecanand
shouldtrustalgorithmsasamechanismforprovidingchecksandbalances,identifying
importantissues,andestablishingacommonagendaforthedemocraticprocessofpublic
opinionformation.Furthermore,futureresearchwillneedtostudytheimplicationsfor
democracyifalgorithmsaretotakeoverjournalism’sroleasawatchdogforgovernment.
GuidetoAutomatedJournalism
39ForSociety
SummaryandOutlook
Automatedjournalismcurrentlyworkswellinproducingroutinenewsstoriesforrepetitive
topics,forwhichclean,accurate,andstructureddataareavailable.Insuchsituations,
algorithmsareabletogeneratenewsfaster,atalargerscale,andwithfewererrorsthan
humanjournalists.Furthermore,algorithmscanusethesamedatatotellstoriesfrom
differentangles,inmultiplelanguages,andpersonalizedtotheneedsandpreferencesof
theindividualreader.Also,softwareprovidershavestartedtoreleasetoolsthatallowusers
toautomaticallycreatestoriesfromtheirowndata.
Automatedjournalismcannotbeusedfordomainswherenodataareavailableandis
challengingwheredataqualityispoor.Furthermore,algorithmsderiveinsightsfromdataby
applyingpredefinedrulesandstatisticalmethods(e.g.,identifyingoutliersandcorrelations)
butcannotexplainnewphenomenaorestablishcausality.Thatis,whilealgorithmscan
describewhatishappening,theycannotprovideinterpretationsofwhythingsare
happening.Algorithmsarethuslimitedintheirabilitytoobservesocietyandfulfilljournalistic
taskssuchasorientationandpublicopinionformation.
Automationwilllikelychangethewayjournalistswork,althoughtheextenttowhich
technologywillreplaceorcomplementjournalistswilldependonthetaskandskillsofthe
journalist.Inthefuture,humanandautomatedjournalismwilllikelybecomeclosely
integratedandforma“man-machinemarriage.”Journalistsarebestadvisedtofocuson
tasksthatalgorithmscannotperform,suchasin-depthanalyses,interviewswithkeypeople,
andinvestigativereporting.Whileautomationwillprobablyreplacejournalistswhomerely
coverroutinetopics,thetechnologyisalsogeneratingnewjobswithintheprocessof
developingnews-generatingalgorithms.
Thewidespreadadoptionwillultimatelydependonwhethernewsconsumerslikereading
thecontent.Evidenceavailabletodate—whichislimitedtotopicswhereautomation
technologyisalreadybeingusedonalargescale(e.g.,sportsandfinance)—showsthat
whilepeoplerateautomatednewsasslightlymorecrediblethanhuman-writtennews,they
donotparticularlyenjoyreadingitsincethewritingisperceivedasratherboringanddry
(seeTextbox1).Therefore,thetechnologyiscurrentlymostsuitedfortopicswhere(a)
providingfactsinaquickandefficientwayismoreimportantthansophisticatednarration
(e.g.,financialnews)or(b)newsdidnotpreviouslyexistsoconsumershavelow
expectationsregardingwritingquality.Thatsaid,thewritingqualityofautomatednewsis
likelytoimprove,asnaturallanguagegenerationtechnologyadvancesfurther.
GuidetoAutomatedJournalism
40SummaryandOutlook
Otherimportantquestionsfortheuseofautomatedjournalisminnewsroomsrelatetoissues
ofalgorithmictransparencyandaccountability.Inparticular,littleisknownaboutwhether
newsconsumers(needorwantto)understandhowalgorithmswork,oraboutwhich
informationtheyusetogeneratecontent.Furthermore,apartfromsomebasicguidelines
andprinciplesthatshouldbefollowedwhenusingautomationtechnology,there’slittledata
aboutwhichinformationnewsorganizationsshouldmaketransparentandhowtheir
algorithmswork(e.g.,decisionrulesorunderlyingdata).Suchinformationmaybecome
particularlyrelevantinsituationswhere(a)errorsoccurand(b)contentispersonalizedto
theneedsandpreferencesoftheindividualnewsconsumer.Finally,apotentialincreasein
personalizednewsislikelytoreemphasizepriorconcernsregardingfilterbubblesor
fragmentationofpublicopinion.
Automatedjournalismhasarrivedandislikelyheretostay.Thekeydriversareanever-
increasingavailabilityofstructureddata,aswellasnewsorganizations’aimtocutcosts
whileatthesametimeincreasingthequantityofnews.Thisguidesummarizedthestatus
quoofautomatedjournalism,discussedkeyquestionsandpotentialimplicationsofits
adoption,andpointedoutavenuesforfutureresearch.Inparticular,conductingfuture
researchintoquestionsabouthowautomationwillchangejournalists’rolesandrequired
skills,hownewsorganizationsandconsumersshouldandwilldealwithissuesrelatingto
algorithmictransparencyandaccountability,andhowawidespreaduseofautomatedand
personalizedcontentwillaffectpublicopinionformationinademocraticsocietywouldbe
valuable.Furthermore,thatresearchshouldtrackhowthewritingqualityofautomatednews
evolvesovertime.Inparticular,itmightconsiderhowpeople’sexpectationstowardand
perceptionsofsuchcontentchange—especiallyforcontroversialandcriticaltopics,suchas
electioncampaigncoverage,whicharenotmerelyfact-basedandinvolveuncertainty.
GuidetoAutomatedJournalism
41SummaryandOutlook
FurtherReading
Foranotherintroductiontothetopic:
CelesteLecompte,“AutomationintheNewsroom:HowAlgorithmsAreHelpingReporters
ExpandCoverage,EngageAudiences,andRespondtoBreakingNews,”NiemanReports,
69(3)(2015):32–45,http://niemanreports.org/articles/automation-in-the-newsroom/.
Ifyouareinterestedinhowjournalistswriteaboutautomatednews:
MattCarlson,“TheRoboticReporter,”DigitalJournalism,3(3)(2015):416–431.
Forevidenceonhownewsconsumersperceivethequalityofautomatednews:
AndreasGraefe,etal.,“PerceptionofAutomatedComputer-GeneratedNews:Credibility,
Expertise,andReadability,”2015.Paperpresentedatthe11thDubrovnikMediaConference
Days:ArtificialIntelligence,Robots,andMedia,October30–31.
Forguidelinesthatnewsorganizationsshouldfollowwhenimplementingautomated
journalism:
TomKent,“AnEthicalChecklistforRobotJournalism,”2015.Availableat
https://medium.com/@tjrkent/an-ethical-checklist-for-robot-journalism-1f41dcbd7be2.
Foradiscussionofissuesregardingalgorithmictransparencyandaccountability:
NicholasDiakopoulos,“AccountabilityinAlgorithmicDecision-Making:AViewfrom
ComputationalJournalism,”CommunicationsoftheACM(forthcomingin2016).
Foradiscussionoflegalimplicationsofautomatedjournalism:
LinWeeks,“MediaLawandCopyrightImplicationsofAutomatedJournalism,”Journalof
IntellectualPropertyandEntertainmentLaw,4(1)(2014):67–94.
Foradiscussionofthetechnology’spotentialsandlimitations,andanoverviewof
softwareproviders:
KonstantinNicholasDörr,“MappingtheFieldofAlgorithmicJournalism,”DigitalJournalism,
3November2015,availableat
http://www.tandfonline.com/doi/abs/10.1080/21670811.2015.1096748?journalCode=rdij20.
GuidetoAutomatedJournalism
42FurtherReading
AbouttheAuthor
Dr.AndreasGraefeiscurrentlyaresearchfellowattheTowCenterforDigitalJournalismat
ColumbiaUniversityandattheDepartmentofCommunicationScienceandMediaResearch
atLMUMunich,Germany.Healsoholdsaprofessorshipincustomerrelationship
managementatMunich’sMacromediaUniversity.Andreasstudiedeconomicsand
informationscienceattheUniversitiesofRegensburgandZurichandreceivedhisPh.D.in
economicsfromtheUniversityofKarlsruhe.HehasheldresearchpositionsattheInstitute
forTechnologyAssessmentandSystemsAnalysisatKarlsruheInstituteofTechnology,as
wellasvisitingscholarpositionsattheUniversityofPennsylvania’sWhartonSchooland
ColumbiaUniversity’sEuropeanInstitute.AfterfinishinghisPh.D.,Andreasworkedinthe
privatesectorasaseniormanagerfortheGermanpay-tvcompanySkyDeutschland,where
heledtheCRMResourceManagementDepartment.Andreashasstudieddiversetopics,
suchasforecasting,decision-making,automatedjournalism,technologyassessment,public
opinion,advertising,leadership,andhealthcare.Hisworkhasbeenpublishedinleading
journalsinvariousfields,suchasPublicOpinionQuarterly,InternationalJournalof
Forecasting,JournalofBehavioralDecisionMaking,JournalofBusinessResearch,
ElectoralStudies,andBMCMedicalInformaticsandDecisionMaking.Andreascanbe
reachedatgraefe.andreas@gmail.com.
GuidetoAutomatedJournalism
43AbouttheAuthor
Citations
1. KonstantinNicholasDörr,“MappingtheFieldofAlgorithmicJournalism,”Digital
Journalism.
2. WillOremus,“WhyRobot?”Slate,2015,
http://www.slate.com/articles/technology/future_tense/2015/02/automated_insights_ap_
earnings_reports_robot_journalists_a_misnomer.html.
3. “AP,NCAAtoGrowCollegeSportsCoverageWithAutomatedGame
Stories,”AssociatedPress,4March2015,http://www.ap.org/Content/Press-
Release/2015/AP-NCAA-to-grow-college-sports-coverage-with-automated-game-
stories.
4. PhilipM.Napoli,“AutomatedMedia:AnInstitutionalTheoryPerspectiveonAlgorithmic
MediaProductionandConsumption,”CommunicationTheory,3(2014):340–360;
NicholasDiakolpoulos,“TowardsaStandardforAlgorithmicTransparencyinthe
Media,”TowCenterforDigitalJournalism,27April2015,http://towcenter.org/towards-a-
standard-for-algorithmic-transparency-in-the-media/;ChristopherW.Anderson,
“TowardsaSociologyofComputationalandAlgorithmicJournalism,”NewMedia&
Society,7(2013):1005–1021.
5. ArjenvanDalen,“TheAlgorithmsBehindtheHeadlines,”JournalismPractice,5–6
(2012):648–658;MattCarlson,“TheRoboticReporter,”DigitalJournalism,3(2015):
416–431.
6. StaceyVanekSmith,“AnNPRReporterRacedaMachinetoWriteaNewsStory.Who
Won?”NPR,29May2015,
http://www.npr.org/sections/money/2015/05/20/406484294/an-npr-reporter-raced-a-
machine-to-write-a-news-story-who-won.
7. TheNewYorkTimes,“DidaHumanoraComputerWriteThis?”7March
2015,http://www.nytimes.com/interactive/2015/03/08/opinion/sunday/algorithm-human-
quiz.html?smid=pl-share&_r=0.
8. TheDailyShow,“RobotJournalists,”2015,http://www.cc.com/video-clips/fh76l0/the-
daily-show-with-trevor-noah-robot-journalists.
9. StevenLevy,“CananAlgorithmWriteaBetterNewsStoryThanaHumanReporter?”
Wired,24April2012,http://www.wired.com/2012/04/can-an-algorithm-write-a-better-
news-story-than-a-human-reporter/.
GuidetoAutomatedJournalism
44Citations
10. EhudReiterandRobertDale,BuildingNaturalLanguageGenerationSystems
(Cambridge:CambridgeUniversityPress,2000);Dörr,“MappingtheFieldofAlgo-
rithmicJournalism.”
11. Dörr,“MappingtheFieldofAlgorithmicJournalism.”
12. HarryR.Glahn,“Computer-ProducedWordedForecasts,”BulletinoftheAmer-ican
MeteorologicalSociety,12(1970):1126–1131.
13. AlfredHermidaandMaryLynnYoung,“FromMr.andMrs.OutliertoCentral
Tendencies,”DigitalJournalism,3(2015):381–397.
14. Levy,“CananAlgorithmWriteaBetterNewsStoryThanaHumanReporter?”
15. ScottKlein,“HowToEdit52,000StoriesatOnce,”ProPublica,2013,
https://www.propublica.org/nerds/item/how-to-edit-52000-stories-at-once.
16. “AP,NCAAtoGrowCollegeSportsCoverageWithAutomatedGameStories.”
17. HermidaandYoung,“FromMr.andMrs.OutliertoCentralTendencies.”
18. Ibid.
19. Ibid.
20. Diakolpoulos,“TowardsaStandardforAlgorithmicTransparencyintheMedia.”
21. “NetflixMissesStreet2QForecasts,”AssociatedPress,15July2015,
http://finance.yahoo.com/news/netflix-misses-street-2q-forecasts-202216117.html.
22. CelesteLecompte,“AutomationintheNewsroom,”NiemanFoundation,1September
2015,http://niemanreports.org/articles/automation-in-the-newsroom/.
23. JoannaPlucinska,“HowanAlgorithmHelpedtheLATScoopMonday’sQuake,”
ColumbiaJournalismReview,18March2014,
http://www.cjr.org/united_states_project/how_an_algorithm_helped_the_lat_scoop_mon
days_quake.php.
24. BrandonMercer,“TwoPowerfulEarthquakesDidNotHitNorthernCalifornia,
AutomatedQuakeAlertsFailUSGS,LATimesAfterDeepJapanQuake,”CBSSFBay
Area,30May2015,http://sanfrancisco.cbslocal.com/2015/05/30/4-8-and-5-5-
magnitude-earthquakes-did-not-hit-northern-california-automated-quake-alerts-fail-usgs-
la-times-a-2nd-and-3rd-time/.
25. DanielC.Bowden,PaulS.Earle,andMichelleGuy,“TwitterEarthquakeDetection:
EarthquakeMonitoringinaSocialWorld,”AnnalsofGeophysics,6(2011):708–715.
GuidetoAutomatedJournalism
45Citations
26. DavidLazeretal.,“TheParableofGoogleFlu:TrapsinBigDataAnalysis,”Science,
6176(2014):1203–1205.
27. JamesKotecki,“NewData-drivenWritingPlatformEnablesProfessionalstoCreate
PersonalizedContentatUnprecedentedScale,”AutomatedInsights,20October2015,
http://www.prweb.com/releases/2015/10/prweb13029986.htm.
28. Lazeretal.,“TheParableofGoogleFlu:TrapsinBigDataAnalysis.”
29. NoamLemelshtrichLatar,“TheRobotJournalistintheAgeofSocialPhysics:TheEnd
ofHumanJournalism?”InTheNewWorldofTransitionedMedia,ed.GailEinav(New
York:Springer,2015),65–80,http://link.springer.com/chapter/10.1007/978-3-319-
09009-2_6.
30. Ibid.
31. KurtSchlegel,“HypeCycleforBusinessIntelligenceandAnalytics,2015,”4August
2015,https://www.gartner.com/doc/3106118/hype-cycle-business-intelligence-analytics.
32. Latar,“TheRobotJournalistintheAgeofSocialPhysics:TheEndofHuman
Journalism?”
33. ErinMediganWhite,“AutomatedEarningsStoriesMultiply,”AssociatedPress,29
January2015,https://blog.ap.org/announcements/automated-earnings-stories-multiply.
34. Ibid.
35. “CaseStudy:Forbes,”NarrativeScience,19May2015,
http://resources.narrativescience.com/h/i/83535927-case-study-forbes.
36. Dörr,“MappingtheFieldofAlgorithmicJournalism.”
37. NicoleS.Cohen,“FromPinkLipstoPinkSlime:TransformingMediaLaborinaDigital
Age,”TheCommunicationReview,2(2015):98–122.
38. AlexanderSiebert,“RoboterjournalismusimJahre2020—AchtThesen,”TheHuffington
Post,8August2014,http://www.huffingtonpost.de/alexander-
siebert/roboterjournalismus-im-jahre-2020---acht-thesen_b_5655061.html.
39. Levy,“CananAlgorithmWriteaBetterNewsStoryThanaHumanReporter?”
40. Dörr,“MappingtheFieldofAlgorithmicJournalism.”
41. Dalen,“TheAlgorithmsBehindtheHeadlines.”
42. Carlson,“TheRoboticReporter.”
GuidetoAutomatedJournalism
46Citations
43. SiegfriedWeischenberg,MajaMalik,andArminScholl,“JournalisminGermanyinthe
21stCentury,”inTheGlobalJournalistinthe21stCentury,ed.DavidWeaverandLars
Willnat(NewYork:Routledge,2012),205–219.
44. HermidaandYoung,“FromMr.andMrs.OutliertoCentralTendencies.”
45. ChristerClerwall,“EntertheRobotJournalist,”JournalismPractice,5(2014):519–531;
HillevanderKaaandEmielKrahmer,“JournalistVersusNewsConsumer:The
PerceivedCredibilityofMachineWrittenNews,”ComputationJournalismConference,
ColumbiaUniversity,NewYork,2014;AndreasGraefeetal.,“Readers’Perceptionof
Computer-WrittenNews:Credibility,Expertise,andReadability,”DubrovnikMediaDays
Conference,UniversityofDubrovnik,2015.
46. Clerwall,“EntertheRobotJournalist.”
47. LanceUlanoff,“NeedtoWrite5MillionStoriesaWeek?RobotReporterstothe
Rescue,”Mashable,1July2014,http://mashable.com/2014/07/01/robot-reporters-add-
data-to-the-five-ws/#jlMMJqbFtSq4.
48. KaaandKrahmer,“JournalistVersusNewsConsumer:ThePerceivedCredibilityof
MachineWrittenNews.”
49. Graefeetal.,“Readers’PerceptionofComputer-WrittenNews:Credibility,Expertise,
andReadability.”
50. KaaandKrahmer,“JournalistVersusNewsConsumer:ThePerceivedCredibilityof
MachineWrittenNews.”
51. Graefeetal.,“Readers’PerceptionofComputer-WrittenNews:Credibility,Expertise,
andReadability.”
52. Ibid.
53. Diakolpoulos,“TowardsaStandardforAlgorithmicTransparencyintheMedia.”
54. Ibid.
55. GregorAischetal.,“TheBestandWorstPlacestoGrowUp:HowYourArea
Compares,”TheNewYorkTimes,3May2015,
http://www.nytimes.com/interactive/2015/05/03/upshot/the-best-and-worst-places-to-
grow-up-how-your-area-compares.html?_r=0.
56. TetyanaLokotandNicholasDiakopoulos,“NewsBots:AutomatingNews
andInformationDisseminationonTwitter,”DigitalJournalism,15September2013,
http://dx.doi.org/10.1080/21670811.2015.1081822.
GuidetoAutomatedJournalism
47Citations
57. TomKent,“AnEthicalChecklistforRobotJournalism,”Medium,24February2015,
https://medium.com/@tjrkent/an-ethical-checklist-for-robot-journalism-1f41dcbd7be2.
58. DavidKoenig,“Exxon3QProfitFallsbyNearlyHalfAmidLowOilPrices,”Associated
Press,30October2015,
http://www.salon.com/2015/10/30/exxon_3q_profit_falls_by_nearly_half_amid_low_oil_p
rices/.
59. Lazeretal.,“TheParableofGoogleFlu:TrapsinBigDataAnalysis.”
60. Diakolpoulos,“TowardsaStandardforAlgorithmicTransparencyintheMedia”;
NicholasDiakolpoulos,“AlgorithmicAccountability:JournalisticInvestigationof
ComputationalPowerStructures,”DigitalJournalism,3(2015):398–415.
61. Kent,“AnEthicalChecklistforRobotJournalism.”
62. LinWeeks,“MediaLawandCopyrightImplicationsofAutomatedJournalism,”Journal
ofIntellectualPropertyandEntertainmentLaw,1(2014):67–94;Pieter-JanOmbelet,
AleksandraKuczerawy,andPeggyValcke,“SupervisingAutomatedJournalistsinthe
Newsroom:LiabilityforAlgorithmicallyProducedNewsStories,”DubrovnikMediaDays:
ArtificialIntelligence,RobotsandtheMediaConference,UniversityofDubrovnik,2015.
63. Diakolpoulos,“AlgorithmicAccountability:JournalisticInvestigationofComputational
PowerStructures.”
64. Diakolpoulos,“TowardsaStandardforAlgorithmicTransparencyintheMedia.”
65. EliPariser,TheFilterBubble:WhattheInternetIsHidingFromYou(NewYork:Penguin
Press,2011).
66. Ibid.
67. LadaAdamic,EytanBakshy,andSolomonMessing,“ExposuretoIdeologicallyDiverse
NewsandOpiniononFacebook,”Science,6239(2015):1130–1132;SethFlaxman,
SharadGoel,andJustinRao,“FilterBubbles,EchoChambers,andOnlineNews
Consumption,”2015,https://5harad.com/papers/bubbles.pdf;FlorianArendt,Mario
Haim,andSebastianScherr,“AbyssorShelter?OntheRelevanceofWebSearch
Engines’SearchResultsWhenPeopleGoogleforSuicide,”HealthCommunication,
2015;MartinFeuz,MatthewFuller,andFelixStalder,“PersonalWebSearchinginthe
AgeofSemanticCapitalism:DiagnosingtheMechanismsofPersonalisation,”First
Monday,2(2011),http://firstmonday.org/ojs/index.php/fm/article/view/3344/2766.
GuidetoAutomatedJournalism
48Citations