Content uploaded by Panos Ipeirotis
Author content
All content in this area was uploaded by Panos Ipeirotis on Feb 24, 2014
Content may be subject to copyright.
AnalyzingtheAmazonMechanicalTurkMarketplace
PanagiotisG.Ipeirotis1
NewYorkUniversity
Introduction
AmazonMechanicalTurk(AMT)isapopularcrowdsourcingmarketplace,introducedbyAmazonin2005.The
marketplaceisnamedafter,“MechanicalTurk”an18thcentury“automatic”chessplayingmachine,whichwas
handilybeatinghumansinchessgames.Ofcourse,therobotwasnotusinganyartificialintelligencealgorithms
backthen.Thesecretofthe“MechanicalTurk”machinewasahumanoperator,hiddeninsidethemachine,who
wastherealintelligencebehindtheintelligentbehaviorexhibitedbythemachine.
TheAmazonMechanicalTurkisalsoamarketplaceforsmalltasksthatcannotbeeasilyautomatedtoday.For
example,humanscaneasilytelliftwodifferentdescriptionscorrespondtothesameproduct,caneasilytagan
imagewithdescriptionsofitscontent,orcaneasilytranscribewithhighqualityanaudiosnippet.However,such
simpletasksforhumansareoftenveryhardforcomputers.UsingAMT,itispossibleforcomputerstousea
programmableAPItoposttasksonthemarketplace,whicharethenfulfilledbyhumanusers.ThisAPI‐based
interactiongivestheimpressionthatthetaskcanbeautomaticallyfulfilled,hencethename“MechanicalTurk.”
Inthemarketplace,employersareknownas“requesters”posttasks,whicharecalled“HITs,”anacronymof
“HumanIntelligenceTasks.”TheHITsarethenpickedupbyonlineusers,referredtoas“workers,”whocomplete
theminexchangeforasmallpayment,typicallyafewcentsperHIT.
Sincetheconceptofcrowdsourcingisrelativelynew,manypotentialparticipantshavequestionsabouttheAMT
marketplace.Forexample,acommonsetofquestionsthatpopupinan“introductiontocrowdsourcingand
AMT”sessionarethefollowing:
Whoaretheworkersthatcompletethesetasks?
Whattypeoftaskscanbecompletedinthemarketplace?
Howmuchdoesitcost?
HowfastcanIgetresultsback?
HowbigistheAMTmarketplace?
Forthefirstquestion,aboutthedemographicsoftheworkers,pastresearch(Ipeirotis,2010;Rossetal.2010)
indicatedthattheworkersthatparticipateonthemarketplacearemainlycomingfromtheUnitedStates,withan
increasingproportioncomingfromIndia.Ingeneral,theworkersarerepresentativeofthegeneralInternetuser
populationbutaregenerallyyoungerand,correspondingly,havelowerincomeandsmallerfamilies.
1PanagiotisG.IpeirotisisanAssociateProfessorattheDepartmentofInformation,Operations,andManagementSciencesat
LeonardN.SternSchoolofBusinessofNewYorkUniversity.Hisrecentresearchinterestsfocusoncrowdsourcing.He
receivedhisPh.D.degreeinComputerSciencefromColumbiaUniversityin2004,withdistinction.Hehasreceivedtwo
MicrosoftLiveLabsAwards,two"BestPaper"awards(IEEEICDE2005,ACMSIGMOD2006),two"BestPaperRunnerUp"
awards(JCDL2002,ACMKDD2008),andisalsoarecipientofaCAREERawardfromtheNationalScienceFoundation.This
workwassupportedbytheNationalScienceFoundationunderGrantNo.IIS‐0643846
Atthesametime,theanswersfortheotherquestionsremainlargelyanecdotalandbasedonpersonal
observationsandexperiences.Tounderstandbetterwhattypesoftasksarebeingcompletedtodayusing
crowdsourcingtechniques,westartedcollectingdataaboutthemarketplace.Here,wepresentapreliminary
analysisofthefindingsandprovidedirectionsforinterestingfutureresearch.
Therestofthepaperisstructuredasfollows.First,wedescribebrieflythedatacollectionprocessandthe
characteristicsofthecollecteddataset.Thenwedescribethecharacteristicsoftherequestersintermsofactivity
andpostedtasks,andwealsoprovideashortanalysisofthemostcommontasksthatarebeingcompletedon
MechanicalTurktoday.Next,weanalyzethepricedistributionsofthepostedHITsandanalyzetheHITposting
andcompletiondynamicsofthemarketplace.Weconcludebypresentingananalysisofthecompletiontime
distributionoftheHITsonMechanicalTurkandpresentsomedirectionforfutureresearchandsomedesign
improvementsthatcanimprovetheefficiencyandeffectivenessofthemarketplace.
DataCollection
WestartedgatheringdataaboutthemarketplaceofAMTinJanuary2009andwekeepcollectingdatauntiltoday.
Theprocessofcollectingdataisthefollowing:Everyhourwecrawledthelistof“HITsAvailable”onAMTandwe
keptthestatusofeachavailableHITgroup(groupid,requester,title,description,keywords,rewards,numberof
HITsavailablewithintheHITgroup,qualificationsrequired,timeofexpiration).WealsostoredtheHTMLcontent
ofeachHIT.Followingthisapproach,wecouldfindthenewHITsbeingpostedovertime,thecompletionrateof
eachHIT,andthetimethattheydisappearfromthemarketeitherbecausetheyhavebeencompletedorbecause
theyexpiredorbecauserequestercanceledandremovedtheremainingHITsfromthemarket.2Ashortcomingof
thisapproachisthatitcannotmeasuretheredundancyofthepostedHITs.So,ifasingleHITneedstobe
completedbymultipleworkers,wecanonlyobserveitasasingleHIT.
Thedataarealsopubliclyavailablethroughthewebsitehttp://www.mturk‐tracker.com.
FromtheperiodofJanuary2009tillApril2010,wecollected165,368HITgroups,withatotalof6,701,406HITs,
from9,436requesters.ThetotalvalueofthepostedHITswas$529,259.Thesenumbers,ofcourse,donot
accountfortheredundancyofthepostedHITs,orforHITsthatwerepostedanddisappearedbetweenourcrawls.
Nevertheless,theyshouldbegoodapproximations(withinanorderofmagnitude)oftheactivityofthe
marketplace.
2IdentifyingexpiredHITsiseasy,asweknowtheexpirationtimeofaHIT.Identifying“cancelled”HITsisalittletrickier:we
needtomonitortheusualcompletionrateofaHITovertime,andseeifitislikely,atthetimeofdisappearance,forthe
remainingHITstohavebeencompletedwithinthetimesincethelastcrawl.
TopRequestersandFrequentlyPostedTasks
Onewaytounderstandwhattypesoftasksarebeingcompletedinthemarketplaceistofindthe“top”requesters
andanalyzetheHITsthattheypost.Table1showsthetoprequesters,basedonthetotalrewardsoftheHITs
posted,filteringoutrequestersthatwereactiveonlyforashortperiodoftime.
Wecanseethatthereareveryfewactiverequestersthatpostasignificantamountoftasksinthemarketplace
andaccountforalargefractionofthepostedrewards.Followingourmeasurements,thetoprequesterslistedin
Table1(whichis0.1%ofthetotalrequestersinourdataset),accountformorethan30%oftheoverallactivityof
themarket.
RequesterIDRequesterName#HITgroups TotalHITs RewardsTypeoftasks
A3MI6MIUNWCR7FCastingWords48,934 73,621 $59,099Transcription
A2IR7ETVOIULZUDoloresLabs1,676 320,543 $26,919Mediatorforother
requesters
A2XL3J4NH6JI12ContentGalore1,150 23,728 $19,375Contentgeneration
A1197OGL0WOQ3GSmartsheet.comClients 1,407 181,620 $17,086Mediatorforother
requesters
AGW2H4I480ZX1PaulPullen6,842 161,535 $11,186Contentrewriting
A1CTI3ZAWTR5AZClassifyThis228 484,369 $9,685Objectclassification
A1AQ7EJ5P7ME65Dave2,249 7,059 $6,448Transcription
AD7C0BZNKYGYVQuestionSwami798 10,980 $2,867Contentgeneration
andevaluation
AD14NALRDOSN9retaildata113 158,206 $2,118Objectclassification
A2RFHBFTZHX7UNContentSpooling.net 555 622 $987Contentgeneration
andevaluation
A1DEBE1WPE6JFOJoelHarvey707 707 $899Transcription
A29XDCTJMAE5RURaphaelMudge748 2,358 $548Websitefeedback
Table1:TopRequestersbasedonthetotalpostedrewardsavailabletoasingleworker(Jan2009‐April2010).
Giventhehighconcentrationofthemarket,thetypeoftaskspostedbytherequestersshowsthetypeoftasks
thatarebeingcompletedinthemarketplace:Castingwordsisthemajorrequester,postingtranscriptiontasks
frequently;therearealsotwoothersemi‐anonymousrequesterspostingtranscriptiontasksaswell.Amongthe
toprequesterswealsoseetwomediatorservices,DoloresLabs(akaCrowdflower)andSmartsheet.com,whopost
tasksonMechanicalTurkonbehalfoftheirclients.Suchservicesareessentiallyaggregatorsoftasks,andprovide
qualityassuranceservicesontopofMechanicalTurk.Thefactthattheyaccountforapproximately10%ofthe
marketindicatesthatmanyusersthatareinterestedincrowdsourcingprefertouseanintermediarythataddress
theconcernsaboutworkerquality,andalsoallowpostingofcomplextaskswithouttheneedforprogramming.
WealsoseethatfourofthetoprequestersuseMechanicalTurkinordertocreateavarietyoforiginalcontent,
fromproductreviews,featurestories,blogposts,andsoon.3Finally,weseethattworequestersuseMechanical
Turkinordertoclassifyavarietyofobjectsintocategories.ThiswastheoriginaltaskforwhichMechanicalTurk
wasusedbyAmazon.
3Onerequester,“PaulPullen”,usesMechanicalTurkinordertoparaphraseexistingcontent,insteadofaskingtheworkersto
createcontentfromscratch.
Thehighconcentrationofthemarketisnotunusualforanyonlinecommunity.Thereisalwaysalongtailof
participantsthathassignificantlyloweractivitythanthetopcontributors.Figure1showshowthisactivityis
distributed,accordingtothevalueoftheHITspostedbyeachrequester.Thex‐axisshowsthelog2ofthevalueof
thepostedHITsandthey‐axisshowswhatpercentageofrequestershasthislevelofactivity.Aswecansee,the
distributionisapproximatelylog‐normal.Interestinglyenough,thisisapproximatelythesamelevelofactivity
demonstratedbyworkers(Ipeirotis,2010).
Figure1:Numberofrequestersvs.totalrewardsposted.
Forouranalysis,wewantedtoalsoexaminethemarketplaceasawhole,toseeiftheHITssubmittedbyother
requestersweresignificantlydifferentthantheonespostedbythetoprequesters.Forthis,wemeasuredthe
popularityofthekeywordsinthedifferentHITgroups,measuringthenumberofHITgroupswithagivenkeywords,
thenumberofHITs,andthetotalamountofrewardsassociatedwiththiskeyword.Table2showstheresults.
OurkeywordanalysisofallHITsinourdatasetindicatesthattranscriptionisindeedaverycommontaskonthe
AMTmarketplace.Noticethatitisoneofthemost“rewarding”keywordsandappearsinmanyHITgroups,butnot
inmanyHITs.ThismeansthatmostofthetranscriptionHITsarepostedassingleHITsandnotasgroupsofmany
similarHITs.BydoingacomparisonofthepricesforthetranscriptionHITs,wealsonoticedthatitisataskfor
whichthepaymentperHITiscomparativelyhigh.Itisunclearatthispointifthisisduetothehighexpectationfor
qualityorwhetherthehigherpricesimplyreflectsthehighereffortrequiredtocompletethesetranscriptionHITs.
Beyondtranscription,Table2indicatesthatclassificationandcategorizationareindeedtasksthatappearinmany
(inexpensive)HITs.Table2alsoindicatesthatmanytasksareaboutdatacollection,imagetaggingand
classification,andalsoaskworkersforfeedbackandadviceforavarietyoftasks(e.g.,usabilitytestingof
websites).
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0.01% 0.10% 1.00% 10.00% 100.00%
PercentageofRewards
Percentofrequesters
Q‐QPlot:%ofrequestersvs%ofrewards
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%
0 5 10 15 20
NumberofRequesters
LOG2ofTotalRewardsPosted
#RequestersvsTotalRewardsPosted
KeywordRewardsKeyword #HITGroups Keyword #HITs
data$192,513castingwords 48,982 product 4,665,449
collection$154,680cw48,981 data 3,559,495
easy$93,293podcast 47,251 categorization3,203,470
writing$91,930transcribe 40,697 shopping 3,086,966
transcribe$81,416english 34,532 merchandise2,825,926
english$78,344mp 33,649 collection 2,599,915
quick$75,755writing 29,229 easy 2,255,757
product$66,726question 21,274 categorize2,047,071
cw$66,486answer 20,315 quick 1,852,027
castingwords$66,111opinion 15,407 website 1,762,722
podcast$64,418short 15,283 category 1,683,644
mp$64,162advice 14,198 image 1,588,586
website$60,527easy 11,420 search 1,456,029
search$57,578article 10,909 fast 1,372,469
image$55,013edit 9,451 shopzilla 1,281,459
builder$53,443research 9,225 tagging 1,028,802
mobmerge$53,431quick 8,282 cloudsort 1,018,455
write$52,188survey 8,265 classify 1,007,173
listings$48,853editing 7,854 listings 962,009
article$48,377data 7,548 tag 956,622
research$48,301rewriting 7,200 photo 872,983
shopping$48,086write 7,145 pageview 862,567
categorization$44,439paul 6,845 this 845,485
simple$43,460pullen 6,843 simple 800,573
fast$40,330snippet 6,831 builder 796,305
categorize$38,705confirm 6,543 mobmerge796,262
email$32,989grade 6,515 picture 743,214
merchandise$32,237sentence 6,275 url 739,049
url$31,819fast 5,620 am 613,744
tagging$30,110collection 5,136 retail 601,714
web$29,309review 4,883 web 584,152
photo$28,771nanonano 4,358 writing 548,111
review$28,707dinkle 4,358 research 511,194
content$28,319multiconfirmsnippet 4,218 email 487,560
articles$27,841website 4,140 v427,138
category$26,656money 4,085 different 425,333
flower$26,131transcription 3,852 entry 410,703
labs$26,117articles 3,540 relevance 400,347
crowd$26,117search 3,488 flower 339,216
doloreslabs$26,117blog 3,406 labs 339,185
crowdflower$26,117and 3,360 crowd 339,184
delores$26,117simple 3,164 crowdflower339,184
dolores$26,117answers 2,637 doloreslabs339,184
deloreslabs$26,117improve 2,632 delores 339,184
entry$25,644retranscribe 2,620 dolores 339,184
tag$25,228writer 2,355 deloreslabs339,184
video$25,100image 2,322 find 338,728
editing$24,791confirmsnippet 2,291 contact 324,510
classify$24,054confirmtranscription 2,288 address 323,918
answer$23,856voicemail 2,202 editing 321,059
Table2:Thetop‐50mostfrequentHITkeywordsinthedataset,rankedbytotalrewardamount,#ofHITgroups,and#ofHITs.
PriceDistributions
TounderstandbetterthetypicalpricespaidforcrowdsourcingtasksonAMT,weexaminedthedistributionofthe
HITpricesandthesizeofthepostedHITs.Figure2illustratestheresults.WhenexaminingHITgroups,thenwe
canseethatonly10%oftheHITgroupshaveapricetagof2centsorless,50%oftheHITshavepriceabove10
cent,andthat15%oftheHITscomewithapricetagof$1ormore.
However,thisanalysiscanbemisleading.Ingeneral,HITgroupswithhighpriceonlycontainasingleHIT,whilethe
HITgroupswithlargenumberofHITshavealowprice.Therefore,ifwecomputethedistributionofHITs(not
HITgroups)accordingtotheprice,wecanseethat25%oftheHITscreateonMechanicalTurkhaveapricetagof
just1cent,70%oftheHITshavearewardof5centsorless,and90%oftheHITscomewitharewardoflessthan
10cents.ThisanalysisconfirmsthecommonfeelingthatmostofthetasksonMechanicalTurkhavetinyrewards.
Ofcourse,thisanalysissimplyscratchesthesurfaceofthebiggerproblem:Howcanweautomaticallypricetasks,
takingintoconsiderationthenatureofthetask,theexistingcompetition,theexpectedactivitylevelofthe
workers,thedesiredcompletiontime,thetenureandprioractivityoftherequester,andmanyotherfactors?For
example,howmuchshouldwepayforanimagetaggingtask,for100,000images,inordertogetitdonewithin24
hours?Buildingsuchmodelswillallowtheexecutionofcrowdsourcingtaskstobecomeeasierforpeoplethat
simplywantto“getthingsdone”anddonotwanttotuneandmicro‐optimizetheircrowdsourcingprocess.
Figure2:DistributionofHITgroupsandHITsaccordingtoHITPrice.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
$0.01 $0.10 $1.00 $10.00
HITPrice
%ofHITgroupsvsHITprice
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
$0.01 $0.10 $1.00 $10.00
HITPrice
%ofHITsvsHITprice
ActivityDynamicsontheAMTMarketplace:PostingandServingProcesses
WhatisthetypicalactivityintheAMTmarketplace?Whatisthevolumeofthetransactions?Thesearevery
commonquestionsfrommanypeoplethatareinterestedinunderstandingthesizeofthemarketandits
demonstratedcapacity4forhandlingbigtasks.
OnewaytoapproachsuchquestionsistoexaminethetaskpostingandtaskcompletionactivityonAMT.By
studyingthepostingactivitywecanunderstandthedemandforcrowdsourcing,andthecompletionrateshows
howfastthemarketcanhandlethedemand.Tostudytheseprocesses,wecomputed,foreachday,thevalueof
tasksbeingpostedbyAMTrequestersandthevalueofthetasksthatgotcompletedineachday.
Wepresentfirstananalysisofthetwoprocesses(postingandcompletion),ignoringanydependenciesontask‐
specificandtime‐specificfactors.Figure3illustratesthedistributionsofthepostingandcompletionprocesses.
Thetwodistributionsaresimilarbutweseethat,ingeneral,therateofcompletionisslightlyhigherthantherate
ofarrival.Thisisnotsurprising,andisarequiredstabilitycondition:ifthecompletionratewaslowerthanthe
arrivalrate,thenthenumberofincompletetasksinthemarketplacewouldgotoinfinity.Weobservedthatthe
medianarrivalrateis$1,040perdayandthemediancompletionrateis$1,155/day.IfweassumethattheAMT
marketplacebehaveslikeanM/M/1queuingsystem,andusingbasicqueuingtheory,wecanseethatatask
worth$1hasanaveragecompletiontimeof12.5minutes,resultinginaneffectivehourlywageof$4.8.
Figure3:ThedistributionofthearrivalandcompletionrateontheAMTmarketplace,asafunctionoftheUSD($)valueofthe
posted/completedHITs.
Ofcourse,thisanalysisisanoversimplificationoftheactualprocess.Thetasksarenotcompletedinafirst‐in‐first‐
outmanner,andthecompletionrateisnotindependentofthearrivalrate.Inreality,workerspicktasksfollowing
personalpreferencesorbybeingrestrictedbythewebuserinterfaceofAMT.Forexample(Chiltonetal.2010)
indicatethatmostworkersusetwoofthemaintasksortingmechanismsprovidedbyAMTtofindandcomplete
tasks(“recentlyposted”and“largestnumberofHITs”orders).Furthermore,thecompletionrateisnot
4Detectingthetruecapacityofthemarketisamoreinvolvedtaskthansimplymeasuringitscurrentservingrate.Many
workersmayshowuponlywhenthereisasignificantamountofworkforthem,andbedormantundernormalloads.
Examiningfullythisquestionisbeyondthescopeofthispaper.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
$0 $500 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000
%ofdayswithcompletionactivity<X
ValueofcompletedHITsinUSD($)
Posting andCompletion ActivityCDF
independentofthearrivalrate.Whentherearemanytasksavailable,moreworkerscometocompletetasks,as
therearemoreopportunitiestofindandworkforbiggertasks,asopposedtoworkingforone‐timeHITs.Asa
simpleexample,considerthedependencyofpostingandcompletionratesonthedayoftheweek.(Figure4
illustratestheresults.)Thepostingactivityfromtherequestersissignificantlylowerovertheweekendsandis
typicallymaximizedonTuesdays.Thiscanberathereasilyexplained:sincemostrequestersarecorporationsand
organizations,mostofthetasksarebeingpostedduringnormalworkingdays.However,thesamedoesnothold
forworkers.Thecompletionactivityisratherunaffectedbytheweekends.Theonlydayonwhichthecompletion
ratedropsisonMonday,andthisismostprobablyaside‐effectofthelowerpostingrateovertheweekends.
(TherearefewertasksavailableforcompletiononMonday,duetothelowerpostingrateovertheweekend.)
Figure4:ThepostingandcompletionrateonAMTasafunctionofthedayoftheweek
Aninterestingopenquestionistounderstandbetterhowtomodelthemarketplace.Workonqueuingtheoryfor
modelingcallcentersisrelated,andcanhelpusunderstandbetterthedynamicsofthemarketandthewaythat
workershandlethepostedtasks.Next,wepresentsomeevidencethatmodelingcanhelpusunderstandbetter
theshortcomingsofthemarketandpointtopotentialdesignimprovements.
Sun Mon Tue Wed Thu Fri Sat
Day of the Week
0
1000
2000
3000
4000
5000
Total Value of Posted HITs
Sun Mon Tue Wed Thu Fri Sat
Day of the Week
0
500
1000
1500
2000
2500
3000
3500
Total Value of Completed HITs
ActivityDynamicsontheAMTMarketplace:CompletionTimeDistribution
GiventhatthesystemdoesnotsatisfytheusualqueuingassumptionsofM/M/1fortheanalysisofcompletion
times,weanalyzedempiricallythecompletiontimeforthepostedtasks.Thegoalofthisanalysiswasto
understandwhatapproachesmaybeappropriateformodelingthebehavioroftheAMTmarketplace.
Ouranalysisindicatedthatthecompletiontimefollows(approximately)apowerlaw,asillustratedinFigure5.We
observesomeirregularities,withsomeoutliersatapproximately12hoursandatthe7‐daycompletiontimes.
Thesearecommon“expirationtimes”setformanyHITs,hencethesuddendisappearanceofmanyHITsatthat
point.Similarly,weseeadifferentbehaviorofHITsthatareavailableforlongerthanoneweek:theseHITsare
typically“renewed”bytheirrequestersbythecontinuouspostingofnewHITswithinthesameHITgroup.5
Althoughitisstillunclearwhatdynamicscausesthisbehavior,theanalysisbyBarabási(2005)indicatesthat
priority‐basedcompletionoftaskscanleadtosuchpower‐lawdistributions.
Tobettercharacterizethispower‐lawdistributionofcompletiontimes,weusedthemaximumlikelihood
estimatorforpower‐laws.Toavoidbiases,wealsomarkedas“censored”theHITsthatwedetectedtobe
“abortedbeforecompletion”andtheHITsthatwerestillrunningatthelastcrawlingdateofourdataset.(For
brevity,weomitthedetails.)TheMLEestimatorindicatedthatthemostlikelyexponentforthepower‐law
distributionofthecompletiontimesofMechanicalTurkisα=‐1.48.Thisexponentisveryclosetothevalue
predictedtheoreticallyforthequeuingmodelof(Cobham,1954),inwhicheachtaskuponarrivalisassignedtoa
queuewithdifferentpriority.Barabási(2005)indicatesthattheCobhammodelcanbeagoodexplanationofthe
power‐lawdistributionofcompletiontimesonlywhenthearrivalrateisequaltothecompletionrateoftasks.
OurearlierresultsindicatethatfortheAMTmarketplacethisisnotfarfromreality.HencetheCobhammodelof
priority‐basedexecutionoftaskscanexplainthepower‐lawdistributionofcompletiontimes.
Figure5:ThedistributionofcompletiontimesforHITgroupspostedonAMT.Thedistributiondoesnotchangesignificantlyifweusethe
completiontimeperHIT(andnotperHITgroup),as80%oftheHITgroupscontainjustoneHIT.
5AcommonreasonforthisbehaviorisfortheHITtoappearinthefirstpageofthe“Mostrecentlyposted”listofHITgroups,
asmanyworkerspickthetaskstoworkonfromthislist(Chilton,2010).
1
4
16
64
256
1,024
4,096
16,384
65,536
1 4 16 64 256 1,024 4,096 16,384
NumberofHITgroups
CompletiontimeforHITgroup(inhours)
DistributionofcompletiontimeforHITGroups
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 4 16 64 256 1,024 4,096 16,384
%ofHITgroupswithcompletiontime<x
CompletiontimeforHITgroup(inhours)
CDFofcompletiontimesforHITGroups
Unfortunately,asystemwithapower‐lawdistributionofcompletiontimesisratherundesirable.Giventhe
infinitevarianceofpower‐lawdistributions,itisinherentlydifficulttopredictthenecessarytimerequiredto
completeatask.Althoughwecanpredictthatformanytasksthecompletiontimewillbeshort,thereisahigh
probabilitythatthepostedtaskwillneedasignificantamountoftimetofinish.Thiscanhappenwhenasmalltask
isnotexecutedquickly,andthereforeisnotavailableinanyofthetwopreferredqueuesfromwhichworkerspick
taskstoworkon.Theprobabilityofa“forgotten”taskincreasesifthetaskisnotdiscoverablethroughanyofthe
othersortingmethodsaswell.
ThisresultindicatesthatitisnecessaryforthemarketplaceofAMTtobeequippedwithbetterwaysforworkers
topicktasks.Ifworkerscanpicktaskstoworkoninaslightlymore“randomized”fashion,itwillbepossibleto
changethebehaviorofthesystemandeliminatethe“heavytailed”distributionofcompletiontimes.Thiscan
leadtoahigherpredictabilityofcompletiontimes,whichisadesirablecharacteristicforrequesters.Especially
newrequesters,withoutthenecessaryexperienceformakingtheirtasksvisible,wouldfindsuchacharacteristic
desirable,asitwilllowerthebarriertosuccessfullycompletetasksasanewrequesterontheAMTmarket.
Weshouldnote,ofcourse,thattheseresultsdonottakeintoconsiderationtheeffectofvariousfactors.For
example,anestablishedrequesterisexpectedtohaveitstaskscompletedfasterthananewrequesterthathas
notestablishedconnectionswiththeworkercommunity.Ataskwithahigherpricewillbepickedupfasterthan
anidenticaltaskwithlowerprice.Animagerecognitiontaskistypicallyeasierthanacontentgenerationtask,
hencemoreworkerswillbeavailabletoworkonitandfinishitfaster.Theseareinterestingdirectionsforfuture
research,astheycanshowtheeffectofvariousfactorswhendesigningandpostingtasks.Thiscanleadtoa
betterunderstandingofthecrowdsourcingprocessandabetterpredictionofcompletiontimeswhen
crowdsourcingvarioustasks.
Higherpredictabilitymeanslowerriskfornewparticipants.Lowerriskmeanshigherparticipationandhigher
satisfactionbothforrequestersandforworkers.
Conclusions
OuranalysisindicatesthattheAMTisaheavy‐tailedmarket,intermsofrequesteractivity,withtheactivityofthe
requestersfollowingalog‐normaldistribution;thetop0.1%oftherequestersamountfor30%ofthedollar
activityandwith1%oftherequesterspostingmorethan50%ofthedollar‐weightedtasks.Asimilaractivity
patternalsoappearsfromthesideofworkers(Ipeirotis,2010).Thiscanbeinterpretedbothpositivelyand
negatively.Thenegativeaspectisthattheadoptionofcrowdsourcingsolutionsisstillminimal,asonlyasmall
numberofparticipantsactivelyusecrowdsourcingforlarge‐scaletasks.Ontheotherhand,thelongtailof
requestersindicatesasignificantinterestforsuchsolutions.Byobservingthepracticesofthesuccessful
requesters,wecanlearnmoreaboutwhatmakescrowdsourcingsuccessful,andincreasethedemandfromthe
smallerrequesters.
Wealsoobservethattheactivityisstillconcentratedaroundsmalltasks,with90%ofthepostedHITsgivinga
rewardof10centsorless.Anextstepinthisanalysisistoseparatethepricedistributionsbytypeoftaskand
identifythe“usual”pricingpointsfordifferenttypesoftasks.Thiscanprovideguidancetonewrequestersthatdo
notknowwhethertheyarepricingtheirtaskscorrectly.
Finally,wepresentedafirstanalysisofthedynamicsoftheAMTmarketplace.Byanalyzingthespeedofposting
andcompletionofthepostedHITs,wecanseethatMechanicalTurkisaprice‐effectivetaskcompletion
marketplace,astheestimatedhourlywageisapproximately$5.Furtheranalysiswillallowustogetabetter
insightof“howthingsgetdone”ontheAMTmarket,identifyingelementsthatcanbeimprovedandleadtoa
betterdesignforthemarketplace.Forexample,byanalyzingthewaitingtimeforthepostedtasks,weget
significantevidencethatworkersarelimitedbythecurrentuserinterfaceandcompletetasksbypickingtheHITs
availablethroughoneoftheexistingsortingcriteria.Thislimitationleadstoahighdegreeofunpredictabilityin
completiontimes,asignificantshortcomingforrequestersthatwanthighdegreeofreliability.Abettersearch
anddiscoveryinterface(orperhapsabettertaskrecommendationservice,aspecialtyofAmazon.com,canleadto
improvementsintheefficiencyandpredictabilityofthemarketplace.
Furtherresearchisalsonecessaryinbetterpredictinghowchangesinthedesignandparametersofataskcan
affectqualityandcompletionspeed.Ideally,weshouldhaveaframeworkthatautomaticallyoptimizesallthe
aspectsoftaskdesign.Databasesystemshidealltheunderlyingcomplexityofdatamanagement,usingquery
optimizerstopicktheappropriateexecutionplans.GooglePredicthidesthecomplexityofpredictivemodelingby
offeringanauto‐optimizingframeworkforclassification.Crowdsourcingcanbenefitsignificantlybythe
developmentofsimilarframeworkthatprovidesimilarabstractionsandautomatictaskoptimizations.
References
MechanicalTurkMonitor,http://www.mturk‐tracker.com.
Barabási,A.‐L.2005.Theoriginofburstsandheavytailsinhumandynamics.Nature,435:207‐211.
Cobham,A.1954.Priorityassignmentinwaitinglineproblems.J.Oper.Res.Sec.Am.2,70−76.
Chilton,L.B.,Horton,J.J.,Miller,R.C.,andAzenkot,S.2010.Tasksearchinahumancomputationmarket.In
ProceedingsoftheACMSIGKDDWorkshoponHumanComputation(WashingtonDC,July25‐25,2010).
HCOMP'10.ACM,NewYork,NY,1‐9.
Ipeirotis,P.2010.DemographicsofMechanicalTurk.WorkingPaperCeDER-10-01,NewYorkUniversity,Stern
SchoolofBusiness.Availableathttp://hdl.handle.net/2451/29585
Ross,J.,Irani,L.,Silberman,M.S.,Zaldivar,A.,andTomlinson,B.2010.Whoarethecrowdworkers?:shifting
demographicsinmechanicalturk.InProceedingsofthe28thoftheinternationalConferenceExtended
AbstractsonHumanFactorsinComputingSystems(Atlanta,Georgia,USA,April10‐15,2010).CHIEA'10.
ACM,NewYork,NY,2863‐2872.