Conference PaperPDF Available

Sentiment-based Classification of Radical Text on the Web

Authors:

Abstract and Figures

The total number of webpages has grown substantially since the birth of the Internet. So too have the number of webpages dedicated to radical yet subtle content. As these new circumstances have necessitated a guided data collection method, one that can sidestep the laborious manual methods that have been classically utilized, simple keyword analysis has not been sufficient to identify radical sites on Web 1.0 – pro-extremist, anti-extremist, and news sites, for example, may use the same keywords to discuss the same event but have a very different motivation. In an effort to explore this problem, we completed an exercise involving the use of a web-crawler to collect 20,000 webpages from five sentiment-based classes to assess their differences: (1) radical Right sites; (2) radical Islamic sites; (3) anti-extremist sites; (4) news source sites discussing extremism; and (5) sites that did not discuss extremism. Parts-of-Speech (POS) tagging was used to identify 198 of the most frequent keywords within the data, and the sentiment value for each of these keywords was calculated for each webpage using sentiment analysis. With these values, a decision tree was applied to three classification models. Results suggest that radical Islamic text can be classified at a much higher rate of success than radical Right text.
No caption available
… 
Content may be subject to copyright.
6HQWLPHQWEDVHG&ODVVLILFDWLRQRI5DGLFDO
7H[WRQWKH:HE
5\DQ6FULYHQVDQG5LFKDUG)UDQN
,QWHUQDWLRQDO&\EHU&ULPH5HVHDUFK&HQWUH 
6FKRRORI&ULPLQRORJ\6LPRQ)UDVHU8QLYHUVLW\
%XUQDE\&DQDGD
^UVFULYHQUIUDQN`#VIXFD
Abstract ² 7KH WRWDO QXPEHU RI ZHESDJHV KDV JURZQ
VXEVWDQWLDOO\ VLQFH WKH ELUWK RI WKH ,QWHUQHW 6R WRR KDYH WKH
QXPEHU RI ZHESDJHV GHGLFDWHG WR UDGLFDO \HW VXEWOH FRQWHQW $V
WKHVHQHZFLUFXPVWDQFHVKDYHQHFHVVLWDWHGDJXLGHGGDWDFROOHFWLRQ
PHWKRGRQHWKDWFDQVLGHVWHSWKHODERULRXVPDQXDOPHWKRGVWKDW
KDYH EHHQ FODVVLFDOO\ XWLOL]HG VLPSOH NH\ZRUG DQDO\VLV KDV QRW
EHHQVXIILFLHQWWRLGHQWLI\UDGLFDOVLWHVRQ:HE±SURH[WUHPLVW
DQWLH[WUHPLVW DQG QHZV VLWHV IRU H[DPSOH PD\ XVH WKH VDPH
NH\ZRUGV WR GLVFXVV WKH VDPH HYHQW EXW KDYH D YHU\ GLIIHUHQW
PRWLYDWLRQ,QDQHIIRUWWRH[SORUHWKLVSUREOHPZHFRPSOHWHGDQ
H[HUFLVH LQYROYLQJ WKH XVH RI D ZHEFUDZOHU WR FROOHFW 
ZHESDJHV IURP ILYH VHQWLPHQWEDVHG FODVVHV WR DVVHVV WKHLU
GLIIHUHQFHV  UDGLFDO 5LJKW VLWHV  UDGLFDO ,VODPLF VLWHV 
DQWLH[WUHPLVW VLWHV  QHZV VRXUFH VLWHV GLVFXVVLQJ H[WUHPLVP
DQGVLWHVWKDWGLGQRWGLVFXVVH[WUHPLVP3DUWVRI6SHHFK326
WDJJLQJ ZDV XVHG WR LGHQWLI\  RI WKH PRVW IUHTXHQW NH\ZRUGV
ZLWKLQ WKH GDWD DQG WKH VHQWLPHQW YDOXH IRU HDFK RI WKHVH
NH\ZRUGV ZDV FDOFXODWHG IRU HDFK ZHESDJH XVLQJ VHQWLPHQW
DQDO\VLV :LWK WKHVH YDOXHV D GHFLVLRQ WUHH ZDV DSSOLHG WR WKUHH
FODVVLILFDWLRQPRGHOV5HVXOWVVXJJHVWWKDWUDGLFDO,VODPLFWH[WFDQ
EHFODVVLILHG DW DPXFKKLJKHU UDWH RIVXFFHVV WKDQ UDGLFDO 5LJKW
WH[W
Keywords - Sentiment Analysis; Decision Trees; Extremism
,,1752'8&7,21
6LQFH WKH PLGV ZH KDYH VHHQ D UDSLG JURZWK LQ WKH
QXPEHURIUDGLFDOZHEVLWHVDFWLQJDVDKXERIWKHPRYHPHQWWKH\
UHSUHVHQW,QPDQ\UHVSHFWVWKH\VHUYHDVDFRQYHUJHQFHVHWWLQJ
IRUQHZFRPHUVWRIDPLOLDUL]HWKHPVHOYHVZLWKUDGLFDOWHDFKLQJV
>@FRQQHFWLQJXVHUVWRRWKHUVLPLODUZHEVLWHVZHEIRUXPV
DQGSDJHVRQWKH'DUN:HEWRQDPHEXWDIHZ>@&RXQWHU
H[WUHPLVPRUJDQL]DWLRQV FRQWLQXH WR ORRN IRU ZD\V WR LGHQWLI\
WKHVH VLWHV >@ \HW WKH ,QWHUQHW LV FRQVWDQWO\ JURZLQJ DQG
LQIRUPDWLRQLVEHLQJJHQHUDWHGDWDUDSLGSDFH>@7KLVKDVOHG
WR D JURZLQJ IORRG RI GDWD UHVXOWLQJ LQ PDQXDO GDWD DQDO\VLV
EHFRPLQJ OHVV HIILFLHQW RU HQWLUHO\ LQIHDVLEOH >@ ,Q UHVSRQVH
UHVHDUFKHUV KDYH GHYHORSHG GDWD FROOHFWLRQ WRROV WKDW DOORZ
GHFLVLRQVWREHPDGHDERXWWKHWKRXVDQGVRIZHESDJHVWKDWDUH
H[WUDFWHGDQGE\H[WHQVLRQFODVVLI\UDGLFDOWH[WRQWKH:HE
 5HVHDUFKHUVKDYHH[SORUHGWKLVFULWLFDOSRLQWRIGHSDUWXUHYLD
DXWRPDWHG FRPSXWDWLRQDO DQG VHPLDXWRPDWHG WRROV ERWK WR
LPSURYH WKH PHWKRGV RI FROOHFWLRQ DQG LGHQWLILFDWLRQ RI
H[WUHPLVW FRQWHQW &KHQ¶V Dark Web Project IRU H[DPSOH
SURGXFHGDQXPEHURIFRPSXWDWLRQDODQGGDWDFHQWULFVWXGLHVRQ
WKHFRQWHQW DQG VWUXFWXUH RI VXUIDFH OHYHO DQG 'DUN :HE VLWHV
FRQWDLQLQJH[WUHPLVWPDWHULDO>@,QOLJKWRIWKLVFRPSUHKHQVLYH
ZRUNWKHHQWLUHSURFHVVZDVDXWRPDWHGDQGWKH\GLGQRWDVVHVV
WKHVXEWOH ODQJXDJH ± LQ WKH (QJOLVK WH[W ± IRXQG RQ ZHEVLWHV
IHDWXULQJH[WUHPLVWFRQWHQW$UJXDEO\DIXOO\DXWRPDWHGV\VWHP
VKRXOGEHDYRLGHGZKHQKXPDQLQWHOOLJHQFHLVUHTXLUHG>@LH
LGHQWLI\LQJVXEWOHIRUPVRIUDGLFDOWH[WLQ(QJOLVK
 ,QWKHZRUNRI0HLDQG)UDQNWKHDXWKRUVGHYHORSHGDPRGHO
WKDW FRPELQHG VHQWLPHQW DQDO\VLV DQG D ZHEFUDZOHU ZLWK D
GHFLVLRQWUHHWRGLIIHUHQWLDWHSURH[WUHPLVWZHESDJHVIURPDQWL
H[WUHPLVW SDJHV QHZV SDJHV DQG SDJHV WKDW GLG QRW UHODWH WR
H[WUHPLVP +HUH WKH RYHUDOO JRDO RI WKLV VHPLDXWRPDWHG
DSSURDFKZDVWRFUHDWHDWRROWKDW PDGHSUHGLFWLRQV DERXW WKH
FRQWHQW IRXQG RQ WKH VLWHV LW GRZQORDGHG >@ :KLOH WKLV
WHFKQLTXH PDQDJHG WR FODVVLI\ H[WUHPLVW WH[W DW D KLJK UDWH RI
DFFXUDF\ LH  ERWK H[WUHPLVWEDVHG ZHEIRUXPV DQG
ZHEVLWHV RI H[WUHPLVW RUJDQL]DWLRQV ZHUH DQDO\]HG ZLWKLQ RQH
PRGHO$UJXDEO\WKHVHQWLPHQWIRXQGRQERWKW\SHVRISODWIRUPV
DUH GLIIHUHQW LQ WHUPV RI WRQH DQG VXEMHFW PDWWHU ,QGHHG WKLV
SUREOHPUHTXLUHVIXUWKHUH[SORUDWLRQ
,,0(7+2'6
7KHSXUSRVHRIWKLVVWXG\ZDVWRDGGWRWKHOLWHUDWXUHE\XVLQJ
DVHQWLPHQWDQDO\VLVWRRODQGDGHFLVLRQWUHHWRGLIIHUHQWLDWHSUR
H[WUHPLVWZHESDJHVIURPDQWLH[WUHPLVWSDJHVQHZVSDJHVDQG
SDJHVWKDWGLGQRWUHODWHWRH[WUHPLVP7KLVZDVGRQHE\EXLOGLQJ
WKUHHWH[WEDVHG FODVVLILFDWLRQ PRGHOV&KDSWHU,,$ )LUVWZH
XVHGWKH7HUURULVPDQG([WUHPLVP1HWZRUN([WUDFWRU7(1(
WR FROOHFW GDWD &KDSWHU  DQG 2SHQ1/3¶V 3DUWVRI6SHHFK
326DQDO\VLVWRGHYHORSDOLVWRINH\ZRUGV&KDSWHU,,&7KH
VHQWLPHQW H[SUHVVHG ZLWKLQ WKH ZHESDJHVZDV WKHQ FDOFXODWHG 
EDVHG RQ WKH 326 NH\ZRUGV GHWDLOLQJ WKH UHODWLYH VHQWLPHQW
VFRUHVIRUHDFKSDJH IRU HDFK RI WKH NH\ZRUGV &KDSWHU,,'
)LQDOO\DGHFLVLRQWUHHZDVEXLOWEDVHGRQWKHVHQWLPHQWVFRUHV
&KDSWHU,,(ZKLFKSURGXFHGWKHFODVVLILFDWLRQUHVXOWVIRUWKUHH
FODVVLILFDWLRQPRGHOV&KDSWHU,,)VHH)LJXUH
A. Webpage Data
5HVHDUFKHUVKDYHVXJJHVWHGWKDWWKHVHQWLPHQWIRXQGRQERWK
UDGLFDO 5LJKW > @ DQG UDGLFDO ,VODPLF > @ ZHEVLWHV DUH
HOXVLYH$VDPHDQVRIH[SORULQJWKLVFODLPRQWKHSURH[WUHPLVW
FRQWHQWDQGHYDOXDWLQJZKHWKHUWKHGHFLVLRQWUHHFRXOGFODVVLI\
FRQWHQW EDVHG RQ WKRVH VXEWOH LGHRORJLFDO PRWLYDWLRQV ILYH
GLVWLQFWDQGQRQRYHUODSSLQJFODVVHVRISDJHVZHUHDWWDLQHG
1) 7KH radical Right FODVV ZDV VHOHFWHG WKURXJK WZR
PHWKRGV  D *RRJOH VHDUFK RI WKH NH\ZRUGV
µH[WUHPLVWZHEVLWHV¶µZKLWHVXSUHPDF\ZHEVLWHV¶DQG
RWKHUVLPLODUZRUGVDQGXVLQJDQLQGH[RIH[WUHPLVW
VLWHV JHQHUDWHG E\ UHVHDUFKHUV DQG VRXUFHV IRU
2016 European Intelligence and Security Informatics Conference
978-1-5090-2857-3/16 $31.00 © 2016 IEEE
DOI 10.1109/EISIC.2016.41
104
H[DPSOHV VHH > @ 7KLV FODVV FRQWDLQHG 
ZHESDJHVVSUHDGRYHUZHEVLWHV
2) 7KH radical Islamic FODVV ZDV VHOHFWHG WKURXJK WKH
NH\ZRUGV µH[WUHPLVW ZHEVLWHV¶ µMLKDG RUJDQL]DWLRQV¶
DQG SUHYLRXVO\ FRPSLOHG LQGH[HV >@ 7KLV FODVV
FRQWDLQHGZHESDJHVIURPZHEVLWHV
3) $Qanti-extremist FODVV LQFOXGHG LQWHOOLJHQFHDJHQFLHV
DQG RUJDQL]DWLRQV GHGLFDWHG WR FRXQWHULQJ H[WUHPLVP
DQG ZHUH LGHQWLILHG YLD DQ RQOLQH VHDUFK RI VXFK
NH\ZRUGVDVµFRXQWHUH[WUHPLVP¶DQGµDQWLH[WUHPLVP¶
JURXSV IRU H[DPSOH 7KH VDPSOH FRQWDLQHG 
ZHESDJHVIURPZHEVLWHV
4) $ news FODVV LQFOXGHG ZHEVLWHV LI WKH\ ZHUH VRXUFHV
WKDWUHSRUWHGRQH[WUHPLVWHYHQWV+HUHLWZDVDVVXPHG
WKDW WKH\ SUHVHQWHG D VWRU\ LQ D PRUH ³LPSDUWLDO´
PDQQHUWKDQSURH[WUHPLVWRUDQWLH[WUHPLVWVLWHV7KLV
VDPSOHFRQWDLQHGZHESDJHVIURPZHEVLWHV
5) $ILQDO FODVVotherVHUYHGDVWKHFRQWUROJURXSLQWKH
VDPSOH ZKHUH RQH ZRXOG QRW H[SHFW WR XQFRYHU
PDWHULDO UHODWLQJ WR H[WUHPLVP DQG LQFOXGHG WRSLFV
UHODWLQJ WR DUW DQG FXOWXUH HGXFDWLRQ HQWHUWDLQPHQW
IRRG WUDYHO DQG OLIHVW\OH VRFLDO PHGLD RQOLQH EORJV
DXWRPRELOHVVSRUWVDQGEXVLQHVV)RUWKLVFODVV
ZHESDJHVIURPZHEVLWHVZHUHVDPSOHG
7KUHH PRGHOV ZHUH WKHQ FRQVWUXFWHG IRU WKH SXUSRVHV RI
FRPSDULQJDQGFRQWUDVWLQJWKHLUFODVVLILFDWLRQUHVXOWV
Two-class model:radical RightDQGIslamicYHUVXV
anti-extremistnewsDQGotherZHESDJHV
Four-class model:radical RightDQGradicalIslamic
YHUVXV  anti-extremist YHUVXV  news YHUVXV 
otherZHESDJHV
Five-class model:radical RightYHUVXV  radical
IslamicYHUVXVanti-extremistYHUVXVnewsYHUVXV
other ZHESDJHV
B. Terrorism and Extremism Network Extractor (TENE)
7KH7HUURULVP DQG([WUHPLVP1HWZRUN([WUDFWRU 7(1(
DFXVWRPZULWWHQFRPSXWHUSURJUDPWKDWZDVGHVLJQHGWRFROOHFW
YDVWDPRXQWVRIGDWDRQOLQHDXWRPDWLFDOO\EURZVHGDQGFDSWXUHG
DOORI WKHFRQWHQWIURPWKH ZHEVLWHVXVHGIRUFODVVLILFDWLRQ
EHWZHHQ-XO\  DQG -XO\IRU D PRUHGHWDLOHG
GHVFULSWLRQRIWKHFUDZOHUVHH>@
C. Parts-of-Speech (POS) Tagging
7KH LQLWLDO VWHS LQ DQDO\]LQJ WKH FRQWHQW RQ ZHESDJHV
DXWRPDWLFDOO\ZDVWRGHWHUPLQHWKHWRSLFVRIGLVFXVVLRQ7RGR
WKLVZHZDQWHG WR WDNH D GDWDGULYHQ DSSURDFK DQG LVRODWH WKH
SDUWLFXODUQRXQV ZLWK WKHKLJKHVWUDWH RI RFFXUUHQFHZLWKLQWKH
GDWDFROOHFWHG :HDVVXPHGWKDW WKHPRVWIUHTXHQWO\ GLVFXVVHG
WRSLFVZRXOGPRVWOLNHO\EHWKHRQHVLQZKLFKH[WUHPLVWFRQWHQW
ZDVOLNHO\WREH GHWHFWHG EDVHG RQ WKH ZRUN RI >@ )XUWKHU
QRXQVZHUHFKRVHQEHFDXVHWKH\DUHWKHZRUGVPRVWOLNHO\WREH
VXUURXQGHGE\ UHOHYDQW VHQWLPHQW WHUPV >@ QDPHVDQGSODFHV
DUHRIWHQGHVFULEHGRUGHQRWHGE\WKHDGMHFWLYHVOLQNHGWRWKHP
DQG DGMHFWLYHV DUH RIWHQ WKH ZRUGV WKDW KDYH VHQWLPHQW YDOXHV
DWWDFKHGWRWKHP>@%\VSHFLI\LQJDGMHFWLYHVDVNH\ZRUGVWKH
VHQWLPHQWRIWKHZRUGLWVHOIZRXOGEHORVW
3DUWVRI6SHHFK 326 WDJJLQJ ZDV XVHG WR SHUIRUP WKLV
DQDO\VLV326LVDGDWDDQDO\VLVPHWKRGWKDWFROOHFWVDQGGLYLGHV
VXSSOLHG WH[W LQWR ZRUG JURXSLQJV VXFK DV QRXQV DQG YHUEV
EDVHGRQWKHLUXVDJHDQGSRVLWLRQZLWKLQWKHVHQWHQFHIRUPRUH
LQIRUPDWLRQVHH>@
)URPHDFKFODVVZHVHOHFWHGRIWKHPRVWIUHTXHQWQRXQV
DQG DJJUHJDWHG WKHP LQWR RQH OLVW ZKLFK DIWHU UHPRYDO RI
GXSOLFDWHVDQGPLVFDWHJRUL]HGZRUGVV\PEROVDQGQRQZRUGV
IRU H[DPSOH ZH HQGHG ZLWK  NH\ZRUGV 7KLV ILQDO OLVW
IRUPHG WKH NH\ZRUG OLVW IRU WKH VHQWLPHQW DQDO\VLV 'RPDLQ
H[SHUWVPD\UHSODFHRUH[WHQGWKHOLVWRINH\ZRUGVZLWKUHOHYDQW
GRPDLQVSHFLILF ZRUGV +RZHYHU WKH H[SORUDWRU\ DQG GDWD
GULYHQQDWXUHRIWKHVWXG\VXJJHVWHGWKDWZHUHO\RQWKHGDWDWR
LGHQWLI\WKHUHOHYDQWNH\ZRUGV
D. Sentiment Analysis
$IWHU WKH NH\ZRUG OLVW ZDV GHYHORSHG LW ZDV QHFHVVDU\ WR
LGHQWLI\DQGHYDOXDWHWKHFRQWH[WVXUURXQGLQJWKHNH\ZRUGV7R
DOORZDSURSHUDQDO\VLVRIWKHZHESDJHVVHQWLPHQWDQDO\VLVZDV
XVHGWRKLJKOLJKWUHOHYDQWWH[W
6HQWLPHQWDQDO\VLVLVDGDWD FROOHFWLRQDQGDQDO\VLVPHWKRG
WKDW DOORZV IRU WKH DSSOLFDWLRQ RI VXEMHFWLYH ODEHOV DQG
FODVVLILFDWLRQV>@,WFDQHYDOXDWHWKHRSLQLRQVRILQGLYLGXDOVE\
RUJDQL]LQJGDWDLQWRGLVWLQFWFODVVHVDQGVHFWLRQVDQGDVVLJQLQJ
DQ LQGLYLGXDO¶V VHQWLPHQW ZLWK D QHJDWLYH RU SRVLWLYH SRODULW\
YDOXH>@,WDOVRSURYLGHV DPRUHWDUJHWHGYLHZRIDGDWDVHW E\
)LJXUH
±
'DWD&ROOHFWLRQDQG7UHH*HQHUDWLRQ3URFHVV
105
DOORZLQJIRUWKHGHPDUFDWLRQEHWZHHQFDVHVWKDWDUHVRXJKWDIWHU
DQGWKRVH ZLWKRXW DQ\ QRWDEOHUHOHYDQFH6LQFH WKH SXUSRVHRI
WKHFXUUHQWVWXG\ZDV QRW WR SXVK WKH ERXQGDULHV RIVHQWLPHQW
DQDO\VLVDOJRULWKP6HQWL6WUHQJWK>@ZDVXWLOL]HG
6HQWL6WUHQJWK LV -DYDEDVHG VRIWZDUH WKDW XVHV D VSHFLILF
DOJRULWKP WR UXQ WKURXJK ODUJH YROXPHV RI WH[W DQG FUHDWH
VHQWLPHQW VFRUHV IRU WKH VXSSOLHG GRFXPHQWV :KLOH WKHUH DUH
VHYHUDOFRQILJXUDWLRQVRI WKH VRIWZDUH ZHXWLOL]HGDNH\ZRUG
IRFXVHGPHWKRGDVDFHQWUDOIHDWXUHRI6HQWL6WUHQJWKLVLWVDELOLW\
WR HYDOXDWH VHQWLPHQW DURXQG DQ\ JLYHQ NH\ZRUG >@ 7KLV
SURFHVV LQYROYHV WKH XWLOL]DWLRQ RI D GLFWLRQDU\ RI FDWDORJXHG
WHUPV DQG +DUYDUG¶V JHQHUDO LQTXLUHU GDWDEDVH WR GHWHUPLQH
VHQWLPHQWYDOXHV>@,WORFDWHVZRUGVWKDWFRUUHVSRQGZLWKLWV
GLFWLRQDU\DQGGDWDEDVHDQGWKHQLWHPSOR\VDVWHPPLQJPHWKRG
WRHYDOXDWHDWH[WE\DVVLJQLQJSRODULW\YDOXHVRIHLWKHUSRVLWLYH
RUQHJDWLYHWRWKH ZRUGV 9DOXHV DUH DXJPHQWHG E\ FKDUDFWHUV
WKDWFDQLQIOXHQFHWKHYDOXHVDVVLJQHGWRWKHWH[WVXFKDVERRVWHU
ZRUGVQHJDWLYHZRUGVUHSHDWHGOHWWHUVUHSHDWHGQHJDWLYHWHUPV
DQWDJRQLVWLFZRUGVSXQFWXDWLRQDQGRWKHUGLVWLQFWLYHFKDUDFWHUV
VXLWHGIRUVWXG\LQJDQRQOLQHFRQWH[W>@IRUPRUHLQIRUPDWLRQ
RQ6HQWL6WUHQJWKVHH>@
E. Decision Tree
$WRROLQPDFKLQHOHDUQLQJ DQG GDWD PLQLQJ WKDW LV ZLGHO\
DFFHSWHG DPRQJVW DFDGHPLFV :DLNDWR (QYLURQPHQW IRU
.QRZOHGJH$QDO\VLV:(.$ZDVUXQ RQ WKH VHQWLPHQW GDWD
IRUHDFKRIWKHWKUHHFODVVLILFDWLRQPRGHOV,WVVWDQGDUG-WUHH
FODVVLILFDWLRQPHWKRGZDVXVHGEHFDXVHLWFRQWDLQVDQDOJRULWKP
IRUWH[WFODVVLILFDWLRQWKDWDOORZVIRUDUXOHEXLOGLQJSURFHVV>@
:(.$ZDVDSSOLHGXVLQJIROGFODVVLILFDWLRQRQWKHRXWSXWWHG
VHQWLPHQWWDEOHV VHDUFKLQJ IRU GLIIHUHQFHV LQVHQWLPHQWYDOXHV
EHWZHHQWKHFODVVHVRIVLWHV$VDUHVXOWWKLVDOJRULWKPSURGXFHG
DGHFLVLRQWUHHZLWKUHFXUVLYHOHDYHVDQGEUDQFKHVZLWKHDFKOHDI
UHSUHVHQWLQJDVSHFLILFVHWRIVHQWLPHQWWKUHVKROGV
,WLV ZLWKWKHVHWKUHVKROGVWKDWWKH GHFLVLRQWUHHHVWDEOLVKHG
ZKHWKHUDFHUWDLQZHESDJHZDVRUJDQL]HGLQWRWKHradicalright
radical Islamic pro-extremist anti-extremist news RU other
FODVV)RUH[DPSOHLIWKHNH\ZRUGµPDUULDJHZDVLQFOXGHGLQ
WKHWH[WZLWKDVHQWLPHQWYDOXHRIOHVVWKDQRUHTXDOWRDQGWKH
WHUPµSULFZDVDOVRLQWKHWH[WZLWKDVHQWLPHQWYDOXHRIOHVV
WKDQRUHTXDOWRDYDOXHRIWKHQWKH-SUXQHGWUHHLQGLFDWHG
WKDWRIWKHSDJHVIHOOZLWKLQWKHnewsFODVVLILFDWLRQ
F. Classification Results
$IWHU WKH - DOJRULWKPZDV UXQ RQ WKH VHQWLPHQWGDWDIRU
HDFK RI WKH WKUHH PRGHOV :(.$ FUHDWHG D PHDVXUH WKDW
VSHFLILHGWKHQXPEHURI DFFXUDWHSDJHV WKDWILWLQWRHDFKRIWKH
FRUUHFWFODVVHV,QWXUQDFRQIXVLRQPDWUL[GLVSOD\HGWKHQXPEHU
RISDJHVWKDWZHUHFRUUHFWO\DQGLQFRUUHFWO\LGHQWLILHGLHIDOVH
SRVLWLYHV DQG IDOVH QHJDWLYHV LQ HDFK PRGHO PHDVXUHG ZLWK
SUHFLVLRQDQGUHFDOO3UHFLVLRQLVDPHDVXUHZKLFKUHSUHVHQWVWKH
H[DFWQHVVRIWKHUXOHV,WFDQEHWKRXJKWRIDVWKHQXPEHURIWUXH
SRVLWLYHHQWLWLHVLGHQWLILHG E\WKHFODVVLILHU RXW RIWKHVHW RI DOO
HQWLWLHV LGHQWLILHG DV SRVLWLYH 7KH ODUJHU WKH YDULDELOLW\ LQ WKH
UHVXOWVWKHVPDOOHUWKHSUHFLVLRQZLOOEH5HFDOOLVDPHDVXUHRI
FRPSOHWHQHVVDQGLVWKHQXPEHURISRVLWLYHHQWLWLHVidentifiedRXW
RIall WUXHSRVLWLYHHQWLWLHV$ UHFDOOVFRUHRI SHUIHFWPHDQV
WKDWDOOWUXHSRVLWLYHVDUHLGHQWLILHGZLWKRXWDQ\IDOVHQHJDWLYHV
,,,5(68/76
7KHUHVXOWV ZHUH LQWHUSUHWHGLQWZR ZD\V )LUVWWKHRYHUDOO
FODVVLILFDWLRQ UHVXOWV ZHUH HYDOXDWHG IRU HDFK FODVVLILFDWLRQ
PRGHOIROORZHGE\ WKH TXDOLW\ RI WKH HQWLUH UXOH VHW IRU HDFK
+HUHWKHDQDO\VLVZDVODUJHO\IRFXVHGRQWKHSURH[WUHPLVWFODVV
A. Two-Class Model: Results and Quality of Rule-Set
7KH WZRFODVV WUHH DQDO\VLV LQGLFDWHG WKDW  RI
ZHESDJHV ZHUH VXFFHVVIXOO\ FODVVLILHG DFURVV FODVVHV 7KH
FRPELQHG radical Right DQG radical Islamic FODVV ZDV
VXFFHVVIXOO\ FODVVLILHG DW D FRQVLGHUDEO\ ORZHU UDWH WKDQ WKH
FRPELQHGanti-extremistnewsDQGotherFODVVLHRI
SDJHVZHUHDFFXUDWHO\LGHQWLILHG)XUWKHUPRUHWKHUHFDOORIWKH
HQWLUH UXOHVHW LQGLFDWHG WKDW DFWXDO WZR UDGLFDO FODVVHV ZHUH
FODVVLILHG DW D UDWH RI  LQ FRPSDULVRQ WR WKH QRQUDGLFDO
FODVVHV¶UHFDOOPHDVXUHRIVHHSUHFLVLRQDQGUHFDOOLQ7DEOH

B. Four-Class Model: Results and Quality of Rule-Set
7KH IRXUFODVV WUHH DQDO\VLV UHYHDOHG WKDW  RI
ZHESDJHV ZHUH DFFXUDWHO\ FODVVLILHG DFURVV FODVVHV
,QWHUHVWLQJO\news SDJHVKDGWKH KLJKHVWQXPEHURI FRUUHFWO\
FODVVLILHGSDJHV LH IROORZHGE\WKHFRPELQHG WZR
UDGLFDOFODVVHVLHZKLFKDJDLQVKRZHGPRGHUDWHO\KLJK
SUHFLVLRQ UDWHV LH  DQG PRGHUDWHO\ ORZ UHFDOO UDWHV
LHNewsFODVVVLWHVZHUHDOVRPLVFODVVLILHGDFURVV WKH
UHPDLQLQJWKUHHFODVVHVDWWKHKLJKHVWIUHTXHQF\VHH7DEOH
C. Five-Class Model: Results and Quality of Rule-Set
7KH ILYHFODVV WUHH DQDO\VLV LQGLFDWHG WKDW  RI WKH
SDJHVZHUHDFFXUDWHO\FODVVLILHG2ISDUWLFXODULQWHUHVWKHUHZHUH
WKHradical Right-DQGradical Islamic SDJHV)LUVWWKHFRQIXVLRQ
PDWUL[LQGLFDWHGWKDWDPHUHRIradical RightSDJHVZHUH
VXFFHVVIXOO\FODVVLILHG:LWKLQWKLVSDUWLFXODUFODVVnewsSDJHV
ZHUHPLVLGHQWLILHG DWWKHKLJKHVWUDWH LQWKHPRGHO RI
LWV SDJHV ZHUH PLVFODVVLILHG DV UDGLFDO ULJKWZLQJ 6HFRQG
 RI SDJHV WKDW IHDWXUHG UDGLFDO ,VODPLF FRQWHQW ZHUH
DFFXUDWHO\FODVVLILHGNewsSDJHVZHUHDOVRPLVFODVVLILHGDWWKH
KLJKHVWUDWHLHZLWKLQWKHradical Islamic FODVVNews
SDJHVDJDLQKDGWKHKLJKHVWQXPEHURIFRUUHFWO\FODVVLILHGSDJHV
DFURVVWKH HQWLUH VDPSOH VLPLODUWR WKH FODVVLILFDWLRQ
UHVXOWVLQWKHIRXUFODVVPRGHO0HDVXULQJWKHTXDOLW\RIWKHUXOH
IRU WKH ILYHFODVV PRGHO WKH DFWXDO radical Right SDJHV ZHUH
FODVVLILHGDW D ORZ UHFDOO UDWH RI  $WWKHRWKHU HQG RI WKH
VSHFWUXPERWKWKH SUHFLVLRQDQGUHFDOOPHDVXUHVLQGLFDWHGWKDW
radical IslamicSDJHVZHUHFODVVLILHGDWDKLJKHUUDWHRIVXFFHVV
WKDQWKH radical Right FODVVRIWKH radical IslamicSDJHV
ZHUHVXFFHVVIXOO\ FODVVLILHG DQGDFWXDOSDJHVZLWKLQ WKLVFODVV
ZHUHVXFFHVVIXOO\FODVVLILHGDWDUDWHRI6HH7DEOH
,9&21&/86,216
 :HVRXJKWWREXLOGRQDVHQWLPHQWJXLGHGZHEFUDZOHUWKDW
FRXOG DFFXUDWHO\ FODVVLI\ WKH VXEWOH \HW UDGLFDOEDVHG WH[W RQ
UDGLFDO 5LJKW DQG UDGLFDO ,VODPLF ZHESDJHV $ IHZ QRWDEOH
ILQGLQJV ZHUH SURGXFHG )LUVW FODVVLILFDWLRQ UHVXOWV IURP WKH
WZRDQGIRXUFODVVPRGHOVVXJJHVWHGWKDWWKHVHQWLPHQWJXLGHG
ZHEFUDZOHUODFNHGWKH RYHUDOO DELOLW\WRGLIIHUHQWLDWHEHWZHHQ
WKHVHQWLPHQWIRXQGRQWKHVHOHFWHGZHEVLWHVWKDWGLGDQGGLGQRW
SURPRWHUDGLFDO LGHRORJLHV 7KLVILQGLQJFRQWUDVWV WKDW RI >@
ZKHUH SURH[WUHPLVW ZHEIRUXP DQG ZHESDJH GDWD ZHUH
106
FODVVLILHGDWDKLJKUDWHRIDFFXUDF\6HFRQGWKHILYHFODVVPRGHO
VKRZHG WKDW UDGLFDO ,VODPLF SDJHV ZHUH FODVVLILHG DW D PXFK
KLJKHU UDWH RI VXFFHVV WKDQ UDGLFDO ULJKWZLQJ SDJHV 3UHYLRXV
UHVHDUFKVXJJHVWVWKDWWKHVHQWLPHQWRQ UDGLFDO 5LJKW ZHEVLWHV
HJ>@DQGUDGLFDO,VODPLFZHEVLWHVLH>@LVSUHVHQWHG
LQDVXEWOHPDQQHUERWKWRDSSHDOWRDZLGHUDXGLHQFHDQGUHFUXLW
QHZPHPEHUVIRUH[DPSOH +RZHYHURXU UHVXOWVOHQGVXSSRUW
IRU >@¶V DVVHUWLRQ WKDW ZKLOH UDGLFDO ,VODPLF VLWHV DWWHPSW WR
OHJLWLPL]HWKHLUHIIRUWVE\SUHVHQWLQJWKHPVHOYHVDVQHZVVRXUFH
ZHEVLWHVWKHVHQWLPHQWLVDOPRVWDOZD\VUHODWHGWRUDGLFDOWRSLFV
LH GLVFXVVLRQV RI YLROHQFH 7KDW VDLG D PRUH LQGHSWK
DQDO\VLVLV QHHGHGWR DVVHVVZKHWKHUWKHUHLVDFOHDUGLVWLQFWLRQ
EHWZHHQDQH[WUHPLVWZHEVLWHIHDWXULQJHOXVLYHPHVVDJHVDQGD
QHZVVLWH7KLVSUREOHPFRXOGEHH[SORUHGXVLQJRWKHUWRROVRI
FODVVLILFDWLRQVXFK DV UDQGRPIRUHVWVRU%D\HVLDQ PHWKRGV WR
VXSSRUW YHFWRU PDFKLQHV DQG QHXUDO QHWZRUNV )XWXUH VWXGLHV
VKRXOG DOVR LQWHJUDWH D TXDOLWDWLYH XQGHUVWDQGLQJ RI KRZ
PDFKLQHOHDUQLQJWRROVPDNHGHFLVLRQVDERXWWKHZHESDJHVWKDW
DUHYLVLWHG 'RLQJ VRPD\LQFUHDVH WKHUHOLDELOLW\RIWKH UHVXOWV
DQGLQFUHDVHWKHOLNHOLKRRGRILGHQWLI\LQJUDGLFDOWH[WRQOLQH
95()(5(1&(6
 / %DFN ³$U\DQV 5HDGLQJ $GRUQR &\EHU&XOWXUH DQG 7ZHQW\)LUVW
&HQWXU\ 5DFLVP´LQ Ethnic and Racial Studies, 25  SS

 /%RZPDQ*ULHYH³$QWLDERUWLRQ([WUHPLVP2QOLQH´LQFirst Monday,
14
 0&DLDQL 'GHOOD 3RUWD & :DJHPDQQMobilizing on the Extreme
Right: Germany, Italy, and the United States2[IRUG2[IRUG8QLYHUVLW\
3UHVV
 + &KHQ Dark Web: Exploring and Data Mining the Dark Side of the
Web1HZ<RUN6SULQJH
 .&RKHQ) -RKDQVVRQ/.DDWL-& 0RUN³'HWHFWLQJ/LQJXLVWLF
0DUNHUV IRU 5DGLFDO 9LROHQFH LQ 6RFLDO 0HGLD´ LQ Terrorism and
Political Violence, 26SS
 5)HOGPDQ³7HFKQLTXHV DQG$SSOLFDWLRQVIRU 6HQWLPHQW$QDO\VLV´LQ
Communications of the ACMSS
 0 +DOO (  )UDQN + *HRIIUH\ % 3IDKULQJHU 3  5HXWHPDQQ  ,
:LWWHQ ³7KH :(.$ 'DWD 0LQLQJ 6RIWZDUH$Q 8SGDWH´ LQ SIGKDD
Explorations, 11SS
 0 +X  / %LQJ ³0LQLQJ DQG 6XPPDUL]LQJ &XVWRPHU 5HYLHZV´ LQ
Proceedings of ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining
 ,QWHUQHW:RUOG 6WDWVInternet Growth Statistics 5HWULHYHG IURP
KWWSZZZLQWHUQHWZRUOGVWDWVFRPHPDUNHWLQJKWP
 -0HL5)UDQN ³6HQWLPHQW&UDZOLQJ([WUHPLVW &RQWHQW&ROOHFWLRQ
WKURXJKD6HQWLPHQW$QDO\VLV*XLGHG :HE&UDZOHU´LQProceedings of
the International Conference on Advances in Social Networks Analysis
and Mining
 06DJHPDQ³7KH6WDJQDWLRQLQ7HUURULVP5HVHDUFK´LQTerrorism and
Political ViolenceSS
 5 6FULYHQV *  'DYLHV 5  )UDQN  - 0HL ³6HQWLPHQWEDVHG
,GHQWLILFDWLRQRI5DGLFDO $XWKRUV 6,5$´ LQ Proceedings of the 2015
IEEE ICDM Workshop on Intelligence and Security Informatics
 36HLE'0-DQEHNGlobal Terrorism and New Media: The Post-Al
Qaeda Generation/RQGRQ5RXWOHGJH
 0 7KHOZDOO  . %XFNOH\ ³7RSLFEDVHG 6HQWLPHQW $QDO\VLVIRU WKH
6RFLDO:HE7KH 5ROHRI 0RRGDQG,VVXHUHODWHG:RUGV´LQ Journal of
the American Society for Information Science and Technology 
SS
 77KHW- 1D&.KRR³$VSHFWEDVHG 6HQWLPHQW$QDO\VLVRI0RYLH
5HYLHZV RQ 'LVFXVVLRQ %RDUGV´ LQ Journal of Information Science
SS
 <7VIDWL*:HLPDQQ³ZZZWHUURULVPFRP7HUURURQWKH,QWHUQHW´
LQStudies in Conflict & Terrorism, 25SS
 3UHGLFWHG3DJHV  

Radical RightDQGRadical
Islamic
Anti-extremist, News DQG
Other
3UHFLVLRQ 5HFDOO
$FWXDO3DJHV
Radical RightDQG
Radical Islamic   
Anti-extremist,
News DQG Other   
7DEOH
±
&RQIXVLRQ0DWUL[IURPWKH-7UHH$QDO\VLVIRUWKH7ZR&ODVV0RGHO
 3UHGLFWHG3DJHV  

Radical RightDQG
Radical IslamicAnti-extremist News Other 3UHFLVLRQ 5HFDOO
$FWXDO3DJHV
Radical RightDQG
Radical Islamic     
Anti-extremist      
News      
Other      
7DEOH
±
&RQIXVLRQ0DWUL[IURPWKH-7UHH$QDO\VLVIRUWKH)RX
U
&ODVV0RGHO
 3UHGLFWHG3DJHV  

Radical
Right
Radical
Islamic Anti-extremist News Other 3UHFLVLRQ 5HFDOO
$FWXDO3DJHV
Radical Right       
Radical Islamic       
Anti-extremist       
News       
Other       
7DEOH±&RQIXVLRQ0DWUL[IURPWKH-7UHH$QDO\VLVIRUWKH)LYH&ODVV0RGHO
107
... at the iCCrC, we focus primarily on text-based extremist content that has radical right-wing or jihadi leanings. for the former, radical right-wing material is characterized by racially, ethnically and sexually defined nationalism, which is typically framed in terms of white power and grounded in xenophobic and exclusionary understandings of the perceived threats posed by such groups as non-whites, Jews, immigrants, homosexuals, and feminists (see perry & Scrivens, 2016). for the latter, we define jihadi material as supportive of the creation of an expansionist islamic state or khalifa, the imposition of sharia law with violent jihad as a central component, and the use of local, national, and international grievances affecting muslims (see moghadam, 2008). ...
... the idea of this approach is based on a combination of the work associated with the Dark web project at the University of arizona (see Chen, 2012) and a previous project at the iCCrC that identified and explored online child exploitation websites (e.g., allsup, thomas, monk, frank, Joffres, Bouchard, frank, & westlake, 2011;frank, westlake, & Bouchard, 2010;monk, allsup, & frank, 2015;westlake & Bouchard, 2015;. tDC has since demonstrated its benefit in investigating online networks and communities in general (e.g., frank, macdonald, & monk, 2016;macdonald & frank, 2016, 2017macdonald, frank, mei, & monk, 2015;mikhaylov & frank, 2016Zulkarnine, frank, monk, mitchell, & Davies, 2016) and extremist content online in particular (e.g., Bouchard et al., 2014;Davies et al., 2015;frank et al., 2015;levey, Bouchard, hashimi, monk, & frank, 2016;mei & frank, 2015;Scrivens et al., 2017;Scrivens, Davies, & frank, 2018;Scrivens & frank, 2016;wong, frank, & allsup, 2015). tDC is a system that can be distributed across multiple virtual machines, depending on the number of machines that are available. ...
... sENtIMENt ANALysIs the use of keywords presents a useful first step in identifying large-scale patterns in extremist content online (e.g., Chalothorn & ellman, 2012;Bouchard et al., 2014;Davies et al., 2015;wong et al., 2015). however, the use of single keywords may lead to misleading interpretations of content (mei & frank, 2015;Scrivens & frank, 2016). if, for example, on a particular webpage, the words gun and control are found within close proximity of each other, it might be concluded that the page is discussing gun control. ...
Chapter
Full-text available
Purpose – This chapter examines how sentiment analysis and web-crawling technology can be used to conduct large-scale data analyses of extremist content online. Methods/approach – The authors describe a customized web-crawler that was developed for the purpose of collecting, classifying, and interpreting extremist content online and on a large scale, followed by an overview of a relatively novel machine learning tool, sentiment analysis, which has sparked the interest of some researchers in the field of terrorism and extremism studies. The authors conclude with a discussion of what they believe is the future applicability of sentiment analysis within the online political violence research domain. Findings – In order to gain a broader understanding of online extremism, or to improve the means by which researchers and practitioners “search for a needle in a haystack,” the authors recommend that social scientists continue to collaborate with computer scientists, combining sentiment analysis software with other classification tools and research methods, as well as validate sentiment analysis programs and adapt sentiment analysis software to new and evolving radical online spaces.
... SentiStreght can report binary (positive vs negative), trinary (positive/negative/neutral) and single scale (-4 to +4) sentiment results. From the reviewed articles, it was the most commonly used tool to determine sentiment [111,109,105,61,103,104,102,89,110]. ...
... It supports different NLP tasks, providing several options to analyse texts. Four articles used OpenNLP on the review [110,104,109,111]. ...
Preprint
Full-text available
Extremism research has grown as an open problem for several countries during recent years, especially due to the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. Natural Language Processing (NLP) represents a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by this groups, with the final objective of detecting and preventing its spread. This survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a description and comparison of the frequently used NLP techniques, how they were applied, the insights they provided, the most frequently used NLP software tools and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested.
... Yet in response to this problem, scholars have urged social scientists to collaborate with computer scientists (Conway, 2016;Scrivens, 2016), and computer scientists have embraced this collaboration, recommending that machine learning techniques, particularly semi-automated techniques that include human research decisions, be used to aid in the process of analyzing Big Data in terrorism and extremism studies (Brynielsson et al., 2013;Cohen et al., 2014). To some extent criminologists have begun to explore this critical point of departure via a customized web-crawler, extracting large bodies of text online that feature radical material and then using text-based analysis tools to assess the content (e.g., Bouchard et al., 2014;Burnap & Williams, 2016;Davies et al., 2015;Frank et al., 2015;Scrivens & Frank, 2016;Williams & Burnap, 2015). Similarly, computational-based research has been conducted to identify radical content on discussion forums (e.g., Fu, Abbasi, & Chen, 2010;Zhang et al., 2010;, Twitter accounts (e.g., Kaati, Omer, Prucha, & Shrestha, 2015), and videos on YouTube (e.g., Chen, 2012). ...
... Links were then recursively analyzed for further webpages, repeating the process until some user-specified termination condition applied (for more information on TENE, see Bouchard et al., 2014; see also Frank et al., 2015). This approach has demonstrated its benefit in investigating online networks and communities (see Bouchard et al., 2014;Davies et al., 2015;Frank et al., 2015;Macdonald, Frank, Mei, & Monk, 2015;Scrivens & Frank, 2016;Westlake & Bouchard, 2015;Wong et al., 2015). ...
Thesis
Full-text available
Criminologists have generally agreed that the Internet is not only a tool or resource for right-wing extremists to disseminate ideas and products, but also a site of important identity work, accomplished interactively through the exchange of radical ideas. Online discussion forums, amongst other interactive corners of the Web, have become an essential conduit for the radical right to air their grievances and bond around their “common enemy.” Yet overlooked in this discussion has been a macro-level understanding of the radical discussions that contribute to the broader collective identity of the extreme right online, as well as what constitutes “radical posting behaviour” within this context. Drawing from criminal career measures to facilitate this type of analysis, data was extracted from a sub-forum of the most notorious white supremacy forum online, Stormfront, which included 141,763 posts made by 7,014 authors over approximately 15 years. In study one of this dissertation, Sentiment-based Identification of Radical Authors (SIRA), a sentiment analysis-based algorithm that draws from traditional criminal career measures to evaluate authors’ opinions, was used to identify and, by extension, assess forum authors’ radical posting behaviours using a mixed-methods approach. Study two extended on study one by using SIRA to quantify authors’ group-level sentiment about their common enemies: Jews, Blacks, and LGBTQs. Study three further extended on studies one and two by analyzing authors’ radical posting trajectories with semi-parametric group-based modeling. Results highlighted the applicability of criminal career measures to study radical discussions online. Not only did this mixed-methods approach provide theoretical insight into what constitutes radical posting behaviour in a white supremacy forum, it also shed light on the communication patterns that contribute to the broader collective identity of the extreme right online.
... Their experiment was done on 768 randomly selected trending topics with over 18 classes and it gave an accuracy of 65% and 70% using the text-based and network-based classification models respectively. 49 Khan et al. (2011) used a rule-based domain-independent to conduct a sentence-level sentiment classification. Sentences were categorized first into subjective and objective sentences and the sentiment score was then calculated using SentiWord Net. ...
Article
The social media space has evolved into a large labyrinth of information exchange platform and due to the growth in the adoption of different social media platforms; there has been an increasing wave of interests in sentiment analysis as a paradigm for the mining and analysis of users' opinions and sentiments based on their posts. In this paper, we present a review of contextual sentiment analysis on social media entries with a specific focus on Twitter. The sentimental analysis consists of two broad approaches which are machine learning which uses classification techniques to classify text, and is further categorized into supervised learning and unsupervised learning; and the lexicon-based approach which uses a dictionary without using any test or training dataset, unlike the machine learning approach. The paper explores generic application areas including product/services analysis and security/terrorism investigations.
... In conclusion, we note 36 the potential of this method to be broadly used by a range of actors and agencies. 37 ...
Article
This article introduces a language‐based tool for addressing the role of religion in violent conflicts. Value predicate analysis (VPA) is an easily transportable, relatively uncomplicated early warning tool for measuring the probable near‐future behavior of modest‐sized religious groups in settings of potential conflict. We show that it is possible to identify a range of nine types of probable group behavior toward other groups. This approach significantly refines current binary assessments of violent/not‐violent group conduct. The authors (1) provide a warrant for diagnosing religion‐group behavior through performative analysis; (2) present a theoretical overview of VPA; (3) summarize their research, data analysis, and field collection methods; (4) present field test results; and (5) conclude with recommendations for further research.
Article
Full-text available
Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification applications, and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested towards stimulating further research in this exciting research area.
Conference Paper
Full-text available
As the data generated on the internet exponentially increases, developing guided data collection methods become more and more essential to the research process. This paper proposes an approach to building a self-guiding web-crawler to collect data specifically from extremist websites. The guidance component of the web-crawler is achieved through the use of sentiment-based classification rules which allow the crawler to make decisions on the content of the webpage it downloads. First, content from 2,500 webpages was collected for each of the four different sentiment-based classes: pro-extremist websites, anti-extremist websites, neutral news sites discussing extremism and finally sites with no discussion of extremism. Then parts of speech tagging was used to find the most frequent keywords in these pages. Utilizing sentiment software in conjunction with classification software a decision tree that could effectively discern which class a particular page would fall into was generated. The resulting tree showed an 80% success rate on differentiating between the four classes and a 92% success rate at classifying specifically extremist pages. This decision tree was then applied to a randomly selected sample of pages for each class. The results from the secondary test showed similar results to the primary test and hold promise for future studies using this framework.
Article
Full-text available
This book analyses the actions, networks and frames of right wing extremism. If research on extreme right political parties is growing, the extreme right has however only very rarely been studied as a social movement. To fill this gap, this volume compares the extreme right in Italy, Germany and the United States using some main concepts and methods developed in social movement studies. In particular, it describes the discourse, repertoires and organizational structures of the extreme right, and explains it on the basis of the discoursive and political opportunities and resources available to them. A combination of empirical methods is used in order to collect and analyse data on the extreme right organizations. The frame analysis looks at the cognitive mechanisms that are relevant in influencing organizational and individual behaviour. The network analysis looks at the (inter-) organizational structural characteristics of the right-wing organizations. Finally, the protest event analysis allows for an empirical summary of the actions undertaken by right-wing extremists over the last decade. The substantive chapters address the organizational structure of the extreme right, the action repertoires of the extreme right as well as the framing concerning, respectively, the definition of the 'us', the struggle against modernity, old and new forms of racism, opposition to globalization and populism. Finally, in the conclusions, the authors reflect on the contributions that social movement studies give to the understanding of the phenomenon, as well as, vice-versa, how research on the extreme rights could contribute to the theorization on social movements' dynamics. © Manuela Caiani, Donatella della Porta, and Claudius Wagemann 2012.
Article
Full-text available
Lone-wolf terrorism is a threat to the security of modern society, as was tragically shown in Norway on July 22, 2011, when Anders Behring Breivik carried out two terrorist attacks that resulted in a total of 77 deaths. Since lone wolves are acting on their own, information about them cannot be collected using traditional police methods such as infiltration or wiretapping. One way to attempt to discover them before it is too late is to search for various “weak signals” on the Internet, such as digital traces left in extremist web forums. With the right tools and techniques, such traces can be collected and analyzed. In this work, we focus on tools and techniques that can be used to detect weak signals in the form of linguistic markers for potential lone wolf terrorism.
Article
Full-text available
The nature of the Internet--the ease of access, the chaotic structure, the anonymity, and the international character--all furnish terrorist organizations with an easy and effective arena for action. The present research focuses on the use of the Internet by modern terrorist organizations and attempts to describe the uses terrorist organizations make of this new communication technology. Is the use of the Internet by terrorists different from that of other, "conventional" means of communication? How can governments respond to this new challenge? The population examined in this study is defined as the Internet sites of terrorist movements as found by a systematic search of the Internet, using various search engines. The sites were subjected to a qualitative content analysis, focusing on their rhetorical structures, symbols, persuasive appeals, and communication tactics. The study reveals differences and similarities between terrorist rhetoric online and in the conventional media.
Book
Global Terrorism and New Media carefully examines the content of terrorist websites and extremist television programming to provide a comprehensive look at how terrorist groups use new media today. Based partly on a content analysis of discussion boards and forums, the authors share their findings on how terrorism 1.0 is migrating to 2.0 where the interactive nature of new media is used to build virtual organization and community. Although the creative use of social networking tools such as Facebook may advance the reach of terrorist groups, the impact of their use of new media remains uncertain. The book pays particular attention to terrorist media efforts directed at women and children, which are evidence of the long-term strategy that some terrorist organizations have adopted, and the relationship between terrorists' media presence and actual terrorist activity. This volume also looks at the future of terrorism online and analyzes lessons learned from counterterrorism strategies. This book will be of much interest to students of terrorism studies, media and communication studies, security studies and political science.
Article
Despite over a decade of government funding and thousands of newcomers to the field of terrorist research, we are no closer to answering the simple question of “What leads a person to turn to political violence?” The state of stagnation with respect to this issue is partly due to the government strategy of funding research without sharing the necessary primary source information with academia, which has created an unbridgeable gap between academia and the intelligence community. This has led to an explosion of speculations with little empirical grounding in academia, which has the methodological skills but lacks data for a major breakthrough. Most of the advances in the field have come from historical archival research and analysis of a few field interviews. Nor has the intelligence community been able to achieve any breakthrough because of the structure and dynamic of this community and its lack of methodological rigor. This prevents creative analysis of terrorism protected from political concerns. The solution to this stagnation is to make non-sensitive data available to academia and to structure more effective discourse between the academic and intelligence communities in order to benefit from the complementary strengths in these two communities.
Article
General sentiment analysis for the social web has become increasingly useful for shedding light on the role of emotion in online communication and offline events in both academic research and data journalism. Nevertheless, existing general-purpose social web sentiment analysis algorithms may not be optimal for texts focussed around specific topics. This article introduces 2 new methods, mood setting and lexicon extension, to improve the accuracy of topic-specific lexical sentiment strength detection for the social web. Mood setting allows the topic mood to determine the default polarity for ostensibly neutral expressive text. Topic-specific lexicon extension involves adding topic-specific words to the default general sentiment lexicon. Experiments with 8 data sets show that both methods can improve sentiment analysis performance in corpora and are recommended when the topic focus is tightest.
Article
The main applications and challenges of one of the hottest research areas in computer science.
Article
This article examines the ways in which digital technology is being used in contemporary forms of racist culture within white nationalist movements. It argues that new types of racist culture are made possible in cyberspace. This both challenges popular conceptions of what 'The Racist' is supposed to look like and points the ways in which technological innovation is reinvigorating anti-Semitism and racisms that work in and through the boundaries of nation-states. It is argued that it is possible to situate racism and white nationalism at the centre of the so-called postmodern condition.