Conference PaperPDF Available

Sentiment-based Classification of Radical Text on the Web

Authors:

Abstract and Figures

The total number of webpages has grown substantially since the birth of the Internet. So too have the number of webpages dedicated to radical yet subtle content. As these new circumstances have necessitated a guided data collection method, one that can sidestep the laborious manual methods that have been classically utilized, simple keyword analysis has not been sufficient to identify radical sites on Web 1.0 – pro-extremist, anti-extremist, and news sites, for example, may use the same keywords to discuss the same event but have a very different motivation. In an effort to explore this problem, we completed an exercise involving the use of a web-crawler to collect 20,000 webpages from five sentiment-based classes to assess their differences: (1) radical Right sites; (2) radical Islamic sites; (3) anti-extremist sites; (4) news source sites discussing extremism; and (5) sites that did not discuss extremism. Parts-of-Speech (POS) tagging was used to identify 198 of the most frequent keywords within the data, and the sentiment value for each of these keywords was calculated for each webpage using sentiment analysis. With these values, a decision tree was applied to three classification models. Results suggest that radical Islamic text can be classified at a much higher rate of success than radical Right text.
No caption available
… 
Content may be subject to copyright.
6HQWLPHQWEDVHG&ODVVLILFDWLRQRI5DGLFDO
7H[WRQWKH:HE
5\DQ6FULYHQVDQG5LFKDUG)UDQN
,QWHUQDWLRQDO&\EHU&ULPH5HVHDUFK&HQWUH 
6FKRRORI&ULPLQRORJ\6LPRQ)UDVHU8QLYHUVLW\
%XUQDE\&DQDGD
^UVFULYHQUIUDQN`#VIXFD
Abstract ² 7KH WRWDO QXPEHU RI ZHESDJHV KDV JURZQ
VXEVWDQWLDOO\ VLQFH WKH ELUWK RI WKH ,QWHUQHW 6R WRR KDYH WKH
QXPEHU RI ZHESDJHV GHGLFDWHG WR UDGLFDO \HW VXEWOH FRQWHQW $V
WKHVHQHZFLUFXPVWDQFHVKDYHQHFHVVLWDWHGDJXLGHGGDWDFROOHFWLRQ
PHWKRGRQHWKDWFDQVLGHVWHSWKHODERULRXVPDQXDOPHWKRGVWKDW
KDYH EHHQ FODVVLFDOO\ XWLOL]HG VLPSOH NH\ZRUG DQDO\VLV KDV QRW
EHHQVXIILFLHQWWRLGHQWLI\UDGLFDOVLWHVRQ:HE±SURH[WUHPLVW
DQWLH[WUHPLVW DQG QHZV VLWHV IRU H[DPSOH PD\ XVH WKH VDPH
NH\ZRUGV WR GLVFXVV WKH VDPH HYHQW EXW KDYH D YHU\ GLIIHUHQW
PRWLYDWLRQ,QDQHIIRUWWRH[SORUHWKLVSUREOHPZHFRPSOHWHGDQ
H[HUFLVH LQYROYLQJ WKH XVH RI D ZHEFUDZOHU WR FROOHFW 
ZHESDJHV IURP ILYH VHQWLPHQWEDVHG FODVVHV WR DVVHVV WKHLU
GLIIHUHQFHV  UDGLFDO 5LJKW VLWHV  UDGLFDO ,VODPLF VLWHV 
DQWLH[WUHPLVW VLWHV  QHZV VRXUFH VLWHV GLVFXVVLQJ H[WUHPLVP
DQGVLWHVWKDWGLGQRWGLVFXVVH[WUHPLVP3DUWVRI6SHHFK326
WDJJLQJ ZDV XVHG WR LGHQWLI\  RI WKH PRVW IUHTXHQW NH\ZRUGV
ZLWKLQ WKH GDWD DQG WKH VHQWLPHQW YDOXH IRU HDFK RI WKHVH
NH\ZRUGV ZDV FDOFXODWHG IRU HDFK ZHESDJH XVLQJ VHQWLPHQW
DQDO\VLV :LWK WKHVH YDOXHV D GHFLVLRQ WUHH ZDV DSSOLHG WR WKUHH
FODVVLILFDWLRQPRGHOV5HVXOWVVXJJHVWWKDWUDGLFDO,VODPLFWH[WFDQ
EHFODVVLILHG DW DPXFKKLJKHU UDWH RIVXFFHVV WKDQ UDGLFDO 5LJKW
WH[W
Keywords - Sentiment Analysis; Decision Trees; Extremism
,,1752'8&7,21
6LQFH WKH PLGV ZH KDYH VHHQ D UDSLG JURZWK LQ WKH
QXPEHURIUDGLFDOZHEVLWHVDFWLQJDVDKXERIWKHPRYHPHQWWKH\
UHSUHVHQW,QPDQ\UHVSHFWVWKH\VHUYHDVDFRQYHUJHQFHVHWWLQJ
IRUQHZFRPHUVWRIDPLOLDUL]HWKHPVHOYHVZLWKUDGLFDOWHDFKLQJV
>@FRQQHFWLQJXVHUVWRRWKHUVLPLODUZHEVLWHVZHEIRUXPV
DQGSDJHVRQWKH'DUN:HEWRQDPHEXWDIHZ>@&RXQWHU
H[WUHPLVPRUJDQL]DWLRQV FRQWLQXH WR ORRN IRU ZD\V WR LGHQWLI\
WKHVH VLWHV >@ \HW WKH ,QWHUQHW LV FRQVWDQWO\ JURZLQJ DQG
LQIRUPDWLRQLVEHLQJJHQHUDWHGDWDUDSLGSDFH>@7KLVKDVOHG
WR D JURZLQJ IORRG RI GDWD UHVXOWLQJ LQ PDQXDO GDWD DQDO\VLV
EHFRPLQJ OHVV HIILFLHQW RU HQWLUHO\ LQIHDVLEOH >@ ,Q UHVSRQVH
UHVHDUFKHUV KDYH GHYHORSHG GDWD FROOHFWLRQ WRROV WKDW DOORZ
GHFLVLRQVWREHPDGHDERXWWKHWKRXVDQGVRIZHESDJHVWKDWDUH
H[WUDFWHGDQGE\H[WHQVLRQFODVVLI\UDGLFDOWH[WRQWKH:HE
 5HVHDUFKHUVKDYHH[SORUHGWKLVFULWLFDOSRLQWRIGHSDUWXUHYLD
DXWRPDWHG FRPSXWDWLRQDO DQG VHPLDXWRPDWHG WRROV ERWK WR
LPSURYH WKH PHWKRGV RI FROOHFWLRQ DQG LGHQWLILFDWLRQ RI
H[WUHPLVW FRQWHQW &KHQ¶V Dark Web Project IRU H[DPSOH
SURGXFHGDQXPEHURIFRPSXWDWLRQDODQGGDWDFHQWULFVWXGLHVRQ
WKHFRQWHQW DQG VWUXFWXUH RI VXUIDFH OHYHO DQG 'DUN :HE VLWHV
FRQWDLQLQJH[WUHPLVWPDWHULDO>@,QOLJKWRIWKLVFRPSUHKHQVLYH
ZRUNWKHHQWLUHSURFHVVZDVDXWRPDWHGDQGWKH\GLGQRWDVVHVV
WKHVXEWOH ODQJXDJH ± LQ WKH (QJOLVK WH[W ± IRXQG RQ ZHEVLWHV
IHDWXULQJH[WUHPLVWFRQWHQW$UJXDEO\DIXOO\DXWRPDWHGV\VWHP
VKRXOGEHDYRLGHGZKHQKXPDQLQWHOOLJHQFHLVUHTXLUHG>@LH
LGHQWLI\LQJVXEWOHIRUPVRIUDGLFDOWH[WLQ(QJOLVK
 ,QWKHZRUNRI0HLDQG)UDQNWKHDXWKRUVGHYHORSHGDPRGHO
WKDW FRPELQHG VHQWLPHQW DQDO\VLV DQG D ZHEFUDZOHU ZLWK D
GHFLVLRQWUHHWRGLIIHUHQWLDWHSURH[WUHPLVWZHESDJHVIURPDQWL
H[WUHPLVW SDJHV QHZV SDJHV DQG SDJHV WKDW GLG QRW UHODWH WR
H[WUHPLVP +HUH WKH RYHUDOO JRDO RI WKLV VHPLDXWRPDWHG
DSSURDFKZDVWRFUHDWHDWRROWKDW PDGHSUHGLFWLRQV DERXW WKH
FRQWHQW IRXQG RQ WKH VLWHV LW GRZQORDGHG >@ :KLOH WKLV
WHFKQLTXH PDQDJHG WR FODVVLI\ H[WUHPLVW WH[W DW D KLJK UDWH RI
DFFXUDF\ LH  ERWK H[WUHPLVWEDVHG ZHEIRUXPV DQG
ZHEVLWHV RI H[WUHPLVW RUJDQL]DWLRQV ZHUH DQDO\]HG ZLWKLQ RQH
PRGHO$UJXDEO\WKHVHQWLPHQWIRXQGRQERWKW\SHVRISODWIRUPV
DUH GLIIHUHQW LQ WHUPV RI WRQH DQG VXEMHFW PDWWHU ,QGHHG WKLV
SUREOHPUHTXLUHVIXUWKHUH[SORUDWLRQ
,,0(7+2'6
7KHSXUSRVHRIWKLVVWXG\ZDVWRDGGWRWKHOLWHUDWXUHE\XVLQJ
DVHQWLPHQWDQDO\VLVWRRODQGDGHFLVLRQWUHHWRGLIIHUHQWLDWHSUR
H[WUHPLVWZHESDJHVIURPDQWLH[WUHPLVWSDJHVQHZVSDJHVDQG
SDJHVWKDWGLGQRWUHODWHWRH[WUHPLVP7KLVZDVGRQHE\EXLOGLQJ
WKUHHWH[WEDVHG FODVVLILFDWLRQ PRGHOV&KDSWHU,,$ )LUVWZH
XVHGWKH7HUURULVPDQG([WUHPLVP1HWZRUN([WUDFWRU7(1(
WR FROOHFW GDWD &KDSWHU  DQG 2SHQ1/3¶V 3DUWVRI6SHHFK
326DQDO\VLVWRGHYHORSDOLVWRINH\ZRUGV&KDSWHU,,&7KH
VHQWLPHQW H[SUHVVHG ZLWKLQ WKH ZHESDJHVZDV WKHQ FDOFXODWHG 
EDVHG RQ WKH 326 NH\ZRUGV GHWDLOLQJ WKH UHODWLYH VHQWLPHQW
VFRUHVIRUHDFKSDJH IRU HDFK RI WKH NH\ZRUGV &KDSWHU,,'
)LQDOO\DGHFLVLRQWUHHZDVEXLOWEDVHGRQWKHVHQWLPHQWVFRUHV
&KDSWHU,,(ZKLFKSURGXFHGWKHFODVVLILFDWLRQUHVXOWVIRUWKUHH
FODVVLILFDWLRQPRGHOV&KDSWHU,,)VHH)LJXUH
A. Webpage Data
5HVHDUFKHUVKDYHVXJJHVWHGWKDWWKHVHQWLPHQWIRXQGRQERWK
UDGLFDO 5LJKW > @ DQG UDGLFDO ,VODPLF > @ ZHEVLWHV DUH
HOXVLYH$VDPHDQVRIH[SORULQJWKLVFODLPRQWKHSURH[WUHPLVW
FRQWHQWDQGHYDOXDWLQJZKHWKHUWKHGHFLVLRQWUHHFRXOGFODVVLI\
FRQWHQW EDVHG RQ WKRVH VXEWOH LGHRORJLFDO PRWLYDWLRQV ILYH
GLVWLQFWDQGQRQRYHUODSSLQJFODVVHVRISDJHVZHUHDWWDLQHG
1) 7KH radical Right FODVV ZDV VHOHFWHG WKURXJK WZR
PHWKRGV  D *RRJOH VHDUFK RI WKH NH\ZRUGV
µH[WUHPLVWZHEVLWHV¶µZKLWHVXSUHPDF\ZHEVLWHV¶DQG
RWKHUVLPLODUZRUGVDQGXVLQJDQLQGH[RIH[WUHPLVW
VLWHV JHQHUDWHG E\ UHVHDUFKHUV DQG VRXUFHV IRU
2016 European Intelligence and Security Informatics Conference
978-1-5090-2857-3/16 $31.00 © 2016 IEEE
DOI 10.1109/EISIC.2016.41
104
H[DPSOHV VHH > @ 7KLV FODVV FRQWDLQHG 
ZHESDJHVVSUHDGRYHUZHEVLWHV
2) 7KH radical Islamic FODVV ZDV VHOHFWHG WKURXJK WKH
NH\ZRUGV µH[WUHPLVW ZHEVLWHV¶ µMLKDG RUJDQL]DWLRQV¶
DQG SUHYLRXVO\ FRPSLOHG LQGH[HV >@ 7KLV FODVV
FRQWDLQHGZHESDJHVIURPZHEVLWHV
3) $Qanti-extremist FODVV LQFOXGHG LQWHOOLJHQFHDJHQFLHV
DQG RUJDQL]DWLRQV GHGLFDWHG WR FRXQWHULQJ H[WUHPLVP
DQG ZHUH LGHQWLILHG YLD DQ RQOLQH VHDUFK RI VXFK
NH\ZRUGVDVµFRXQWHUH[WUHPLVP¶DQGµDQWLH[WUHPLVP¶
JURXSV IRU H[DPSOH 7KH VDPSOH FRQWDLQHG 
ZHESDJHVIURPZHEVLWHV
4) $ news FODVV LQFOXGHG ZHEVLWHV LI WKH\ ZHUH VRXUFHV
WKDWUHSRUWHGRQH[WUHPLVWHYHQWV+HUHLWZDVDVVXPHG
WKDW WKH\ SUHVHQWHG D VWRU\ LQ D PRUH ³LPSDUWLDO´
PDQQHUWKDQSURH[WUHPLVWRUDQWLH[WUHPLVWVLWHV7KLV
VDPSOHFRQWDLQHGZHESDJHVIURPZHEVLWHV
5) $ILQDO FODVVotherVHUYHGDVWKHFRQWUROJURXSLQWKH
VDPSOH ZKHUH RQH ZRXOG QRW H[SHFW WR XQFRYHU
PDWHULDO UHODWLQJ WR H[WUHPLVP DQG LQFOXGHG WRSLFV
UHODWLQJ WR DUW DQG FXOWXUH HGXFDWLRQ HQWHUWDLQPHQW
IRRG WUDYHO DQG OLIHVW\OH VRFLDO PHGLD RQOLQH EORJV
DXWRPRELOHVVSRUWVDQGEXVLQHVV)RUWKLVFODVV
ZHESDJHVIURPZHEVLWHVZHUHVDPSOHG
7KUHH PRGHOV ZHUH WKHQ FRQVWUXFWHG IRU WKH SXUSRVHV RI
FRPSDULQJDQGFRQWUDVWLQJWKHLUFODVVLILFDWLRQUHVXOWV
Two-class model:radical RightDQGIslamicYHUVXV
anti-extremistnewsDQGotherZHESDJHV
Four-class model:radical RightDQGradicalIslamic
YHUVXV  anti-extremist YHUVXV  news YHUVXV 
otherZHESDJHV
Five-class model:radical RightYHUVXV  radical
IslamicYHUVXVanti-extremistYHUVXVnewsYHUVXV
other ZHESDJHV
B. Terrorism and Extremism Network Extractor (TENE)
7KH7HUURULVP DQG([WUHPLVP1HWZRUN([WUDFWRU 7(1(
DFXVWRPZULWWHQFRPSXWHUSURJUDPWKDWZDVGHVLJQHGWRFROOHFW
YDVWDPRXQWVRIGDWDRQOLQHDXWRPDWLFDOO\EURZVHGDQGFDSWXUHG
DOORI WKHFRQWHQWIURPWKH ZHEVLWHVXVHGIRUFODVVLILFDWLRQ
EHWZHHQ-XO\  DQG -XO\IRU D PRUHGHWDLOHG
GHVFULSWLRQRIWKHFUDZOHUVHH>@
C. Parts-of-Speech (POS) Tagging
7KH LQLWLDO VWHS LQ DQDO\]LQJ WKH FRQWHQW RQ ZHESDJHV
DXWRPDWLFDOO\ZDVWRGHWHUPLQHWKHWRSLFVRIGLVFXVVLRQ7RGR
WKLVZHZDQWHG WR WDNH D GDWDGULYHQ DSSURDFK DQG LVRODWH WKH
SDUWLFXODUQRXQV ZLWK WKHKLJKHVWUDWH RI RFFXUUHQFHZLWKLQWKH
GDWDFROOHFWHG :HDVVXPHGWKDW WKHPRVWIUHTXHQWO\ GLVFXVVHG
WRSLFVZRXOGPRVWOLNHO\EHWKHRQHVLQZKLFKH[WUHPLVWFRQWHQW
ZDVOLNHO\WREH GHWHFWHG EDVHG RQ WKH ZRUN RI >@ )XUWKHU
QRXQVZHUHFKRVHQEHFDXVHWKH\DUHWKHZRUGVPRVWOLNHO\WREH
VXUURXQGHGE\ UHOHYDQW VHQWLPHQW WHUPV >@ QDPHVDQGSODFHV
DUHRIWHQGHVFULEHGRUGHQRWHGE\WKHDGMHFWLYHVOLQNHGWRWKHP
DQG DGMHFWLYHV DUH RIWHQ WKH ZRUGV WKDW KDYH VHQWLPHQW YDOXHV
DWWDFKHGWRWKHP>@%\VSHFLI\LQJDGMHFWLYHVDVNH\ZRUGVWKH
VHQWLPHQWRIWKHZRUGLWVHOIZRXOGEHORVW
3DUWVRI6SHHFK 326 WDJJLQJ ZDV XVHG WR SHUIRUP WKLV
DQDO\VLV326LVDGDWDDQDO\VLVPHWKRGWKDWFROOHFWVDQGGLYLGHV
VXSSOLHG WH[W LQWR ZRUG JURXSLQJV VXFK DV QRXQV DQG YHUEV
EDVHGRQWKHLUXVDJHDQGSRVLWLRQZLWKLQWKHVHQWHQFHIRUPRUH
LQIRUPDWLRQVHH>@
)URPHDFKFODVVZHVHOHFWHGRIWKHPRVWIUHTXHQWQRXQV
DQG DJJUHJDWHG WKHP LQWR RQH OLVW ZKLFK DIWHU UHPRYDO RI
GXSOLFDWHVDQGPLVFDWHJRUL]HGZRUGVV\PEROVDQGQRQZRUGV
IRU H[DPSOH ZH HQGHG ZLWK  NH\ZRUGV 7KLV ILQDO OLVW
IRUPHG WKH NH\ZRUG OLVW IRU WKH VHQWLPHQW DQDO\VLV 'RPDLQ
H[SHUWVPD\UHSODFHRUH[WHQGWKHOLVWRINH\ZRUGVZLWKUHOHYDQW
GRPDLQVSHFLILF ZRUGV +RZHYHU WKH H[SORUDWRU\ DQG GDWD
GULYHQQDWXUHRIWKHVWXG\VXJJHVWHGWKDWZHUHO\RQWKHGDWDWR
LGHQWLI\WKHUHOHYDQWNH\ZRUGV
D. Sentiment Analysis
$IWHU WKH NH\ZRUG OLVW ZDV GHYHORSHG LW ZDV QHFHVVDU\ WR
LGHQWLI\DQGHYDOXDWHWKHFRQWH[WVXUURXQGLQJWKHNH\ZRUGV7R
DOORZDSURSHUDQDO\VLVRIWKHZHESDJHVVHQWLPHQWDQDO\VLVZDV
XVHGWRKLJKOLJKWUHOHYDQWWH[W
6HQWLPHQWDQDO\VLVLVDGDWD FROOHFWLRQDQGDQDO\VLVPHWKRG
WKDW DOORZV IRU WKH DSSOLFDWLRQ RI VXEMHFWLYH ODEHOV DQG
FODVVLILFDWLRQV>@,WFDQHYDOXDWHWKHRSLQLRQVRILQGLYLGXDOVE\
RUJDQL]LQJGDWDLQWRGLVWLQFWFODVVHVDQGVHFWLRQVDQGDVVLJQLQJ
DQ LQGLYLGXDO¶V VHQWLPHQW ZLWK D QHJDWLYH RU SRVLWLYH SRODULW\
YDOXH>@,WDOVRSURYLGHV DPRUHWDUJHWHGYLHZRIDGDWDVHW E\
)LJXUH
±
'DWD&ROOHFWLRQDQG7UHH*HQHUDWLRQ3URFHVV
105
DOORZLQJIRUWKHGHPDUFDWLRQEHWZHHQFDVHVWKDWDUHVRXJKWDIWHU
DQGWKRVH ZLWKRXW DQ\ QRWDEOHUHOHYDQFH6LQFH WKH SXUSRVHRI
WKHFXUUHQWVWXG\ZDV QRW WR SXVK WKH ERXQGDULHV RIVHQWLPHQW
DQDO\VLVDOJRULWKP6HQWL6WUHQJWK>@ZDVXWLOL]HG
6HQWL6WUHQJWK LV -DYDEDVHG VRIWZDUH WKDW XVHV D VSHFLILF
DOJRULWKP WR UXQ WKURXJK ODUJH YROXPHV RI WH[W DQG FUHDWH
VHQWLPHQW VFRUHV IRU WKH VXSSOLHG GRFXPHQWV :KLOH WKHUH DUH
VHYHUDOFRQILJXUDWLRQVRI WKH VRIWZDUH ZHXWLOL]HGDNH\ZRUG
IRFXVHGPHWKRGDVDFHQWUDOIHDWXUHRI6HQWL6WUHQJWKLVLWVDELOLW\
WR HYDOXDWH VHQWLPHQW DURXQG DQ\ JLYHQ NH\ZRUG >@ 7KLV
SURFHVV LQYROYHV WKH XWLOL]DWLRQ RI D GLFWLRQDU\ RI FDWDORJXHG
WHUPV DQG +DUYDUG¶V JHQHUDO LQTXLUHU GDWDEDVH WR GHWHUPLQH
VHQWLPHQWYDOXHV>@,WORFDWHVZRUGVWKDWFRUUHVSRQGZLWKLWV
GLFWLRQDU\DQGGDWDEDVHDQGWKHQLWHPSOR\VDVWHPPLQJPHWKRG
WRHYDOXDWHDWH[WE\DVVLJQLQJSRODULW\YDOXHVRIHLWKHUSRVLWLYH
RUQHJDWLYHWRWKH ZRUGV 9DOXHV DUH DXJPHQWHG E\ FKDUDFWHUV
WKDWFDQLQIOXHQFHWKHYDOXHVDVVLJQHGWRWKHWH[WVXFKDVERRVWHU
ZRUGVQHJDWLYHZRUGVUHSHDWHGOHWWHUVUHSHDWHGQHJDWLYHWHUPV
DQWDJRQLVWLFZRUGVSXQFWXDWLRQDQGRWKHUGLVWLQFWLYHFKDUDFWHUV
VXLWHGIRUVWXG\LQJDQRQOLQHFRQWH[W>@IRUPRUHLQIRUPDWLRQ
RQ6HQWL6WUHQJWKVHH>@
E. Decision Tree
$WRROLQPDFKLQHOHDUQLQJ DQG GDWD PLQLQJ WKDW LV ZLGHO\
DFFHSWHG DPRQJVW DFDGHPLFV :DLNDWR (QYLURQPHQW IRU
.QRZOHGJH$QDO\VLV:(.$ZDVUXQ RQ WKH VHQWLPHQW GDWD
IRUHDFKRIWKHWKUHHFODVVLILFDWLRQPRGHOV,WVVWDQGDUG-WUHH
FODVVLILFDWLRQPHWKRGZDVXVHGEHFDXVHLWFRQWDLQVDQDOJRULWKP
IRUWH[WFODVVLILFDWLRQWKDWDOORZVIRUDUXOHEXLOGLQJSURFHVV>@
:(.$ZDVDSSOLHGXVLQJIROGFODVVLILFDWLRQRQWKHRXWSXWWHG
VHQWLPHQWWDEOHV VHDUFKLQJ IRU GLIIHUHQFHV LQVHQWLPHQWYDOXHV
EHWZHHQWKHFODVVHVRIVLWHV$VDUHVXOWWKLVDOJRULWKPSURGXFHG
DGHFLVLRQWUHHZLWKUHFXUVLYHOHDYHVDQGEUDQFKHVZLWKHDFKOHDI
UHSUHVHQWLQJDVSHFLILFVHWRIVHQWLPHQWWKUHVKROGV
,WLV ZLWKWKHVHWKUHVKROGVWKDWWKH GHFLVLRQWUHHHVWDEOLVKHG
ZKHWKHUDFHUWDLQZHESDJHZDVRUJDQL]HGLQWRWKHradicalright
radical Islamic pro-extremist anti-extremist news RU other
FODVV)RUH[DPSOHLIWKHNH\ZRUGµPDUULDJHZDVLQFOXGHGLQ
WKHWH[WZLWKDVHQWLPHQWYDOXHRIOHVVWKDQRUHTXDOWRDQGWKH
WHUPµSULFZDVDOVRLQWKHWH[WZLWKDVHQWLPHQWYDOXHRIOHVV
WKDQRUHTXDOWRDYDOXHRIWKHQWKH-SUXQHGWUHHLQGLFDWHG
WKDWRIWKHSDJHVIHOOZLWKLQWKHnewsFODVVLILFDWLRQ
F. Classification Results
$IWHU WKH - DOJRULWKPZDV UXQ RQ WKH VHQWLPHQWGDWDIRU
HDFK RI WKH WKUHH PRGHOV :(.$ FUHDWHG D PHDVXUH WKDW
VSHFLILHGWKHQXPEHURI DFFXUDWHSDJHV WKDWILWLQWRHDFKRIWKH
FRUUHFWFODVVHV,QWXUQDFRQIXVLRQPDWUL[GLVSOD\HGWKHQXPEHU
RISDJHVWKDWZHUHFRUUHFWO\DQGLQFRUUHFWO\LGHQWLILHGLHIDOVH
SRVLWLYHV DQG IDOVH QHJDWLYHV LQ HDFK PRGHO PHDVXUHG ZLWK
SUHFLVLRQDQGUHFDOO3UHFLVLRQLVDPHDVXUHZKLFKUHSUHVHQWVWKH
H[DFWQHVVRIWKHUXOHV,WFDQEHWKRXJKWRIDVWKHQXPEHURIWUXH
SRVLWLYHHQWLWLHVLGHQWLILHG E\WKHFODVVLILHU RXW RIWKHVHW RI DOO
HQWLWLHV LGHQWLILHG DV SRVLWLYH 7KH ODUJHU WKH YDULDELOLW\ LQ WKH
UHVXOWVWKHVPDOOHUWKHSUHFLVLRQZLOOEH5HFDOOLVDPHDVXUHRI
FRPSOHWHQHVVDQGLVWKHQXPEHURISRVLWLYHHQWLWLHVidentifiedRXW
RIall WUXHSRVLWLYHHQWLWLHV$ UHFDOOVFRUHRI SHUIHFWPHDQV
WKDWDOOWUXHSRVLWLYHVDUHLGHQWLILHGZLWKRXWDQ\IDOVHQHJDWLYHV
,,,5(68/76
7KHUHVXOWV ZHUH LQWHUSUHWHGLQWZR ZD\V )LUVWWKHRYHUDOO
FODVVLILFDWLRQ UHVXOWV ZHUH HYDOXDWHG IRU HDFK FODVVLILFDWLRQ
PRGHOIROORZHGE\ WKH TXDOLW\ RI WKH HQWLUH UXOH VHW IRU HDFK
+HUHWKHDQDO\VLVZDVODUJHO\IRFXVHGRQWKHSURH[WUHPLVWFODVV
A. Two-Class Model: Results and Quality of Rule-Set
7KH WZRFODVV WUHH DQDO\VLV LQGLFDWHG WKDW  RI
ZHESDJHV ZHUH VXFFHVVIXOO\ FODVVLILHG DFURVV FODVVHV 7KH
FRPELQHG radical Right DQG radical Islamic FODVV ZDV
VXFFHVVIXOO\ FODVVLILHG DW D FRQVLGHUDEO\ ORZHU UDWH WKDQ WKH
FRPELQHGanti-extremistnewsDQGotherFODVVLHRI
SDJHVZHUHDFFXUDWHO\LGHQWLILHG)XUWKHUPRUHWKHUHFDOORIWKH
HQWLUH UXOHVHW LQGLFDWHG WKDW DFWXDO WZR UDGLFDO FODVVHV ZHUH
FODVVLILHG DW D UDWH RI  LQ FRPSDULVRQ WR WKH QRQUDGLFDO
FODVVHV¶UHFDOOPHDVXUHRIVHHSUHFLVLRQDQGUHFDOOLQ7DEOH

B. Four-Class Model: Results and Quality of Rule-Set
7KH IRXUFODVV WUHH DQDO\VLV UHYHDOHG WKDW  RI
ZHESDJHV ZHUH DFFXUDWHO\ FODVVLILHG DFURVV FODVVHV
,QWHUHVWLQJO\news SDJHVKDGWKH KLJKHVWQXPEHURI FRUUHFWO\
FODVVLILHGSDJHV LH IROORZHGE\WKHFRPELQHG WZR
UDGLFDOFODVVHVLHZKLFKDJDLQVKRZHGPRGHUDWHO\KLJK
SUHFLVLRQ UDWHV LH  DQG PRGHUDWHO\ ORZ UHFDOO UDWHV
LHNewsFODVVVLWHVZHUHDOVRPLVFODVVLILHGDFURVV WKH
UHPDLQLQJWKUHHFODVVHVDWWKHKLJKHVWIUHTXHQF\VHH7DEOH
C. Five-Class Model: Results and Quality of Rule-Set
7KH ILYHFODVV WUHH DQDO\VLV LQGLFDWHG WKDW  RI WKH
SDJHVZHUHDFFXUDWHO\FODVVLILHG2ISDUWLFXODULQWHUHVWKHUHZHUH
WKHradical Right-DQGradical Islamic SDJHV)LUVWWKHFRQIXVLRQ
PDWUL[LQGLFDWHGWKDWDPHUHRIradical RightSDJHVZHUH
VXFFHVVIXOO\FODVVLILHG:LWKLQWKLVSDUWLFXODUFODVVnewsSDJHV
ZHUHPLVLGHQWLILHG DWWKHKLJKHVWUDWH LQWKHPRGHO RI
LWV SDJHV ZHUH PLVFODVVLILHG DV UDGLFDO ULJKWZLQJ 6HFRQG
 RI SDJHV WKDW IHDWXUHG UDGLFDO ,VODPLF FRQWHQW ZHUH
DFFXUDWHO\FODVVLILHGNewsSDJHVZHUHDOVRPLVFODVVLILHGDWWKH
KLJKHVWUDWHLHZLWKLQWKHradical Islamic FODVVNews
SDJHVDJDLQKDGWKHKLJKHVWQXPEHURIFRUUHFWO\FODVVLILHGSDJHV
DFURVVWKH HQWLUH VDPSOH VLPLODUWR WKH FODVVLILFDWLRQ
UHVXOWVLQWKHIRXUFODVVPRGHO0HDVXULQJWKHTXDOLW\RIWKHUXOH
IRU WKH ILYHFODVV PRGHO WKH DFWXDO radical Right SDJHV ZHUH
FODVVLILHGDW D ORZ UHFDOO UDWH RI  $WWKHRWKHU HQG RI WKH
VSHFWUXPERWKWKH SUHFLVLRQDQGUHFDOOPHDVXUHVLQGLFDWHGWKDW
radical IslamicSDJHVZHUHFODVVLILHGDWDKLJKHUUDWHRIVXFFHVV
WKDQWKH radical Right FODVVRIWKH radical IslamicSDJHV
ZHUHVXFFHVVIXOO\ FODVVLILHG DQGDFWXDOSDJHVZLWKLQ WKLVFODVV
ZHUHVXFFHVVIXOO\FODVVLILHGDWDUDWHRI6HH7DEOH
,9&21&/86,216
 :HVRXJKWWREXLOGRQDVHQWLPHQWJXLGHGZHEFUDZOHUWKDW
FRXOG DFFXUDWHO\ FODVVLI\ WKH VXEWOH \HW UDGLFDOEDVHG WH[W RQ
UDGLFDO 5LJKW DQG UDGLFDO ,VODPLF ZHESDJHV $ IHZ QRWDEOH
ILQGLQJV ZHUH SURGXFHG )LUVW FODVVLILFDWLRQ UHVXOWV IURP WKH
WZRDQGIRXUFODVVPRGHOVVXJJHVWHGWKDWWKHVHQWLPHQWJXLGHG
ZHEFUDZOHUODFNHGWKH RYHUDOO DELOLW\WRGLIIHUHQWLDWHEHWZHHQ
WKHVHQWLPHQWIRXQGRQWKHVHOHFWHGZHEVLWHVWKDWGLGDQGGLGQRW
SURPRWHUDGLFDO LGHRORJLHV 7KLVILQGLQJFRQWUDVWV WKDW RI >@
ZKHUH SURH[WUHPLVW ZHEIRUXP DQG ZHESDJH GDWD ZHUH
106
FODVVLILHGDWDKLJKUDWHRIDFFXUDF\6HFRQGWKHILYHFODVVPRGHO
VKRZHG WKDW UDGLFDO ,VODPLF SDJHV ZHUH FODVVLILHG DW D PXFK
KLJKHU UDWH RI VXFFHVV WKDQ UDGLFDO ULJKWZLQJ SDJHV 3UHYLRXV
UHVHDUFKVXJJHVWVWKDWWKHVHQWLPHQWRQ UDGLFDO 5LJKW ZHEVLWHV
HJ>@DQGUDGLFDO,VODPLFZHEVLWHVLH>@LVSUHVHQWHG
LQDVXEWOHPDQQHUERWKWRDSSHDOWRDZLGHUDXGLHQFHDQGUHFUXLW
QHZPHPEHUVIRUH[DPSOH +RZHYHURXU UHVXOWVOHQGVXSSRUW
IRU >@¶V DVVHUWLRQ WKDW ZKLOH UDGLFDO ,VODPLF VLWHV DWWHPSW WR
OHJLWLPL]HWKHLUHIIRUWVE\SUHVHQWLQJWKHPVHOYHVDVQHZVVRXUFH
ZHEVLWHVWKHVHQWLPHQWLVDOPRVWDOZD\VUHODWHGWRUDGLFDOWRSLFV
LH GLVFXVVLRQV RI YLROHQFH 7KDW VDLG D PRUH LQGHSWK
DQDO\VLVLV QHHGHGWR DVVHVVZKHWKHUWKHUHLVDFOHDUGLVWLQFWLRQ
EHWZHHQDQH[WUHPLVWZHEVLWHIHDWXULQJHOXVLYHPHVVDJHVDQGD
QHZVVLWH7KLVSUREOHPFRXOGEHH[SORUHGXVLQJRWKHUWRROVRI
FODVVLILFDWLRQVXFK DV UDQGRPIRUHVWVRU%D\HVLDQ PHWKRGV WR
VXSSRUW YHFWRU PDFKLQHV DQG QHXUDO QHWZRUNV )XWXUH VWXGLHV
VKRXOG DOVR LQWHJUDWH D TXDOLWDWLYH XQGHUVWDQGLQJ RI KRZ
PDFKLQHOHDUQLQJWRROVPDNHGHFLVLRQVDERXWWKHZHESDJHVWKDW
DUHYLVLWHG 'RLQJ VRPD\LQFUHDVH WKHUHOLDELOLW\RIWKH UHVXOWV
DQGLQFUHDVHWKHOLNHOLKRRGRILGHQWLI\LQJUDGLFDOWH[WRQOLQH
95()(5(1&(6
 / %DFN ³$U\DQV 5HDGLQJ $GRUQR &\EHU&XOWXUH DQG 7ZHQW\)LUVW
&HQWXU\ 5DFLVP´LQ Ethnic and Racial Studies, 25  SS

 /%RZPDQ*ULHYH³$QWLDERUWLRQ([WUHPLVP2QOLQH´LQFirst Monday,
14
 0&DLDQL 'GHOOD 3RUWD & :DJHPDQQMobilizing on the Extreme
Right: Germany, Italy, and the United States2[IRUG2[IRUG8QLYHUVLW\
3UHVV
 + &KHQ Dark Web: Exploring and Data Mining the Dark Side of the
Web1HZ<RUN6SULQJH
 .&RKHQ) -RKDQVVRQ/.DDWL-& 0RUN³'HWHFWLQJ/LQJXLVWLF
0DUNHUV IRU 5DGLFDO 9LROHQFH LQ 6RFLDO 0HGLD´ LQ Terrorism and
Political Violence, 26SS
 5)HOGPDQ³7HFKQLTXHV DQG$SSOLFDWLRQVIRU 6HQWLPHQW$QDO\VLV´LQ
Communications of the ACMSS
 0 +DOO (  )UDQN + *HRIIUH\ % 3IDKULQJHU 3  5HXWHPDQQ  ,
:LWWHQ ³7KH :(.$ 'DWD 0LQLQJ 6RIWZDUH$Q 8SGDWH´ LQ SIGKDD
Explorations, 11SS
 0 +X  / %LQJ ³0LQLQJ DQG 6XPPDUL]LQJ &XVWRPHU 5HYLHZV´ LQ
Proceedings of ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining
 ,QWHUQHW:RUOG 6WDWVInternet Growth Statistics 5HWULHYHG IURP
KWWSZZZLQWHUQHWZRUOGVWDWVFRPHPDUNHWLQJKWP
 -0HL5)UDQN ³6HQWLPHQW&UDZOLQJ([WUHPLVW &RQWHQW&ROOHFWLRQ
WKURXJKD6HQWLPHQW$QDO\VLV*XLGHG :HE&UDZOHU´LQProceedings of
the International Conference on Advances in Social Networks Analysis
and Mining
 06DJHPDQ³7KH6WDJQDWLRQLQ7HUURULVP5HVHDUFK´LQTerrorism and
Political ViolenceSS
 5 6FULYHQV *  'DYLHV 5  )UDQN  - 0HL ³6HQWLPHQWEDVHG
,GHQWLILFDWLRQRI5DGLFDO $XWKRUV 6,5$´ LQ Proceedings of the 2015
IEEE ICDM Workshop on Intelligence and Security Informatics
 36HLE'0-DQEHNGlobal Terrorism and New Media: The Post-Al
Qaeda Generation/RQGRQ5RXWOHGJH
 0 7KHOZDOO  . %XFNOH\ ³7RSLFEDVHG 6HQWLPHQW $QDO\VLVIRU WKH
6RFLDO:HE7KH 5ROHRI 0RRGDQG,VVXHUHODWHG:RUGV´LQ Journal of
the American Society for Information Science and Technology 
SS
 77KHW- 1D&.KRR³$VSHFWEDVHG 6HQWLPHQW$QDO\VLVRI0RYLH
5HYLHZV RQ 'LVFXVVLRQ %RDUGV´ LQ Journal of Information Science
SS
 <7VIDWL*:HLPDQQ³ZZZWHUURULVPFRP7HUURURQWKH,QWHUQHW´
LQStudies in Conflict & Terrorism, 25SS
 3UHGLFWHG3DJHV  

Radical RightDQGRadical
Islamic
Anti-extremist, News DQG
Other
3UHFLVLRQ 5HFDOO
$FWXDO3DJHV
Radical RightDQG
Radical Islamic   
Anti-extremist,
News DQG Other   
7DEOH
±
&RQIXVLRQ0DWUL[IURPWKH-7UHH$QDO\VLVIRUWKH7ZR&ODVV0RGHO
 3UHGLFWHG3DJHV  

Radical RightDQG
Radical IslamicAnti-extremist News Other 3UHFLVLRQ 5HFDOO
$FWXDO3DJHV
Radical RightDQG
Radical Islamic     
Anti-extremist      
News      
Other      
7DEOH
±
&RQIXVLRQ0DWUL[IURPWKH-7UHH$QDO\VLVIRUWKH)RX
U
&ODVV0RGHO
 3UHGLFWHG3DJHV  

Radical
Right
Radical
Islamic Anti-extremist News Other 3UHFLVLRQ 5HFDOO
$FWXDO3DJHV
Radical Right       
Radical Islamic       
Anti-extremist       
News       
Other       
7DEOH±&RQIXVLRQ0DWUL[IURPWKH-7UHH$QDO\VLVIRUWKH)LYH&ODVV0RGHO
107
... at the iCCrC, we focus primarily on text-based extremist content that has radical right-wing or jihadi leanings. for the former, radical right-wing material is characterized by racially, ethnically and sexually defined nationalism, which is typically framed in terms of white power and grounded in xenophobic and exclusionary understandings of the perceived threats posed by such groups as non-whites, Jews, immigrants, homosexuals, and feminists (see perry & Scrivens, 2016). for the latter, we define jihadi material as supportive of the creation of an expansionist islamic state or khalifa, the imposition of sharia law with violent jihad as a central component, and the use of local, national, and international grievances affecting muslims (see moghadam, 2008). ...
... the idea of this approach is based on a combination of the work associated with the Dark web project at the University of arizona (see Chen, 2012) and a previous project at the iCCrC that identified and explored online child exploitation websites (e.g., allsup, thomas, monk, frank, Joffres, Bouchard, frank, & westlake, 2011;frank, westlake, & Bouchard, 2010;monk, allsup, & frank, 2015;westlake & Bouchard, 2015;. tDC has since demonstrated its benefit in investigating online networks and communities in general (e.g., frank, macdonald, & monk, 2016;macdonald & frank, 2016, 2017macdonald, frank, mei, & monk, 2015;mikhaylov & frank, 2016Zulkarnine, frank, monk, mitchell, & Davies, 2016) and extremist content online in particular (e.g., Bouchard et al., 2014;Davies et al., 2015;frank et al., 2015;levey, Bouchard, hashimi, monk, & frank, 2016;mei & frank, 2015;Scrivens et al., 2017;Scrivens, Davies, & frank, 2018;Scrivens & frank, 2016;wong, frank, & allsup, 2015). tDC is a system that can be distributed across multiple virtual machines, depending on the number of machines that are available. ...
... sENtIMENt ANALysIs the use of keywords presents a useful first step in identifying large-scale patterns in extremist content online (e.g., Chalothorn & ellman, 2012;Bouchard et al., 2014;Davies et al., 2015;wong et al., 2015). however, the use of single keywords may lead to misleading interpretations of content (mei & frank, 2015;Scrivens & frank, 2016). if, for example, on a particular webpage, the words gun and control are found within close proximity of each other, it might be concluded that the page is discussing gun control. ...
Chapter
Full-text available
Purpose – This chapter examines how sentiment analysis and web-crawling technology can be used to conduct large-scale data analyses of extremist content online. Methods/approach – The authors describe a customized web-crawler that was developed for the purpose of collecting, classifying, and interpreting extremist content online and on a large scale, followed by an overview of a relatively novel machine learning tool, sentiment analysis, which has sparked the interest of some researchers in the field of terrorism and extremism studies. The authors conclude with a discussion of what they believe is the future applicability of sentiment analysis within the online political violence research domain. Findings – In order to gain a broader understanding of online extremism, or to improve the means by which researchers and practitioners “search for a needle in a haystack,” the authors recommend that social scientists continue to collaborate with computer scientists, combining sentiment analysis software with other classification tools and research methods, as well as validate sentiment analysis programs and adapt sentiment analysis software to new and evolving radical online spaces.
... SentiStreght can report binary (positive vs negative), trinary (positive/negative/neutral) and single scale (-4 to +4) sentiment results. From the reviewed articles, it was the most commonly used tool to determine sentiment [111,109,105,61,103,104,102,89,110]. ...
... It supports different NLP tasks, providing several options to analyse texts. Four articles used OpenNLP on the review [110,104,109,111]. ...
Preprint
Full-text available
Extremism research has grown as an open problem for several countries during recent years, especially due to the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. Natural Language Processing (NLP) represents a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by this groups, with the final objective of detecting and preventing its spread. This survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a description and comparison of the frequently used NLP techniques, how they were applied, the insights they provided, the most frequently used NLP software tools and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested.
... In the context of public safety and national security, ML algorithms can help law enforcement agencies to proactively identify potential threats by detecting extremist narratives or recruitment attempts within the vast amount of social media content [41]. On a similar note, social media companies can leverage these technologies to maintain community standards, flagging and removing harmful content, thereby preserving the platform's integrity and ensuring the safety of their users. ...
... In [27], a sentiment analysis tool and a decision tree are used to differentiate pro-extremist web pages from anti-extremist pages, news pages, and pages that did not relate to extremism. ...
Article
Full-text available
In this research paper, we propose a corpus for the task of detecting religious extremism in social networks and open sources and compare various machine learning algorithms for the binary classification problem using a previously created corpus, thereby checking whether it is possible to detect extremist messages in the Kazakh language. To do this, the authors trained models using six classic machine-learning algorithms such as Support Vector Machine, Decision Tree, Random Forest, K Nearest Neighbors, Naive Bayes, and Logistic Regression. To increase the accuracy of detecting extremist texts, we used various characteristics such as Statistical Features, TF-IDF, POS, LIWC, and applied oversampling and undersampling techniques to handle imbalanced data. As a result, we achieved 98% accuracy in detecting religious extremism in Kazakh texts for the collected dataset. Testing the developed machine learning models in various databases that are often found in everyday life “Jokes”, “News”, “Toxic content”, “Spam”, “Advertising” has also shown high rates of extremism detection.
... Their experiment was done on 768 randomly selected trending topics with over 18 classes and it gave an accuracy of 65% and 70% using the text-based and network-based classification models respectively. 49 Khan et al. (2011) used a rule-based domain-independent to conduct a sentence-level sentiment classification. Sentences were categorized first into subjective and objective sentences and the sentiment score was then calculated using SentiWord Net. ...
Article
The social media space has evolved into a large labyrinth of information exchange platform and due to the growth in the adoption of different social media platforms; there has been an increasing wave of interests in sentiment analysis as a paradigm for the mining and analysis of users' opinions and sentiments based on their posts. In this paper, we present a review of contextual sentiment analysis on social media entries with a specific focus on Twitter. The sentimental analysis consists of two broad approaches which are machine learning which uses classification techniques to classify text, and is further categorized into supervised learning and unsupervised learning; and the lexicon-based approach which uses a dictionary without using any test or training dataset, unlike the machine learning approach. The paper explores generic application areas including product/services analysis and security/terrorism investigations.
Article
Social media platforms provide effective mediums for expressing opinions and thoughts on several topics openly. This does protect our right to freedom of speech, however the enormous reach of social media makes it a potential tool for widespread radicalization among the youth, irrespective of the geographical and demographical boundaries. This necessitates the need to effectively identify the content which is a source of mass online radicalization. In order to curb the propagation, security agencies need an automatic radicalization detection mechanism for mining the huge volumes of social media content. In this article, we propose an approach for detecting online radicalized accounts and quantifying the degree to which these user accounts are propagating radical content. We propose to use three novel features, i.e., Similarity to domain, presence of radical content and sentiment to calculate the radicalness score for each online user. Our algorithm uses a CNN-LSTM-based technique to effectively differentiate between radical/non-radical content with an accuracy of 93%. Our empirical results show that radicalness scores for known radicalized websites are higher as compared to the non-radical users. We believe this is a first ever attempt at quantifying the level of radicalization of users using scientific methods which can be very helpful to national security agencies in tracking suspicious online users and stop the spread of anti-national content on social media.
Article
Full-text available
Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification applications, and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested towards stimulating further research in this exciting research area.
Article
Full-text available
In recent years, researchers have shown a vested interest in developing advanced information technologies, machine-learning algorithms, and risk-assessment tools to detect and analyze radical content online, with increased attention on identifying violent extremists or measuring digital pathways of violent radicalization. Yet overlooked in this evolving space has been a systematic examination of what constitutes radical posting behaviors in general. This study uses a sentiment analysis-based algorithm that adapts criminal career measures – and is guided by communication research on social influence – to develop and describe three radical posting behaviors (high-intensity, high-frequency, and high-duration) found on a sub-forum of the most conspicuous right-wing extremist forum. The results highlight the multi-dimensional nature of radical right-wing posting behaviors, many of which may inform future risk factor frameworks used by law enforcement and intelligence agencies to identify credible threats online.
Conference Paper
Full-text available
As the data generated on the internet exponentially increases, developing guided data collection methods become more and more essential to the research process. This paper proposes an approach to building a self-guiding web-crawler to collect data specifically from extremist websites. The guidance component of the web-crawler is achieved through the use of sentiment-based classification rules which allow the crawler to make decisions on the content of the webpage it downloads. First, content from 2,500 webpages was collected for each of the four different sentiment-based classes: pro-extremist websites, anti-extremist websites, neutral news sites discussing extremism and finally sites with no discussion of extremism. Then parts of speech tagging was used to find the most frequent keywords in these pages. Utilizing sentiment software in conjunction with classification software a decision tree that could effectively discern which class a particular page would fall into was generated. The resulting tree showed an 80% success rate on differentiating between the four classes and a 92% success rate at classifying specifically extremist pages. This decision tree was then applied to a randomly selected sample of pages for each class. The results from the secondary test showed similar results to the primary test and hold promise for future studies using this framework.
Article
Full-text available
This book analyses the actions, networks and frames of right wing extremism. If research on extreme right political parties is growing, the extreme right has however only very rarely been studied as a social movement. To fill this gap, this volume compares the extreme right in Italy, Germany and the United States using some main concepts and methods developed in social movement studies. In particular, it describes the discourse, repertoires and organizational structures of the extreme right, and explains it on the basis of the discoursive and political opportunities and resources available to them. A combination of empirical methods is used in order to collect and analyse data on the extreme right organizations. The frame analysis looks at the cognitive mechanisms that are relevant in influencing organizational and individual behaviour. The network analysis looks at the (inter-) organizational structural characteristics of the right-wing organizations. Finally, the protest event analysis allows for an empirical summary of the actions undertaken by right-wing extremists over the last decade. The substantive chapters address the organizational structure of the extreme right, the action repertoires of the extreme right as well as the framing concerning, respectively, the definition of the 'us', the struggle against modernity, old and new forms of racism, opposition to globalization and populism. Finally, in the conclusions, the authors reflect on the contributions that social movement studies give to the understanding of the phenomenon, as well as, vice-versa, how research on the extreme rights could contribute to the theorization on social movements' dynamics. © Manuela Caiani, Donatella della Porta, and Claudius Wagemann 2012.
Article
Full-text available
Lone-wolf terrorism is a threat to the security of modern society, as was tragically shown in Norway on July 22, 2011, when Anders Behring Breivik carried out two terrorist attacks that resulted in a total of 77 deaths. Since lone wolves are acting on their own, information about them cannot be collected using traditional police methods such as infiltration or wiretapping. One way to attempt to discover them before it is too late is to search for various “weak signals” on the Internet, such as digital traces left in extremist web forums. With the right tools and techniques, such traces can be collected and analyzed. In this work, we focus on tools and techniques that can be used to detect weak signals in the form of linguistic markers for potential lone wolf terrorism.
Article
Full-text available
The nature of the Internet--the ease of access, the chaotic structure, the anonymity, and the international character--all furnish terrorist organizations with an easy and effective arena for action. The present research focuses on the use of the Internet by modern terrorist organizations and attempts to describe the uses terrorist organizations make of this new communication technology. Is the use of the Internet by terrorists different from that of other, "conventional" means of communication? How can governments respond to this new challenge? The population examined in this study is defined as the Internet sites of terrorist movements as found by a systematic search of the Internet, using various search engines. The sites were subjected to a qualitative content analysis, focusing on their rhetorical structures, symbols, persuasive appeals, and communication tactics. The study reveals differences and similarities between terrorist rhetoric online and in the conventional media.
Book
Global Terrorism and New Media carefully examines the content of terrorist websites and extremist television programming to provide a comprehensive look at how terrorist groups use new media today. Based partly on a content analysis of discussion boards and forums, the authors share their findings on how terrorism 1.0 is migrating to 2.0 where the interactive nature of new media is used to build virtual organization and community. Although the creative use of social networking tools such as Facebook may advance the reach of terrorist groups, the impact of their use of new media remains uncertain. The book pays particular attention to terrorist media efforts directed at women and children, which are evidence of the long-term strategy that some terrorist organizations have adopted, and the relationship between terrorists' media presence and actual terrorist activity. This volume also looks at the future of terrorism online and analyzes lessons learned from counterterrorism strategies. This book will be of much interest to students of terrorism studies, media and communication studies, security studies and political science.
Article
Despite over a decade of government funding and thousands of newcomers to the field of terrorist research, we are no closer to answering the simple question of “What leads a person to turn to political violence?” The state of stagnation with respect to this issue is partly due to the government strategy of funding research without sharing the necessary primary source information with academia, which has created an unbridgeable gap between academia and the intelligence community. This has led to an explosion of speculations with little empirical grounding in academia, which has the methodological skills but lacks data for a major breakthrough. Most of the advances in the field have come from historical archival research and analysis of a few field interviews. Nor has the intelligence community been able to achieve any breakthrough because of the structure and dynamic of this community and its lack of methodological rigor. This prevents creative analysis of terrorism protected from political concerns. The solution to this stagnation is to make non-sensitive data available to academia and to structure more effective discourse between the academic and intelligence communities in order to benefit from the complementary strengths in these two communities.
Article
General sentiment analysis for the social web has become increasingly useful for shedding light on the role of emotion in online communication and offline events in both academic research and data journalism. Nevertheless, existing general-purpose social web sentiment analysis algorithms may not be optimal for texts focussed around specific topics. This article introduces 2 new methods, mood setting and lexicon extension, to improve the accuracy of topic-specific lexical sentiment strength detection for the social web. Mood setting allows the topic mood to determine the default polarity for ostensibly neutral expressive text. Topic-specific lexicon extension involves adding topic-specific words to the default general sentiment lexicon. Experiments with 8 data sets show that both methods can improve sentiment analysis performance in corpora and are recommended when the topic focus is tightest.
Article
The main applications and challenges of one of the hottest research areas in computer science.
Article
This article examines the ways in which digital technology is being used in contemporary forms of racist culture within white nationalist movements. It argues that new types of racist culture are made possible in cyberspace. This both challenges popular conceptions of what 'The Racist' is supposed to look like and points the ways in which technological innovation is reinvigorating anti-Semitism and racisms that work in and through the boundaries of nation-states. It is argued that it is possible to situate racism and white nationalism at the centre of the so-called postmodern condition.