Content uploaded by Daniel Malone
Author content
All content in this area was uploaded by Daniel Malone on Jul 13, 2020
Content may be subject to copyright.
Developing a complex query
to build a specialised corpus:
Reducing the issue of
polysemous query terms
Aim
dual query-term
groups
Query Term
Relevance !"#$
%%
2
Outline
&'(
&(%!
&)*!"
&!%+
&)'"
&!"# #!"#$
&!"#!%
+
&#,
3
Case study
Task-
.*/
0'12
"-%
.3444%456
Aims of the project-
,
7%,
.
4
Developing a
Specialised-
corpus Query
Specialised corpora
&"7838
7 449$
Precision
&1:-
;<, <$
;=:
Recall
&:
%
Text Relevance = Query Relevance
+:44>$
5
Developing a
Specialised-
corpus Query The Query
?:
73
33
:@ +:44>-9$
Working De#nition
&?":single3
ideologically motivated perpetrator
,A,
3,acts without any direct
support 3
7terrorist<3,
:
@ .<<B#4C$
6
Issue of
Polysemous
Query Terms
Dictionary De#nition
&Lone wolf&n. D DU.S.$D
gurative3(a),7
,3<
ED(b),
EDattributive ,
$Dv.Dintransitive33
,<333 =
=45$
&Lone wolf3
7
Issue of
Polysemous
Query Terms
Problem
*
7
%
Example Domain: Sport
&"-Lone wolf
:.<
&.-2"Lone WolfF#
<F<
&FLone wolfFG:
.*::
&HIF(LONE WOLF ."2
(=1I")I*'H(
8
Dual Query-
term Groups
Solution to Polysemy Issue
&3
:.=#
; lone wolf =#lone actor =#J$
&%
9
Group A
lone wolf
lone wolf =# lone actor =#…)
I
Group B
terror*
terrorism=#extremism =#J$
Identifying
Candidate
Terms
1. Academic Literature
2. Pilot study
&*!:
&(-lone-wolf I
terror*.
&I,K(33
+3
&"4LK4> <
$
&LM34L4M<
10
Identifying
Candidate
Terms Keyness Analysis of Pilot Corpus
&(<3""C
&7C44<,44
%,
;1,3/3373
/3)3<3::3
N3<3::3
333N3
3,3
)3<333%,
3<33
33:33
7)3%)3
)
Concordance Check
&7:<%8
lone.
;(33
11
Identifying
Candidate
Terms Binomials
&(,,
<
&8<
7
12
Modi#ers Nouns
< %O
< <
::
% 7 ::
% P OO
8 <OO
OO
Query Term
Relevance
(QTR) Measure
What?
&(
&'<
7:
7
&.
Purpose?
&
:
+:44$
13
Query Term
Relevance
(QTR) Measure Formula
&!"#-4K
;4Q%
,
'!"$7
;Q,%
,
+:44>$
14
!"#
!"#
QTR Baseline
Baseline for Dual Query-term
Groups
&?!"#
-:
.$@ +:444$
&(:
:'!"
15
Group A Baseline
lone wolfI
terror*$
"
lone-wolf$
!"#
2732 MC9 494C
Group B Baseline
terror* Ilone
wolf$
"
terror*$
!"#
2732 9C 444M
Relative Query
Term Relevance
(RQTR) Candidate Terms
&'6!"#
::
&Positive:
+:44>$
16
CT Group A CT(A) &
CQ(B)
T QTR RQTR RQTRn
lone actor 299 311 0.961 58.922 90.233
lone offender 5 15 0.333 -44.900 -44.900
lone radical 0 2 0 -100 -100
Relative Query
Term Relevance
(RQTR) &#!"#
17
RQT
R
Interpretation
R44 G-always co-occurs
7,
4.-equal
relevance 6
%44 I-never co-occurs
7
CT CQ & CT T QTR RQTR RQTRn
lone actor 299 311 0.961 58.922 90.233
lone offender 5 15 0.333 -44.900 -44.900
lone radical 0 2 0 -100 -100
&#!"#
;<
78
Review
dual query-groups -
&#
,
:
&
Query Term Relevance :
%
Final Query:
&ML+ ,$
&>+. $
18
Contact
&S<
&",-STT
&*/#+-
-:<5
&%
='($::
444U
-<::
19
References
&.<<33B#3VWN 4C$1%"-8
2<DCountering Lone-Actor Terrorism Series
&3 449$'.-%
<:
-,,,:<%,)
&G,,31 44M$"X
DDiscourse in the professions: Perspectives from
corpus linguistics3D113LL
&+:3' 44>$DSelecting query terms to build a specialised corpus from a
restricted-access database.D)'N3LC%MM-:/L/L%
C%MM
&+:3' 44$'%8%
:Corpus Linguistics Advanced Research Education and Training (CLARET)
13M44-,,,C55C5
&+3* 4C$DLone-actor terrorists: A behavioural analysis#
&3(B(/3#G 4>$DThe age of lone wolf terrorismI,Y<-':
*
&H3344.)-=6HP'
$3"#:<'11-#399%>5
&== 45$Z3/Z=7*
-,,,,4559M[GQR, 9
:45$
20