Content uploaded by Ritchie Ng
Author content
All content in this area was uploaded by Ritchie Ng on Aug 31, 2021
Content may be subject to copyright.
Hybrid Learning to Rank for Financial Event Ranking
Fuli Feng12∗, Moxin Li2, Cheng Luo3, Ritchie Ng2, Tat-Seng Chua2
1Sea-NExT Joint Lab, 2National University of Singapore, 3MegaTech.AI
fulifeng93@gmail.com,limoxin@pku.edu.cn,luocheng@megatechai.com,ritchieng@u.nus.edu,dcscts@nus.edu.sg
ABSTRACT
The nancial markets are moved by events such as the issuance of
administrative orders. The participants in nancial markets (e.g.,
traders) thus pay constant attention to nancial news relevant to
the nancial asset (e.g., oil) of interest. Due to the large scale of
news stream, it is time and labor intensive to manually identify
inuential events that can move the price of the nancial asset,
pushing the nancial participants to embrace automatic nancial
event ranking, which has received relatively little scrutiny to date.
In this work, we formulate the nancial event ranking task,
which aims to score nancial news (document) according to its
inuence to the given asset (query). To solve this task, we propose
aHybrid News Ranking framework that, from the asset perspective,
evaluates the inuence of news articles by comparing their contents;
and from the event perspective, accesses the inuence over all query
assets. Moreover, we resolve the dilemma between the essential
requirement of sucient labels for training the framework and the
unaordable cost of hiring domain experts for labeling the news. In
particular, we design a cost-friendly system for news labeling that
leverages the knowledge within published nancial analyst reports.
In this way, we construct three nancial event ranking datasets.
Extensive experiments on the datasets validate the eectiveness
of the proposed framework and the rationality of solving nancial
event ranking through learning to rank.
CCS CONCEPTS
•Information systems →Document ltering
;
Information
retrieval
;
Learning to rank
;
•Computing methodologies →
Learning to rank.
KEYWORDS
learning to rank, document retrieval, nance
ACM Reference Format:
Fuli Feng, Moxin Li, Cheng Luo, Ritchie Ng, Tat-Seng Chua. 2021. Hybrid
Learning to Rank for Financial Event Ranking. In Proceedings of the 44th
International ACM SIGIR Conference on Research and Development in Infor-
mation Retrieval (SIGIR ’21), July 11–15, 2021, Virtual Event, Canada. ACM,
New York, NY, USA, 11 pages. https://doi.org/10.1145/3404835.3462969
∗Corresponding author. This research is supported by the Sea-NExT Joint Lab.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
©2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8037-9/21/07. . . $15.00
https://doi.org/10.1145/3404835.3462969
1 INTRODUCTION
The ecient market theory [
16
] states that nancial markets quickly
impound the publicly available information [
3
]. In other words, the
prices of nancial assets such as stocks and commodities
1
(see
Figure 1(a) for an example) are quickly moved by nancial events
especially the unanticipated ones such as the outbreak of infec-
tious disease [
1
], the declaration of electoral victory [
45
] and the
announcement of government intervention [48]. For instance, the
global stock markets lost about two trillion dollars in value within
the 24 hours after the declaration of Brexit results [
38
]. It is un-
doubtedly essential for nancial participants such as traders and
analysts to quickly assess and react to nancial events with po-
tential to move the asset price. Consequently, identifying events
inuential to the asset of interest has become a heavy workload that
burns tremendous time and energy of the nancial participants due
to the large volume of news stream
2
. Therefore, nancial event
3
ranking [
5
,
10
] is an emergent requirement of great practical value.
However, it has received relatively little scrutiny to date.
In this work, we formulate nancial event ranking as a learning
to rank task where the target asset is viewed as the query to retrieve
the candidate news published within a lag until the query date (see
Figure 1(b)). The key to solving this task lies in quantifying the
inuence to the query asset according to the contents of the news.
Intuitively, several widely used information retrieval techniques
can be applied to achieve the target, such as document classica-
tion [
39
], document retrieval [
25
] and news recommendation [
18
].
For instance, the document retrieval models can learn the connec-
tion between the query and news contents from labeled query-news
pairs. The models can thus emphasize news mentioning “Brazil” for
the query of ferrous since Brazil is the largest exporter of iron ore
and frequently occurs in the inuential news of ferrous. However,
the direct usage of existing methods is insucient to solve the -
nancial event ranking problem due to lacking consideration on the
properties of nancial markets. For instance, due to the connection
between “Brazil” and ferrous, the existing method will recognize
“Brazil” as a feature of inuential news. It will promotes the score of
news mentioning Brazil in queries other than ferrous such as gold
where Brazil is not a key stakeholder, leading to improper ranking
with false positive responses for the query of gold.
We argue that the key to bridging this gap lies in scrutinizing
the inuence from the perspectives of both asset and news. This
is because a news can simultaneously inuence various nancial
assets due to their connections [17, 24]. Towards this end, we pro-
pose a Hybrid News Ranking (HNR) framework, which combines
the asset and news perspectives. In particular, an inuence quan-
tication module evaluates the inuence of nancial events from
1Note that the nancial asset of a commodity is the corresponding future contract.
2https://blog.gdeltproject.org/the-datasets- of-gdelt- as-of- february-2016/.
3
We interchangeably use news and event which refer to the textual content of the
news article or the textual description of the event.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
233
Figure 1: Illustration of (a) the inuence of nancial events on the price movement of an example asset; and (b) the nancial
event ranking task. The candlesticks and bars represent the daily price movement and trading volume, respectively. The blue
curve is the Dow Jones Industrial Average (DJI) index, which reects the trend of US stock market. Better viewed in colors.
the asset perspective by comparing their contents. From the query
perspective, an inuence allocation module accesses the inuence
of a news across assets. The inuence allocation module will align
the inuence scores of a news on the homogeneous queries (e.g.,
ferrous and coal) and distance the inuence scores on the hetero-
geneous queries (e.g., base metal and coal), which provides clues
for eliminating the false positive responses. To ingeniously use
such clues and combine the two perspectives, an inuence mixer is
carefully devised to learn integration strategies.
Labeled data are indispensable for the training of HNR, which
largely rely on deep neural networks to encode the query and
news [
25
,
39
]. However, it is extremely resource consuming to label
the inuence of nance events due to the large number of candidates
and the reliance on experienced but expensive nancial experts.
To resolve the dilemma, we build up a labeling system to identify
the positive news for each asset from the corresponding analyst
reports, which are published periodically (e.g., daily) and written
by domain experts. In particular, the system consists of: mention
extraction which extracts events mentioned in the analyst report
and mention-news matching which matches the extracted event
with news reporting the event. As both stages can be accomplished
under the assistance of automatic algorithms or common crowd
workers, we largely reduce the cost and construct three large-scale
datasets. Extensive experiments on the three datasets validate the
eectiveness of the proposed HNR and the rationality of solving
the nancial event ranking via learning to rank. The datasets and
code are released at: https://github.com/fulifeng/Financial_Event_
Ranking.
The main contributions are summarized as follow:
•
We formulate the problem of nancial event ranking and propose
aHybrid News Ranking framework.
•
We build up a cost friendly system to label positive news and
construct three datasets for nancial news ranking.
•
We conduct extensive experiments that validate the rationality
and eectiveness of our proposal.
2 PROBLEM FORMULATION
To achieve nancial event ranking, the target is to learn a scoring
function
ˆ
y=f(D,Q|Θ)
, which predicts the inuence of a candidate
news
D
to a query
Q
.
Θ
denotes the parameters of the function
to be learned.
D
is a list of word IDs that encodes the contents of
the news.
Q
is also a list of word IDs that corresponds to the name
of the query asset, e.g., “base metal”
4
. The nancial event ranking
task is dierent from the conventional document retrieval [
25
] for
the following reasons: 1) Queries with the same content (e.g., “base
metal”) at dierent time-steps are viewed as dierent queries, but
belong to the same type. 2) The candidates to be ranked at dierent
time-steps do not overlap with each other. 3) Our problem has xed
types of query instead of unlimited query according to content. We
denote the query types as a set Qwhere |Q| is the set size.
Training.
The scoring function should identify the key patterns
of news contents that can distinguish inuential news from the
common one in a query specic manner. A promising solution
to is to learn from labeled historical queries. We thus employ the
supervised learning paradigm to optimize the parameters of the
scoring function, which is formulated as:
ˆ
Θ=min
ΘÕ
(<Q,D>,y) ∈L
l(y,ˆ
y)+α∥Θ∥.(1)
L
denotes a set of labeled query-news pair where
y=
1for inuen-
tial news and
y=
0for negative samples randomly selected from
the remaining candidates. l(·) is a loss function such as the binary
cross-entropy loss. αadjusts the strength of regularization.
4
Technically, base metal includes four commodities: lead, copper, nickel, and zinc. In
this work, we do not view a specic one of the four commodities as query since they
are typically discussed and analyzed as a group. Note that we can easily generalize to
specic commodity with its name as the query (e.g., “copper”).
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
234
Figure 2: Illustration of the proposed HNR framework where queries in the same color (e.g., Q1
tand Q2
t) are homogeneous ones,
and queries in dierent colors (e.g., Q2
tand Q3
t) are heterogeneous ones.
Serving.
As shown in Figure 1(b), the learned scoring function
can serve for each query
Qt
at any time-step
t
. We add the subscript
t
to distinguish queries at dierent time-steps. Note that the query
types at dierent time-steps (e.g.,
Qt−1
and
Qt
) are equal. Let
Dt
denotes the candidate set at time-step
t
, which consists of news
published within a lag (e.g.,
[t−l,t]
). Note that we remove duplicate
news reporting the same event through SimHash [
44
] to reduce
the size of the candidate set. Nevertheless, we still need to score
thousands of news to select the top-
K
most inuential ones for each
query. Formally, the serving phase for query Qtis:
sort {ˆ
yi
t=f(Di
t,Qt|ˆ
Θ)|Di
t∈ Dt},(2)
where
sort(·)
denotes a function that sorts the candidate news in a
descending order. Moreover, considering that the query set is xed,
the serving phase is indeed performing one ranking for each query
at the start of the time-step and serve all the coming queries.
3 METHODOLOGY
In this section, we introduce the proposed HNR framework. As
shown in Figure 2, it consists of three modules: inuence quantica-
tion module (Section 3.1), inuence allocation module (Section 3.2),
and inuence mixer (Section 3.3).
3.1 Inuence Quantication Module
Our main consideration for devising the inuence quantication
module is to meticulously assessing the inuence of a news on
a given query according to their contents. To achieve the target,
the key lies in mining the connections between the query descrip-
tion and the news content, which is coherent with the target of
document retrieval [
7
,
25
,
33
]. As such, we devise the inuence
quantication module as a document retrieval model where the in-
put is a concatenation of the query and the candidate news
[Qt,Di
t]
and the output is the inuence prediction, i.e., the probability that
the Di
tis an inuential event of Qt. Formally,
¯
yi
t=fq[Qt,Di
t]|Θq,(3)
where
Θq
denotes the model parameters to be learned. Inspired
by the huge success of pre-trained language model in document
retrieval [
2
,
9
,
25
,
31
–
33
], we devise
fq(·)
based on a deep Trans-
former such as BERT [
13
,
26
], XLNet [
54
] or RoBERTa [
27
], which
is pre-trained over a large-scale corpus in a self-supervised manner
to encode the co-occurrence of words.
In particular, we follow the next sentence prediction paradigm [
33
]
to format the query-news pair as
[CLS,Qt,S EP ,Di
t]
. As shown in
Figure 3, the query and the candidate news are concatenated with
a
[CLS]
token at the beginning and a
[SEP ]
token for separation.
Figure 3: Illustration of the inuence quantication module.
After passing through the deep Transformer, each token obtains a
representation
h
which encodes the textual patterns. The represen-
tation of the
[CLS]
token
h[CLS]
is passed through a fully connected
(FC) layer to estimate the probability that Di
tis a positive news of
query Qt. Formally,
¯
yi
t=fq[CLS,Qt,S EP ,Di
t]|Θq.(4)
We can rank the news for query
Qt
according to the prediction
¯
yi
t
.
3.2 Inuence Allocation Module
Financial researches [
17
,
24
] have demonstrated the coupling eects
across dierent assets. As such, the inuence of a nancial event
on dierent assets is linked to each other rather than independent.
We thus further devise an inuence allocation module to evaluate
the inuence from the query perspective. Our main consideration
of the module is to account for the connections of the queries in
the evaluation of news inuence. To achieve the target, the module
is expected to consider the whole query set
Qt
when evaluating
the inuence of a news
5
. We thus devise the inuence allocation
module as a
|Q |
-way classication module, which is formulated as:
˜
yi
t=fn(Di
t|Θn),(5)
where
˜
yi
t∈ R | Q |
denotes the inuence predictions over the query
set. In this way, the news inuence is evaluated by the comparison
across the queries. Again, inspired by the success of pre-trained
language models in text classication, we also devise the inuence
allocation module based on Transformer. In particular, the input
is formatted as
[CLS,Di
t]
, and the prediction is made through a FC
layer from the representation of the [CLS ]token.
It should be noted that the FC layer is parameterized by a map-
ping matrix
W∈ R | Q | ×H
where
H
denotes the dimensionality of
the latent representation. In this way, there are separate parameters
5Note that the types of potential queries at dierent time-steps is always | Q |.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
235
for the inuence evaluation of each query, i.e., each row of
W
corre-
sponds to a query. Accordingly, queries with close parameters will
be allocated to similar inuence scores and vise versa. Undoubtedly,
the parameters of homogeneous queries will be pushed to be close
to each other since the queries are inuenced by similar news even
the same news sometimes. The heterogeneous queries will thus
obtain parameters with relatively large distance. In this way, the
inuence allocation module is able to account for the homogeneous
and heterogeneous relations between queries (cf. Figure 5).
Both the inuence quantication module and the inuence al-
location module have pros and cons. In the quantication module
(Equation 4), the same classication parameters are shared across
all queries, which faces the spurious correlation issue [
46
]. The
positive news of a query will be assigned an exaggerated score on
a heterogeneous query where the news is insucient to inuence.
The allocation module can break some spurious correlations across
queries owing to the query-specic parameters and the considera-
tion of query relations. However, the allocation module ignores the
query contents and may suer from information loss.
3.3 Inuence Mixer
A natural way to leverage the advantages of both modules is to
aggregate their ranking scores. Inspired by the RUBi function [
4
],
which is widely used for aggregating predictions, a straightforward
mixer is formulated as:
ˆ
yi
t=¯
yi
t⊙power (˜
yi
t,λ),(6)
where
ˆ
yi
t∈ R | Q |
is the nal prediction of news
Di
t
over all the
queries; and
¯
yi
t∈ R | Q |
denotes the scores from the quantication
module where we gather the output of Equation 4 across all queries.
⊙
denotes the element-wise product and
power (·)
is a element-wise
power function.
λ
is a hyper-parameter to balance the contribution
of the two modules.
While the simple solution can achieve the target of combinining
the two perspectives, we consider that the mixer should account for
the inter-module connections and inter-query connections within
the scores. Inspired by the success of Convolutional Neural Network
(CNN) in recognizing local-region patterns, we further devise the
inuence mixer module as a CNN, which is formulated as:
ˆ
yi
t=CNN(¯
yi
t,˜
yi
t).(7)
In particular, the CNN consists of a stack layer, column convolution
layer, row convolution layer, and FC layer.
•Stack layer.
The stack layer stacks the outputs of the two mod-
ules as a matrix
Yi
t=[¯
yi
t,˜
yi
t] ∈ R2× | Q |
, which can facilitate
observing the local-region patterns.
•Column convolution layer.
The column convolution layer con-
sists of 1D vertical lters to learn the rules for combining the
two predictions. Formally,
Ci
t=<Fc,Yi
t>, (8)
where
Fc∈ RK×2
denotes the lters of the convolution layer
and
Ci
t∈ RK× | Q |
denotes the recognized signals.
K
is a hyper-
parameter to adjust the number of lters.
•Row convolution layer.
Similarly, this layer consists of 1D
horizontal lters to recognize the inter-query patterns, which is
formulated as,
Ri
t=<Fr,Ci
t>, (9)
where
Fr∈ RM× | Q |
are the
M
lters and
Ri
t∈ RK×M
represents
the recognized signals.
•FC layer.
The FC layer makes the nal prediction to combine
the recognized signals, which is formulated as,
ˆ
yi
t=atten(Ri
t)W+b,(10)
where
W∈ R(K∗M)× | Q |
and
b∈ R | Q |
are parameters to be
learned. atten(·) atten a matrix as a vector.
In Figure 2(c), we depict the process of the CNN inuence mixer
with a simple example.
A key consideration for the training of the proposed HNR frame-
work is the saving of memory and computation cost. Since both
the quantication module and the allocation module consists of
deep Transformer, learning the parameters of three modules in
an end-to-end manner will double the memory cost, which poses
higher requirements on the infrastructure and thus constrains the
practical usage of HNR. To reduce cost, a straightforward solution is
sharing the Transformer across the two modules. Its disadvantages
are twofold: 1) the training objective consists of three components,
leading to huge overhead for hyper-parameter tuning; and 2) the
two modules are coupled, making them hardly to recognize com-
plementary signals. Another solution is to train the three modules
separately by optimizing Equation 1 over the labeled dataset
L
.
Below illustrates the training algorithm of HNR.
Algorithm 1 Training of HNR.
Input: Training data L, types of query Q.
1: Train the inuence quantication module over L;
2: Train the inuence allocation module over L;
3: Collect the inuence score from the two modules; ▷Model inference
4:
Train the inuence mixer with the collected inuence scores as features.
4 DATASETS
The key to building a HNR is the construction oof labeled datasets.
That is, from the historical candidate set
Dt
, labeling the positive
news for the queries
Qt
. The target is non-trivial to achieve because:
1) the large size of
Dt
; and 2) the reliance on experienced analyst to
evaluate the inuence. Obviously, it is critical to resolve the reliance
on domain expertise, which has indeed been encoded by the analyst
reports written by the experts. Typically, on each trading day, we
can nd an analyst report for each query asset, which summarizes
the market status and discusses the inuential events
6
. As such,
we can identify positive news for a query by extracting the events
mentioned in the corresponding analyst report. In this way, we
construct three datasets that corresponds to the metal, agriculture,
and chemical markets. As shown in Table 1, there are three types
of queries in each dataset. These datasets have 340, 581, and 200
queries, respectively, which are reasonable size compared to the
widely used relevance judgement benchmark TREC7.
6
To write the analyst reports, the domain experts manually identify inuential news.
In other words, the target of HNR is to mimic the domain experts.
7https://trec.nist.gov/data/reljudge_eng.html.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
236
Dataset #Queries Types of query #Positive news
per query
#Candidates
per query
Metal 340 Base metal,
Ferrous, Coal 4.89 2459.58
Agriculture 581 Soy oil, Cotton, Sugar 2.51 2390.89
Chemical 200 PTA, MEG, Rubber 2.53 2414.18
Table 1: Statists of the three constructed datasets.
Figure 4: Illustration of the news labeling procedure: a) ex-
tracting event mentions from the analyst reports; and b)
matching mention with candidate news.
4.1 Data Collection
Analyst reports.
We select Hongyuan Futures Co., Ltd. as our
source
8
to collect historical analyst reports since the reports are
published in plain text instead of PDF les. As the company fo-
cuses on commodities, we select three commodity markets: metal,
agriculture, and chemical, according to the popularity of reports,
i.e., the number of reads. For each market, the analyst report is
typically published on each trading day and is targeted at a group
of commodities, e.g., base metal (copper, aluminium, etc.) and fer-
rous (screw thread steel and iron ore). From each market, we select
three groups of commodity, leading to three types of query in each
dataset. The name of the commodity groups (e.g., “base metal”) are
the queries (i.e., Qt) on each trading day t.
Candidate news.
To match the language of the collected analyst
reports, we collect candidate news in Chinese from the largest portal
websites in China for both the nancial news and commodity news
9
.
We select the news posted within 48 hours before trading day
t
as
the candidate news of query
Qt
. We set 48 hours since the markets
can quickly react to the event [
3
], which means that “old” news
cannot inuence the markets anymore. In the data collection period
from 2018-09 to 2020-06, the size of the candidate set is around 2,400
for each query. In other words, given a query, our task is to select
the top-Kinuential news from thousands of candidates.
4.2 Labelling Procedure
Figure 4 illustrates the procedure to identify the positive news for
a query
Qt
from the corresponding analyst report, which consists
of two phases: mention extraction and mention-news matching.
Mention extraction.
Analyst reports typically follow the same
template for mentioning nancial events. As such, we dene a set
of rules based on the section titles and HTML styles to extract event
mentions from the collected analyst reports. Averagely, we extract
8
http://www.hongyuanqh.com/hyqhnew/hyyj/index.jsp?1=1&threeMenuid=
00020001001500020001.
9Sina: https://nance.sina.com.cn/, Chinese Finance Online: http://www.jrj.com.cn/.
the number of 4.89, 2.51, and 2.55 positive events (cf. Table 1) for the
queries in the metal, agriculture, and chemical datasets, respectively.
Note that we do not directly merge the extracted mentions into
the corresponding candidate set for two reasons: 1) it will lead to
duplicate news, leading to biased evaluation; and 2) the mentions
are typically rephrased by the analyst with linguistic properties
dierent from common news articles. Due to such discrepancy, the
model trained on mentions will fail in practical usage where the
candidates are all common news articles.
Mention-news matching.
We thus match the event mention
with its corresponding news, i.e., recognizing the news that reports
the same event as the mention within the candidate set. To con-
trol the cost and quality, we perform the matching in two steps: 1)
automatic matching, which evaluates the similarity between the
mention and each candidate news; and 2) manual checking where
crowd workers check the top-3 most similar news to identify the
positive one
10
. Note that checking on whether two pieces of text
describe the same event can be done without domain expertise,
resolving the reliance on domain experts. As to the similarity eval-
uation algorithm, we leverage the public API provided by one of
the largest search engines for Chinese news11.
In this way, we identify the positive news for more than 99.9% of
the extracted event mentions and discard the remaining cases. By
checking the contents of the identied positive news, we conrm
that these news cover a wide spectral of events aecting the supply
and demand of the commodities, such as geopolitical events, gov-
ernment policies, company announcements, and strike, indicating
the challenge of these datasets.
5 EXPERIMENTS
We conduct experiments on the three constructed datasets to an-
swer the following research questions:
RQ1:
To what extent the
learning to rank techniques solve the nancial event ranking prob-
lem?
RQ2:
How eective is the proposed HNR as compared to
existing document retrieval methods?
RQ3:
What are the factors
that inuence the eectiveness of the proposed HNR?
5.1 Experiment Settings
Evaluation protocols. We chronologically split each dataset into
training, validation, and testing with a ratio of 7:1:2. That is, the
most recent 20 percent queries are treated as testing cases. Fol-
lowing conventional document retrieval work [
25
], we adopt the
evaluation metrics of MAP, MRR (MRR1 and MRR3), and Recall
(Rec3, Rec5, and Rec10). We report the average performance over
testing queries where larger value indicates better performance.
Compared methods. We compare the proposed HNR with ad-
vanced document retrieval methods, including:
•BM25 [42]
: This method is still widely used for document re-
trieval. We use an open source implementation
12
of the method.
•ColBERT [25]
: It is a Transformer-based ranking method that
encodes the query and the document separately with the Trans-
former and scores query-document pair by the similarity (inner
product) of their representations.
10The manual checking ends at nding a positive news or the third round.
11https://news.baidu.com/.
12https://github.com/dorianbrown/rank_bm25.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
237
•Ret [33]
: It is also a Transformer-based document retrieval
method which concatenates the query and the document as input
and uses a binary classication layer to calculate the matching
score. It is the same as the inuence quantication module in the
proposed HNR.
•RetQE
: It is an extension of
Ret
equipped with query expan-
sion [
43
]. For a query
Qt
, it extracts candidate terms from the
positive news
Qt−1
, and adds the top-tanked terms into the input
query according to their cosine similarity with the query. The
expanded query is then fed into
Ret
during both model training
and testing.
We also include two advanced document classication baselines:
•ClaM [47]
: It ne-tunes the pre-trained Transformer with an
additional layer to classify the documents into dierent types
of queries. Documents are ranked by the probability over the
class corresponds to the type of the input query. It is same as the
inuence allocation module in the proposed HNR.
•ClaS [47]
: Similar to
ClaM
, this method has a classication layer
that predicts whether a document is positive or not. That is to say,
we ne-tune a pre-trained Transformer for each dataset, where
documents are ranked according to the probability given by the
Transformer.
Implementation details. In addition to BM25, the compared meth-
ods are implemented with Pytorch 1.4.0
13
based on HuggingFace’s
Transformers[
53
]. For pre-training, we use the checkpoint of Chi-
nese RoBERTa with Whole Word Masking (named chinese-roberta-
wwm-ext) released by [
8
]. For training of the inuence quantica-
tion module and the inuence allocation module (i.e., ne-tuning
RoBERTa), we set the maximum input length as 256 and update
model parameters with AdamW [
28
]. We set the gradient accumu-
lation step as 2, gradient clipping by 2.0, the number of warmup
steps as 100, the total training steps as 5,000, and the weight for
regularization term (i.e.,
α
) as 0. The learning rate and batch size
are selected according to the validation performance w.r.t. Rec10.
As to the inuence mixer, we set the coecient of RUBi (i.e.,
λ
) as
1, and tune the number of lters in CNN.
5.2 Rationality of Learning to Rank (RQ1)
To validate the rationality of formulating the nancial event ranking
as a learning to rank task, we rst test the document classication
and retrieval methods. Table 2 summarizes the ranking performance
of the compared methods on the three datasets: metal, agriculture,
and chemical. Note that Upper_Bound represents the performance
of knowing the ground truth of the test queries, which can be seen
as the performance of domain experts. From the table, we have the
following observations:
•
The performance of deep Transformer-based methods are surpris-
ingly good on the three datasets. For instance, the performance
of ColBERT on metal w.r.t. Rec10 surpasses 0.982, which is very
close to the upper bound 0.994. The result means that the nancial
participants can access more than 98% of the inuential events
by only reading 10 top-ranked news from ColBERT, which will
help the participants can to save tremendous amount of time and
13https://pytorch.org/get- started/previous-versions/#v140.
Metal
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
ClaM 0.924 0.925 0.959 0.666 0.835 0.978
ClaS 0.903 0.906 0.943 0.635 0.812 0.964
BM25 0.032 0.019 0.041 0.019 0.044 0.073
ColBERT 0.948 0.981 0.987 0.684 0.834 0.982
Ret 0.929 0.943 0.969 0.666 0.820 0.969
RetQE 0.923 0.906 0.953 0.653 0.833 0.977
Upper_Bound 1.0 1.0 1.0 0.718 0.877 0.994
Agriculture
Method MAP MRR1 MRR3 Rec3 Rec5 Rec10
ClaM 0.637 0.542 0.650 0.630 0.778 0.952
ClaS 0.550 0.402 0.545 0.554 0.729 0.878
BM25 0.059 0.028 0.056 0.059 0.075 0.123
ColBERT 0.633 0.505 0.634 0.640 0.814 0.945
Ret 0.640 0.514 0.656 0.657 0.792 0.952
RetQE 0.633 0.523 0.623 0.622 0.767 0.946
Upper_Bound 1.0 1.0 1.0 0.944 0.992 1.0
Chemical
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
ClaM 0.380 0.216 0.405 0.421 0.529 0.737
ClaS 0.525 0.486 0.563 0.414 0.636 0.814
BM25 0.077 0.0 0.018 0.029 0.089 0.291
ColBERT 0.492 0.351 0.473 0.399 0.609 0.802
Ret 0.592 0.514 0.662 0.561 0.611 0.833
RetQE 0.542 0.459 0.599 0.510 0.625 0.834
Upper_Bound 1.0 1.0 1.0 0.887 0.961 0.997
Table 2: Ranking performance of document classication
and document retrieval models on the three datasets.
eort. The results thus validate the rationality and eectiveness
of learning to rank solutions for nancial event ranking.
•
The retrieval models achieve performance that is comparable
to classication models. Across the six metrics, both types of
model achieve the best performance in some cases. For instance,
ClaM achieves the best Rec5 on the metal market, while ColBERT
achieves the best Rec5 on the agriculture market. These results
reect that both types have their pros and cons, i.e., neither
asset perspective nor the event perspective is sucient to solve
nancial event ranking. As such, it is essential to build a hybrid
solution to combine the two perspectives.
•
Among the retrieval models, a) BM25 achieves the worst perfor-
mance because it only considers the occurrence of query terms.
This result highlights the importance of understanding the news
content, which means that keyword-based ltering is not applica-
ble for nancial news. b) Ret performs better than RetQE in most
cases, which means that the benet of query expansion is limited
in the nancial event ranking problem. We postulate the reason
to be the temporal uctuation of the nancial events, i.e., positive
events across dierent time-steps are not closely connected.
•
Among the classication models, ClaM performs better on the
metal and agriculture markets, while ClaS achieves better per-
formance on the chemical market. Recall that ClaM has separate
classication parameters for each type of query while ClaS shares
all model parameters across queries. The chemical dataset is a
unbalanced dataset with very few queries of MEG. We suspect
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
238
Metal
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
Ret 0.929 0.943 0.969 0.666 0.820 0.969
ClaM 0.924 0.925 0.959 0.666 0.835 0.978
HNB_RUBi 0.940 0.962 0.978 0.670 0.835 0.982
HNB_CNN 0.944 0.962 0.978 0.670 0.837 0.988
Agriculture
Ret 0.640 0.514 0.656 0.657 0.792 0.952
ClaM 0.637 0.542 0.650 0.630 0.778 0.952
HNB_RUBi 0.643 0.523 0.646 0.644 0.802 0.960
HNB_CNN 0.650 0.551 0.673 0.675 0.806 0.945
Chemical
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
Ret 0.592 0.514 0.662 0.561 0.611 0.833
ClaM 0.380 0.216 0.405 0.421 0.529 0.737
HNR_RUBi 0.427 0.270 0.441 0.401 0.566 0.809
HNR_CNN 0.512 0.459 0.545 0.400 0.627 0.834
Table 3: Performance of HNR on the three datasets. The best
and worst performances on each dataset w.r.t. each metric
are highlighted with bold font and underline, respectively.
that the inferior performance of ClaM on the chemical market is
caused by the unbalanced dataset and the insucient training
on the rare class. Note that we need to choose between the two
classication models to build the HNR framework. In this work,
we simply select ClaM.
5.3 Eectiveness of HNR (RQ2)
We then investigate the eectiveness of the proposed HNR frame-
work. In particular, we compare the ranking performance of four
versions of the proposed HNR: 1) HNR_CNN that applies the CNN
mixer (Equation 7); 2) HNR_RUBi that applies the RUBi mixer (Equa-
tion 6); 3) HNR without the inuence allocation module (i.e., Ret);
and 4) HNR without the inuence quantication module (i.e., ClaM).
Table 3 shows the performance of the four HNR versions on the
three datasets. From the table, we have the following observations:
•
HNB_CNN performs better than HNB_RUBi in most cases, which
shows the advantages of the CNN mixer module. That is to say,
the query and news perspectives can be more accurately com-
bined by considering the local-region patterns (i.e., the inter-
module and inter-query connections).
•
In most cases, the hybrid models (i.e., HNB_RUBi and HNB_CNN)
achieve performance gain over the single models (i.e., Ret and
Clam). In particular, across the cases, the hybrid models typically
perform the best and seldom perform the worst among the four
versions, which indicates that the hybrid models successfully
leverage the pros of both the quantication and the allocation
modules. Therefore, these results validate the eectiveness and
the rationality of the hybrid learning-to-rank framework in solv-
ing nancial event ranking.
•
On the chemical dataset, as compared to Ret, HNB_CNN per-
forms slightly better w.r.t. Rec5 and Rec10, while worse w.r.t. the
remaining metrics, especially Rec3. The result means that the
inuence mixer has to sacrice the ranking at the head (e.g., top
three) to recall more positive news on the chemical dataset. We
postulate the reason to be that the allocation module (i.e., ClaM)
performs much worse than the quantication module (i.e., Ret).
That is, the performance gap forces the mixer module to sacrice
the head part. Recall that the inferior performance of ClaM might
because of the unbalance of the chemical dataset. This result
thus suggests a potential future direction to enhance the HNR
framework by eliminating the impact of data unbalance.
5.4 In-depth Analysis (RQ3)
Figure 5: Visualization of the query relations recognized by
the inuence allocation module.
Query relations. Recall that the parameters of the classication
layer in the inuence allocation module are expected to capture the
query relations, i.e., homogeneous queries obtain close parameters.
As such, we calculate the cosine similarity between each pair of
parameters and depict the similarities in Figure 5. From the gure,
we can see that: 1) in the metal market, base metal and ferrous
have the highest similarity, which are both widely used in industry
applications. Coal and ferrous also exhibit a high similarity since
coal is widely used in steel smelting. On the contrary, the smelting
of base metals relies on electricity, making their similarity with coal
very low. 2) In the agriculture market, soy oil and sugar obtains
high similarity since they are both used in the food industry. They
thus have less similarity to cotton, which is mainly used in the
textile industry. 3) In the chemical industry, MEG and PTA are
closely connected since they are both industry raw materials and
mainly used together to produce polyester. On the contrary, rubber
is another kind of industry materials typically used in dierent
applications. To summarize, the results justify that the inuence
allocation module indeed captures query relations, i.e., commodities
with larger overlap on the uses are closer in the allocation module.
Query-specic performance. We further investigate the eective-
ness of HNR in a query specic manner by comparing HNR_RUBi
and HNR_CNN with Ret, i.e., the single inuence quantication
module, over dierent queries. In particular, we select a pair of
homogeneous queries: base metal and ferrous, and a pair of het-
erogeneous queries: soy oil and cotton, according to whether the
commodities have similar uses. Figure 6 shows the performance
of the compared methods w.r.t. MAP. We omit the results w.r.t. the
other metrics for saving space, which show similar trends. From
the gures, we can observe that: 1) On the homogeneous queries,
both HNR_RUBi and HNR_CNN show clear performance gain over
Ret, which is attributed to the ability of HNR to consider query rela-
tions. That is, the evaluation of inuence to a query might facilitate
the evaluation for a homogeneous query. 2) On the heterogeneous
queries, as compared to Ret, HNR performs better on one query, but
worse on the other query. It means that the benet of HNR mainly
comes from accounting for the homogeneous query relations.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
239
(a) Homogeneous Queries (b) Heterogeneous Queries
Figure 6: Performance of HNR and the quantication mod-
ule on homogeneous queries and heterogeneous queries.
Combination strategy. We then investigate the combination strate-
gies learned by the CNN mixer by observing the changes from the
ranking of Ret to the ranking of HNB_CNN. In particular, given
a query
Qt
, we select the top-10 ranked news of Ret, extract the
corresponding ranking from HNB_CNN, and then study the posi-
tion changes of two types of news: 1)
arдmax(˜
yi
t)=Qt
, which
means the allocation module allocates the most inuence to the
target query
Qt
; and 2)
arдmax(˜
yi
t),Qt
, which means the most
inuence is allocated to query other than
Q_t
. We resort to inver-
sion number [
51
] to quantitatively analyze the position changes
of a type of news. In particular, for a type of news, we label them
with 1 and the remaining news with 0 in the rankings from Ret
and HNB_CNN, count the inversion number of each ranking, and
calculate the increase rate of the inversion number from Ret to
HNB_CNN. A positive increase rate means that the HNB_CNN
moves the selected type of news to the top positions. Figure 7 il-
lustrate the average increase rate over all testing queries in the
three datasets. From the gure, we can see that the increase rate of
the rst type (i.e.,
arдmax(˜
yi
t)=Qt
) is positive, while the increase
rate of the second type (i.e.,
arдmax(˜
yi
t),Qt
) is negative. The
result means that the CNN mixer favors news with the maximum
score from the allocation module, which is a reasonable strategy to
combine the two modules.
Case study. We then conduct a qualitative analysis on the ranking
results generated by HNR_CNN. As a reference, we compare the
retrieval results from a widely used nancial news search engine
14
.
Note that we restrict the search engine to return news from either
Sina or China Finance Online, so that the search engine has the
same candidate set as HNR_CNN for fair comparison. In particular,
we set the query content and query date as “soy oil” and 2019-09-10,
respectively. Figure 8 shows the returned top-5 news of HNR_CNN
and the search engine. Note that the query has two positive events,
which are labeled as Tin the gure. From the gure, we can see that
the top two retrieved results of the proposed HNR_RNN are exactly
the ground truth events. On the contrary, the search engine fails
to recall any positive news. Obviously, the search engine focuses
more on exact term matching between the query and the news
contents, overlooking the inuential news without explicit mention
of the query terms. To summarize, this result further validates the
eectiveness of solving nancial event ranking as a learning to
rank task through the proposed hybrid framework.
14https://news.baidu.com/.
Figure 7: The increase rate of the inversion number from the
ranking of Ret to the ranking of HNB_CNN, where the news
is assigned a binary ag 1/0 according the classication re-
sult from the inuence allocation module (i.e., arдmax(˜
yi
t)).
The blue column corresponds to assigning 1 to news satisfy-
ing arдmax(˜
yi
t)=Qt. The yellow corresponds to an opposite
operation that assigns 1 to news with arдmax(˜
yi
t),Qt.
Failure case analysis. To further shed light on the capability and
the weakness of the proposed HNR, we dive into the failure cases
of HNR_CNN on the three datasets. We dene the failure cases
in a ranking list as the negative news that is ranked before any
positive news of the query. That is to say, the failure cases are news
articles occupy opportunities of the positive news. We summarize
the properties of the failure cases as follow:
•
For the metal and chemical datasets, the failure cases are mainly
discussing the agriculture in America or Brazil, such as the pro-
duction of soybean. We suspect that such failure cases are caused
by the spurious correlations from two perspectives: 1) both Amer-
ica and Brazil are key stackholders in the metal and chemical
markets, which are frequently mentioned by the positive news
of metal and chemical queries; and 2) a large portion of positive
news in the metal and chemical datasets are about the production
or the export/import regulations, which are also the frequently
discussed topics of agriculture commodities with very similar
textual structures.
•
As to the agriculture dataset, a large portion of the failure cases
are about the current situation of COVID-19. We postulate the
reason to be that the epidemic is a key inuential factor of the
agriculture production and the global export and import. The
term “epidemic” frequently occurs in the positive news of agricul-
ture commodities. Therefore, the model lays strong attention on
this word and wrongly recognizes many epidemic-related news
as inuential news.
Given these failure cases, we believe that it is essential to further
study the spurious correlation in nancial event ranking in future.
6 RELATED WORK
Document retrieval. Neural ranking models have become promis-
ing solutions for document retrieval tasks with the help of deep neu-
ral network and continuous word representations [
40
,
41
]. After the
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
240
Figure 8: A case study showing the top-5 news titles returned by HNR_CNN and existing search engine for the query “soy oil”
on 2019-09-10. The label T/F denotes whether the news is labeled as a positive one in the dataset.
emergence of the powerful deep Transformer models, ne-tuning
the pre-trained language models on target datasets achieves state-
of-the-art performances on various natural language understanding
tasks. Owing to their extremely competitive performances, many
researchers leverage pre-trained language models, such as BERT
[
13
], for document retrieval. Nogueira and Cho
[33]
concatenate
the query sentence with the document as input, and uses BERT as a
binary classier to calculate the matching score for each candidate
document of the query, reaching state-of-the-art performance. To
achieve a trade-o between the low eciency and the outstanding
performance of BERT, ColBERT [
25
] performs a quick “late interac-
tion" over the pre-computed representations of the queries and the
documents produced by BERT, thus accelerating the ranking and
retaining a competitive result. In addition, Zhan et al
. [56]
use the
inner product of BERT pre-computed contextual embeddings as
initial retrieval score, which is the best rst-step retrieval method.
Instead of truncating the whole document, Yilmaz et al
. [55]
judge
the relevance between the query and each sentence of the document
and aggregates the sentence-level scores, which achieves state-of-
the-art performance on news retrieval test collections. Furthermore,
BERT is combined with useful retrieval techniques to pursue better
performance [
34
], such as an alternative to query expansion [
35
].
Despite the success of these Transformer-based models, they are
focused on the query perspective. This paper studies nancial news
ranking where it is critical to combine the perspectives of both
query and document.
Financial news analysis. As an importance source of nancial
information, nancial news analysis has received a surge of atten-
tion from the research community, where the focus is to predict
the price movement of assets with consideration of the relevant
nancial news [
14
,
15
,
23
,
30
,
50
]. These work mainly explores neu-
ral network architectures such as embedding [
6
,
11
,
30
], recurrent
neural network [
14
,
50
], and hierarchical attention [
23
] to facilitate
the asset price prediction. Another line of research focuses on the
sentiment analysis of nancial news [
12
,
29
,
49
,
52
] which mainly
extend the conventional techniques of sentiment analysis to be able
to capture the property of nancial news such as number intensive.
In an orthogonal direction, this work studies a new task of nancial
news analysis, i.e., nancial event ranking, which can augment the
existing tasks by selecting the inuential news as their inputs to
eliminate some potential noise. Beyond nance, event ranking has
been studied in medicine [
37
], however, focuses on the prediction
of future events, rather than retrieving the happened events.
Hybrid learning to rank. A line of research has studied hybrid
models for learning to rank tasks, which is largely focused on
personalized recommendation [
22
,
36
,
57
]. The target of recommen-
dation is to predict the user preference over items, which naturally
consists of two perspectives: the user and item perspectives. There-
fore, hybrid recommender systems are proposed to jointly consider
the two perspectives, which combines item ranking and user tar-
geting in a single framework. Despite the success of these hybrid
learning to rank in recommendation, these methods are not applica-
ble to the nancial event ranking task. This is because the existing
methods can only handle items with interaction records, whereas all
nancial news are cold-start for queries. Lastly, research on multi-
modal retrieval [
19
–
21
] can also consider dierent perspectives,
but is focused on the heterogeneity across dierent modalities.
7 CONCLUSION
In this work, we highlighted the importance of nancial event rank-
ing which is formulated as a learning to rank task. We explored
the central theme of nancial event ranking: from the modal per-
spective by proposing a Hybrid News Ranking framework; and
from the data perspective by building up a labeling system and
constructed three large-scale datasets. We conducted extensive ex-
periments on the constructed datasets. The experimental results
validate the rationality and eectiveness of solving nancial event
ranking through learning to rank. Moreover, the results justify that
the capability of the inuence allocation module to encode query
relations and the benet of the hybrid learning to rank framework.
Lastly, the results point out the issue of spurious correlation in
nancial document analysis, which is also faced in other domain.
In the future, we will consider to tackle the negative transfer
in HNR. Moreover, we will explore techniques to bridge the per-
formance gap between the quantication and allocation modules.
We will also extend the nancial news ranking solutions to serve
more languages. Lastly, we will explore the solution to spurious
correlation in nancial document analysis.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
241
REFERENCES
[1]
Scott R Baker, Nicholas Bloom, Steven J Davis, Kyle Kost, Marco Sammon, and
Tasaneeya Viratyosin. 2020. The unprecedented stock market reaction to COVID-
19. The Review of Asset Pricing Studies 10, 4 (2020), 742–758.
[2]
Lila Boualili, Jose G Moreno, and Mohand Boughanem. 2020. MarkedBERT: Inte-
grating Traditional IR Cues in Pre-trained Language Models for Passage Retrieval.
In Proceedings of the 43rd International ACM SIGIR Conference on Research and
Development in Information Retrieval. 1977–1980.
[3]
Raymond M Brooks, Ajay Patel, and Tie Su. 2003. How the equity market
responds to unanticipated events. The Journal of Business 76, 1 (2003), 109–133.
[4]
Remi Cadene, Corentin Dancette, Matthieu Cord, Devi Parikh, et al
.
2019. Rubi:
Reducing unimodal biases for visual question answering. In Advances in neural
information processing systems. 841–852.
[5]
Diego Ceccarelli, Francesco Nidito, and Miles Osborne. 2016. Ranking nancial
tweets. In Proceedings of the 39th International ACM SIGIR conference on Research
and Development in Information Retrieval. 527–528.
[6]
Dawei Cheng, Fangzhou Yang, Xiaoyang Wang, Ying Zhang, and Liqing Zhang.
2020. Knowledge Graph-based Event Embedding Framework for Financial Quan-
titative Investments. In Proceedings of the 43rd International ACM SIGIR Conference
on Research and Development in Information Retrieval. 2221–2230.
[7]
W Bruce Croft. 2019. The Importance of Interaction for Information Retrieval..
In SIGIR, Vol. 19. 1–2.
[8]
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu.
2020. Revisiting Pre-Trained Models for Chinese Natural Language Processing.
arXiv preprint arXiv:2004.13922 (2020).
[9]
Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with
contextual neural language modeling. In Proceedings of the 42nd International
ACM SIGIR Conference on Research and Development in Information Retrieval.
985–988.
[10]
Xuan-Hong Dang, Syed Yousaf Shah, and Petros Zerfos. 2019. " The Squawk Bot":
Joint Learning of Time Series and Text Data Modalities for Automated Financial
Information Filtering. arXiv preprint arXiv:1912.10858 (2019).
[11]
Shumin Deng, Ningyu Zhang, Wen Zhang, Jiaoyan Chen, Je Z Pan, and Huajun
Chen. 2019. Knowledge-driven stock trend prediction and explanation via tem-
poral convolutional network. In Companion Proceedings of The 2019 World Wide
Web Conference. 678–685.
[12]
Ann Devitt and Khurshid Ahmad. 2007. Sentiment polarity identication in
nancial news: A cohesion-based approach. In Proceedings of the 45th annual
meeting of the association of computational linguistics. 984–991.
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding. In
NAACL-HLT. ACL, 4171–4186.
[14]
Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for
event-driven stock prediction. In Twenty-fourth international joint conference on
articial intelligence.
[15]
Xin Du and Kumiko Tanaka-Ishii. 2020. Stock Embeddings Acquired from News
Articles and Price History, and an Application to Portfolio Optimization. In
Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics. 3353–3363.
[16]
Eugene F Fama. 1970. Ecient capital markets: A review of theory and empirical
work. The journal of Finance 25, 2 (1970), 383–417.
[17]
Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and Tat-Seng Chua.
2019. Temporal relational ranking for stock prediction. ACM Transactions on
Information Systems (TOIS) 37, 2 (2019), 1–30.
[18]
Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph
Enhanced Representation Learning for News Recommendation. In Proceedings of
The Web Conference 2020. 2863–2869.
[19]
Xianjing Han, Xuemeng Song, Yiyang Yao, Xin-Shun Xu, and Liqiang Nie. 2019.
Neural compatibility modeling with probabilistic knowledge distillation. IEEE
Transactions on Image Processing 29 (2019), 871–882.
[20]
Richang Hong, Lei Li, Junjie Cai, Dapeng Tao, Meng Wang, and Qi Tian. 2017.
Coherent semantic-visual indexing for large-scale image retrieval in the cloud.
IEEE Transactions on Image Processing 26, 9 (2017), 4128–4138.
[21]
Richang Hong, Yang Yang, Meng Wang, and Xian-Sheng Hua. 2015. Learning
visual semantic relationships for ecient visual retrieval. IEEE Transactions on
Big Data 1, 4 (2015), 152–161.
[22]
Jun Hu and Ping Li. 2018. Collaborative multi-objective ranking. In Proceedings of
the 27th ACM International Conference on Information and Knowledge Management.
1363–1372.
[23]
Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. 2018. Listening
to chaotic whispers: A deep learning framework for news-oriented stock trend
prediction. In Proceedings of the eleventh ACM international conference on web
search and data mining. 261–269.
[24]
Qiang Ji, Elie Bouri, Rangan Gupta, and David Roubaud. 2018. Network causality
structures among Bitcoin and other nancial assets: A directed acyclic graph
approach. The Quarterly Review of Economics and Finance 70 (2018), 203–213.
[25]
Omar Khattab and Matei Zaharia. 2020. Colbert: Ecient and eective passage
search via contextualized late interaction over bert. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information
Retrieval. 39–48.
[26]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised
Learning of Language Representations. In ICLR.
[27]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019.
Roberta: A robustly optimized bert pretraining approach. arXiv e-prints (2019).
arXiv:1907.11692
[28]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization.
arXiv preprint arXiv:1711.05101 (2017).
[29]
Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He.
2018. Beyond polarity: interpretable nancial sentiment analysis with hierarchical
query-driven attention. In Proceedings of the 27th International Joint Conference
on Articial Intelligence. 4244–4250.
[30]
Ye Ma, Lu Zong, Yikang Yang, and Jionglong Su. 2019. News2vec: News network
embedding with subnode information. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP). 4845–4854.
[31]
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR:
Contextualized embeddings for document ranking. In Proceedings of the 42nd
International ACM SIGIR Conference on Research and Development in Information
Retrieval. 1101–1104.
[32]
Ping Nie, Yuyu Zhang, Xiubo Geng, Arun Ramamurthy, Le Song, and Daxin Jiang.
2020. DC-BERT: Decoupling Question and Document for Ecient Contextual
Encoding. In Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval. 1829–1832.
[33]
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT.
arXiv preprint arXiv:1901.04085 (2019).
[34]
Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-stage
document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019).
[35]
Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document
expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019).
[36]
Rasaq Otunba, Raimi A Rufai, and Jessica Lin. 2017. Mpr: Multi-objective pairwise
ranking. In Proceedings of the Eleventh ACM Conference on Recommender Systems.
170–178.
[37]
Zhi Qiao, Shiwan Zhao, Cao Xiao, Xiang Li, Yong Qin, and Fei Wang. 2018.
Pairwise-ranking based collaborative recurrent neural networks for clinical event
prediction. In Proceedings of the Twenty-Seventh International Joint Conference on
Articial Intelligence.
[38]
Isaac Quaye, Yinping Mu, Braimah Abudu, Ramous Agyare, et al
.
2016. Review
of Stock Markets’ Reaction to New Events: Evidence from Brexit. Journal of
nancial risk management 5, 04 (2016), 281.
[39]
Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian
Stab, and Iryna Gurevych. 2019. Classication and Clustering of Arguments with
Contextualized Word Embeddings. In ACL. ACL, 567–578.
[40]
Pengjie Ren, Zhumin Chen, Zhaochun Ren, Evangelos Kanoulas, Christof Monz,
and Maarten de Rijke. 2021. Conversations with Search Engines: SERP-based
Conversational Response Generation. ACM Transactions on Information Systems
(TOIS) (2021).
[41]
Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zhumin Chen,
Zhaochun Ren, and Maarten de Rijke. 2021. Wizard of Search Engine: Access
to Information Through Conversations with Search Engines. In Proceedings of
the 44th International ACM SIGIR Conference on Research and Development in
Information Retrieval.
[42]
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu,
Mike Gatford, et al
.
1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995),
109.
[43]
Dwaipayan Roy, Debjyoti Paul, Mandar Mitra, and Utpal Garain. 2016. Using
word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608
(2016).
[44]
Caitlin Sadowski and Greg Levin. 2007. Simhash: Hash-based similarity detection.
Technical report, Google (2007).
[45]
Thomas Sattler. 2013. Do markets punish left governments? The Journal of
Politics 75, 2 (2013), 343–356.
[46]
Herbert A Simon. 1954. Spurious correlation: A causal interpretation. Journal of
the American statistical Association 49, 267 (1954), 467–479.
[47]
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to ne-tune bert
for text classication?. In China National Conference on Chinese Computational
Linguistics. Springer, 194–206.
[48]
Eric T Swanson. 2020. Measuring the eects of Federal Reser ve forward guidance
and asset purchases on nancial markets. Journal of Monetary Economics (2020).
[49]
Matthias W Uhl. 2014. Reuters sentiment and stock returns. Journal of Behavioral
Finance 15, 4 (2014), 287–298.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
242
[50]
Manuel R Vargas, Carlos EM dos Anjos, Gustavo LG Bichara, and Alexandre G
Evsuko. 2018. Deep leaming for stock market prediction using technical indica-
tors and nancial news articles. In 2018 International Joint Conference on Neural
Networks (IJCNN). IEEE, 1–8.
[51]
Jerey Scott Vitter and Philippe Flajolet. 1990. Average-case analysis of algo-
rithms and data structures. In Algorithms and Complexity. Elsevier, 431–524.
[52]
Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classication. In
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the
4th International Joint Conference on Natural Language Processing of the AFNLP.
235–243.
[53]
Thomas Wolf,Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue,
Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe
Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu,
Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest,
and Alexander M. Rush. 2019. HuggingFace’s Transformers: State-of-the-art
Natural Language Processing. ArXiv abs/1910.03771 (2019).
[54]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov,
and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language
understanding. In NeuIPS. Curran Associates, Inc., 5754–5764.
[55]
Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019.
Cross-domain modeling of sentence-level evidence for document retrieval. In
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Pro-
cessing and the 9th International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP). 3481–3487.
[56]
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. Rep-
BERT: Contextualized Text Embeddings for First-Stage Retrieval. arXiv preprint
arXiv:2006.15498 (2020).
[57]
Zhenyu Zhang and Juan Yang. 2018. Dual learning based multi-objective pairwise
ranking. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE,
1–7.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
243