Conference PaperPDF Available

Hybrid Learning to Rank for Financial Event Ranking


Abstract and Figures

The financial markets are moved by events such as the issuance of administrative orders. The participants in financial markets (e.g., traders) thus pay constant attention to financial news relevant to the financial asset (e.g., oil) of interest. Due to the large scale of news stream, it is time and labor intensive to manually identify influential events that can move the price of the financial asset, pushing the financial participants to embrace automatic financial event ranking, which has received relatively little scrutiny to date. In this work, we formulate the financial event ranking task, which aims to score financial news (document) according to its influence to the given asset (query). To solve this task, we propose a Hybrid News Ranking framework that, from the asset perspective, evaluates the influence of news articles by comparing their contents; and from the event perspective, accesses the influence over all query assets. Moreover, we resolve the dilemma between the essential requirement of sufficient labels for training the framework and the unaffordable cost of hiring domain experts for labeling the news. In particular, we design a cost-friendly system for news labeling that leverages the knowledge within published financial analyst reports. In this way, we construct three financial event ranking datasets. Extensive experiments on the datasets validate the effectiveness of the proposed framework and the rationality of solving financial event ranking through learning to rank.
Content may be subject to copyright.
Hybrid Learning to Rank for Financial Event Ranking
Fuli Feng12, Moxin Li2, Cheng Luo3, Ritchie Ng2, Tat-Seng Chua2
1Sea-NExT Joint Lab, 2National University of Singapore, 3MegaTech.AI,,,,
The nancial markets are moved by events such as the issuance of
administrative orders. The participants in nancial markets (e.g.,
traders) thus pay constant attention to nancial news relevant to
the nancial asset (e.g., oil) of interest. Due to the large scale of
news stream, it is time and labor intensive to manually identify
inuential events that can move the price of the nancial asset,
pushing the nancial participants to embrace automatic nancial
event ranking, which has received relatively little scrutiny to date.
In this work, we formulate the nancial event ranking task,
which aims to score nancial news (document) according to its
inuence to the given asset (query). To solve this task, we propose
aHybrid News Ranking framework that, from the asset perspective,
evaluates the inuence of news articles by comparing their contents;
and from the event perspective, accesses the inuence over all query
assets. Moreover, we resolve the dilemma between the essential
requirement of sucient labels for training the framework and the
unaordable cost of hiring domain experts for labeling the news. In
particular, we design a cost-friendly system for news labeling that
leverages the knowledge within published nancial analyst reports.
In this way, we construct three nancial event ranking datasets.
Extensive experiments on the datasets validate the eectiveness
of the proposed framework and the rationality of solving nancial
event ranking through learning to rank.
Information systems Document ltering
Learning to rank
Computing methodologies
Learning to rank.
learning to rank, document retrieval, nance
ACM Reference Format:
Fuli Feng, Moxin Li, Cheng Luo, Ritchie Ng, Tat-Seng Chua. 2021. Hybrid
Learning to Rank for Financial Event Ranking. In Proceedings of the 44th
International ACM SIGIR Conference on Research and Development in Infor-
mation Retrieval (SIGIR ’21), July 11–15, 2021, Virtual Event, Canada. ACM,
New York, NY, USA, 11 pages.
Corresponding author. This research is supported by the Sea-NExT Joint Lab.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
©2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8037-9/21/07. . . $15.00
The ecient market theory [
] states that nancial markets quickly
impound the publicly available information [
]. In other words, the
prices of nancial assets such as stocks and commodities
Figure 1(a) for an example) are quickly moved by nancial events
especially the unanticipated ones such as the outbreak of infec-
tious disease [
], the declaration of electoral victory [
] and the
announcement of government intervention [48]. For instance, the
global stock markets lost about two trillion dollars in value within
the 24 hours after the declaration of Brexit results [
]. It is un-
doubtedly essential for nancial participants such as traders and
analysts to quickly assess and react to nancial events with po-
tential to move the asset price. Consequently, identifying events
inuential to the asset of interest has become a heavy workload that
burns tremendous time and energy of the nancial participants due
to the large volume of news stream
. Therefore, nancial event
ranking [
] is an emergent requirement of great practical value.
However, it has received relatively little scrutiny to date.
In this work, we formulate nancial event ranking as a learning
to rank task where the target asset is viewed as the query to retrieve
the candidate news published within a lag until the query date (see
Figure 1(b)). The key to solving this task lies in quantifying the
inuence to the query asset according to the contents of the news.
Intuitively, several widely used information retrieval techniques
can be applied to achieve the target, such as document classica-
tion [
], document retrieval [
] and news recommendation [
For instance, the document retrieval models can learn the connec-
tion between the query and news contents from labeled query-news
pairs. The models can thus emphasize news mentioning “Brazil” for
the query of ferrous since Brazil is the largest exporter of iron ore
and frequently occurs in the inuential news of ferrous. However,
the direct usage of existing methods is insucient to solve the -
nancial event ranking problem due to lacking consideration on the
properties of nancial markets. For instance, due to the connection
between “Brazil” and ferrous, the existing method will recognize
“Brazil” as a feature of inuential news. It will promotes the score of
news mentioning Brazil in queries other than ferrous such as gold
where Brazil is not a key stakeholder, leading to improper ranking
with false positive responses for the query of gold.
We argue that the key to bridging this gap lies in scrutinizing
the inuence from the perspectives of both asset and news. This
is because a news can simultaneously inuence various nancial
assets due to their connections [17, 24]. Towards this end, we pro-
pose a Hybrid News Ranking (HNR) framework, which combines
the asset and news perspectives. In particular, an inuence quan-
tication module evaluates the inuence of nancial events from
1Note that the nancial asset of a commodity is the corresponding future contract.
2 of-gdelt- as-of- february-2016/.
We interchangeably use news and event which refer to the textual content of the
news article or the textual description of the event.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
Figure 1: Illustration of (a) the inuence of nancial events on the price movement of an example asset; and (b) the nancial
event ranking task. The candlesticks and bars represent the daily price movement and trading volume, respectively. The blue
curve is the Dow Jones Industrial Average (DJI) index, which reects the trend of US stock market. Better viewed in colors.
the asset perspective by comparing their contents. From the query
perspective, an inuence allocation module accesses the inuence
of a news across assets. The inuence allocation module will align
the inuence scores of a news on the homogeneous queries (e.g.,
ferrous and coal) and distance the inuence scores on the hetero-
geneous queries (e.g., base metal and coal), which provides clues
for eliminating the false positive responses. To ingeniously use
such clues and combine the two perspectives, an inuence mixer is
carefully devised to learn integration strategies.
Labeled data are indispensable for the training of HNR, which
largely rely on deep neural networks to encode the query and
news [
]. However, it is extremely resource consuming to label
the inuence of nance events due to the large number of candidates
and the reliance on experienced but expensive nancial experts.
To resolve the dilemma, we build up a labeling system to identify
the positive news for each asset from the corresponding analyst
reports, which are published periodically (e.g., daily) and written
by domain experts. In particular, the system consists of: mention
extraction which extracts events mentioned in the analyst report
and mention-news matching which matches the extracted event
with news reporting the event. As both stages can be accomplished
under the assistance of automatic algorithms or common crowd
workers, we largely reduce the cost and construct three large-scale
datasets. Extensive experiments on the three datasets validate the
eectiveness of the proposed HNR and the rationality of solving
the nancial event ranking via learning to rank. The datasets and
code are released at:
The main contributions are summarized as follow:
We formulate the problem of nancial event ranking and propose
aHybrid News Ranking framework.
We build up a cost friendly system to label positive news and
construct three datasets for nancial news ranking.
We conduct extensive experiments that validate the rationality
and eectiveness of our proposal.
To achieve nancial event ranking, the target is to learn a scoring
, which predicts the inuence of a candidate
to a query
denotes the parameters of the function
to be learned.
is a list of word IDs that encodes the contents of
the news.
is also a list of word IDs that corresponds to the name
of the query asset, e.g., “base metal”
. The nancial event ranking
task is dierent from the conventional document retrieval [
] for
the following reasons: 1) Queries with the same content (e.g., “base
metal”) at dierent time-steps are viewed as dierent queries, but
belong to the same type. 2) The candidates to be ranked at dierent
time-steps do not overlap with each other. 3) Our problem has xed
types of query instead of unlimited query according to content. We
denote the query types as a set Qwhere |Q| is the set size.
The scoring function should identify the key patterns
of news contents that can distinguish inuential news from the
common one in a query specic manner. A promising solution
to is to learn from labeled historical queries. We thus employ the
supervised learning paradigm to optimize the parameters of the
scoring function, which is formulated as:
(<Q,D>,y) ∈L
denotes a set of labeled query-news pair where
1for inuen-
tial news and
0for negative samples randomly selected from
the remaining candidates. l(·) is a loss function such as the binary
cross-entropy loss. αadjusts the strength of regularization.
Technically, base metal includes four commodities: lead, copper, nickel, and zinc. In
this work, we do not view a specic one of the four commodities as query since they
are typically discussed and analyzed as a group. Note that we can easily generalize to
specic commodity with its name as the query (e.g., “copper”).
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
Figure 2: Illustration of the proposed HNR framework where queries in the same color (e.g., Q1
tand Q2
t) are homogeneous ones,
and queries in dierent colors (e.g., Q2
tand Q3
t) are heterogeneous ones.
As shown in Figure 1(b), the learned scoring function
can serve for each query
at any time-step
. We add the subscript
to distinguish queries at dierent time-steps. Note that the query
types at dierent time-steps (e.g.,
) are equal. Let
denotes the candidate set at time-step
, which consists of news
published within a lag (e.g.,
). Note that we remove duplicate
news reporting the same event through SimHash [
] to reduce
the size of the candidate set. Nevertheless, we still need to score
thousands of news to select the top-
most inuential ones for each
query. Formally, the serving phase for query Qtis:
sort {ˆ
t∈ Dt},(2)
denotes a function that sorts the candidate news in a
descending order. Moreover, considering that the query set is xed,
the serving phase is indeed performing one ranking for each query
at the start of the time-step and serve all the coming queries.
In this section, we introduce the proposed HNR framework. As
shown in Figure 2, it consists of three modules: inuence quantica-
tion module (Section 3.1), inuence allocation module (Section 3.2),
and inuence mixer (Section 3.3).
3.1 Inuence Quantication Module
Our main consideration for devising the inuence quantication
module is to meticulously assessing the inuence of a news on
a given query according to their contents. To achieve the target,
the key lies in mining the connections between the query descrip-
tion and the news content, which is coherent with the target of
document retrieval [
]. As such, we devise the inuence
quantication module as a document retrieval model where the in-
put is a concatenation of the query and the candidate news
and the output is the inuence prediction, i.e., the probability that
the Di
tis an inuential event of Qt. Formally,
denotes the model parameters to be learned. Inspired
by the huge success of pre-trained language model in document
retrieval [
], we devise
based on a deep Trans-
former such as BERT [
], XLNet [
] or RoBERTa [
], which
is pre-trained over a large-scale corpus in a self-supervised manner
to encode the co-occurrence of words.
In particular, we follow the next sentence prediction paradigm [
to format the query-news pair as
[CLS,Qt,S EP ,Di
. As shown in
Figure 3, the query and the candidate news are concatenated with
token at the beginning and a
[SEP ]
token for separation.
Figure 3: Illustration of the inuence quantication module.
After passing through the deep Transformer, each token obtains a
which encodes the textual patterns. The represen-
tation of the
is passed through a fully connected
(FC) layer to estimate the probability that Di
tis a positive news of
query Qt. Formally,
t=fq[CLS,Qt,S EP ,Di
We can rank the news for query
according to the prediction
3.2 Inuence Allocation Module
Financial researches [
] have demonstrated the coupling eects
across dierent assets. As such, the inuence of a nancial event
on dierent assets is linked to each other rather than independent.
We thus further devise an inuence allocation module to evaluate
the inuence from the query perspective. Our main consideration
of the module is to account for the connections of the queries in
the evaluation of news inuence. To achieve the target, the module
is expected to consider the whole query set
when evaluating
the inuence of a news
. We thus devise the inuence allocation
module as a
|Q |
-way classication module, which is formulated as:
t R | Q |
denotes the inuence predictions over the query
set. In this way, the news inuence is evaluated by the comparison
across the queries. Again, inspired by the success of pre-trained
language models in text classication, we also devise the inuence
allocation module based on Transformer. In particular, the input
is formatted as
, and the prediction is made through a FC
layer from the representation of the [CLS ]token.
It should be noted that the FC layer is parameterized by a map-
ping matrix
W R | Q | ×H
denotes the dimensionality of
the latent representation. In this way, there are separate parameters
5Note that the types of potential queries at dierent time-steps is always | Q |.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
for the inuence evaluation of each query, i.e., each row of
sponds to a query. Accordingly, queries with close parameters will
be allocated to similar inuence scores and vise versa. Undoubtedly,
the parameters of homogeneous queries will be pushed to be close
to each other since the queries are inuenced by similar news even
the same news sometimes. The heterogeneous queries will thus
obtain parameters with relatively large distance. In this way, the
inuence allocation module is able to account for the homogeneous
and heterogeneous relations between queries (cf. Figure 5).
Both the inuence quantication module and the inuence al-
location module have pros and cons. In the quantication module
(Equation 4), the same classication parameters are shared across
all queries, which faces the spurious correlation issue [
]. The
positive news of a query will be assigned an exaggerated score on
a heterogeneous query where the news is insucient to inuence.
The allocation module can break some spurious correlations across
queries owing to the query-specic parameters and the considera-
tion of query relations. However, the allocation module ignores the
query contents and may suer from information loss.
3.3 Inuence Mixer
A natural way to leverage the advantages of both modules is to
aggregate their ranking scores. Inspired by the RUBi function [
which is widely used for aggregating predictions, a straightforward
mixer is formulated as:
tpower (˜
t R | Q |
is the nal prediction of news
over all the
queries; and
t R | Q |
denotes the scores from the quantication
module where we gather the output of Equation 4 across all queries.
denotes the element-wise product and
power (·)
is a element-wise
power function.
is a hyper-parameter to balance the contribution
of the two modules.
While the simple solution can achieve the target of combinining
the two perspectives, we consider that the mixer should account for
the inter-module connections and inter-query connections within
the scores. Inspired by the success of Convolutional Neural Network
(CNN) in recognizing local-region patterns, we further devise the
inuence mixer module as a CNN, which is formulated as:
In particular, the CNN consists of a stack layer, column convolution
layer, row convolution layer, and FC layer.
Stack layer.
The stack layer stacks the outputs of the two mod-
ules as a matrix
t] ∈ R2× | Q |
, which can facilitate
observing the local-region patterns.
Column convolution layer.
The column convolution layer con-
sists of 1D vertical lters to learn the rules for combining the
two predictions. Formally,
t>, (8)
Fc∈ RK×2
denotes the lters of the convolution layer
t∈ RK× | Q |
denotes the recognized signals.
is a hyper-
parameter to adjust the number of lters.
Row convolution layer.
Similarly, this layer consists of 1D
horizontal lters to recognize the inter-query patterns, which is
formulated as,
t>, (9)
Fr∈ RM× | Q |
are the
lters and
t∈ RK×M
the recognized signals.
FC layer.
The FC layer makes the nal prediction to combine
the recognized signals, which is formulated as,
W∈ R(KM)× | Q |
b R | Q |
are parameters to be
learned. atten(·) atten a matrix as a vector.
In Figure 2(c), we depict the process of the CNN inuence mixer
with a simple example.
A key consideration for the training of the proposed HNR frame-
work is the saving of memory and computation cost. Since both
the quantication module and the allocation module consists of
deep Transformer, learning the parameters of three modules in
an end-to-end manner will double the memory cost, which poses
higher requirements on the infrastructure and thus constrains the
practical usage of HNR. To reduce cost, a straightforward solution is
sharing the Transformer across the two modules. Its disadvantages
are twofold: 1) the training objective consists of three components,
leading to huge overhead for hyper-parameter tuning; and 2) the
two modules are coupled, making them hardly to recognize com-
plementary signals. Another solution is to train the three modules
separately by optimizing Equation 1 over the labeled dataset
Below illustrates the training algorithm of HNR.
Algorithm 1 Training of HNR.
Input: Training data L, types of query Q.
1: Train the inuence quantication module over L;
2: Train the inuence allocation module over L;
3: Collect the inuence score from the two modules; Model inference
Train the inuence mixer with the collected inuence scores as features.
The key to building a HNR is the construction oof labeled datasets.
That is, from the historical candidate set
, labeling the positive
news for the queries
. The target is non-trivial to achieve because:
1) the large size of
; and 2) the reliance on experienced analyst to
evaluate the inuence. Obviously, it is critical to resolve the reliance
on domain expertise, which has indeed been encoded by the analyst
reports written by the experts. Typically, on each trading day, we
can nd an analyst report for each query asset, which summarizes
the market status and discusses the inuential events
. As such,
we can identify positive news for a query by extracting the events
mentioned in the corresponding analyst report. In this way, we
construct three datasets that corresponds to the metal, agriculture,
and chemical markets. As shown in Table 1, there are three types
of queries in each dataset. These datasets have 340, 581, and 200
queries, respectively, which are reasonable size compared to the
widely used relevance judgement benchmark TREC7.
To write the analyst reports, the domain experts manually identify inuential news.
In other words, the target of HNR is to mimic the domain experts.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
Dataset #Queries Types of query #Positive news
per query
per query
Metal 340 Base metal,
Ferrous, Coal 4.89 2459.58
Agriculture 581 Soy oil, Cotton, Sugar 2.51 2390.89
Chemical 200 PTA, MEG, Rubber 2.53 2414.18
Table 1: Statists of the three constructed datasets.
Figure 4: Illustration of the news labeling procedure: a) ex-
tracting event mentions from the analyst reports; and b)
matching mention with candidate news.
4.1 Data Collection
Analyst reports.
We select Hongyuan Futures Co., Ltd. as our
to collect historical analyst reports since the reports are
published in plain text instead of PDF les. As the company fo-
cuses on commodities, we select three commodity markets: metal,
agriculture, and chemical, according to the popularity of reports,
i.e., the number of reads. For each market, the analyst report is
typically published on each trading day and is targeted at a group
of commodities, e.g., base metal (copper, aluminium, etc.) and fer-
rous (screw thread steel and iron ore). From each market, we select
three groups of commodity, leading to three types of query in each
dataset. The name of the commodity groups (e.g., “base metal”) are
the queries (i.e., Qt) on each trading day t.
Candidate news.
To match the language of the collected analyst
reports, we collect candidate news in Chinese from the largest portal
websites in China for both the nancial news and commodity news
We select the news posted within 48 hours before trading day
the candidate news of query
. We set 48 hours since the markets
can quickly react to the event [
], which means that “old” news
cannot inuence the markets anymore. In the data collection period
from 2018-09 to 2020-06, the size of the candidate set is around 2,400
for each query. In other words, given a query, our task is to select
the top-Kinuential news from thousands of candidates.
4.2 Labelling Procedure
Figure 4 illustrates the procedure to identify the positive news for
a query
from the corresponding analyst report, which consists
of two phases: mention extraction and mention-news matching.
Mention extraction.
Analyst reports typically follow the same
template for mentioning nancial events. As such, we dene a set
of rules based on the section titles and HTML styles to extract event
mentions from the collected analyst reports. Averagely, we extract
9Sina: https://, Chinese Finance Online:
the number of 4.89, 2.51, and 2.55 positive events (cf. Table 1) for the
queries in the metal, agriculture, and chemical datasets, respectively.
Note that we do not directly merge the extracted mentions into
the corresponding candidate set for two reasons: 1) it will lead to
duplicate news, leading to biased evaluation; and 2) the mentions
are typically rephrased by the analyst with linguistic properties
dierent from common news articles. Due to such discrepancy, the
model trained on mentions will fail in practical usage where the
candidates are all common news articles.
Mention-news matching.
We thus match the event mention
with its corresponding news, i.e., recognizing the news that reports
the same event as the mention within the candidate set. To con-
trol the cost and quality, we perform the matching in two steps: 1)
automatic matching, which evaluates the similarity between the
mention and each candidate news; and 2) manual checking where
crowd workers check the top-3 most similar news to identify the
positive one
. Note that checking on whether two pieces of text
describe the same event can be done without domain expertise,
resolving the reliance on domain experts. As to the similarity eval-
uation algorithm, we leverage the public API provided by one of
the largest search engines for Chinese news11.
In this way, we identify the positive news for more than 99.9% of
the extracted event mentions and discard the remaining cases. By
checking the contents of the identied positive news, we conrm
that these news cover a wide spectral of events aecting the supply
and demand of the commodities, such as geopolitical events, gov-
ernment policies, company announcements, and strike, indicating
the challenge of these datasets.
We conduct experiments on the three constructed datasets to an-
swer the following research questions:
To what extent the
learning to rank techniques solve the nancial event ranking prob-
How eective is the proposed HNR as compared to
existing document retrieval methods?
What are the factors
that inuence the eectiveness of the proposed HNR?
5.1 Experiment Settings
Evaluation protocols. We chronologically split each dataset into
training, validation, and testing with a ratio of 7:1:2. That is, the
most recent 20 percent queries are treated as testing cases. Fol-
lowing conventional document retrieval work [
], we adopt the
evaluation metrics of MAP, MRR (MRR1 and MRR3), and Recall
(Rec3, Rec5, and Rec10). We report the average performance over
testing queries where larger value indicates better performance.
Compared methods. We compare the proposed HNR with ad-
vanced document retrieval methods, including:
BM25 [42]
: This method is still widely used for document re-
trieval. We use an open source implementation
of the method.
ColBERT [25]
: It is a Transformer-based ranking method that
encodes the query and the document separately with the Trans-
former and scores query-document pair by the similarity (inner
product) of their representations.
10The manual checking ends at nding a positive news or the third round.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
Ret [33]
: It is also a Transformer-based document retrieval
method which concatenates the query and the document as input
and uses a binary classication layer to calculate the matching
score. It is the same as the inuence quantication module in the
proposed HNR.
: It is an extension of
equipped with query expan-
sion [
]. For a query
, it extracts candidate terms from the
positive news
, and adds the top-tanked terms into the input
query according to their cosine similarity with the query. The
expanded query is then fed into
during both model training
and testing.
We also include two advanced document classication baselines:
ClaM [47]
: It ne-tunes the pre-trained Transformer with an
additional layer to classify the documents into dierent types
of queries. Documents are ranked by the probability over the
class corresponds to the type of the input query. It is same as the
inuence allocation module in the proposed HNR.
ClaS [47]
: Similar to
, this method has a classication layer
that predicts whether a document is positive or not. That is to say,
we ne-tune a pre-trained Transformer for each dataset, where
documents are ranked according to the probability given by the
Implementation details. In addition to BM25, the compared meth-
ods are implemented with Pytorch 1.4.0
based on HuggingFace’s
]. For pre-training, we use the checkpoint of Chi-
nese RoBERTa with Whole Word Masking (named chinese-roberta-
wwm-ext) released by [
]. For training of the inuence quantica-
tion module and the inuence allocation module (i.e., ne-tuning
RoBERTa), we set the maximum input length as 256 and update
model parameters with AdamW [
]. We set the gradient accumu-
lation step as 2, gradient clipping by 2.0, the number of warmup
steps as 100, the total training steps as 5,000, and the weight for
regularization term (i.e.,
) as 0. The learning rate and batch size
are selected according to the validation performance w.r.t. Rec10.
As to the inuence mixer, we set the coecient of RUBi (i.e.,
) as
1, and tune the number of lters in CNN.
5.2 Rationality of Learning to Rank (RQ1)
To validate the rationality of formulating the nancial event ranking
as a learning to rank task, we rst test the document classication
and retrieval methods. Table 2 summarizes the ranking performance
of the compared methods on the three datasets: metal, agriculture,
and chemical. Note that Upper_Bound represents the performance
of knowing the ground truth of the test queries, which can be seen
as the performance of domain experts. From the table, we have the
following observations:
The performance of deep Transformer-based methods are surpris-
ingly good on the three datasets. For instance, the performance
of ColBERT on metal w.r.t. Rec10 surpasses 0.982, which is very
close to the upper bound 0.994. The result means that the nancial
participants can access more than 98% of the inuential events
by only reading 10 top-ranked news from ColBERT, which will
help the participants can to save tremendous amount of time and
13 started/previous-versions/#v140.
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
ClaM 0.924 0.925 0.959 0.666 0.835 0.978
ClaS 0.903 0.906 0.943 0.635 0.812 0.964
BM25 0.032 0.019 0.041 0.019 0.044 0.073
ColBERT 0.948 0.981 0.987 0.684 0.834 0.982
Ret 0.929 0.943 0.969 0.666 0.820 0.969
RetQE 0.923 0.906 0.953 0.653 0.833 0.977
Upper_Bound 1.0 1.0 1.0 0.718 0.877 0.994
Method MAP MRR1 MRR3 Rec3 Rec5 Rec10
ClaM 0.637 0.542 0.650 0.630 0.778 0.952
ClaS 0.550 0.402 0.545 0.554 0.729 0.878
BM25 0.059 0.028 0.056 0.059 0.075 0.123
ColBERT 0.633 0.505 0.634 0.640 0.814 0.945
Ret 0.640 0.514 0.656 0.657 0.792 0.952
RetQE 0.633 0.523 0.623 0.622 0.767 0.946
Upper_Bound 1.0 1.0 1.0 0.944 0.992 1.0
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
ClaM 0.380 0.216 0.405 0.421 0.529 0.737
ClaS 0.525 0.486 0.563 0.414 0.636 0.814
BM25 0.077 0.0 0.018 0.029 0.089 0.291
ColBERT 0.492 0.351 0.473 0.399 0.609 0.802
Ret 0.592 0.514 0.662 0.561 0.611 0.833
RetQE 0.542 0.459 0.599 0.510 0.625 0.834
Upper_Bound 1.0 1.0 1.0 0.887 0.961 0.997
Table 2: Ranking performance of document classication
and document retrieval models on the three datasets.
eort. The results thus validate the rationality and eectiveness
of learning to rank solutions for nancial event ranking.
The retrieval models achieve performance that is comparable
to classication models. Across the six metrics, both types of
model achieve the best performance in some cases. For instance,
ClaM achieves the best Rec5 on the metal market, while ColBERT
achieves the best Rec5 on the agriculture market. These results
reect that both types have their pros and cons, i.e., neither
asset perspective nor the event perspective is sucient to solve
nancial event ranking. As such, it is essential to build a hybrid
solution to combine the two perspectives.
Among the retrieval models, a) BM25 achieves the worst perfor-
mance because it only considers the occurrence of query terms.
This result highlights the importance of understanding the news
content, which means that keyword-based ltering is not applica-
ble for nancial news. b) Ret performs better than RetQE in most
cases, which means that the benet of query expansion is limited
in the nancial event ranking problem. We postulate the reason
to be the temporal uctuation of the nancial events, i.e., positive
events across dierent time-steps are not closely connected.
Among the classication models, ClaM performs better on the
metal and agriculture markets, while ClaS achieves better per-
formance on the chemical market. Recall that ClaM has separate
classication parameters for each type of query while ClaS shares
all model parameters across queries. The chemical dataset is a
unbalanced dataset with very few queries of MEG. We suspect
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
Ret 0.929 0.943 0.969 0.666 0.820 0.969
ClaM 0.924 0.925 0.959 0.666 0.835 0.978
HNB_RUBi 0.940 0.962 0.978 0.670 0.835 0.982
HNB_CNN 0.944 0.962 0.978 0.670 0.837 0.988
Ret 0.640 0.514 0.656 0.657 0.792 0.952
ClaM 0.637 0.542 0.650 0.630 0.778 0.952
HNB_RUBi 0.643 0.523 0.646 0.644 0.802 0.960
HNB_CNN 0.650 0.551 0.673 0.675 0.806 0.945
MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10
Ret 0.592 0.514 0.662 0.561 0.611 0.833
ClaM 0.380 0.216 0.405 0.421 0.529 0.737
HNR_RUBi 0.427 0.270 0.441 0.401 0.566 0.809
HNR_CNN 0.512 0.459 0.545 0.400 0.627 0.834
Table 3: Performance of HNR on the three datasets. The best
and worst performances on each dataset w.r.t. each metric
are highlighted with bold font and underline, respectively.
that the inferior performance of ClaM on the chemical market is
caused by the unbalanced dataset and the insucient training
on the rare class. Note that we need to choose between the two
classication models to build the HNR framework. In this work,
we simply select ClaM.
5.3 Eectiveness of HNR (RQ2)
We then investigate the eectiveness of the proposed HNR frame-
work. In particular, we compare the ranking performance of four
versions of the proposed HNR: 1) HNR_CNN that applies the CNN
mixer (Equation 7); 2) HNR_RUBi that applies the RUBi mixer (Equa-
tion 6); 3) HNR without the inuence allocation module (i.e., Ret);
and 4) HNR without the inuence quantication module (i.e., ClaM).
Table 3 shows the performance of the four HNR versions on the
three datasets. From the table, we have the following observations:
HNB_CNN performs better than HNB_RUBi in most cases, which
shows the advantages of the CNN mixer module. That is to say,
the query and news perspectives can be more accurately com-
bined by considering the local-region patterns (i.e., the inter-
module and inter-query connections).
In most cases, the hybrid models (i.e., HNB_RUBi and HNB_CNN)
achieve performance gain over the single models (i.e., Ret and
Clam). In particular, across the cases, the hybrid models typically
perform the best and seldom perform the worst among the four
versions, which indicates that the hybrid models successfully
leverage the pros of both the quantication and the allocation
modules. Therefore, these results validate the eectiveness and
the rationality of the hybrid learning-to-rank framework in solv-
ing nancial event ranking.
On the chemical dataset, as compared to Ret, HNB_CNN per-
forms slightly better w.r.t. Rec5 and Rec10, while worse w.r.t. the
remaining metrics, especially Rec3. The result means that the
inuence mixer has to sacrice the ranking at the head (e.g., top
three) to recall more positive news on the chemical dataset. We
postulate the reason to be that the allocation module (i.e., ClaM)
performs much worse than the quantication module (i.e., Ret).
That is, the performance gap forces the mixer module to sacrice
the head part. Recall that the inferior performance of ClaM might
because of the unbalance of the chemical dataset. This result
thus suggests a potential future direction to enhance the HNR
framework by eliminating the impact of data unbalance.
5.4 In-depth Analysis (RQ3)
Figure 5: Visualization of the query relations recognized by
the inuence allocation module.
Query relations. Recall that the parameters of the classication
layer in the inuence allocation module are expected to capture the
query relations, i.e., homogeneous queries obtain close parameters.
As such, we calculate the cosine similarity between each pair of
parameters and depict the similarities in Figure 5. From the gure,
we can see that: 1) in the metal market, base metal and ferrous
have the highest similarity, which are both widely used in industry
applications. Coal and ferrous also exhibit a high similarity since
coal is widely used in steel smelting. On the contrary, the smelting
of base metals relies on electricity, making their similarity with coal
very low. 2) In the agriculture market, soy oil and sugar obtains
high similarity since they are both used in the food industry. They
thus have less similarity to cotton, which is mainly used in the
textile industry. 3) In the chemical industry, MEG and PTA are
closely connected since they are both industry raw materials and
mainly used together to produce polyester. On the contrary, rubber
is another kind of industry materials typically used in dierent
applications. To summarize, the results justify that the inuence
allocation module indeed captures query relations, i.e., commodities
with larger overlap on the uses are closer in the allocation module.
Query-specic performance. We further investigate the eective-
ness of HNR in a query specic manner by comparing HNR_RUBi
and HNR_CNN with Ret, i.e., the single inuence quantication
module, over dierent queries. In particular, we select a pair of
homogeneous queries: base metal and ferrous, and a pair of het-
erogeneous queries: soy oil and cotton, according to whether the
commodities have similar uses. Figure 6 shows the performance
of the compared methods w.r.t. MAP. We omit the results w.r.t. the
other metrics for saving space, which show similar trends. From
the gures, we can observe that: 1) On the homogeneous queries,
both HNR_RUBi and HNR_CNN show clear performance gain over
Ret, which is attributed to the ability of HNR to consider query rela-
tions. That is, the evaluation of inuence to a query might facilitate
the evaluation for a homogeneous query. 2) On the heterogeneous
queries, as compared to Ret, HNR performs better on one query, but
worse on the other query. It means that the benet of HNR mainly
comes from accounting for the homogeneous query relations.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
(a) Homogeneous Queries (b) Heterogeneous Queries
Figure 6: Performance of HNR and the quantication mod-
ule on homogeneous queries and heterogeneous queries.
Combination strategy. We then investigate the combination strate-
gies learned by the CNN mixer by observing the changes from the
ranking of Ret to the ranking of HNB_CNN. In particular, given
a query
, we select the top-10 ranked news of Ret, extract the
corresponding ranking from HNB_CNN, and then study the posi-
tion changes of two types of news: 1)
, which
means the allocation module allocates the most inuence to the
target query
; and 2)
, which means the most
inuence is allocated to query other than
. We resort to inver-
sion number [
] to quantitatively analyze the position changes
of a type of news. In particular, for a type of news, we label them
with 1 and the remaining news with 0 in the rankings from Ret
and HNB_CNN, count the inversion number of each ranking, and
calculate the increase rate of the inversion number from Ret to
HNB_CNN. A positive increase rate means that the HNB_CNN
moves the selected type of news to the top positions. Figure 7 il-
lustrate the average increase rate over all testing queries in the
three datasets. From the gure, we can see that the increase rate of
the rst type (i.e.,
) is positive, while the increase
rate of the second type (i.e.,
) is negative. The
result means that the CNN mixer favors news with the maximum
score from the allocation module, which is a reasonable strategy to
combine the two modules.
Case study. We then conduct a qualitative analysis on the ranking
results generated by HNR_CNN. As a reference, we compare the
retrieval results from a widely used nancial news search engine
Note that we restrict the search engine to return news from either
Sina or China Finance Online, so that the search engine has the
same candidate set as HNR_CNN for fair comparison. In particular,
we set the query content and query date as “soy oil” and 2019-09-10,
respectively. Figure 8 shows the returned top-5 news of HNR_CNN
and the search engine. Note that the query has two positive events,
which are labeled as Tin the gure. From the gure, we can see that
the top two retrieved results of the proposed HNR_RNN are exactly
the ground truth events. On the contrary, the search engine fails
to recall any positive news. Obviously, the search engine focuses
more on exact term matching between the query and the news
contents, overlooking the inuential news without explicit mention
of the query terms. To summarize, this result further validates the
eectiveness of solving nancial event ranking as a learning to
rank task through the proposed hybrid framework.
Figure 7: The increase rate of the inversion number from the
ranking of Ret to the ranking of HNB_CNN, where the news
is assigned a binary ag 1/0 according the classication re-
sult from the inuence allocation module (i.e., arдmax(˜
The blue column corresponds to assigning 1 to news satisfy-
ing arдmax(˜
t)=Qt. The yellow corresponds to an opposite
operation that assigns 1 to news with arдmax(˜
Failure case analysis. To further shed light on the capability and
the weakness of the proposed HNR, we dive into the failure cases
of HNR_CNN on the three datasets. We dene the failure cases
in a ranking list as the negative news that is ranked before any
positive news of the query. That is to say, the failure cases are news
articles occupy opportunities of the positive news. We summarize
the properties of the failure cases as follow:
For the metal and chemical datasets, the failure cases are mainly
discussing the agriculture in America or Brazil, such as the pro-
duction of soybean. We suspect that such failure cases are caused
by the spurious correlations from two perspectives: 1) both Amer-
ica and Brazil are key stackholders in the metal and chemical
markets, which are frequently mentioned by the positive news
of metal and chemical queries; and 2) a large portion of positive
news in the metal and chemical datasets are about the production
or the export/import regulations, which are also the frequently
discussed topics of agriculture commodities with very similar
textual structures.
As to the agriculture dataset, a large portion of the failure cases
are about the current situation of COVID-19. We postulate the
reason to be that the epidemic is a key inuential factor of the
agriculture production and the global export and import. The
term “epidemic” frequently occurs in the positive news of agricul-
ture commodities. Therefore, the model lays strong attention on
this word and wrongly recognizes many epidemic-related news
as inuential news.
Given these failure cases, we believe that it is essential to further
study the spurious correlation in nancial event ranking in future.
Document retrieval. Neural ranking models have become promis-
ing solutions for document retrieval tasks with the help of deep neu-
ral network and continuous word representations [
]. After the
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
Figure 8: A case study showing the top-5 news titles returned by HNR_CNN and existing search engine for the query “soy oil”
on 2019-09-10. The label T/F denotes whether the news is labeled as a positive one in the dataset.
emergence of the powerful deep Transformer models, ne-tuning
the pre-trained language models on target datasets achieves state-
of-the-art performances on various natural language understanding
tasks. Owing to their extremely competitive performances, many
researchers leverage pre-trained language models, such as BERT
], for document retrieval. Nogueira and Cho
the query sentence with the document as input, and uses BERT as a
binary classier to calculate the matching score for each candidate
document of the query, reaching state-of-the-art performance. To
achieve a trade-o between the low eciency and the outstanding
performance of BERT, ColBERT [
] performs a quick “late interac-
tion" over the pre-computed representations of the queries and the
documents produced by BERT, thus accelerating the ranking and
retaining a competitive result. In addition, Zhan et al
. [56]
use the
inner product of BERT pre-computed contextual embeddings as
initial retrieval score, which is the best rst-step retrieval method.
Instead of truncating the whole document, Yilmaz et al
. [55]
the relevance between the query and each sentence of the document
and aggregates the sentence-level scores, which achieves state-of-
the-art performance on news retrieval test collections. Furthermore,
BERT is combined with useful retrieval techniques to pursue better
performance [
], such as an alternative to query expansion [
Despite the success of these Transformer-based models, they are
focused on the query perspective. This paper studies nancial news
ranking where it is critical to combine the perspectives of both
query and document.
Financial news analysis. As an importance source of nancial
information, nancial news analysis has received a surge of atten-
tion from the research community, where the focus is to predict
the price movement of assets with consideration of the relevant
nancial news [
]. These work mainly explores neu-
ral network architectures such as embedding [
], recurrent
neural network [
], and hierarchical attention [
] to facilitate
the asset price prediction. Another line of research focuses on the
sentiment analysis of nancial news [
] which mainly
extend the conventional techniques of sentiment analysis to be able
to capture the property of nancial news such as number intensive.
In an orthogonal direction, this work studies a new task of nancial
news analysis, i.e., nancial event ranking, which can augment the
existing tasks by selecting the inuential news as their inputs to
eliminate some potential noise. Beyond nance, event ranking has
been studied in medicine [
], however, focuses on the prediction
of future events, rather than retrieving the happened events.
Hybrid learning to rank. A line of research has studied hybrid
models for learning to rank tasks, which is largely focused on
personalized recommendation [
]. The target of recommen-
dation is to predict the user preference over items, which naturally
consists of two perspectives: the user and item perspectives. There-
fore, hybrid recommender systems are proposed to jointly consider
the two perspectives, which combines item ranking and user tar-
geting in a single framework. Despite the success of these hybrid
learning to rank in recommendation, these methods are not applica-
ble to the nancial event ranking task. This is because the existing
methods can only handle items with interaction records, whereas all
nancial news are cold-start for queries. Lastly, research on multi-
modal retrieval [
] can also consider dierent perspectives,
but is focused on the heterogeneity across dierent modalities.
In this work, we highlighted the importance of nancial event rank-
ing which is formulated as a learning to rank task. We explored
the central theme of nancial event ranking: from the modal per-
spective by proposing a Hybrid News Ranking framework; and
from the data perspective by building up a labeling system and
constructed three large-scale datasets. We conducted extensive ex-
periments on the constructed datasets. The experimental results
validate the rationality and eectiveness of solving nancial event
ranking through learning to rank. Moreover, the results justify that
the capability of the inuence allocation module to encode query
relations and the benet of the hybrid learning to rank framework.
Lastly, the results point out the issue of spurious correlation in
nancial document analysis, which is also faced in other domain.
In the future, we will consider to tackle the negative transfer
in HNR. Moreover, we will explore techniques to bridge the per-
formance gap between the quantication and allocation modules.
We will also extend the nancial news ranking solutions to serve
more languages. Lastly, we will explore the solution to spurious
correlation in nancial document analysis.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
Scott R Baker, Nicholas Bloom, Steven J Davis, Kyle Kost, Marco Sammon, and
Tasaneeya Viratyosin. 2020. The unprecedented stock market reaction to COVID-
19. The Review of Asset Pricing Studies 10, 4 (2020), 742–758.
Lila Boualili, Jose G Moreno, and Mohand Boughanem. 2020. MarkedBERT: Inte-
grating Traditional IR Cues in Pre-trained Language Models for Passage Retrieval.
In Proceedings of the 43rd International ACM SIGIR Conference on Research and
Development in Information Retrieval. 1977–1980.
Raymond M Brooks, Ajay Patel, and Tie Su. 2003. How the equity market
responds to unanticipated events. The Journal of Business 76, 1 (2003), 109–133.
Remi Cadene, Corentin Dancette, Matthieu Cord, Devi Parikh, et al
2019. Rubi:
Reducing unimodal biases for visual question answering. In Advances in neural
information processing systems. 841–852.
Diego Ceccarelli, Francesco Nidito, and Miles Osborne. 2016. Ranking nancial
tweets. In Proceedings of the 39th International ACM SIGIR conference on Research
and Development in Information Retrieval. 527–528.
Dawei Cheng, Fangzhou Yang, Xiaoyang Wang, Ying Zhang, and Liqing Zhang.
2020. Knowledge Graph-based Event Embedding Framework for Financial Quan-
titative Investments. In Proceedings of the 43rd International ACM SIGIR Conference
on Research and Development in Information Retrieval. 2221–2230.
W Bruce Croft. 2019. The Importance of Interaction for Information Retrieval..
In SIGIR, Vol. 19. 1–2.
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu.
2020. Revisiting Pre-Trained Models for Chinese Natural Language Processing.
arXiv preprint arXiv:2004.13922 (2020).
Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with
contextual neural language modeling. In Proceedings of the 42nd International
ACM SIGIR Conference on Research and Development in Information Retrieval.
Xuan-Hong Dang, Syed Yousaf Shah, and Petros Zerfos. 2019. " The Squawk Bot":
Joint Learning of Time Series and Text Data Modalities for Automated Financial
Information Filtering. arXiv preprint arXiv:1912.10858 (2019).
Shumin Deng, Ningyu Zhang, Wen Zhang, Jiaoyan Chen, Je Z Pan, and Huajun
Chen. 2019. Knowledge-driven stock trend prediction and explanation via tem-
poral convolutional network. In Companion Proceedings of The 2019 World Wide
Web Conference. 678–685.
Ann Devitt and Khurshid Ahmad. 2007. Sentiment polarity identication in
nancial news: A cohesion-based approach. In Proceedings of the 45th annual
meeting of the association of computational linguistics. 984–991.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding. In
NAACL-HLT. ACL, 4171–4186.
Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for
event-driven stock prediction. In Twenty-fourth international joint conference on
articial intelligence.
Xin Du and Kumiko Tanaka-Ishii. 2020. Stock Embeddings Acquired from News
Articles and Price History, and an Application to Portfolio Optimization. In
Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics. 3353–3363.
Eugene F Fama. 1970. Ecient capital markets: A review of theory and empirical
work. The journal of Finance 25, 2 (1970), 383–417.
Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and Tat-Seng Chua.
2019. Temporal relational ranking for stock prediction. ACM Transactions on
Information Systems (TOIS) 37, 2 (2019), 1–30.
Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph
Enhanced Representation Learning for News Recommendation. In Proceedings of
The Web Conference 2020. 2863–2869.
Xianjing Han, Xuemeng Song, Yiyang Yao, Xin-Shun Xu, and Liqiang Nie. 2019.
Neural compatibility modeling with probabilistic knowledge distillation. IEEE
Transactions on Image Processing 29 (2019), 871–882.
Richang Hong, Lei Li, Junjie Cai, Dapeng Tao, Meng Wang, and Qi Tian. 2017.
Coherent semantic-visual indexing for large-scale image retrieval in the cloud.
IEEE Transactions on Image Processing 26, 9 (2017), 4128–4138.
Richang Hong, Yang Yang, Meng Wang, and Xian-Sheng Hua. 2015. Learning
visual semantic relationships for ecient visual retrieval. IEEE Transactions on
Big Data 1, 4 (2015), 152–161.
Jun Hu and Ping Li. 2018. Collaborative multi-objective ranking. In Proceedings of
the 27th ACM International Conference on Information and Knowledge Management.
Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. 2018. Listening
to chaotic whispers: A deep learning framework for news-oriented stock trend
prediction. In Proceedings of the eleventh ACM international conference on web
search and data mining. 261–269.
Qiang Ji, Elie Bouri, Rangan Gupta, and David Roubaud. 2018. Network causality
structures among Bitcoin and other nancial assets: A directed acyclic graph
approach. The Quarterly Review of Economics and Finance 70 (2018), 203–213.
Omar Khattab and Matei Zaharia. 2020. Colbert: Ecient and eective passage
search via contextualized late interaction over bert. In Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information
Retrieval. 39–48.
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised
Learning of Language Representations. In ICLR.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019.
Roberta: A robustly optimized bert pretraining approach. arXiv e-prints (2019).
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization.
arXiv preprint arXiv:1711.05101 (2017).
Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He.
2018. Beyond polarity: interpretable nancial sentiment analysis with hierarchical
query-driven attention. In Proceedings of the 27th International Joint Conference
on Articial Intelligence. 4244–4250.
Ye Ma, Lu Zong, Yikang Yang, and Jionglong Su. 2019. News2vec: News network
embedding with subnode information. In Proceedings of the 2019 Conference on
Empirical Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP). 4845–4854.
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR:
Contextualized embeddings for document ranking. In Proceedings of the 42nd
International ACM SIGIR Conference on Research and Development in Information
Retrieval. 1101–1104.
Ping Nie, Yuyu Zhang, Xiubo Geng, Arun Ramamurthy, Le Song, and Daxin Jiang.
2020. DC-BERT: Decoupling Question and Document for Ecient Contextual
Encoding. In Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval. 1829–1832.
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT.
arXiv preprint arXiv:1901.04085 (2019).
Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-stage
document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019).
Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document
expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019).
Rasaq Otunba, Raimi A Rufai, and Jessica Lin. 2017. Mpr: Multi-objective pairwise
ranking. In Proceedings of the Eleventh ACM Conference on Recommender Systems.
Zhi Qiao, Shiwan Zhao, Cao Xiao, Xiang Li, Yong Qin, and Fei Wang. 2018.
Pairwise-ranking based collaborative recurrent neural networks for clinical event
prediction. In Proceedings of the Twenty-Seventh International Joint Conference on
Articial Intelligence.
Isaac Quaye, Yinping Mu, Braimah Abudu, Ramous Agyare, et al
2016. Review
of Stock Markets’ Reaction to New Events: Evidence from Brexit. Journal of
nancial risk management 5, 04 (2016), 281.
Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian
Stab, and Iryna Gurevych. 2019. Classication and Clustering of Arguments with
Contextualized Word Embeddings. In ACL. ACL, 567–578.
Pengjie Ren, Zhumin Chen, Zhaochun Ren, Evangelos Kanoulas, Christof Monz,
and Maarten de Rijke. 2021. Conversations with Search Engines: SERP-based
Conversational Response Generation. ACM Transactions on Information Systems
(TOIS) (2021).
Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zhumin Chen,
Zhaochun Ren, and Maarten de Rijke. 2021. Wizard of Search Engine: Access
to Information Through Conversations with Search Engines. In Proceedings of
the 44th International ACM SIGIR Conference on Research and Development in
Information Retrieval.
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu,
Mike Gatford, et al
1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995),
Dwaipayan Roy, Debjyoti Paul, Mandar Mitra, and Utpal Garain. 2016. Using
word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608
Caitlin Sadowski and Greg Levin. 2007. Simhash: Hash-based similarity detection.
Technical report, Google (2007).
Thomas Sattler. 2013. Do markets punish left governments? The Journal of
Politics 75, 2 (2013), 343–356.
Herbert A Simon. 1954. Spurious correlation: A causal interpretation. Journal of
the American statistical Association 49, 267 (1954), 467–479.
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to ne-tune bert
for text classication?. In China National Conference on Chinese Computational
Linguistics. Springer, 194–206.
Eric T Swanson. 2020. Measuring the eects of Federal Reser ve forward guidance
and asset purchases on nancial markets. Journal of Monetary Economics (2020).
Matthias W Uhl. 2014. Reuters sentiment and stock returns. Journal of Behavioral
Finance 15, 4 (2014), 287–298.
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
Manuel R Vargas, Carlos EM dos Anjos, Gustavo LG Bichara, and Alexandre G
Evsuko. 2018. Deep leaming for stock market prediction using technical indica-
tors and nancial news articles. In 2018 International Joint Conference on Neural
Networks (IJCNN). IEEE, 1–8.
Jerey Scott Vitter and Philippe Flajolet. 1990. Average-case analysis of algo-
rithms and data structures. In Algorithms and Complexity. Elsevier, 431–524.
Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classication. In
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the
4th International Joint Conference on Natural Language Processing of the AFNLP.
Thomas Wolf,Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue,
Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe
Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu,
Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest,
and Alexander M. Rush. 2019. HuggingFace’s Transformers: State-of-the-art
Natural Language Processing. ArXiv abs/1910.03771 (2019).
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov,
and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language
understanding. In NeuIPS. Curran Associates, Inc., 5754–5764.
Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019.
Cross-domain modeling of sentence-level evidence for document retrieval. In
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Pro-
cessing and the 9th International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP). 3481–3487.
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. Rep-
BERT: Contextualized Text Embeddings for First-Stage Retrieval. arXiv preprint
arXiv:2006.15498 (2020).
Zhenyu Zhang and Juan Yang. 2018. Dual learning based multi-objective pairwise
ranking. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE,
Session 1F: Applications 1
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
... Event Extraction. Event extraction (EE) is a critical task in public opinion monitoring and financial field [8,13,16,33]. It mainly has two key methods, pipeline and joint model. ...
Full-text available
Previous studies about event-level sentiment analysis (SA) usually model the event as a topic, a category or target terms, while the structured arguments (e.g., subject, object, time and location) that have potential effects on the sentiment are not well studied. In this paper, we redefine the task as structured event-level SA and propose an End-to-End Event-level Sentiment Analysis ($\textit{E}^{3}\textit{SA}$) approach to solve this issue. Specifically, we explicitly extract and model the event structure information for enhancing event-level SA. Extensive experiments demonstrate the great advantages of our proposed approach over the state-of-the-art methods. Noting the lack of the dataset, we also release a large-scale real-world dataset with event arguments and sentiment labelling for promoting more researches\footnote{The dataset is available at}.
... The later targets to judge the quality of financial texts to facilitate selecting valuable posts from the large volume data stream. Existing methods mainly solve the problem as a regression task by applying conventional machine learning models [11,31]. For instance, Feng et al. ranks influential news articles to a commodity through document retrieval models. ...
Conference Paper
Full-text available
Financial texts (e.g., economic news) play an important role in predicting stock prices. The effects of texts of different semantics (e.g., launching a product and reporting a small product bug) last for different time horizons. Despite the importance of timing in stock prediction, there is currently no research that accounts for the time horizon of financial texts. Aiming to bridge this gap, we propose a new prediction solution, termed FreqNet, which explicitly associates texts with trading patterns in different frequencies. Equipped with such an association, FreqNet is able to adaptively infer the time horizon of each text without labeled data on time horizon, and aggregate the texts and trading patterns into better stock representations to facilitate price prediction. Extensive experiments on two datasets validate the effectiveness of FreqNet on time horizon-aware text modeling with improvements over state-of-the-art methods.
... Financial data (e.g., financial news, annual financial reports) play a critical role in improving the quality of financial services and minimizing the risks of financial activities (e.g., portfolio selection [43], stock trading strategy analysis [31], stock price movements prediction [14,36]). Existing studies report that financial data from Web media (e.g., financial news and discussion boards) has become increasingly salient for analyzing stock markets [27]. ...
Full-text available
The completeness (in terms of content) of financial documents is a fundamental requirement for investment funds. To ensure completeness, financial regulators spend a huge amount of time for carefully checking every financial document based on the relevant content requirements, which prescribe the information types to be included in financial documents (e.g., the description of shares' issue conditions). Although several techniques have been proposed to automatically detect certain types of information in documents in various application domains, they provide limited support to help regulators automatically identify the text chunks related to financial information types, due to the complexity of financial documents and the diversity of the sentences characterizing an information type. In this paper, we propose FITI, an artificial intelligence (AI)-based method for tracing content requirements in financial documents. Given a new financial document, FITI selects a set of candidate sentences for efficient information type identification. Then, FITI uses a combination of rule-based and data-centric approaches, by leveraging information retrieval (IR) and machine learning (ML) techniques that analyze the words, sentences, and contexts related to an information type, to rank candidate sentences. Finally, using a list of indicator phrases related to each information type, a heuristic-based selector, which considers both the sentence ranking and the domain-specific phrases, determines a list of sentences corresponding to each information type. We evaluated FITI by assessing its effectiveness in tracing financial content requirements in 100 financial documents. Experimental results show that FITI provides accurate identification with average precision and recall values of 0.824 and 0.646, respectively. Furthermore, FITI can detect about 80% of missing information types in financial documents.
In this article, we address the problem of answering complex information needs by conducting conversations with search engines , in the sense that users can express their queries in natural language and directly receive the information they need from a short system response in a conversational manner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agent s (CAs) and Conversational Search (CS). However, they either do not address complex information needs in search scenarios or they are limited to the development of conceptual frameworks and/or laboratory-based user studies. We pursue two goals in this article: (1) the creation of a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines, and (2) the development of a state-of-the-art pipeline for conversations with search engines, Conversations with Search Engines (CaSE), using this dataset. SaaC is built based on a multi-turn conversational search dataset, where we further employ workers from a crowdsourcing platform to summarize each relevant passage into a short, conversational response. CaSE enhances the state-of-the-art by introducing a supporting token identification module and a prior-aware pointer generator, which enables us to generate more accurate responses. We carry out experiments to show that CaSE is able to outperform strong baselines. We also conduct extensive analyses on the SaaC dataset to show where there is room for further improvement beyond CaSE. Finally, we release the SaaC dataset and the code for CaSE and all models used for comparison to facilitate future research on this topic.
The methods of Gürkaynak et al. (2005a) are extended to separately identify surprise changes in the federal funds rate, forward guidance, and large-scale asset purchases (LSAPs) for each FOMC announcement from July 1991 to June 2019. Forward guidance and LSAPs had substantial and highly statistically significant effects on Treasury yields, corporate bond yields, stock prices, and exchange rates, comparable in magnitude to the effects of the federal funds rate in normal times. These effects were all very persistent, with the exception of the very large and perhaps special March 2009 “QE1” announcement for LSAPs.
No previous infectious disease outbreak, including the Spanish Flu, has affected the stock market as forcefully as the COVID-19 pandemic. In fact, previous pandemics left only mild traces on the U.S. stock market. We use text-based methods to develop these points with respect to large daily stock market moves back to 1900 and with respect to overall stock market volatility back to 1985. We also evaluate potential explanations for the unprecedented stock market reaction to the COVID-19 pandemic. The evidence we amass suggests that government restrictions on commercial activity and voluntary social distancing, operating with powerful effects in a service-oriented economy, are the main reasons the U.S. stock market reacted so much more forcefully to COVID-19 than to previous pandemics in 1918–1919, 1957–1958, and 1968.