Conference PaperPDF Available

# Hybrid Learning to Rank for Financial Event Ranking

Authors:

## Abstract and Figures

The financial markets are moved by events such as the issuance of administrative orders. The participants in financial markets (e.g., traders) thus pay constant attention to financial news relevant to the financial asset (e.g., oil) of interest. Due to the large scale of news stream, it is time and labor intensive to manually identify influential events that can move the price of the financial asset, pushing the financial participants to embrace automatic financial event ranking, which has received relatively little scrutiny to date. In this work, we formulate the financial event ranking task, which aims to score financial news (document) according to its influence to the given asset (query). To solve this task, we propose a Hybrid News Ranking framework that, from the asset perspective, evaluates the influence of news articles by comparing their contents; and from the event perspective, accesses the influence over all query assets. Moreover, we resolve the dilemma between the essential requirement of sufficient labels for training the framework and the unaffordable cost of hiring domain experts for labeling the news. In particular, we design a cost-friendly system for news labeling that leverages the knowledge within published financial analyst reports. In this way, we construct three financial event ranking datasets. Extensive experiments on the datasets validate the effectiveness of the proposed framework and the rationality of solving financial event ranking through learning to rank.
Content may be subject to copyright.
Hybrid Learning to Rank for Financial Event Ranking
Fuli Feng12, Moxin Li2, Cheng Luo3, Ritchie Ng2, Tat-Seng Chua2
1Sea-NExT Joint Lab, 2National University of Singapore, 3MegaTech.AI
fulifeng93@gmail.com,limoxin@pku.edu.cn,luocheng@megatechai.com,ritchieng@u.nus.edu,dcscts@nus.edu.sg
ABSTRACT
The nancial markets are moved by events such as the issuance of
administrative orders. The participants in nancial markets (e.g.,
traders) thus pay constant attention to nancial news relevant to
the nancial asset (e.g., oil) of interest. Due to the large scale of
news stream, it is time and labor intensive to manually identify
inuential events that can move the price of the nancial asset,
pushing the nancial participants to embrace automatic nancial
event ranking, which has received relatively little scrutiny to date.
In this work, we formulate the nancial event ranking task,
which aims to score nancial news (document) according to its
inuence to the given asset (query). To solve this task, we propose
aHybrid News Ranking framework that, from the asset perspective,
evaluates the inuence of news articles by comparing their contents;
and from the event perspective, accesses the inuence over all query
assets. Moreover, we resolve the dilemma between the essential
requirement of sucient labels for training the framework and the
unaordable cost of hiring domain experts for labeling the news. In
particular, we design a cost-friendly system for news labeling that
leverages the knowledge within published nancial analyst reports.
In this way, we construct three nancial event ranking datasets.
Extensive experiments on the datasets validate the eectiveness
of the proposed framework and the rationality of solving nancial
event ranking through learning to rank.
CCS CONCEPTS
Information systems Document ltering
;
Information
retrieval
;
Learning to rank
;
Computing methodologies
Learning to rank.
KEYWORDS
learning to rank, document retrieval, nance
ACM Reference Format:
Fuli Feng, Moxin Li, Cheng Luo, Ritchie Ng, Tat-Seng Chua. 2021. Hybrid
Learning to Rank for Financial Event Ranking. In Proceedings of the 44th
International ACM SIGIR Conference on Research and Development in Infor-
mation Retrieval (SIGIR ’21), July 11–15, 2021, Virtual Event, Canada. ACM,
New York, NY, USA, 11 pages. https://doi.org/10.1145/3404835.3462969
Corresponding author. This research is supported by the Sea-NExT Joint Lab.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SIGIR ’21, July 11–15, 2021, Virtual Event, Canada
ACM ISBN 978-1-4503-8037-9/21/07. . . 15.00 https://doi.org/10.1145/3404835.3462969 1 INTRODUCTION The ecient market theory [ 16 ] states that nancial markets quickly impound the publicly available information [ 3 ]. In other words, the prices of nancial assets such as stocks and commodities 1 (see Figure 1(a) for an example) are quickly moved by nancial events especially the unanticipated ones such as the outbreak of infec- tious disease [ 1 ], the declaration of electoral victory [ 45 ] and the announcement of government intervention [48]. For instance, the global stock markets lost about two trillion dollars in value within the 24 hours after the declaration of Brexit results [ 38 ]. It is un- doubtedly essential for nancial participants such as traders and analysts to quickly assess and react to nancial events with po- tential to move the asset price. Consequently, identifying events inuential to the asset of interest has become a heavy workload that burns tremendous time and energy of the nancial participants due to the large volume of news stream 2 . Therefore, nancial event 3 ranking [ 5 , 10 ] is an emergent requirement of great practical value. However, it has received relatively little scrutiny to date. In this work, we formulate nancial event ranking as a learning to rank task where the target asset is viewed as the query to retrieve the candidate news published within a lag until the query date (see Figure 1(b)). The key to solving this task lies in quantifying the inuence to the query asset according to the contents of the news. Intuitively, several widely used information retrieval techniques can be applied to achieve the target, such as document classica- tion [ 39 ], document retrieval [ 25 ] and news recommendation [ 18 ]. For instance, the document retrieval models can learn the connec- tion between the query and news contents from labeled query-news pairs. The models can thus emphasize news mentioning “Brazil” for the query of ferrous since Brazil is the largest exporter of iron ore and frequently occurs in the inuential news of ferrous. However, the direct usage of existing methods is insucient to solve the - nancial event ranking problem due to lacking consideration on the properties of nancial markets. For instance, due to the connection between “Brazil” and ferrous, the existing method will recognize “Brazil” as a feature of inuential news. It will promotes the score of news mentioning Brazil in queries other than ferrous such as gold where Brazil is not a key stakeholder, leading to improper ranking with false positive responses for the query of gold. We argue that the key to bridging this gap lies in scrutinizing the inuence from the perspectives of both asset and news. This is because a news can simultaneously inuence various nancial assets due to their connections [17, 24]. Towards this end, we pro- pose a Hybrid News Ranking (HNR) framework, which combines the asset and news perspectives. In particular, an inuence quan- tication module evaluates the inuence of nancial events from 1Note that the nancial asset of a commodity is the corresponding future contract. 2https://blog.gdeltproject.org/the-datasets- of-gdelt- as-of- february-2016/. 3 We interchangeably use news and event which refer to the textual content of the news article or the textual description of the event. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 233 Figure 1: Illustration of (a) the inuence of nancial events on the price movement of an example asset; and (b) the nancial event ranking task. The candlesticks and bars represent the daily price movement and trading volume, respectively. The blue curve is the Dow Jones Industrial Average (DJI) index, which reects the trend of US stock market. Better viewed in colors. the asset perspective by comparing their contents. From the query perspective, an inuence allocation module accesses the inuence of a news across assets. The inuence allocation module will align the inuence scores of a news on the homogeneous queries (e.g., ferrous and coal) and distance the inuence scores on the hetero- geneous queries (e.g., base metal and coal), which provides clues for eliminating the false positive responses. To ingeniously use such clues and combine the two perspectives, an inuence mixer is carefully devised to learn integration strategies. Labeled data are indispensable for the training of HNR, which largely rely on deep neural networks to encode the query and news [ 25 , 39 ]. However, it is extremely resource consuming to label the inuence of nance events due to the large number of candidates and the reliance on experienced but expensive nancial experts. To resolve the dilemma, we build up a labeling system to identify the positive news for each asset from the corresponding analyst reports, which are published periodically (e.g., daily) and written by domain experts. In particular, the system consists of: mention extraction which extracts events mentioned in the analyst report and mention-news matching which matches the extracted event with news reporting the event. As both stages can be accomplished under the assistance of automatic algorithms or common crowd workers, we largely reduce the cost and construct three large-scale datasets. Extensive experiments on the three datasets validate the eectiveness of the proposed HNR and the rationality of solving the nancial event ranking via learning to rank. The datasets and code are released at: https://github.com/fulifeng/Financial_Event_ Ranking. The main contributions are summarized as follow: We formulate the problem of nancial event ranking and propose aHybrid News Ranking framework. We build up a cost friendly system to label positive news and construct three datasets for nancial news ranking. We conduct extensive experiments that validate the rationality and eectiveness of our proposal. 2 PROBLEM FORMULATION To achieve nancial event ranking, the target is to learn a scoring function ˆ y=f(D,Q|Θ) , which predicts the inuence of a candidate news D to a query Q . Θ denotes the parameters of the function to be learned. D is a list of word IDs that encodes the contents of the news. Q is also a list of word IDs that corresponds to the name of the query asset, e.g., “base metal” 4 . The nancial event ranking task is dierent from the conventional document retrieval [ 25 ] for the following reasons: 1) Queries with the same content (e.g., “base metal”) at dierent time-steps are viewed as dierent queries, but belong to the same type. 2) The candidates to be ranked at dierent time-steps do not overlap with each other. 3) Our problem has xed types of query instead of unlimited query according to content. We denote the query types as a set Qwhere |Q| is the set size. Training. The scoring function should identify the key patterns of news contents that can distinguish inuential news from the common one in a query specic manner. A promising solution to is to learn from labeled historical queries. We thus employ the supervised learning paradigm to optimize the parameters of the scoring function, which is formulated as: ˆ Θ=min ΘÕ (<Q,D>,y) ∈L l(y,ˆ y)+αΘ.(1) L denotes a set of labeled query-news pair where y= 1for inuen- tial news and y= 0for negative samples randomly selected from the remaining candidates. l(·) is a loss function such as the binary cross-entropy loss. αadjusts the strength of regularization. 4 Technically, base metal includes four commodities: lead, copper, nickel, and zinc. In this work, we do not view a specic one of the four commodities as query since they are typically discussed and analyzed as a group. Note that we can easily generalize to specic commodity with its name as the query (e.g., “copper”). Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 234 Figure 2: Illustration of the proposed HNR framework where queries in the same color (e.g., Q1 tand Q2 t) are homogeneous ones, and queries in dierent colors (e.g., Q2 tand Q3 t) are heterogeneous ones. Serving. As shown in Figure 1(b), the learned scoring function can serve for each query Qt at any time-step t . We add the subscript t to distinguish queries at dierent time-steps. Note that the query types at dierent time-steps (e.g., Qt1 and Qt ) are equal. Let Dt denotes the candidate set at time-step t , which consists of news published within a lag (e.g., [tl,t] ). Note that we remove duplicate news reporting the same event through SimHash [ 44 ] to reduce the size of the candidate set. Nevertheless, we still need to score thousands of news to select the top- K most inuential ones for each query. Formally, the serving phase for query Qtis: sort {ˆ yi t=f(Di t,Qt|ˆ Θ)|Di t∈ Dt},(2) where sort(·) denotes a function that sorts the candidate news in a descending order. Moreover, considering that the query set is xed, the serving phase is indeed performing one ranking for each query at the start of the time-step and serve all the coming queries. 3 METHODOLOGY In this section, we introduce the proposed HNR framework. As shown in Figure 2, it consists of three modules: inuence quantica- tion module (Section 3.1), inuence allocation module (Section 3.2), and inuence mixer (Section 3.3). 3.1 Inuence Quantication Module Our main consideration for devising the inuence quantication module is to meticulously assessing the inuence of a news on a given query according to their contents. To achieve the target, the key lies in mining the connections between the query descrip- tion and the news content, which is coherent with the target of document retrieval [ 7 , 25 , 33 ]. As such, we devise the inuence quantication module as a document retrieval model where the in- put is a concatenation of the query and the candidate news [Qt,Di t] and the output is the inuence prediction, i.e., the probability that the Di tis an inuential event of Qt. Formally, ¯ yi t=fq[Qt,Di t]|Θq,(3) where Θq denotes the model parameters to be learned. Inspired by the huge success of pre-trained language model in document retrieval [ 2 , 9 , 25 , 31 33 ], we devise fq(·) based on a deep Trans- former such as BERT [ 13 , 26 ], XLNet [ 54 ] or RoBERTa [ 27 ], which is pre-trained over a large-scale corpus in a self-supervised manner to encode the co-occurrence of words. In particular, we follow the next sentence prediction paradigm [ 33 ] to format the query-news pair as [CLS,Qt,S EP ,Di t] . As shown in Figure 3, the query and the candidate news are concatenated with a [CLS] token at the beginning and a [SEP ] token for separation. Figure 3: Illustration of the inuence quantication module. After passing through the deep Transformer, each token obtains a representation h which encodes the textual patterns. The represen- tation of the [CLS] token h[CLS] is passed through a fully connected (FC) layer to estimate the probability that Di tis a positive news of query Qt. Formally, ¯ yi t=fq[CLS,Qt,S EP ,Di t]|Θq.(4) We can rank the news for query Qt according to the prediction ¯ yi t . 3.2 Inuence Allocation Module Financial researches [ 17 , 24 ] have demonstrated the coupling eects across dierent assets. As such, the inuence of a nancial event on dierent assets is linked to each other rather than independent. We thus further devise an inuence allocation module to evaluate the inuence from the query perspective. Our main consideration of the module is to account for the connections of the queries in the evaluation of news inuence. To achieve the target, the module is expected to consider the whole query set Qt when evaluating the inuence of a news 5 . We thus devise the inuence allocation module as a |Q | -way classication module, which is formulated as: ˜ yi t=fn(Di t|Θn),(5) where ˜ yi t R | Q | denotes the inuence predictions over the query set. In this way, the news inuence is evaluated by the comparison across the queries. Again, inspired by the success of pre-trained language models in text classication, we also devise the inuence allocation module based on Transformer. In particular, the input is formatted as [CLS,Di t] , and the prediction is made through a FC layer from the representation of the [CLS ]token. It should be noted that the FC layer is parameterized by a map- ping matrix W R | Q | ×H where H denotes the dimensionality of the latent representation. In this way, there are separate parameters 5Note that the types of potential queries at dierent time-steps is always | Q |. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 235 for the inuence evaluation of each query, i.e., each row of W corre- sponds to a query. Accordingly, queries with close parameters will be allocated to similar inuence scores and vise versa. Undoubtedly, the parameters of homogeneous queries will be pushed to be close to each other since the queries are inuenced by similar news even the same news sometimes. The heterogeneous queries will thus obtain parameters with relatively large distance. In this way, the inuence allocation module is able to account for the homogeneous and heterogeneous relations between queries (cf. Figure 5). Both the inuence quantication module and the inuence al- location module have pros and cons. In the quantication module (Equation 4), the same classication parameters are shared across all queries, which faces the spurious correlation issue [ 46 ]. The positive news of a query will be assigned an exaggerated score on a heterogeneous query where the news is insucient to inuence. The allocation module can break some spurious correlations across queries owing to the query-specic parameters and the considera- tion of query relations. However, the allocation module ignores the query contents and may suer from information loss. 3.3 Inuence Mixer A natural way to leverage the advantages of both modules is to aggregate their ranking scores. Inspired by the RUBi function [ 4 ], which is widely used for aggregating predictions, a straightforward mixer is formulated as: ˆ yi t=¯ yi tpower (˜ yi t,λ),(6) where ˆ yi t R | Q | is the nal prediction of news Di t over all the queries; and ¯ yi t R | Q | denotes the scores from the quantication module where we gather the output of Equation 4 across all queries. denotes the element-wise product and power (·) is a element-wise power function. λ is a hyper-parameter to balance the contribution of the two modules. While the simple solution can achieve the target of combinining the two perspectives, we consider that the mixer should account for the inter-module connections and inter-query connections within the scores. Inspired by the success of Convolutional Neural Network (CNN) in recognizing local-region patterns, we further devise the inuence mixer module as a CNN, which is formulated as: ˆ yi t=CNN(¯ yi t,˜ yi t).(7) In particular, the CNN consists of a stack layer, column convolution layer, row convolution layer, and FC layer. Stack layer. The stack layer stacks the outputs of the two mod- ules as a matrix Yi t=[¯ yi t,˜ yi t] ∈ R2× | Q | , which can facilitate observing the local-region patterns. Column convolution layer. The column convolution layer con- sists of 1D vertical lters to learn the rules for combining the two predictions. Formally, Ci t=<Fc,Yi t>, (8) where Fc∈ RK×2 denotes the lters of the convolution layer and Ci t∈ RK× | Q | denotes the recognized signals. K is a hyper- parameter to adjust the number of lters. Row convolution layer. Similarly, this layer consists of 1D horizontal lters to recognize the inter-query patterns, which is formulated as, Ri t=<Fr,Ci t>, (9) where Fr∈ RM× | Q | are the M lters and Ri t∈ RK×M represents the recognized signals. FC layer. The FC layer makes the nal prediction to combine the recognized signals, which is formulated as, ˆ yi t=atten(Ri t)W+b,(10) where W∈ R(KM)× | Q | and b R | Q | are parameters to be learned. atten(·) atten a matrix as a vector. In Figure 2(c), we depict the process of the CNN inuence mixer with a simple example. A key consideration for the training of the proposed HNR frame- work is the saving of memory and computation cost. Since both the quantication module and the allocation module consists of deep Transformer, learning the parameters of three modules in an end-to-end manner will double the memory cost, which poses higher requirements on the infrastructure and thus constrains the practical usage of HNR. To reduce cost, a straightforward solution is sharing the Transformer across the two modules. Its disadvantages are twofold: 1) the training objective consists of three components, leading to huge overhead for hyper-parameter tuning; and 2) the two modules are coupled, making them hardly to recognize com- plementary signals. Another solution is to train the three modules separately by optimizing Equation 1 over the labeled dataset L . Below illustrates the training algorithm of HNR. Algorithm 1 Training of HNR. Input: Training data L, types of query Q. 1: Train the inuence quantication module over L; 2: Train the inuence allocation module over L; 3: Collect the inuence score from the two modules; Model inference 4: Train the inuence mixer with the collected inuence scores as features. 4 DATASETS The key to building a HNR is the construction oof labeled datasets. That is, from the historical candidate set Dt , labeling the positive news for the queries Qt . The target is non-trivial to achieve because: 1) the large size of Dt ; and 2) the reliance on experienced analyst to evaluate the inuence. Obviously, it is critical to resolve the reliance on domain expertise, which has indeed been encoded by the analyst reports written by the experts. Typically, on each trading day, we can nd an analyst report for each query asset, which summarizes the market status and discusses the inuential events 6 . As such, we can identify positive news for a query by extracting the events mentioned in the corresponding analyst report. In this way, we construct three datasets that corresponds to the metal, agriculture, and chemical markets. As shown in Table 1, there are three types of queries in each dataset. These datasets have 340, 581, and 200 queries, respectively, which are reasonable size compared to the widely used relevance judgement benchmark TREC7. 6 To write the analyst reports, the domain experts manually identify inuential news. In other words, the target of HNR is to mimic the domain experts. 7https://trec.nist.gov/data/reljudge_eng.html. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 236 Dataset #Queries Types of query #Positive news per query #Candidates per query Metal 340 Base metal, Ferrous, Coal 4.89 2459.58 Agriculture 581 Soy oil, Cotton, Sugar 2.51 2390.89 Chemical 200 PTA, MEG, Rubber 2.53 2414.18 Table 1: Statists of the three constructed datasets. Figure 4: Illustration of the news labeling procedure: a) ex- tracting event mentions from the analyst reports; and b) matching mention with candidate news. 4.1 Data Collection Analyst reports. We select Hongyuan Futures Co., Ltd. as our source 8 to collect historical analyst reports since the reports are published in plain text instead of PDF les. As the company fo- cuses on commodities, we select three commodity markets: metal, agriculture, and chemical, according to the popularity of reports, i.e., the number of reads. For each market, the analyst report is typically published on each trading day and is targeted at a group of commodities, e.g., base metal (copper, aluminium, etc.) and fer- rous (screw thread steel and iron ore). From each market, we select three groups of commodity, leading to three types of query in each dataset. The name of the commodity groups (e.g., “base metal”) are the queries (i.e., Qt) on each trading day t. Candidate news. To match the language of the collected analyst reports, we collect candidate news in Chinese from the largest portal websites in China for both the nancial news and commodity news 9 . We select the news posted within 48 hours before trading day t as the candidate news of query Qt . We set 48 hours since the markets can quickly react to the event [ 3 ], which means that “old” news cannot inuence the markets anymore. In the data collection period from 2018-09 to 2020-06, the size of the candidate set is around 2,400 for each query. In other words, given a query, our task is to select the top-Kinuential news from thousands of candidates. 4.2 Labelling Procedure Figure 4 illustrates the procedure to identify the positive news for a query Qt from the corresponding analyst report, which consists of two phases: mention extraction and mention-news matching. Mention extraction. Analyst reports typically follow the same template for mentioning nancial events. As such, we dene a set of rules based on the section titles and HTML styles to extract event mentions from the collected analyst reports. Averagely, we extract 8 http://www.hongyuanqh.com/hyqhnew/hyyj/index.jsp?1=1&threeMenuid= 00020001001500020001. 9Sina: https://nance.sina.com.cn/, Chinese Finance Online: http://www.jrj.com.cn/. the number of 4.89, 2.51, and 2.55 positive events (cf. Table 1) for the queries in the metal, agriculture, and chemical datasets, respectively. Note that we do not directly merge the extracted mentions into the corresponding candidate set for two reasons: 1) it will lead to duplicate news, leading to biased evaluation; and 2) the mentions are typically rephrased by the analyst with linguistic properties dierent from common news articles. Due to such discrepancy, the model trained on mentions will fail in practical usage where the candidates are all common news articles. Mention-news matching. We thus match the event mention with its corresponding news, i.e., recognizing the news that reports the same event as the mention within the candidate set. To con- trol the cost and quality, we perform the matching in two steps: 1) automatic matching, which evaluates the similarity between the mention and each candidate news; and 2) manual checking where crowd workers check the top-3 most similar news to identify the positive one 10 . Note that checking on whether two pieces of text describe the same event can be done without domain expertise, resolving the reliance on domain experts. As to the similarity eval- uation algorithm, we leverage the public API provided by one of the largest search engines for Chinese news11. In this way, we identify the positive news for more than 99.9% of the extracted event mentions and discard the remaining cases. By checking the contents of the identied positive news, we conrm that these news cover a wide spectral of events aecting the supply and demand of the commodities, such as geopolitical events, gov- ernment policies, company announcements, and strike, indicating the challenge of these datasets. 5 EXPERIMENTS We conduct experiments on the three constructed datasets to an- swer the following research questions: RQ1: To what extent the learning to rank techniques solve the nancial event ranking prob- lem? RQ2: How eective is the proposed HNR as compared to existing document retrieval methods? RQ3: What are the factors that inuence the eectiveness of the proposed HNR? 5.1 Experiment Settings Evaluation protocols. We chronologically split each dataset into training, validation, and testing with a ratio of 7:1:2. That is, the most recent 20 percent queries are treated as testing cases. Fol- lowing conventional document retrieval work [ 25 ], we adopt the evaluation metrics of MAP, MRR (MRR1 and MRR3), and Recall (Rec3, Rec5, and Rec10). We report the average performance over testing queries where larger value indicates better performance. Compared methods. We compare the proposed HNR with ad- vanced document retrieval methods, including: BM25 [42] : This method is still widely used for document re- trieval. We use an open source implementation 12 of the method. ColBERT [25] : It is a Transformer-based ranking method that encodes the query and the document separately with the Trans- former and scores query-document pair by the similarity (inner product) of their representations. 10The manual checking ends at nding a positive news or the third round. 11https://news.baidu.com/. 12https://github.com/dorianbrown/rank_bm25. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 237 Ret [33] : It is also a Transformer-based document retrieval method which concatenates the query and the document as input and uses a binary classication layer to calculate the matching score. It is the same as the inuence quantication module in the proposed HNR. RetQE : It is an extension of Ret equipped with query expan- sion [ 43 ]. For a query Qt , it extracts candidate terms from the positive news Qt1 , and adds the top-tanked terms into the input query according to their cosine similarity with the query. The expanded query is then fed into Ret during both model training and testing. We also include two advanced document classication baselines: ClaM [47] : It ne-tunes the pre-trained Transformer with an additional layer to classify the documents into dierent types of queries. Documents are ranked by the probability over the class corresponds to the type of the input query. It is same as the inuence allocation module in the proposed HNR. ClaS [47] : Similar to ClaM , this method has a classication layer that predicts whether a document is positive or not. That is to say, we ne-tune a pre-trained Transformer for each dataset, where documents are ranked according to the probability given by the Transformer. Implementation details. In addition to BM25, the compared meth- ods are implemented with Pytorch 1.4.0 13 based on HuggingFace’s Transformers[ 53 ]. For pre-training, we use the checkpoint of Chi- nese RoBERTa with Whole Word Masking (named chinese-roberta- wwm-ext) released by [ 8 ]. For training of the inuence quantica- tion module and the inuence allocation module (i.e., ne-tuning RoBERTa), we set the maximum input length as 256 and update model parameters with AdamW [ 28 ]. We set the gradient accumu- lation step as 2, gradient clipping by 2.0, the number of warmup steps as 100, the total training steps as 5,000, and the weight for regularization term (i.e., α ) as 0. The learning rate and batch size are selected according to the validation performance w.r.t. Rec10. As to the inuence mixer, we set the coecient of RUBi (i.e., λ ) as 1, and tune the number of lters in CNN. 5.2 Rationality of Learning to Rank (RQ1) To validate the rationality of formulating the nancial event ranking as a learning to rank task, we rst test the document classication and retrieval methods. Table 2 summarizes the ranking performance of the compared methods on the three datasets: metal, agriculture, and chemical. Note that Upper_Bound represents the performance of knowing the ground truth of the test queries, which can be seen as the performance of domain experts. From the table, we have the following observations: The performance of deep Transformer-based methods are surpris- ingly good on the three datasets. For instance, the performance of ColBERT on metal w.r.t. Rec10 surpasses 0.982, which is very close to the upper bound 0.994. The result means that the nancial participants can access more than 98% of the inuential events by only reading 10 top-ranked news from ColBERT, which will help the participants can to save tremendous amount of time and 13https://pytorch.org/get- started/previous-versions/#v140. Metal MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10 ClaM 0.924 0.925 0.959 0.666 0.835 0.978 ClaS 0.903 0.906 0.943 0.635 0.812 0.964 BM25 0.032 0.019 0.041 0.019 0.044 0.073 ColBERT 0.948 0.981 0.987 0.684 0.834 0.982 Ret 0.929 0.943 0.969 0.666 0.820 0.969 RetQE 0.923 0.906 0.953 0.653 0.833 0.977 Upper_Bound 1.0 1.0 1.0 0.718 0.877 0.994 Agriculture Method MAP MRR1 MRR3 Rec3 Rec5 Rec10 ClaM 0.637 0.542 0.650 0.630 0.778 0.952 ClaS 0.550 0.402 0.545 0.554 0.729 0.878 BM25 0.059 0.028 0.056 0.059 0.075 0.123 ColBERT 0.633 0.505 0.634 0.640 0.814 0.945 Ret 0.640 0.514 0.656 0.657 0.792 0.952 RetQE 0.633 0.523 0.623 0.622 0.767 0.946 Upper_Bound 1.0 1.0 1.0 0.944 0.992 1.0 Chemical MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10 ClaM 0.380 0.216 0.405 0.421 0.529 0.737 ClaS 0.525 0.486 0.563 0.414 0.636 0.814 BM25 0.077 0.0 0.018 0.029 0.089 0.291 ColBERT 0.492 0.351 0.473 0.399 0.609 0.802 Ret 0.592 0.514 0.662 0.561 0.611 0.833 RetQE 0.542 0.459 0.599 0.510 0.625 0.834 Upper_Bound 1.0 1.0 1.0 0.887 0.961 0.997 Table 2: Ranking performance of document classication and document retrieval models on the three datasets. eort. The results thus validate the rationality and eectiveness of learning to rank solutions for nancial event ranking. The retrieval models achieve performance that is comparable to classication models. Across the six metrics, both types of model achieve the best performance in some cases. For instance, ClaM achieves the best Rec5 on the metal market, while ColBERT achieves the best Rec5 on the agriculture market. These results reect that both types have their pros and cons, i.e., neither asset perspective nor the event perspective is sucient to solve nancial event ranking. As such, it is essential to build a hybrid solution to combine the two perspectives. Among the retrieval models, a) BM25 achieves the worst perfor- mance because it only considers the occurrence of query terms. This result highlights the importance of understanding the news content, which means that keyword-based ltering is not applica- ble for nancial news. b) Ret performs better than RetQE in most cases, which means that the benet of query expansion is limited in the nancial event ranking problem. We postulate the reason to be the temporal uctuation of the nancial events, i.e., positive events across dierent time-steps are not closely connected. Among the classication models, ClaM performs better on the metal and agriculture markets, while ClaS achieves better per- formance on the chemical market. Recall that ClaM has separate classication parameters for each type of query while ClaS shares all model parameters across queries. The chemical dataset is a unbalanced dataset with very few queries of MEG. We suspect Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 238 Metal MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10 Ret 0.929 0.943 0.969 0.666 0.820 0.969 ClaM 0.924 0.925 0.959 0.666 0.835 0.978 HNB_RUBi 0.940 0.962 0.978 0.670 0.835 0.982 HNB_CNN 0.944 0.962 0.978 0.670 0.837 0.988 Agriculture Ret 0.640 0.514 0.656 0.657 0.792 0.952 ClaM 0.637 0.542 0.650 0.630 0.778 0.952 HNB_RUBi 0.643 0.523 0.646 0.644 0.802 0.960 HNB_CNN 0.650 0.551 0.673 0.675 0.806 0.945 Chemical MethodMAP MRR1 MRR3 Rec3 Rec5 Rec10 Ret 0.592 0.514 0.662 0.561 0.611 0.833 ClaM 0.380 0.216 0.405 0.421 0.529 0.737 HNR_RUBi 0.427 0.270 0.441 0.401 0.566 0.809 HNR_CNN 0.512 0.459 0.545 0.400 0.627 0.834 Table 3: Performance of HNR on the three datasets. The best and worst performances on each dataset w.r.t. each metric are highlighted with bold font and underline, respectively. that the inferior performance of ClaM on the chemical market is caused by the unbalanced dataset and the insucient training on the rare class. Note that we need to choose between the two classication models to build the HNR framework. In this work, we simply select ClaM. 5.3 Eectiveness of HNR (RQ2) We then investigate the eectiveness of the proposed HNR frame- work. In particular, we compare the ranking performance of four versions of the proposed HNR: 1) HNR_CNN that applies the CNN mixer (Equation 7); 2) HNR_RUBi that applies the RUBi mixer (Equa- tion 6); 3) HNR without the inuence allocation module (i.e., Ret); and 4) HNR without the inuence quantication module (i.e., ClaM). Table 3 shows the performance of the four HNR versions on the three datasets. From the table, we have the following observations: HNB_CNN performs better than HNB_RUBi in most cases, which shows the advantages of the CNN mixer module. That is to say, the query and news perspectives can be more accurately com- bined by considering the local-region patterns (i.e., the inter- module and inter-query connections). In most cases, the hybrid models (i.e., HNB_RUBi and HNB_CNN) achieve performance gain over the single models (i.e., Ret and Clam). In particular, across the cases, the hybrid models typically perform the best and seldom perform the worst among the four versions, which indicates that the hybrid models successfully leverage the pros of both the quantication and the allocation modules. Therefore, these results validate the eectiveness and the rationality of the hybrid learning-to-rank framework in solv- ing nancial event ranking. On the chemical dataset, as compared to Ret, HNB_CNN per- forms slightly better w.r.t. Rec5 and Rec10, while worse w.r.t. the remaining metrics, especially Rec3. The result means that the inuence mixer has to sacrice the ranking at the head (e.g., top three) to recall more positive news on the chemical dataset. We postulate the reason to be that the allocation module (i.e., ClaM) performs much worse than the quantication module (i.e., Ret). That is, the performance gap forces the mixer module to sacrice the head part. Recall that the inferior performance of ClaM might because of the unbalance of the chemical dataset. This result thus suggests a potential future direction to enhance the HNR framework by eliminating the impact of data unbalance. 5.4 In-depth Analysis (RQ3) Figure 5: Visualization of the query relations recognized by the inuence allocation module. Query relations. Recall that the parameters of the classication layer in the inuence allocation module are expected to capture the query relations, i.e., homogeneous queries obtain close parameters. As such, we calculate the cosine similarity between each pair of parameters and depict the similarities in Figure 5. From the gure, we can see that: 1) in the metal market, base metal and ferrous have the highest similarity, which are both widely used in industry applications. Coal and ferrous also exhibit a high similarity since coal is widely used in steel smelting. On the contrary, the smelting of base metals relies on electricity, making their similarity with coal very low. 2) In the agriculture market, soy oil and sugar obtains high similarity since they are both used in the food industry. They thus have less similarity to cotton, which is mainly used in the textile industry. 3) In the chemical industry, MEG and PTA are closely connected since they are both industry raw materials and mainly used together to produce polyester. On the contrary, rubber is another kind of industry materials typically used in dierent applications. To summarize, the results justify that the inuence allocation module indeed captures query relations, i.e., commodities with larger overlap on the uses are closer in the allocation module. Query-specic performance. We further investigate the eective- ness of HNR in a query specic manner by comparing HNR_RUBi and HNR_CNN with Ret, i.e., the single inuence quantication module, over dierent queries. In particular, we select a pair of homogeneous queries: base metal and ferrous, and a pair of het- erogeneous queries: soy oil and cotton, according to whether the commodities have similar uses. Figure 6 shows the performance of the compared methods w.r.t. MAP. We omit the results w.r.t. the other metrics for saving space, which show similar trends. From the gures, we can observe that: 1) On the homogeneous queries, both HNR_RUBi and HNR_CNN show clear performance gain over Ret, which is attributed to the ability of HNR to consider query rela- tions. That is, the evaluation of inuence to a query might facilitate the evaluation for a homogeneous query. 2) On the heterogeneous queries, as compared to Ret, HNR performs better on one query, but worse on the other query. It means that the benet of HNR mainly comes from accounting for the homogeneous query relations. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 239 (a) Homogeneous Queries (b) Heterogeneous Queries Figure 6: Performance of HNR and the quantication mod- ule on homogeneous queries and heterogeneous queries. Combination strategy. We then investigate the combination strate- gies learned by the CNN mixer by observing the changes from the ranking of Ret to the ranking of HNB_CNN. In particular, given a query Qt , we select the top-10 ranked news of Ret, extract the corresponding ranking from HNB_CNN, and then study the posi- tion changes of two types of news: 1) arдmax(˜ yi t)=Qt , which means the allocation module allocates the most inuence to the target query Qt ; and 2) arдmax(˜ yi t),Qt , which means the most inuence is allocated to query other than Q_t . We resort to inver- sion number [ 51 ] to quantitatively analyze the position changes of a type of news. In particular, for a type of news, we label them with 1 and the remaining news with 0 in the rankings from Ret and HNB_CNN, count the inversion number of each ranking, and calculate the increase rate of the inversion number from Ret to HNB_CNN. A positive increase rate means that the HNB_CNN moves the selected type of news to the top positions. Figure 7 il- lustrate the average increase rate over all testing queries in the three datasets. From the gure, we can see that the increase rate of the rst type (i.e., arдmax(˜ yi t)=Qt ) is positive, while the increase rate of the second type (i.e., arдmax(˜ yi t),Qt ) is negative. The result means that the CNN mixer favors news with the maximum score from the allocation module, which is a reasonable strategy to combine the two modules. Case study. We then conduct a qualitative analysis on the ranking results generated by HNR_CNN. As a reference, we compare the retrieval results from a widely used nancial news search engine 14 . Note that we restrict the search engine to return news from either Sina or China Finance Online, so that the search engine has the same candidate set as HNR_CNN for fair comparison. In particular, we set the query content and query date as “soy oil” and 2019-09-10, respectively. Figure 8 shows the returned top-5 news of HNR_CNN and the search engine. Note that the query has two positive events, which are labeled as Tin the gure. From the gure, we can see that the top two retrieved results of the proposed HNR_RNN are exactly the ground truth events. On the contrary, the search engine fails to recall any positive news. Obviously, the search engine focuses more on exact term matching between the query and the news contents, overlooking the inuential news without explicit mention of the query terms. To summarize, this result further validates the eectiveness of solving nancial event ranking as a learning to rank task through the proposed hybrid framework. 14https://news.baidu.com/. Figure 7: The increase rate of the inversion number from the ranking of Ret to the ranking of HNB_CNN, where the news is assigned a binary ag 1/0 according the classication re- sult from the inuence allocation module (i.e., arдmax(˜ yi t)). The blue column corresponds to assigning 1 to news satisfy- ing arдmax(˜ yi t)=Qt. The yellow corresponds to an opposite operation that assigns 1 to news with arдmax(˜ yi t),Qt. Failure case analysis. To further shed light on the capability and the weakness of the proposed HNR, we dive into the failure cases of HNR_CNN on the three datasets. We dene the failure cases in a ranking list as the negative news that is ranked before any positive news of the query. That is to say, the failure cases are news articles occupy opportunities of the positive news. We summarize the properties of the failure cases as follow: For the metal and chemical datasets, the failure cases are mainly discussing the agriculture in America or Brazil, such as the pro- duction of soybean. We suspect that such failure cases are caused by the spurious correlations from two perspectives: 1) both Amer- ica and Brazil are key stackholders in the metal and chemical markets, which are frequently mentioned by the positive news of metal and chemical queries; and 2) a large portion of positive news in the metal and chemical datasets are about the production or the export/import regulations, which are also the frequently discussed topics of agriculture commodities with very similar textual structures. As to the agriculture dataset, a large portion of the failure cases are about the current situation of COVID-19. We postulate the reason to be that the epidemic is a key inuential factor of the agriculture production and the global export and import. The term “epidemic” frequently occurs in the positive news of agricul- ture commodities. Therefore, the model lays strong attention on this word and wrongly recognizes many epidemic-related news as inuential news. Given these failure cases, we believe that it is essential to further study the spurious correlation in nancial event ranking in future. 6 RELATED WORK Document retrieval. Neural ranking models have become promis- ing solutions for document retrieval tasks with the help of deep neu- ral network and continuous word representations [ 40 , 41 ]. After the Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 240 Figure 8: A case study showing the top-5 news titles returned by HNR_CNN and existing search engine for the query “soy oil” on 2019-09-10. The label T/F denotes whether the news is labeled as a positive one in the dataset. emergence of the powerful deep Transformer models, ne-tuning the pre-trained language models on target datasets achieves state- of-the-art performances on various natural language understanding tasks. Owing to their extremely competitive performances, many researchers leverage pre-trained language models, such as BERT [ 13 ], for document retrieval. Nogueira and Cho [33] concatenate the query sentence with the document as input, and uses BERT as a binary classier to calculate the matching score for each candidate document of the query, reaching state-of-the-art performance. To achieve a trade-o between the low eciency and the outstanding performance of BERT, ColBERT [ 25 ] performs a quick “late interac- tion" over the pre-computed representations of the queries and the documents produced by BERT, thus accelerating the ranking and retaining a competitive result. In addition, Zhan et al . [56] use the inner product of BERT pre-computed contextual embeddings as initial retrieval score, which is the best rst-step retrieval method. Instead of truncating the whole document, Yilmaz et al . [55] judge the relevance between the query and each sentence of the document and aggregates the sentence-level scores, which achieves state-of- the-art performance on news retrieval test collections. Furthermore, BERT is combined with useful retrieval techniques to pursue better performance [ 34 ], such as an alternative to query expansion [ 35 ]. Despite the success of these Transformer-based models, they are focused on the query perspective. This paper studies nancial news ranking where it is critical to combine the perspectives of both query and document. Financial news analysis. As an importance source of nancial information, nancial news analysis has received a surge of atten- tion from the research community, where the focus is to predict the price movement of assets with consideration of the relevant nancial news [ 14 , 15 , 23 , 30 , 50 ]. These work mainly explores neu- ral network architectures such as embedding [ 6 , 11 , 30 ], recurrent neural network [ 14 , 50 ], and hierarchical attention [ 23 ] to facilitate the asset price prediction. Another line of research focuses on the sentiment analysis of nancial news [ 12 , 29 , 49 , 52 ] which mainly extend the conventional techniques of sentiment analysis to be able to capture the property of nancial news such as number intensive. In an orthogonal direction, this work studies a new task of nancial news analysis, i.e., nancial event ranking, which can augment the existing tasks by selecting the inuential news as their inputs to eliminate some potential noise. Beyond nance, event ranking has been studied in medicine [ 37 ], however, focuses on the prediction of future events, rather than retrieving the happened events. Hybrid learning to rank. A line of research has studied hybrid models for learning to rank tasks, which is largely focused on personalized recommendation [ 22 , 36 , 57 ]. The target of recommen- dation is to predict the user preference over items, which naturally consists of two perspectives: the user and item perspectives. There- fore, hybrid recommender systems are proposed to jointly consider the two perspectives, which combines item ranking and user tar- geting in a single framework. Despite the success of these hybrid learning to rank in recommendation, these methods are not applica- ble to the nancial event ranking task. This is because the existing methods can only handle items with interaction records, whereas all nancial news are cold-start for queries. Lastly, research on multi- modal retrieval [ 19 21 ] can also consider dierent perspectives, but is focused on the heterogeneity across dierent modalities. 7 CONCLUSION In this work, we highlighted the importance of nancial event rank- ing which is formulated as a learning to rank task. We explored the central theme of nancial event ranking: from the modal per- spective by proposing a Hybrid News Ranking framework; and from the data perspective by building up a labeling system and constructed three large-scale datasets. We conducted extensive ex- periments on the constructed datasets. The experimental results validate the rationality and eectiveness of solving nancial event ranking through learning to rank. Moreover, the results justify that the capability of the inuence allocation module to encode query relations and the benet of the hybrid learning to rank framework. Lastly, the results point out the issue of spurious correlation in nancial document analysis, which is also faced in other domain. In the future, we will consider to tackle the negative transfer in HNR. Moreover, we will explore techniques to bridge the per- formance gap between the quantication and allocation modules. We will also extend the nancial news ranking solutions to serve more languages. Lastly, we will explore the solution to spurious correlation in nancial document analysis. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 241 REFERENCES [1] Scott R Baker, Nicholas Bloom, Steven J Davis, Kyle Kost, Marco Sammon, and Tasaneeya Viratyosin. 2020. The unprecedented stock market reaction to COVID- 19. The Review of Asset Pricing Studies 10, 4 (2020), 742–758. [2] Lila Boualili, Jose G Moreno, and Mohand Boughanem. 2020. MarkedBERT: Inte- grating Traditional IR Cues in Pre-trained Language Models for Passage Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1977–1980. [3] Raymond M Brooks, Ajay Patel, and Tie Su. 2003. How the equity market responds to unanticipated events. The Journal of Business 76, 1 (2003), 109–133. [4] Remi Cadene, Corentin Dancette, Matthieu Cord, Devi Parikh, et al . 2019. Rubi: Reducing unimodal biases for visual question answering. In Advances in neural information processing systems. 841–852. [5] Diego Ceccarelli, Francesco Nidito, and Miles Osborne. 2016. Ranking nancial tweets. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 527–528. [6] Dawei Cheng, Fangzhou Yang, Xiaoyang Wang, Ying Zhang, and Liqing Zhang. 2020. Knowledge Graph-based Event Embedding Framework for Financial Quan- titative Investments. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2221–2230. [7] W Bruce Croft. 2019. The Importance of Interaction for Information Retrieval.. In SIGIR, Vol. 19. 1–2. [8] Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2020. Revisiting Pre-Trained Models for Chinese Natural Language Processing. arXiv preprint arXiv:2004.13922 (2020). [9] Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985–988. [10] Xuan-Hong Dang, Syed Yousaf Shah, and Petros Zerfos. 2019. " The Squawk Bot": Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering. arXiv preprint arXiv:1912.10858 (2019). [11] Shumin Deng, Ningyu Zhang, Wen Zhang, Jiaoyan Chen, Je Z Pan, and Huajun Chen. 2019. Knowledge-driven stock trend prediction and explanation via tem- poral convolutional network. In Companion Proceedings of The 2019 World Wide Web Conference. 678–685. [12] Ann Devitt and Khurshid Ahmad. 2007. Sentiment polarity identication in nancial news: A cohesion-based approach. In Proceedings of the 45th annual meeting of the association of computational linguistics. 984–991. [13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. ACL, 4171–4186. [14] Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for event-driven stock prediction. In Twenty-fourth international joint conference on articial intelligence. [15] Xin Du and Kumiko Tanaka-Ishii. 2020. Stock Embeddings Acquired from News Articles and Price History, and an Application to Portfolio Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3353–3363. [16] Eugene F Fama. 1970. Ecient capital markets: A review of theory and empirical work. The journal of Finance 25, 2 (1970), 383–417. [17] Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo, Yiqun Liu, and Tat-Seng Chua. 2019. Temporal relational ranking for stock prediction. ACM Transactions on Information Systems (TOIS) 37, 2 (2019), 1–30. [18] Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph Enhanced Representation Learning for News Recommendation. In Proceedings of The Web Conference 2020. 2863–2869. [19] Xianjing Han, Xuemeng Song, Yiyang Yao, Xin-Shun Xu, and Liqiang Nie. 2019. Neural compatibility modeling with probabilistic knowledge distillation. IEEE Transactions on Image Processing 29 (2019), 871–882. [20] Richang Hong, Lei Li, Junjie Cai, Dapeng Tao, Meng Wang, and Qi Tian. 2017. Coherent semantic-visual indexing for large-scale image retrieval in the cloud. IEEE Transactions on Image Processing 26, 9 (2017), 4128–4138. [21] Richang Hong, Yang Yang, Meng Wang, and Xian-Sheng Hua. 2015. Learning visual semantic relationships for ecient visual retrieval. IEEE Transactions on Big Data 1, 4 (2015), 152–161. [22] Jun Hu and Ping Li. 2018. Collaborative multi-objective ranking. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1363–1372. [23] Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. 2018. Listening to chaotic whispers: A deep learning framework for news-oriented stock trend prediction. In Proceedings of the eleventh ACM international conference on web search and data mining. 261–269. [24] Qiang Ji, Elie Bouri, Rangan Gupta, and David Roubaud. 2018. Network causality structures among Bitcoin and other nancial assets: A directed acyclic graph approach. The Quarterly Review of Economics and Finance 70 (2018), 203–213. [25] Omar Khattab and Matei Zaharia. 2020. Colbert: Ecient and eective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48. [26] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR. [27] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv e-prints (2019). arXiv:1907.11692 [28] Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017). [29] Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: interpretable nancial sentiment analysis with hierarchical query-driven attention. In Proceedings of the 27th International Joint Conference on Articial Intelligence. 4244–4250. [30] Ye Ma, Lu Zong, Yikang Yang, and Jionglong Su. 2019. News2vec: News network embedding with subnode information. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4845–4854. [31] Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1101–1104. [32] Ping Nie, Yuyu Zhang, Xiubo Geng, Arun Ramamurthy, Le Song, and Daxin Jiang. 2020. DC-BERT: Decoupling Question and Document for Ecient Contextual Encoding. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1829–1832. [33] Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019). [34] Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, and Jimmy Lin. 2019. Multi-stage document ranking with BERT. arXiv preprint arXiv:1910.14424 (2019). [35] Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019). [36] Rasaq Otunba, Raimi A Rufai, and Jessica Lin. 2017. Mpr: Multi-objective pairwise ranking. In Proceedings of the Eleventh ACM Conference on Recommender Systems. 170–178. [37] Zhi Qiao, Shiwan Zhao, Cao Xiao, Xiang Li, Yong Qin, and Fei Wang. 2018. Pairwise-ranking based collaborative recurrent neural networks for clinical event prediction. In Proceedings of the Twenty-Seventh International Joint Conference on Articial Intelligence. [38] Isaac Quaye, Yinping Mu, Braimah Abudu, Ramous Agyare, et al . 2016. Review of Stock Markets’ Reaction to New Events: Evidence from Brexit. Journal of nancial risk management 5, 04 (2016), 281. [39] Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. Classication and Clustering of Arguments with Contextualized Word Embeddings. In ACL. ACL, 567–578. [40] Pengjie Ren, Zhumin Chen, Zhaochun Ren, Evangelos Kanoulas, Christof Monz, and Maarten de Rijke. 2021. Conversations with Search Engines: SERP-based Conversational Response Generation. ACM Transactions on Information Systems (TOIS) (2021). [41] Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zhumin Chen, Zhaochun Ren, and Maarten de Rijke. 2021. Wizard of Search Engine: Access to Information Through Conversations with Search Engines. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. [42] Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al . 1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995), 109. [43] Dwaipayan Roy, Debjyoti Paul, Mandar Mitra, and Utpal Garain. 2016. Using word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608 (2016). [44] Caitlin Sadowski and Greg Levin. 2007. Simhash: Hash-based similarity detection. Technical report, Google (2007). [45] Thomas Sattler. 2013. Do markets punish left governments? The Journal of Politics 75, 2 (2013), 343–356. [46] Herbert A Simon. 1954. Spurious correlation: A causal interpretation. Journal of the American statistical Association 49, 267 (1954), 467–479. [47] Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to ne-tune bert for text classication?. In China National Conference on Chinese Computational Linguistics. Springer, 194–206. [48] Eric T Swanson. 2020. Measuring the eects of Federal Reser ve forward guidance and asset purchases on nancial markets. Journal of Monetary Economics (2020). [49] Matthias W Uhl. 2014. Reuters sentiment and stock returns. Journal of Behavioral Finance 15, 4 (2014), 287–298. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 242 [50] Manuel R Vargas, Carlos EM dos Anjos, Gustavo LG Bichara, and Alexandre G Evsuko. 2018. Deep leaming for stock market prediction using technical indica- tors and nancial news articles. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8. [51] Jerey Scott Vitter and Philippe Flajolet. 1990. Average-case analysis of algo- rithms and data structures. In Algorithms and Complexity. Elsevier, 431–524. [52] Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classication. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 235–243. [53] Thomas Wolf,Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2019. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. ArXiv abs/1910.03771 (2019). [54] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In NeuIPS. Curran Associates, Inc., 5754–5764. [55] Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Cross-domain modeling of sentence-level evidence for document retrieval. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Pro- cessing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3481–3487. [56] Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. Rep- BERT: Contextualized Text Embeddings for First-Stage Retrieval. arXiv preprint arXiv:2006.15498 (2020). [57] Zhenyu Zhang and Juan Yang. 2018. Dual learning based multi-objective pairwise ranking. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–7. Session 1F: Applications 1 SIGIR ’21, July 11–15, 2021, Virtual Event, Canada 243 ... Event Extraction. Event extraction (EE) is a critical task in public opinion monitoring and financial field [8,13,16,33]. It mainly has two key methods, pipeline and joint model. ... Preprint Full-text available Previous studies about event-level sentiment analysis (SA) usually model the event as a topic, a category or target terms, while the structured arguments (e.g., subject, object, time and location) that have potential effects on the sentiment are not well studied. In this paper, we redefine the task as structured event-level SA and propose an End-to-End Event-level Sentiment Analysis (\textit{E}^{3}\textit{SA}\$) approach to solve this issue. Specifically, we explicitly extract and model the event structure information for enhancing event-level SA. Extensive experiments demonstrate the great advantages of our proposed approach over the state-of-the-art methods. Noting the lack of the dataset, we also release a large-scale real-world dataset with event arguments and sentiment labelling for promoting more researches\footnote{The dataset is available at https://github.com/zhangqi-here/E3SA}.
... The later targets to judge the quality of financial texts to facilitate selecting valuable posts from the large volume data stream. Existing methods mainly solve the problem as a regression task by applying conventional machine learning models [11,31]. For instance, Feng et al. ranks influential news articles to a commodity through document retrieval models. ...
Conference Paper
Full-text available
Financial texts (e.g., economic news) play an important role in predicting stock prices. The effects of texts of different semantics (e.g., launching a product and reporting a small product bug) last for different time horizons. Despite the importance of timing in stock prediction, there is currently no research that accounts for the time horizon of financial texts. Aiming to bridge this gap, we propose a new prediction solution, termed FreqNet, which explicitly associates texts with trading patterns in different frequencies. Equipped with such an association, FreqNet is able to adaptively infer the time horizon of each text without labeled data on time horizon, and aggregate the texts and trading patterns into better stock representations to facilitate price prediction. Extensive experiments on two datasets validate the effectiveness of FreqNet on time horizon-aware text modeling with improvements over state-of-the-art methods.
... Financial data (e.g., financial news, annual financial reports) play a critical role in improving the quality of financial services and minimizing the risks of financial activities (e.g., portfolio selection [43], stock trading strategy analysis [31], stock price movements prediction [14,36]). Existing studies report that financial data from Web media (e.g., financial news and discussion boards) has become increasingly salient for analyzing stock markets [27]. ...
Preprint
Full-text available
The completeness (in terms of content) of financial documents is a fundamental requirement for investment funds. To ensure completeness, financial regulators spend a huge amount of time for carefully checking every financial document based on the relevant content requirements, which prescribe the information types to be included in financial documents (e.g., the description of shares' issue conditions). Although several techniques have been proposed to automatically detect certain types of information in documents in various application domains, they provide limited support to help regulators automatically identify the text chunks related to financial information types, due to the complexity of financial documents and the diversity of the sentences characterizing an information type. In this paper, we propose FITI, an artificial intelligence (AI)-based method for tracing content requirements in financial documents. Given a new financial document, FITI selects a set of candidate sentences for efficient information type identification. Then, FITI uses a combination of rule-based and data-centric approaches, by leveraging information retrieval (IR) and machine learning (ML) techniques that analyze the words, sentences, and contexts related to an information type, to rank candidate sentences. Finally, using a list of indicator phrases related to each information type, a heuristic-based selector, which considers both the sentence ranking and the domain-specific phrases, determines a list of sentences corresponding to each information type. We evaluated FITI by assessing its effectiveness in tracing financial content requirements in 100 financial documents. Experimental results show that FITI provides accurate identification with average precision and recall values of 0.824 and 0.646, respectively. Furthermore, FITI can detect about 80% of missing information types in financial documents.
Conference Paper
Full-text available
Article
In this article, we address the problem of answering complex information needs by conducting conversations with search engines , in the sense that users can express their queries in natural language and directly receive the information they need from a short system response in a conversational manner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agent s (CAs) and Conversational Search (CS). However, they either do not address complex information needs in search scenarios or they are limited to the development of conceptual frameworks and/or laboratory-based user studies. We pursue two goals in this article: (1) the creation of a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines, and (2) the development of a state-of-the-art pipeline for conversations with search engines, Conversations with Search Engines (CaSE), using this dataset. SaaC is built based on a multi-turn conversational search dataset, where we further employ workers from a crowdsourcing platform to summarize each relevant passage into a short, conversational response. CaSE enhances the state-of-the-art by introducing a supporting token identification module and a prior-aware pointer generator, which enables us to generate more accurate responses. We carry out experiments to show that CaSE is able to outperform strong baselines. We also conduct extensive analyses on the SaaC dataset to show where there is room for further improvement beyond CaSE. Finally, we release the SaaC dataset and the code for CaSE and all models used for comparison to facilitate future research on this topic.
Article
The methods of Gürkaynak et al. (2005a) are extended to separately identify surprise changes in the federal funds rate, forward guidance, and large-scale asset purchases (LSAPs) for each FOMC announcement from July 1991 to June 2019. Forward guidance and LSAPs had substantial and highly statistically significant effects on Treasury yields, corporate bond yields, stock prices, and exchange rates, comparable in magnitude to the effects of the federal funds rate in normal times. These effects were all very persistent, with the exception of the very large and perhaps special March 2009 “QE1” announcement for LSAPs.
Article
No previous infectious disease outbreak, including the Spanish Flu, has affected the stock market as forcefully as the COVID-19 pandemic. In fact, previous pandemics left only mild traces on the U.S. stock market. We use text-based methods to develop these points with respect to large daily stock market moves back to 1900 and with respect to overall stock market volatility back to 1985. We also evaluate potential explanations for the unprecedented stock market reaction to the COVID-19 pandemic. The evidence we amass suggests that government restrictions on commercial activity and voluntary social distancing, operating with powerful effects in a service-oriented economy, are the main reasons the U.S. stock market reacted so much more forcefully to COVID-19 than to previous pandemics in 1918–1919, 1957–1958, and 1968.