Figure 1 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
A) Statistics on the queries and passages of MS-MARCO corpus. The queries are often in question or phrase format, having 6 words on average. However, the passages are mostly declarative paragraphs which have 56 words on average. B) Example queries and passages in MS-MARCO corpus.
Source publication
Passage retrieval aims to retrieve relevant passages from large collections of the open-domain corpus. Contextual Masked Auto-Encoding has been proven effective in representation bottleneck pre-training of a monolithic dual-encoder for passage retrieval. Siamese or fully separated dual-encoders are often adopted as basic retrieval architecture in t...
Context in source publication
Context 1
... often adopts Siamese or fully separated manners for encoding queries and passages into their latent embedding spaces. We statically analyze the average length and syntax of the query and passage sets (Figure 1) in MS-MARCO ( Nguyen et al., 2016) collections 1 , a large real-world web search dataset. Statistics show significant differences in the traits of queries and passages. ...
Similar publications
Models based on the extended SU(3) C × SU(3) L × U(1) X (331) gauge group usually follow a common pattern: two families of left-handed quarks are placed in anti-triplet representations of the SU(3) L group; the remaining quark family, as well as the left-handed leptons, are assigned to triplets (or vice-versa). In this work we present a flipped 331...
Person Re-Identification (ReID) aims to retrieve relevant individuals in non-overlapping camera images and has a wide range of applications in the field of public safety. In recent years, with the development of Vision Transformer (ViT) and self-supervised learning techniques, the performance of person ReID based on self-supervised pre-training has...
We propose an approach to referring expression generation (REG) in visually grounded dialogue that is meant to produce referring expressions (REs) that are both discriminative and discourse-appropriate. Our method constitutes a two-stage process. First, we model REG as a text- and image-conditioned next-token prediction task. REs are autoregressive...
Person Re-Identification (ReID) aims to retrieve relevant individuals in non-overlapping camera images and has a wide range of applications in the field of public safety. In recent years, with the development of Vision Transformer (ViT) and self-supervised learning techniques, the performance of person ReID based on self-supervised pre-training has...
Due to the rare occurrence of anomalous events, a typical approach to anomaly detection is to train an autoencoder (AE) with normal data only so that it learns the patterns or representations of the normal training data. At test time, the trained AE is expected to well reconstruct normal but to poorly reconstruct anomalous data. However, contrary t...