Conference Paper

Long-context Transformers: A survey

Conference Paper

Long-context Transformers: A survey

If you want to read the PDF, try requesting it from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We cover the key notions of modeling transformations using neural networks [ [4], [5], [6], [10], [12], [13]], learning and analyzing transformation-invariant representations [ [1], [2], [7], [11]], as well as how deep learning has ushered in a new era of machine learning outside of computer vision in this section [ [3], [8], [9]]. ...
... This causes Transformers inefficiency while working with long sequences. Therefore, lots of researchers turn towards solving this problem in an optimal way using stochastic and heuristic approaches [13]. ...
Chapter
Full-text available
The future in which autonomous vehicles will become commonplace is very near. The biggest companies around the world are testing autonomous vehicles. But despite the rapid progress in this area, there are still unresolved technical problems that prevent the spreading of self-driving vehicles. As a result, we are still quite far from “self-driving” cars, even though marketing is the opposite of us. Of course, driver assistance technologies capable of keeping the lane, braking, and following the road rules (under human supervision) are entering the market thanks to Tesla.Nowadays, there is a problem; an autonomous car may suddenly stop. Some colors cause panic in self-driving vehicles, becoming a safety threat. This research paper approached a model using a deep neural network that identifies color combinations to prevent panic in autonomous cars by combining outputs from event-based cameras. Finally, we show the advantages of using event-based vision, and this approach outperforms algorithms based on standard cameras.
Article
Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck, as one needs to store the activations in order to calculate gradients using backpropagation. We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer's activations can be reconstructed exactly from the next layer's. Therefore, the activations for most layers need not be stored in memory during backpropagation. We demonstrate the effectiveness of RevNets on CIFAR-10, CIFAR-100, and ImageNet, establishing nearly identical classification accuracy to equally-sized ResNets, even though the activation storage requirements are independent of depth.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • devlin
Longformer: The long-document transformer
  • Iz Beltagy
  • E Matthew
  • Arman Peters
  • Cohan
Linformer: Self-attention with linear complexity
  • wang
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
  • dai
Reformer: The Efficient Transformer
  • Nikita Kitaev
  • Łukasz Kaiser
  • Anselm Levskaya
Synthesizer: Rethinking Self-Attention for Transformer Models
  • tay
Long Range Arena : A Benchmark for Efficient Transformers
  • tay
Big Bird: Transformers for Longer Sequences
  • zaheer
Longformer: The long-document transformer
  • beltagy
Reformer: The Efficient Transformer
  • kitaev
Generating long sequences with sparse transformers
  • child
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
  • dosovitskiy
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
  • katharopoulos