Ashish Vaswani’s research while affiliated with Mountain View College and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (12)


Decoding the neural representation of story meanings across languages
  • Article

September 2017

·

361 Reads

·

85 Citations

·

·

·

[...]

·

Drawing from a common lexicon of semantic units, humans fashion narratives whose meaning transcends that of their individual utterances. However, while brain regions that represent lower-level semantic units, such as words and sentences, have been identified, questions remain about the neural representation of narrative comprehension, which involves inferring cumulative meaning. To address these questions, we exposed English, Mandarin, and Farsi native speakers to native language translations of the same stories during fMRI scanning. Using a new technique in natural language processing, we calculated the distributed representations of these stories (capturing the meaning of the stories in high-dimensional semantic space), and demonstrate that using these representations we can identify the specific story a participant was reading from the neural data. Notably, this was possible even when the distributed representations were calculated using stories in a different language than the participant was reading. Our results reveal that identification relied on a collection of brain regions most prominently located in the default mode network. These results demonstrate that neuro-semantic encoding of narratives happens at levels higher than individual semantic units and that this encoding is systematic across both individuals and languages. Hum Brain Mapp, 2017.


One Model To Learn Them All

June 2017

·

477 Reads

·

230 Citations

Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.


Attention Is All You Need

June 2017

·

28,563 Reads

·

110,953 Citations

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.


Decoding the Neural Representation of Story Meanings across Languages

March 2017

·

143 Reads

·

2 Citations

Drawing from a common lexicon of semantic units, humans fashion narratives whose meaning transcends that of their individual utterances. However, while brain regions that represent lower-level semantic units, such as words and sentences, have been identified, questions remain about the neural representation of narrative comprehension, which involves inferring cumulative meaning. To address these questions, we exposed English, Mandarin and Farsi native speakers to native language translations of the same stories during fMRI scanning. Using a new technique in natural language processing, we calculated the distributed representations of these stories (capturing the meaning of the stories in high-dimensional semantic space), and demonstrate that using these representations we can identify the specific story a participant was reading from the neural data. Notably, this was possible even when the distributed representations were calculated using stories in a different language than the participant was reading. Relying on over 44 billion classifications, our results reveal that identification relied on a collection of brain regions most prominently located in the default mode network. These results demonstrate that neuro-semantic encoding of narratives happens at levels higher than individual semantic units and that this encoding is systematic across both individuals and languages.


Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States

December 2016

·

9 Reads

·

13 Citations

Transactions of the Association for Computational Linguistics

Transition-based approaches based on local classification are attractive for dependency parsing due to their simplicity and speed, despite producing results slightly below the state-of-the-art. In this paper, we propose a new approach for approximate structured inference for transition-based parsing that produces scores suitable for global scoring using local models. This is accomplished with the introduction of error states in local training, which add information about incorrect derivation paths typically left out completely in locally-trained models. Using neural networks for our local classifiers, our approach achieves 93.61% accuracy for transition-based dependency parsing in English.


Unsupervised Neural Hidden Markov Models
  • Article
  • Full-text available

September 2016

·

65 Reads

·

6 Citations

In this work, we present the first results for neuralizing an Unsupervised Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach outperforms existing generative models and is competitive with the state-of-the-art though with a simpler model easily extended to include additional context.

Download

Figure 1: We add a language model between supertags.
Supertagging with LSTMs

June 2016

·

344 Reads

·

86 Citations

In this paper we present new state-of-the-art performance on CCG supertagging and parsing. Our model outperforms existing approaches by an absolute gain of 1.5%. We analyze the performance of several neural models and demonstrate that while feed-forward architectures can compete with bidirectional LSTMs on POS tagging, models that encode the complete sentence are necessary for the long range syntactic information encoded in supertags.





Citations (12)


... Modern neural constituency parsers typically fall into one of three camps: chart-based parsers (Stern et al., 2017;Gaddy et al., 2018;Kitaev and Klein, 2018;Mrini et al., 2020;Tian et al., 2020;Tenney et al., 2019;Jawahar et al., 2019;Li et al., 2020;Murty et al., 2022), transitionbased parsers (Zhang and Clark, 2009;Cross and Huang, 2016;Vaswani and Sagae, 2016;Vilares and Gómez-Rodríguez, 2018;Fernandez Astudillo et al., 2020), or sequence-to-sequence-style parsers (Vinyals et al., 2015;Kamigaito et al., 2017;Suzuki et al., 2018;Fernández-González and Gómez-Rodríguez, 2020;Nguyen et al., 2021;Yang and Tu, 2022). ...

Reference:

Approximating CKY with Transformers
Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States
  • Citing Article
  • December 2016

Transactions of the Association for Computational Linguistics

... Brain activity evoked by concepts when watching naturalistic movies, as well as when listening to passages of spoken narrative, has been found to share a common basis across individuals (e.g., [20,21,27]). Recent studies suggest commonality of neural representations of sentences across individuals [52], while others show that the brain-representations of concepts ( [11,37]), and of sentences [55] in different languages, have a common basis. These recent neuroimaging studies suggest that concept-and language-related mental representations are based on distinguishable patterns of brain activation and have a common cortical activation-picture across individuals and across languages. ...

Decoding the Neural Representation of Story Meanings across Languages
  • Citing Preprint
  • March 2017

... Focusing on higher-level discourse representations, Dehghani et al. (2017) showed that vector representation of narratives (i.e., embeddings reflecting their high-level meaning) can predict brain responses in the default mode network, and that the story representation supporting the mapping could be switched across languages; however, differently from our procedure, the (decoding) models were not transferred across languages but rather re-trained with story representations from other languages, given that the story representations could not be projected onto the same space. This study therefore leaves open the question of whether the general mapping between linguistic and neural representations is actually shared across languages. ...

Decoding the neural representation of story meanings across languages
  • Citing Article
  • September 2017

... While such multi-task learning architecture have been used in natural language processing ( Collobert and Weston, 2008 ), machine translation ( Johnson et al., 2017 ), speech recognition ( Seltzer and Droppo, 2013 ), computer vision problems ( Zhang et al., 2014 ), and content recommendation ( Ma et al., 2018 ), but rarely been applied in spatio-temporal forecasting problems in ride-hailing system. Previously, for spatio-temporal forecasting problems in ride-hailing system, deep learning was applied to deal only with the problem at hand, which limits the efficiency of deep learning since repeating efforts are required for each problem ( Kaiser et al., 2017 ). In this study, a spatio-temporal multi-task learning architecture with mixture-of-experts is developed in this study for forecasting multiple spatio-temporal tasks in a city as well as across cities. ...

One Model To Learn Them All
  • Citing Article
  • June 2017

... Machine Translation (MT) and Large Language Models (LLMs) represent the most sophisticated intersection of academic research and business application in the translation industry. MT systems, such as Google Translate or DeepL, are built on decades of academic research in computational linguistics (Vaswani et al., 2017), while LLMs like GPT-4 leverage vast amounts of data and advanced algorithms to produce highly accurate translations (Koehn, 2020). ...

Attention Is All You Need
  • Citing Article
  • June 2017

... By the way, NER in the high-resource language (i.e., English) is also called high-resource NER. English NER models [11] with good performance are pretrained thanks in part to these sufficient labeled resources. By contrast, the languages other than English are still not fully studied due to the lack of labeled data. ...

Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning
  • Citing Conference Paper
  • January 2016

... Reverse Embedding: Reverse word embedding, also called word decoding or word reconstruction, is a crucial process in converting a numerical representation, typically in the form of a vector, back into its corresponding word or textual representation(Please refer to Fig. 4). The inverse operation of word embedding maps continuous vector representations within a high-dimensional space back onto its respective word or text [37]. ...

Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies
  • Citing Conference Paper
  • January 2016

... Making predictions using the fMRI recordings and the extracted features The second step was to train a ridge regression model to learn a mapping from brain representation to neural network representation. This approach followed some previous work [38,36,39,28,21] that learned a linear function with a ridge penalty to learn the mapping from brain to neural network representation. Although a ridge regression is a relatively simple model we chose to use it since it previously has been demonstrated to be useful [37]. ...

Aligning context-based statistical models of language with brain activity during reading
  • Citing Conference Paper
  • January 2014

... Dou et al. (2015) showed that the mapping between monolingual word embeddings gives a good base distribution for the decipherment process. Dou et al. (2014) proposed to learn word alignment and decipherment using joint learning. ...

Beyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation
  • Citing Conference Paper
  • January 2014