Publications

  • Source
    Anupam Khattri · Aditya Joshi · Pushpak Bhattacharyya · Mark James Carman
    [Show abstract] [Hide abstract]
    ABSTRACT: Sarcasm understanding may require information beyond the text itself, as in the case of 'I absolutely love this restaurant!' which may be sarcastic, depending on the contextual situation. We present the first quantitative evidence to show that historical tweets by an author can provide additional context for sarcasm detection. Our sarcasm detection approach uses two components: a contrast-based predictor (that identifies if there is a sentiment contrast within a target tweet), and a historical tweet-based predictor (that identifies if the sentiment expressed towards an entity in the target tweet agrees with sentiment expressed by the author towards that entity in the past).
    WASSA at EMNLP 2015, Lisbon, Portugal; 09/2015
  • Source
    Aditya Joshi · Anoop Kunchukuttan · Pushpak Bhattacharyya · Mark James Carman
    [Show abstract] [Hide abstract]
    ABSTRACT: Sarcasm detection is a recent innovation in sentiment analysis research. However, there has been no attention to sarcasm generation. We present a sarcasm-generation module for chatbots. The uniqueness of 'SarcasmBot' is that it generates a sarcastic response for a user input. SarcasmBot is a sarcasm generation module that implements eight rule-based sarcasm generators, each of which generates a certain type of sarcastic expression. One of these sarcasm generators is selected at run-time, based on properties of user input such as question type, number of entities, etc. We evaluate our sarcasm-generation module in two ways: (a) a qualitative evaluation on three parameters: coherence , grammatical correctness and sarcastic nature, where all scores are above 0.69 out of 1, and (b) a comparative evaluation between SarcasmBot and ALICE, where a majority of our human evaluators are able to identify the output of SarcasmBot among two outputs, in 70.97% of test examples.
    WISDOM at SIGKDD, Sydney, Australia; 08/2015
  • Source
    Aditya Joshi · Vinita Sharma · Pushpak Bhattacharyya
    [Show abstract] [Hide abstract]
    ABSTRACT: The relationship between context incon-gruity and sarcasm has been studied in linguistics. We present a computational system that harnesses context incongruity as a basis for sarcasm detection. Our statistical sarcasm classifiers incorporate two kinds of incongruity features: explicit and implicit. We show the benefit of our incon-gruity features for two text forms-tweets and discussion forum posts. Our system also outperforms two past works (with F-score improvement of 10-20%). We also show how our features can capture inter-sentential incongruity.
    ACL 2015, Beijing, China; 07/2015
  • Source
    Aditya Joshi · Abhijit Mishra · Balamurali Ar · Pushpak Bhattacharyya · Mark James Carman
    [Show abstract] [Hide abstract]
    ABSTRACT: Alcohol abuse may lead to unsociable behavior such as crime, drunk driving, or privacy leaks. We introduce automatic drunk-texting prediction as the task of identifying whether a text was written when under the influence of alcohol. We experiment with tweets labeled using hashtags as distant supervision. Our clas-sifiers use a set of N-gram and stylistic features to detect drunk tweets. Our observations present the first quantitative evidence that text contains signals that can be exploited to detect drunk-texting.
    ACL 2015, Beijing, China; 07/2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An acid test for any new Word Sense Disam-biguation (WSD) algorithm is its performance against the Most Frequent Sense (MFS). The field of WSD has found the MFS baseline very hard to beat. Clearly, if WSD researchers had access to MFS values, their striving to better this heuristic will push the WSD frontier. However, getting MFS values requires sense annotated corpus in enormous amounts, which is out of bounds for most languages, even if their WordNets are available. In this paper, we propose an unsupervised method for MFS detection from the untagged corpora, which exploits word embeddings. We compare the word embedding of a word with all its sense embeddings and obtain the predominant sense with the highest similarity. We observe significant performance gain for Hindi WSD over the WordNet First Sense (WFS) baseline. As for English, the SemCor baseline is bettered for those words whose frequency is greater than 2. Our approach is language and domain independent.
    NAACL, Denver, Colorado; 06/2015
  • Source
    Hanumant Redkar · Sudha Bhingardive · Diptesh Kanojia · Pushpak Bhattacharyya
    [Show abstract] [Hide abstract]
    ABSTRACT: WordNet is an online lexical resource which expresses unique concepts in a language. English WordNet is the first WordNet which was developed at Princeton University. Over a period of time, many language WordNets were developed by various organizations all over the world. It has always been a challenge to store the WordNet data. Some WordNets are stored using file system and some WordNets are stored using different database models. In this paper, we present the World WordNet Database Structure which can be used to efficiently store the WordNet information of all languages of the World. This design can be adapted by most language WordNets to store information such as synset data, semantic and lexical relations, ontology details, language specific features, linguistic information, etc. An attempt is made to develop Application Programming Interfaces to manipulate the data from these databases. This database structure can help in various Natural Language Processing applications like Multilingual Information Retrieval, Word Sense Disambiguation, Machine Translation, etc.
    Association for the Advancement of Artificial Intelligence Conference (AAAI 2015), Austin, Texas; 01/2015
  • Source
    Diptesh Kanojia · Manish Srivastava · Raj Dabre · Pushpak Bhattacharyya
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a Parallel Corpora Management tool that aides parallel corpora generation for the task of Machine Translation (MT). It takes source and target text of a corpus for any language pair in text file format, or zip archives containing multiple corresponding text files. Then, it provides with a helpful interface to lexicographers for manual translation / validation, and gives out the corrected text files as output. It provides various dictionary references as help within the interface which increase the productivity and efficiency of a lexicographer. It also provides automatic translation of the source sentence using an integrated MT system. The tool interface includes a corpora management system which facilitates maintenance of parallel corpora by assigning roles such as manager , lexicographer etc. We have designed a novel tool that provides aides like references to various dictionary sources such as Wordnets, Shabdkosh, Wikitionary etc. We also provide manual word alignment correction which is visualized in the tool and can lead to its gamification in the future , thus, providing a valuable source of word / phrase alignments.
    International Conference on Natural Language Processing (ICON 2014), Goa, India; 12/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPB-SMT) systems for five Indian language pairs namely Bengali-Hindi, English-Hindi, Marathi-Hindi, Tamil-Hindi, and Telugu-Hindi, in three domains each, HEALTH, TOURISM and GENERAL. We named them PanchBhoota, as these systems are elemental in nature. We used a very simple approach to train, tune, and test them using cdec toolkit. We hope that this work will motivate Indian Language Machine Translation researchers to look deeper into the field of HPBSMT which is known to perform better than Phrase Based Statistical Machine Translation.
    SMT Contest in International Conference on Natural Language Processing (ICON 2014), Goa, India; 12/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: WordNet is a large lexical resource expressing distinct concepts in a language. Synset is a basic building block of the WordNet. In this paper, we introduce a web based lexicographer's interface 'Synskarta' which is developed to create synsets from source language to target language with special reference to Sanskrit WordNet. We focus on introduction and implementation of Synskarta and how it can help to overcome the limitations of the existing system. Further, we highlight the features , advantages, limitations and user evaluations of the same. Finally, we mention the scope and enhancements to the Synskarta and its usefulness in the entire IndoWordNet community.
    International Conference on Natural Language Processing (ICON) 2014, Goa, India.; 12/2014
  • Proceedings of the Ninth Workshop on Statistical Machine Translation; 06/2014
  • Shubham Gautam · Pushpak Bhattacharyya
    Proceedings of the Ninth Workshop on Statistical Machine Translation; 06/2014
  • Aditya Joshi · Abhijit Mishra · Pushpak Bhattacharyya
    Workshop on Approaches to Subjectivity and Sentiment Analysis (WASSA) at ACL 2014; 06/2014
  • Source
    Diptesh Kanojia · Pushpak Bhattacharyya · Raj Dabre · Siddhartha Gunti · Manish Shrivastava
    [Show abstract] [Hide abstract]
    ABSTRACT: The task of Word Sense Disambiguation (WSD) incorporates in its definition the role of 'context'. We present our work on the development of a tool which allows for automatic acquisition and ranking of 'context clues' for WSD. These clue words are extracted from the contexts of words appearing in a large monolin-gual corpus. These mined collection of contex-tual clues form a discrimination net in the sense that for targeted WSD, navigation of the net leads to the correct sense of a word given its context. Utilizing this resource we intend to develop efficient and light weight WSD based on look up and navigation of memory-resident knowledge base, thereby avoiding heavy computation which often prevents incorporation of any serious WSD in MT and search. The need for large quantities of sense marked data too can be reduced.
    Global Wordnet Conference (GWC 2014), Tartu, Estonia; 01/2014
  • Source
    Aditya Joshi · Pushpak Bhattacharyya · Abhijit Mishra
    [Show abstract] [Hide abstract]
    ABSTRACT: (To be published)
    Association For Computational Linguistics Conference 2014; 01/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we report our methods and results of using, for the first time, semi-automatic approach to enhance an Indian language Wordnet. We apply our methods to enhancing an already existing Sanskrit Wordnet created from Hindi Wordnet (which is created from Princeton Wordnet) using expansion approach. We base our experiment on an existing bilingual Sanskrit English Dictionary and show how lemma in this dictionary can be mapped to Princeton Wordnet through which corresponding Sanskrit synsets can be populated by Sanskrit lexemes. This our method will also show how absence of resources of a pair of languages need not be an obstacle, if another resource of one of them is available. Sanskrit being historically related to languages of Indo-European family, we believe that this semi-automatic approach will help enhance Wordnets of other Indian languages of the same family.
    GWC, Estonia, Tartu; 01/2014
  • Source
    Kashyap Popat · Balamurali A.R · Pushpak Bhattacharyya · Gholamreza Haffari
    Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 08/2013
  • Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations; 08/2013
  • Ankit Ramteke · Akshat Malu · Pushpak Bhattacharyya · Saketha J Nath
    Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 08/2013
  • Anoop Kunchukuttan · Ritesh Shah · Pushpak Bhattacharyya
    Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task; 08/2013
  • Abhijit Mishra · Pushpak Bhattacharyya · Michael Carl
    Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 08/2013

27 Following View all

183 Followers View all