Samar Husain’s research while affiliated with Indian Institute of Technology Delhi and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (38)


Reading times at the critical verb in Experiment 1.
Reading times at the critical verb in Experiment 2.
First-pass reading time and regression path duration in Experiment 3 at the critical verb. Error bars show 95% confidence intervals.
First-pass reading time and regression path duration in Experiment 4 at the critical verb. Error bars show 95% confidence intervals.
of the magnitudes of effects (derived from the linear mixed models) across the four experiments. The error bars show 95% uncertainty intervals and show the range within which we can be 95% certain that the true parameter lies given the data.

+3

Dependency Resolution Difficulty Increases with Distance in Persian Separable Complex Predicates: Evidence for Expectation and Memory-Based Accounts
  • Article
  • Full-text available

March 2016

·

153 Reads

·

32 Citations

·

Samar Husain

·

Shravan Vasishth

Delaying the appearance of a verb in a noun-verb dependency tends to increase processing difficulty at the verb; one explanation for this locality effect is decay and/or interference of the noun in working memory. Surprisal, an expectation-based account, predicts that delaying the appearance of a verb either renders it no more predictable or more predictable, leading respectively to a prediction of no effect of distance or a facilitation. Recently, Husain et al. (2014) suggested that when the exact identity of the upcoming verb is predictable (strong predictability), increasing argument-verb distance leads to facilitation effects, which is consistent with surprisal; but when the exact identity of the upcoming verb is not predictable (weak predictability), locality effects are seen. We investigated Husain et al.'s proposal using Persian complex predicates (CPs), which consist of a non-verbal element—a noun in the current study—and a verb. In CPs, once the noun has been read, the exact identity of the verb is highly predictable (strong predictability); this was confirmed using a sentence completion study. In two self-paced reading (SPR) and two eye-tracking (ET) experiments, we delayed the appearance of the verb by interposing a relative clause (Experiments 1 and 3) or a long PP (Experiments 2 and 4). We also included a simple Noun-Verb predicate configuration with the same distance manipulation; here, the exact identity of the verb was not predictable (weak predictability). Thus, the design crossed Predictability Strength and Distance. We found that, consistent with surprisal, the verb in the strong predictability conditions was read faster than in the weak predictability conditions. Furthermore, greater verb-argument distance led to slower reading times; strong predictability did not neutralize or attenuate the locality effects. As regards the effect of distance on dependency resolution difficulty, these four experiments present evidence in favor of working memory accounts of argument-verb dependency resolution, and against the surprisal-based expectation account of Levy (2008). However, another expectation-based measure, entropy, which was computed using the offline sentence completion data, predicts reading times in Experiment 1 but not in the other experiments. Because participants tend to produce more ungrammatical continuations in the long-distance condition in Experiment 1, we suggest that forgetting due to memory overload leads to greater entropy at the verb.

Download

Integration and prediction difficulty in Hindi sentence comprehension: Evidence from an eye-tracking corpus

July 2015

·

288 Reads

·

49 Citations

Journal of Eye Movement Research

This is the first attempt at characterizing reading difficulty in Hindi using naturally occurring sentences. We created the Potsdam-Allahabad Hindi Eyetracking Corpus by recording eye-movement data from 30 participants at the University of Allahabad, India. The target stimuli were 153 sentences selected from the beta version of the Hindi-Urdu treebank. We find that word- or low-level predictors (syllable length, unigram and bigram frequency) affect first-pass reading times, regression path duration, total reading time, and outgoing saccade length. An increase in syllable length results in longer fixations, and an increase in word unigram and bigram frequency leads to shorter fixations. Longer syllable length and higher frequency lead to longer outgoing saccades. We also find that two predictors of sentence comprehension difficulty, integration and storage cost, have an effect on reading difficulty. Integration cost (Gibson, 2000) was approximated by calculating the distance (in words) between a dependent and head; and storage cost (Gibson, 2000), which measures difficulty of maintaining predictions, was estimated by counting the number of predicted heads at each point in the sentence. We find that integration cost mainly affects outgoing saccade length, and storage cost affects total reading times and outgoing saccade length. Thus, word-level predictors have an effect in both early and late measures of reading time, while predictors of sentence comprehension difficulty tend to affect later measures. This is, to our knowledge, the first demonstration using eye-tracking that both integration and storage cost influence reading difficulty.


Locality and Expectation in Persian Separable Complex Predicates

Processing cost is known to increase with dependency distance (Gibson 2000). However, the expectation-based account (Hale 2001, Levy 2008) predicts that delaying the appearance of a verb renders it more predictable and therefore easier to process. We tested the predictions of these two opposing accounts using complex predicates in Persian. One type of Complex predicate is a Noun-Verb configuration in which the verb is highly predictable given the noun. We delayed the appearance of the verb by interposing a relative clause (Expt 1, 42 subjects), or a single long PP (Expt 2, 43 subjects); the precritical region (the phrase before the verb) in both the short and long conditions was a short PP. Locality accounts such as Gibson (2000) predict a slowdown at the verb (real verb) due to increased Noun-Verb distance, whereas expectation accounts predict that distance should not adversely affect the processing time at the verb (because the conditional probability of the verb given the preceding context is close to 1-this was established with an offline sentence completion study). As a control, we included a simple predicate (Noun-Verb) configuration; the same distance manipulation was applied here as for complex predicates. In the control, locality accounts predict a slowdown in the long distance condition, but expectation accounts predict a speedup due to the increasing probability of the verb appearing given left context. Thus, we had a 2x2 design (high vs low predictability; short vs long distance). In Expt 1, we found a main effect of distance (t=4.24): reading time (RT) was longer in the long-distance conditions; a nested comparison showed that this effect was due to the low-predictable (simple predicate) conditions. In addition, both the high-predictable conditions were read faster than the low-predictable conditions (t=3.49). Expt 2, which had a long intervening PP, showed an even stronger main effect of distance (t=6.04) than in Expt 1: the RT in long conditions was slower than in short conditions; the locality effects were equally strong in the high and low predictable cases. As in Expt 1, we saw faster RTs in the high-predictable conditions. A combined analysis of the two experiments revealed a main effect of prediction (t=3.55) and a main of distance (t=4.30) as well as a marginal 3-way interaction between experiment, distance and prediction (t=-1.94). Thus, we find clear effects of locality in both experiments, and we also find evidence for expectation effects: the high-predictable verbs are read faster than the low-predictable verbs. The fact that we don't see facilitation with increased distance at the verb in spite of high predictability might be due to increased difficulty in prediction maintenance due to processing load. Recall that the locality effect in Expt 1 is driven only by low-predictable condition, while in Expt 2 both high and low are affected. In Expt 2, the intervener is a long, uninterrupted phrase whereas in Expt 1, the intervener consists of a short RC followed by a PP. Processing a single long intervening phrase may be harder than processing two different phrases, reminiscent of the sausage machine proposal of Frazier and Fodor (1978). The results suggest that complexity of intervening material is critical for prediction maintenance. Although we found evidence for both locality and expectation effects, a key prediction of the expectation account was not validated: delaying the appearance of a verb (predictable or not) did not facilitate processing.


Strong Expectations Cancel Locality Effects: Evidence from Hindi

June 2014

·

370 Reads

·

74 Citations

Expectation-driven facilitation (Hale, 2001, Levy, 2008) and locality-driven retrieval difficulty (Gibson, 1998, 2000; Lewis & Vasishth, 2005) are widely recognized to be two critical factors in incremental sentence processing; there is accumulating evidence that both can influence processing difficulty. However, it is unclear whether and how expectations and memory interact. We first confirm a key prediction of the expectation account: a Hindi self-paced reading study shows that when an expectation for an upcoming part of speech is dashed, building a rarer structure consumes more processing time than building a less rare structure. This is a strong validation of the expectation-based account. In a second study, we show that when the expectation is strong, i.e., when a particular verb is predicted, strong facilitation effects are seen when the appearance of the verb is delayed; however, when expectation is weak, i.e., when only the part of speech ``verb'' is predicted but a particular verb is not predicted, the facilitation disappears and a tendency towards a locality effect is seen. The interaction seen between expectation strength and distance shows that strong expectations cancel locality effects, and that weak expectations allow locality effects to emerge.


An Automatic Approach to Treebank Error Detection Using a Dependency Parser

March 2013

·

35 Reads

·

6 Citations

Lecture Notes in Computer Science

Treebanks play an important role in the development of various natural language processing tools. Amongst other things, they provide crucial language-specific patterns that are exploited by various machine learning techniques. Quality control in any treebanking project is therefore extremely important. Manual validation of the treebank is one of the steps that is generally necessary to ensure good annotation quality. Needless to say, manual validation requires a lot of human time and effort. In this paper, we present an automatic approach which helps in detecting potential errors in a treebank. We use a dependency parser to detect such errors. By using this tool, validators can validate a treebank in less time and with reduced human effort.


Towards a psycholinguistically motivated dependency grammar for Hindi

January 2013

·

26 Reads

·

2 Citations

The overall goal of our work is to build a dependency grammar-based human sen-tence processor for Hindi. As a first step towards this end, in this paper we present a dependency grammar that is motivated by psycholinguistic concerns. We describe the components of the grammar that have been automatically induced using a Hindi dependency treebank. We relate some as-pects of the grammar to relevant ideas in the psycholinguistics literature. In the pro-cess, we also extract statistics and pat-terns for phenomena that are interesting from a processing perspective. We fi-nally present an outline of a dependency grammar-based human sentence processor for Hindi.


Table 7 .
Intra-chunk dependency annotation: expanding Hindi inter-chunk annotated treebank

July 2012

·

285 Reads

·

13 Citations

Prudhvi Kosaraju

·

Samar Husain

·

·

[...]

·

We present two approaches (rule-based and statistical) for automatically annotating intra-chunk dependencies in Hindi. The intra-chunk dependencies are added to the dependency trees for Hindi which are already annotated with inter-chunk dependencies. Thus, the intra-chunk annotator finally provides a fully parsed dependency tree for a Hindi sentence. In this paper, we first describe the guidelines for marking intra-chunk dependency relations. Although the guidelines are for Hindi, they can easily be extended to other Indian languages. These guidelines are used for framing the rules in the rule-based approach. For the statistical approach, we use MaltParser, a data driven parser. A part of the ICON 2010 tools contest data for Hindi is used for training and testing the MaltParser. The same set is used for testing the rule-based approach.


Analyzing Parser Errors: To Improve Parsing Accuracy And To Inform Treebanking Decisions

January 2012

·

101 Reads

·

10 Citations

Linguistic Issues in Language Technology

We present a detailed error analysis of a transition-based dependency parser trained on a Hindi dependency treebank. Parser error analysis has not been systematically examined from the point of view of treebanking before and this work intends to contribute in this area. We address two main questions in this paper: Can the parsing of certain structures be made easier by using alternative analyses for these structures? Are there certain linguistic cues implicit (or missing) in the current treebank that can be made explicit (or added) in order to make the parsing of complex constructions easier? These questions will guide us in examining the potential benefits of parser error analysis during treebanking. Through our experiments and analysis we were able to shed light on the causes of errors and subsequently have been able to improve the performance of the parser.


Figure 1. Constraint graph for sentence 1. 
Figure 2. Illustration of Experiment 1 
Figure 3. Illustration of Experiment 2 
Figure 4. UAS of all the experiments. 
that the improvement in the accu- racies is spread across different kinds of relations.
Linguistically rich graph based data driven parsing for Hindi

In this paper we show how linguistic knowledge can be incorporated during graph based parsing. We use MSTParser and show that the use of a constraint graph, instead of a complete graph, to extract a spanning tree improves parsing accuracy. A constraint graph is formed by using linguistic knowledge of a constraint based parsing system. Through a series of experiments we formulate the optimal constraint graph that gives us the best accuracy. These experiments show that some of the previous MSTParser errors can be corrected consistently. It also shows the limitations of the proposed approach.


Identification of Conjunct Verbs in Hindi and Its Effect on Parsing Accuracy

February 2011

·

111 Reads

·

15 Citations

Lecture Notes in Computer Science

This paper introduces a work on identification of conjunct verbs in Hindi. The paper will first focus on investigating which noun-verb combination makes a conjunct verb in Hindi using a set of linguistic diagnostics. We will then see which of these diagnostics can be used as features in a MaxEnt based automatic identification tool. Finally we will use this tool to incorporate certain features in a graph based dependency parser and show an improvement over previous best Hindi parsing accuracy.


Citations (32)


... The second phase of the methodology covers the manual revision of the automatic parses produced during the first phase. The revision of ParlaMint-It relies on the assumption that annotation errors can be either random or systematic (Agrawal et al., 2013): the former are heterogeneous, while systematic errors can be identified by searching for recurrent error patterns. To identify patterns of systematic and recurring erroneous dependency relations in the automatically annotated ParlaMint-It corpus we adopted and specialized the methodology first introduced by Alzetta et al. (2017), which leverages the LISCA (Linguistically-driven Selection of Correct Arcs) algorithm (Dell'Orletta et al., 2013a). ...

Reference:

Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches
An Automatic Approach to Treebank Error Detection Using a Dependency Parser
  • Citing Conference Paper
  • March 2013

Lecture Notes in Computer Science

... Based on evidence from the decay and interference literature, as well as the dependency locality literature, we would expect to find that OTOH is read slower when following intervening material. However, some studies have shown that locality effects can only be detected when the expectation for the predicted element is weak, and not when the expectation is strong (Husain et al., 2014, see also Campanelli et al., 2018;Nicenboim et al., 2015;Stone et al., 2020, but see Safavi et al., 2016). Hence, it is possible that strong expectations can override the decay or interference effect of intervening material by keeping the anticipated material active in working memory. ...

Dependency Resolution Difficulty Increases with Distance in Persian Separable Complex Predicates: Evidence for Expectation and Memory-Based Accounts

... Thirdly, for model training and validation against human reading patterns (specifically, eye fixations on words), we sourced publicly accessible datasets for each language, containing texts of a uniform style. We utilized the Potsdam-Allahabad Hindi Eye-tracking Corpus (PAC, Husain et al., 2015;Vasishth, 2021) comprising word-level eyetracking data from 30 individuals reading 83 sentences sourced from newspapers. For English, our source was the Multilingual Eye-tracking Corpus (MECO, Siegelman et al., 2022), which includes eye-tracking data captured from 46 participants reading 112 encyclopedic sentences. ...

Integration and prediction difficulty in Hindi sentence comprehension: Evidence from an eye-tracking corpus

Journal of Eye Movement Research

... A natural way to decide which syntactic representation is the best is to choose the one for which a standard parser will achieve the highest parsing performance (Schwartz et al., 2012;Husain and Agrawal, 2012;Noro et al., 2005). Implementing this general principle faces two challenges: i) defining a learning criterion that can predict which dependency structure will be the easiest to learn ii) finding a way to explore a potentially large number of annotation schemes that describe all combinations of several design decisions. ...

Analyzing Parser Errors: To Improve Parsing Accuracy And To Inform Treebanking Decisions
  • Citing Article
  • January 2012

Linguistic Issues in Language Technology

... The identification of chunks, syntactically related non-overlapping groups of words (Tjong Kim Sang and Buchholz, 2000), was used mainly in shallow parsing strategies (Federici et al., 1996). Clausal parsing was designed to parse Hindi texts (Husain et al., 2011). However, there is no work on exploiting chunks for full-scale parsing. ...

Clausal parsing helps data-driven dependency parsing: Experiments with Hindi

... Using this model, the most notable works of treebanking on Indian languages started with the Anncorra [12] which provided the guidelines for interchunk dependency relations in Hindi whose revised version is reported in [14]. Again, Kosaraju et al. [35] incorporated 12 new intrachunk tags to the dependency trees of AnnCorra [12]. Begum et al. [6] introduced a dependency annotation scheme targeting to annotate a large scale Hindi corpus. ...

Intra-chunk dependency annotation: expanding Hindi inter-chunk annotated treebank

... Hindi, predictions about verbs seem to be particularly strong and robust (Husain et al., 2014;Levy and Keller, 2013;Vasishth, 2003). Here, we ask whether Hindi grammar also results in robust predictions about event structure: in particular, whether an event is culminated or not. ...

Strong Expectations Cancel Locality Effects: Evidence from Hindi

... One important contribution of this paper is the evaluation methodology. Previous work (Husain et al., 2007; Gustavii, 2005 ) on preposition translation measured only accuracy gains with respect to simple baselines, and focused on small sets of frequent prepositions. Our methodology measures both precision and recall over all prepositions occurring in a small corpus of randomly chosen sentences . ...

Simple Preposition Correspondence: A problem in English to Indian language Machine Translation
  • Citing Article
  • June 2007

... An effort has been previously made in grammar driven parsing for Hindi by us ( Gupta et al., 2008) where the focus was not to mark relations in a broad coverage sense but to mark certain easily identifiable relations using a rule base. In this paper, we show improvements in results over our previous work by including some additional linguistic features which help in identifying relations better. ...

A Rule Based Approach for Automatic Annotation of a Hindi Tree bank
  • Citing Article
  • January 2008