Philippe Blache

Université d´Avignon et des Pays du Vaucluse, Avinyó, Provence-Alpes-Côte d'Azur, France

Are you Philippe Blache?

Claim your profile

Publications (104)2.37 Total impact

  • Source
    First TextLink Action Conference, Louvain-le-Neuve; 01/2015
  • Source
    Houda Oufaida, Omar Nouali, Philippe Blache
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic text summarization aims to produce summaries for one or more texts using machine techniques. In this paper, we propose a novel statistical summarization system for Arabic texts. Our system uses a clustering algorithm and an adapted discriminant analysis method: mRMR (minimum Redundancy and Maximum Relevance) to score terms. Through mRMR analysis, terms are ranked according to their discriminant and coverage power. Second, we propose a novel sentence extraction algorithm which selects sentences with top ranked terms and maximum diversity. Our system uses minimal language-dependant processing: sentence splitting, tokenization and root extraction. Experimental results on EASC and TAC 2011MultiLingual datasets showed that our proposed approach is competitive to the state of the art systems.
    12/2014; 26(4). DOI:10.1016/j.jksuci.2014.06.008
  • Bertille Pallaud, Stéphane Rauzy, Philippe Blache
    10/2013; DOI:10.4000/tipa.995
  • Stpéhane Rauzy, Philippe Blache
    [Show abstract] [Hide abstract]
    ABSTRACT: We present in this paper a robust method for predicting reading times. Robustness first comes from the conception of the difficulty model, which is based on a morpho-syntactic surprisal index. This metric is not only a good predictor, as shown in the paper, but also intrinsically robust (because relying on POS-tagging instead of parsing). Second, robustness also concerns data analysis: we propose to enlarge the scope of reading processing units by using syntactic chunks instead of words. As a result, words with null reading time do not need any special treatment or filtering. It appears that working at chunks scale smooths out the variability inherent to the different reader's strategy. The pilot study presented in this paper applies this technique to a new resource we have built, enriching a French treebank with eye-tracking data and difficulty prediction measures.
    Eye-Tracking and Natural Language Processing; 12/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper focuses on the representation and querying of knowledge-based multimodal data. This work stands in the OTIM project which aims at processing multimodal annotation of a large conversational French speech corpus. Within OTIM, we aim at providing linguists with a unique framework to encode and manipulate numerous linguistic domains (from prosody to gesture). Linguists commonly use Typed Feature Structures (TFS) to provide an uniform view of multimodal annotations but such a representation cannot be used within an applicative framework. Moreover TFS expressibility is limited to hierarchical and constituency relations and does not suit to any linguistic domain that needs for example to represent temporal relations. To overcome these limits, we propose an ontological approach based on Description logics (DL) for the description of linguistic knowledge and we provide an applicative framework based on OWL DL (Ontology Web Language) and the query language SPARQL.
    Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC'12)Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC'12); 01/2012
  • Source
    Journée Atala; 04/2011
  • Source
    Workshop Proceedings of the 9th International Conference on Terminology and Artificial Intelligence - WS2: Ontology and Lexicon: new insightsWorkshop Proceedings of the 9th International Conference on Terminology and Artificial Intelligence - WS2: Ontology and Lexicon: new insights; 01/2011
  • Source
    Laurianne Sitbon, Patrice Bellot, Philippe Blache
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces readability constraints in relevance measures for document re- trieval and summarisation. The readability constraints are specifically estimated for dyslexic readers. The optimal integration rate is estimated around 30% from the observation of perfor- mances on CLEF and DUC evaluation campaigns.
  • Source
    Laurianne Sitbon, Patrice Bellot, Philippe Blache
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new way of estimating relevance that takes some non-informational user needs into account. This is achieved using a linear function which has the advantage of being simple, efficient, and controllable by the user directly. The experiments are conducted on TREC and CLEF ad hoc task data and on DUC data. Lastly, the readability constraints are specifically estimated for dyslexic readers.
    Document Numerique 10/2010; 13(1):161-185. DOI:10.3166/dn.13.1.161-185
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose in this paper a broad-coverage approach for multimodal annotation of conversational data. Large annotation pro-jects addressing the question of multimo-dal annotation bring together many dif-ferent kinds of information from different domains, with different levels of granula-rity. We present in this paper the first re-sults of the OTIM project aiming at deve-loping conventions and tools for multimo-dal annotation.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The multiplication of annotation schemes and coding formats is a severe limitation for interoperability. We propose in this pa-per an approach specifying the annotation scheme in terms of typed feature struc-tures, that are in a second step translated into XML schemas, from which data are encoded. This approach guarantees the fact that no information is lost when trans-lating one format into another.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present in this paper an automatic summarization technique of Arabic texts, based on RST. We first present a corpus study which enabled us to specify, following empirical observations, a set of relations and rhetorical frames. Then, we present our method to automatically summarize Arabic texts. Finally, we present the architecture of the ARSTResume system. This method is based on the Rhetorical Structure Theory (Mann, 1988) and uses linguistic knowledge. The method relies on three pillars. The first consists in locating the rhetorical relations between the minimal units of the text by applying the rhetorical rules. One of these units is the nucleus (the segment necessary to maintain coherence) and the other can be either nucleus or satellite (an optional segment). The second pillar is the representation and the simplification of the RST-tree that represents the entries text in hierarchical form. The third pillar is the selection of sentences for the final summary, which takes into account the type of the rhetorical relations chosen for the extract.
    ICEIS 2010 - Proceedings of the 12th International Conference on Enterprise Information Systems, Volume 2, AIDSS, Funchal, Madeira, Portugal, June 8 - 12, 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Large annotation projects, typically those addressing the question of multimodal annotation in which many different kinds of information have to be encoded, have to elaborate precise and high level annotation schemes. Doing this requires first to define the structure of the information: the different objects and their organization. This stage has to be as much independent as possible from the coding language constraints. This is the reason why we propose a preliminary formal annotation model, represented with typed feature structures. This representation requires a precise definition of the different objects, their properties (or features) and their relations, represented in terms of type hierarchies. This approach has been used to specify the annotation scheme of a large multimodal annotation project (OTIM) and experimented in the annotation of a multimodal corpus (CID, Corpus of Interactional Data). This project aims at collecting, annotating and exploiting a dialogue video corpus in a multimodal perspective (including speech and gesture modalities). The corpus itself, is made of 8 hours of dialogues, fully transcribed and richly annotated (phonetics, syntax, pragmatics, gestures, etc.).
    Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta; 01/2010
  • Source
    Philippe Blache, Laurent Prévot
    [Show abstract] [Hide abstract]
    ABSTRACT: We present in this paper a formal approach for the representation of multimodal information. This approach, thanks to the to use of typed feature structures and hypergraphs, generalizes existing ones (typically annotation graphs) in several ways. It first proposes an homogenous representation of different types of information (nodes and relations) coming from different domains (speech, gestures). Second, it makes it possible to specify constraints representing the interaction between the different modalities, in the perspective of developing multimodal grammars.
    COLING 2010, 23rd International Conference on Computational Linguistics, Posters Volume, 23-27 August 2010, Beijing, China; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: La constitution, l'annotation et l'exploitation de corpus conversationnels multimodaux constituent un enjeu majeur dans la recherche en sciences du langage. Nous présentons ici le CID (LPL) et les premiers résultats d'une étude sur les signaux backchannels (BC). Ces derniers sont des phénomènes vocaux et/ou gestuels produits par l'auditeur pour signaler son attention au discours. Notre objectif vise à améliorer les typologies formelles et fonctionnelles des BC et à rendre compte du rôle des indices discursifs et prosodiques favorisant leur production. L'exploitation simultanée des annotations des différents niveaux linguistiques implique des outils d'interrogation. Nous avons testé une méthode d'extraction des données (XSLT). Nos résultats préliminaires permettent de poser quelques éléments de discussion pour interroger ultérieurement le rôle des indices prosodiques notamment.
  • Source
    Philippe Blache
    [Show abstract] [Hide abstract]
    ABSTRACT: We present in this paper a formal and computational scheme in the perspective of broad-coverage multimodal annotation. We propose in particular to introduce the notion of annotation hypergraphs in which primary and secondary data are represented by means of the same structure.
  • Source
    Laurianne Sitbon, Patrice Bellot, Philippe Blache
  • Source
    Laurianne Sitbon, Patrice Bellot, Philippe Blache
    [Show abstract] [Hide abstract]
    ABSTRACT: When a user cannot find a word, he may think of semantically related words that could be used into an automatic process to help him. This paper presents an evaluation of lexical resources and semantic networks for modelling mental associations. A corpus of associations has been constructed for its evaluation. It is composed of 20 low frequency target words each associated 5 times by 20 users. In the experiments we look for the target word in propositions made from the associated words thanks to 5 different resources. The results show that even if each resource has a usefull specificity, the global recall is low. An experiment to extract common semantic features of several associations showed that we cannot expect to see the target word below a rank of 20 propositions.
    Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May - 1 June 2008, Marrakech, Morocco; 01/2008
  • Source
    Philippe Blache, Roxane Bertrand, Gaëlle Ferré
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents a project of the Laboratoire Parole et Langage which aims at collecting, annotating and exploiting a corpus of spoken French in a multimodal perspective. The project directly meets the present needs in linguistics where a growing number of researchers become aware of the fact that a theory of communication which aims at describing real interactions should take into account the complexity of these interactions. However, in order to take into account such a complexity, linguists should have access to spoken corpora annotated in different fields. The paper presents the annotation schemes used in phonetics, morphology and syntax, prosody, gestuality at the LPL together with the type of linguistic description made from the annotations seen in two examples.
    Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May - 1 June 2008, Marrakech, Morocco; 01/2008
  • Source
    Laurianne Sitbon, Patrice Bellot, Philippe Blache
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces readability constraints in relevance measures,for document,re- trieval and summarisation. The readability constraints are specifically estimated for dyslexic readers. The optimal integration rate is estimated around 30% from the observation of perfor- mances,on CLEF and DUC evaluation campaigns. MOTS-CLÉS : Recherche documentaire, résumé automatique, lisibilité, dyslexie KEYWORDS: document retrieval, summarisation, readability, dyslexia
    COnférence en Recherche d'Infomations et Applications - CORIA 2008, 5th French Information Retrieval Conference, Trégastel, France, March 12-14, 2008. Proceedings; 01/2008

Publication Stats

387 Citations
2.37 Total Impact Points

Institutions

  • 2010
    • Université d´Avignon et des Pays du Vaucluse
      • Laboratoire Informatique d'Avignon (EA 4128)
      Avinyó, Provence-Alpes-Côte d'Azur, France
  • 2003–2010
    • French National Centre for Scientific Research
      Lutetia Parisorum, Île-de-France, France
  • 2009
    • Aix-Marseille Université
      • Laboratoire Parole et Langage (UMR 7309)
      Marseille, Provence-Alpes-Cote d'Azur, France
  • 1992
    • Université de Neuchâtel
      Neuenburg, Neuchâtel, Switzerland