Pierre Nugues

Lund University, Lund, Skåne, Sweden

Are you Pierre Nugues?

Claim your profile

Publications (90)14.92 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The concept of culturomics was born out of the availability of massive amounts of textual data and the interest to make sense of cultural and language phenomena over time. Thus far however, culturomics has only made use of, and shown the great potential of, statistical methods. In this paper, we present a vision for a knowledge-based culturomics that complements traditional culturomics. We discuss the possibilities and challenges of combining knowledge-based methods with statistical methods and address major challenges that arise due to the nature of the data; diversity of sources, changes in language over time as well as temporal dynamics of information in general. We address all layers needed for knowledge-based culturomics, from natural language processing and relations to summaries and opinions.
    International Journal on Digital Libraries 04/2015; 15(2-4). DOI:10.1007/s00799-015-0139-1
  • Hakan Jonsson, Pierre Nugues
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a comparative exploratory analysis of two proximity networks of mobile phone users, the Proximates network and the Reality Mining network. Data for both networks were collected from mobile phones carried by two groups of users. Periodic Bluetooth scans were performed to detect the proximity of other mobile phones. The Reality Mining project took place in 2004-2005 at MIT, while Proximates took place in Sweden in 2012-2013. We show that the differences in sampling strategy between the two networks has effects on both static and dynamic metrics. We also find that fundamental metrics of the static Proximates network capture social interactions characteristics better than in the static Reality Mining network.
    2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP); 04/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: A smartphone is a personal device and as such usually hosts multiple public user identities such as a phone number, email address, and Facebook account. As each smartphone has a unique Bluetooth MAC address, Bluetooth discovery can be used in combination with the user registration to a server with a Facebook account. This makes it possible to identify a nearby smartphone related to a given Facebook contact. Using this capability, we designed Memorit, an application that handles reminders alerting the user when a contact is nearby her/him. This way it is possible to trigger a user-defined reminder, for example to give back a book, when a registered contact comes in proximity. Data collected from Memorit will allow study of pervasive social context. This paper gives an overview of Memorit, its features, implementation, and evaluates its applicability through a user study.
    2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS); 03/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: The massive amounts of text data made available through the Google Books digitization project have inspired a new field of big-data textual research. Named culturomics, this field has attracted the attention of a growing number of scholars over recent years. However, initial studies based on these data have been criticized for not referring to relevant work in linguistics and language technology. This paper provides some ideas, thoughts and first steps towards a new culturomics initiative, based this time on Swedish data, which pursues a more knowledge-based approach than previous work in this emerging field. The amount of new Swedish text produced daily and older texts being digitized in cultural heritage projects grows at an accelerating rate. These volumes of text being available in digital form have grown far beyond the capacity of human readers, leaving automated semantic processing of the texts as the only realistic option for accessing and using the information contained in them. The aim of our recently initiated research program is to advance the state of the art in language technology resources and methods for semantic processing of Big Swedish text and focus on the theoretical and methodological advancement of the state of the art in extracting and correlating information from large volumes of Swedish text using a combination of knowledge-based and statistical methods.
    Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing; 10/2013
  • Maj Stenmark, Pierre Nugues
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we introduce a method to use written natural language instructions to program assembly tasks for industrial robots. In our application, we used a state-of-the-art semantic and syntactic parser together with semantically rich world and skill descriptions to create high-level symbolic task sequences. From these sequences, we generated executable code for both virtual and physical robot systems. Our focus lays on the applicability of these methods in an industrial setting with real-time constraints.
    Robotics (ISR), 2013 44th International Symposium on; 01/2013
  • Detection, Representation, and Exploitation of Events in the Semantic Web, Workshop in conjunction with the 11th International Semantic Web Conference 2012 Boston Massachussets, USA, 12 November 2012; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the structure of the LTH coreference solver used in the closed track of the CoNLL 2012 shared task (Pradhan et al., 2012). The solver core is a mention classifier that uses Soon et al. (2001)'s algorithm and features extracted from the dependency graphs of the sentences. This system builds on Björkelund and Nugues (2011)'s solver that we extended so that it can be applied to the three languages of the task: English, Chinese, and Arabic. We designed a new mention detection module that removes pleonastic pronouns, prunes constituents, and recovers mentions when they do not match exactly a noun phrase. We carefully redesigned the features so that they reflect more complex linguistic phenomena as well as discourse properties. Finally, we introduced a minimal cluster model grounded in the first mention of an entity. We optimized the feature sets for the three languages: We carried out an extensive evaluation of pairs of features and we complemented the single features with associations that improved the CoNLL score. We obtained the respective scores of 59.57, 56.62, and 48.25 on English, Chinese, and Arabic on the development set, 59.36, 56.85, and 49.43 on the test set, and the combined official score of 55.21.
    Joint Conference on EMNLP and CoNLL 2012 -- Shared Task; 01/2012
  • Peter Exner, Pierre Nugues
    SLTC 2012, The Fourth Swedish Language Technology Conference Lund, October 24-26, 2012; 01/2012
  • Peter Exner, Pierre Nugues
    Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference (ISWC 2012) Boston, USA, November 11, 2012.; 01/2012
  • Peter Exner, Pierre Nugues
    LREC 2012, Eighth International Conference on Language Resources and Evaluation; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Model-based systems in control are a means to utilize efficiently human knowledge and achieve high performance. While models consisting of formalized knowledge are used during the engineering step, running systems usually do not contain a high-level, symbolic representation of the control and most of its properties, typically named numerical parameters. On a system level and beyond the plant data, there is also a need to represent the meaning of the data such that deployment and fault analysis could be augmented with partly automated inference based on the semantics of the data. To that end, we extended the formalized knowledge traditionally used in control to include the control purpose, engineering assumption, quality, involved state machines, and so on. We then represented the control semantics in a format that allows an easier extraction of information using querying and reasoning. It aims at making knowledge in control engineering reusable so that it can be shipped together with the control systems. We implemented prototypes that include automatic conversion of plant data from AutomationML into RDF triples, as well as the automated extraction of control properties, the conversion of parameters, and their storage in the same triple store. Although these techniques are standard within the semantic web community, we believe that our robotic prototypes for semantic control represent a novel approach.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Robots used in manufacturing today are tailored to their tasks by system integration based on expert knowledge concerning both production and machine control. For upcoming new generations of even more flexible robot solutions, in applications such as dexterous assembly, the robot setup and programming gets even more challenging. Reuse of solutions in terms of parameters, controls, process tuning, and of software modules in general then gets increasingly important. There has been valuable progress within reuse of automation solutions when machines comply with standards and behave according to nominal models. However, more flexible robots with sensor-based manipulation skills and cognitive functions for human interaction are far too complex to manage, and solutions are rarely reusable since knowledge is either implicit in imperative software or not captured in machine readable form. We propose techniques that build on existing knowledge by converting structured data into an RDF-based knowledge base. By enhancements of industrial control systems and available engineering tools, such knowledge can be gradually extended as part of the interaction during the definition of the robot task.
    Assembly and Manufacturing (ISAM), 2011 IEEE International Symposium on; 06/2011
  • Source
    Anders Björkelund, Pierre Nugues
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe a coreference solver based on the extensive use of lexical features and features extracted from dependency graphs of the sentences. The solver uses Soon et al. (2001)'s classical resolution algorithm based on a pairwise classification of the mentions. We applied this solver to the closed track of the CoNLL 2011 shared task (Pradhan et al., 2011). We carried out a systematic optimization of the feature set using cross-validation that led us to retain 24 features. Using this set, we reached a MUC score of 58.61 on the test set of the shared task. We analyzed the impact of the features on the development set and we show the importance of lexicalization as well as of properties related to dependency links in coreference resolution.
    Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task; 06/2011
  • Source
    Peter Exner, Pierre Nugues
    [Show abstract] [Hide abstract]
    ABSTRACT: Although event models and corresponding RDF vocabularies are be-coming available, the collection of events still requires an initial manual encoding to produce the data. In this paper, we describe a system based on semantic parsing (SRL) to collect automatically events from text and convert them into the LODE model. Furthermore, the system automatically links extracted event properties to the external resources DBpedia and GeoNames. We applied our system to 10% of the English Wikipedia and we evaluated its performance. We managed to extract 27,500 high-confidence event instances. Although SRL is not an error-free tech-nique, we show that it is an effective tool, as the definition of the arguments (or roles) used in our analysis and the event properties are, most of the time, nearly identical. We evaluated the results on a randomly selected sample of 100 events and we report F-measures of up to 73. The extracted events are available online from a SPARQL endpoint 1 .
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a named entity recognition (NER) system for short text messages (SMS) running on a mobile platform. Most NER systems deal with text that is structured, formal, well written, with a good grammatical structure, and few spelling errors. SMS text messages lack these qualities and have instead a short-handed and mixed language studded with emoticons, which makes NER a challenge on this kind of material.We implemented a system that recognizes named entities from SMSes written in Swedish and that runs on an Android cellular telephone. The entities extracted are locations, names, dates, times, and telephone numbers with the idea that extraction of these entities could be utilized by other applications running on the telephone. We started from a regular expression implementation that we complemented with classifiers using logistic regression. We optimized the recognition so that the incoming text messages could be processed on the telephone with a fast response time. We reached an F-score of 86 for strict matches and 89 for partial matches.
    Procedia - Social and Behavioral Sciences 01/2011; 27:178–187. DOI:10.1016/j.sbspro.2011.10.596
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper reports on the work on a new service using text mining on SMS data: SMSTrends. The service extracts trends in the form of keywords from SMS messages sent and received by ad hoc location-based communities of users. Trends are then presented to the user using a phone widget, which is regularly updated to show the latest trends. This allows the user to see what the user community is texting about, and makes her aware of what is going on in this community. Privacy considerations of the service are governed by user expectations and regulations. Brenner and Wang discussed mining of personal communication in operator bit pipes. We expand on this by looking deeper into privacy and regulatory aspects through the specific example of SMSTrends. Especially, the use of adaptive location granularity selection is introduced.
    Intelligence in Next Generation Networks (ICIN), 2010 14th International Conference on; 11/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a knowledge integration framework for robotics, whose goal is to represent, store, adapt, and distribute knowledge across engineering platforms. The architecture abstracts the components as data sources, where data are available in the AutomationML data exchange format. AutomationML is an on-going standard initiative that aims at unifying data representation and APIs used by engineering tools. A triplification procedure converts native formats used by data sources into RDF triples and then exposes them via a SPARQL endpoint. The triplification step has been implemented for the CAEX top level and logic data parts of AutomationML, where the conversion uses XSLT rules.
    Robotics (ISR), 2010 41st International Symposium on and 2010 6th German Conference on Robotics (ROBOTIK); 07/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This demonstration presents a high-performance syntactic and semantic dependency parser. The system consists of a pipeline of modules that carry out the to-kenization, lemmatization, part-of-speech tagging, dependency parsing, and semantic role labeling of a sentence. The system's two main components draw on improved versions of a state-of-the-art dependency parser (Bohnet, 2009) and semantic role labeler (Björkelund et al., 2009) developed independently by the authors. The system takes a sentence as input and produces a syntactic and semantic annotation using the CoNLL 2009 format. The processing time needed for a sentence typically ranges from 10 to 1000 milliseconds. The predicate--argument structures in the final output are visualized in the form of segments, which are more intuitive for a user.
    COLING 2010, 23rd International Conference on Computational Linguistics, Demonstrations Volume, 23-27 August 2010, Beijing, China; 01/2010
  • Source
    Stefan Karlsson, Pierre Nugues
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes experiments to extract discourse relations holding between two text spans in Swedish. We considered three relation types: cause-explanation-evidence (CEV), contrast, and elaboration and we extracted word pairs eliciting these relations. We determined a list of Swedish cue phrases marking explicitly the relations and we learned the word pairs automatically from a corpus of 60 million words. We evaluated the method by building two-way classifiers and we obtained the results: Contrast vs. Other 67.9%, CEV vs. Other 57.7%, and Elaboration vs. Other 52.2%. The conclusion is that this technique, possibly with improvements or modifications, seems usable to capture discourse relations in Swedish.
    Advances in Natural Language Processing, 7th International Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 16-18, 2010; 01/2010
  • Source
    Peter Nilsson, Pierre Nugues
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a search procedure to discover optimal feature sets for dependency parsers. The search applies to the shift-reduce algorithm and the feature sets are extracted from the parser configuration. The initial feature is limited to the first word in the input queue. Then, the procedure uses a set of rules founded on the assumption that topological neighbors of significant features in the dependency graph may also have a significant contribution. The search can be fully automated and the level of greediness adjusted with the number of features examined at each iteration of the discovery procedure. Using our automated feature discovery on two corpora, the Swedish corpus in CoNLL-X and the English corpus in CoNLL 2008, and a single parser system, we could reach results comparable or better than the best scores reported in these evaluations. The CoNLL 2008 test set contains, in addition to a Wall Street Journal (WSJ) section, an out-of-domain sample from the Brown corpus. With sets of 15 features, we obtained a labeled attachment score of 84.21 for Swedish, 88.11 on the WSJ test set, and 81.33 on the Brown test set.
    COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23-27 August 2010, Beijing, China; 01/2010

Publication Stats

866 Citations
14.92 Total Impact Points

Institutions

  • 2002–2014
    • Lund University
      • Department of Computer Science
      Lund, Skåne, Sweden
  • 2005
    • Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
      Caen, Lower Normandy, France
  • 2001
    • Université de Caen Basse-Normandie
      Caen, Lower Normandy, France