Sara Lafia

Sara Lafia
University of Michigan | U-M · Inter-University Consortium for Political and Social Research

Doctor of Philosophy

About

19
Publications
1,778
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
63
Citations
Introduction
Interests include data curation, discovery, and visualization; the Semantic Web; and GIS education. Recently completed my PhD in Geography at UC Santa Barbara.
Additional affiliations
September 2014 - July 2020
University of California, Santa Barbara
Position
  • PhD Student
Education
September 2014 - July 2020
September 2009 - June 2014
California State Polytechnic University, Pomona
Field of study
  • Urban Planning, Geography, Regenerative Studies

Publications

Publications (19)
Article
Full-text available
Data archives are an important source of high-quality data in many fields, making them ideal sites to study data reuse. By studying data reuse through citation networks, we are able to learn how hidden research communities – those that use the same scientific datasets – are organized. This paper analyzes the community structure of an authoritative...
Preprint
Full-text available
Discovering authoritative links between publications and the datasets that they use can be a labor-intensive process. We introduce a natural language processing pipeline that retrieves and reviews publications for informal references to research datasets, which complements the work of data librarians. We first describe the components of the pipelin...
Preprint
Full-text available
Data archives are an important source of high quality data in many fields, making them ideal sites to study data reuse. By studying data reuse through citation networks, we are able to learn how hidden research communities - those that use the same scientific datasets - are organized. This paper analyzes the community structure of an authoritative...
Article
Despite large public investments in facilitating the secondary use of data, there is little information about the specific factors that predict data's reuse. Using data download logs from the Inter‐university Consortium for Political and Social Research (ICPSR), this study examines how data properties, curation decisions, and repository funding mod...
Preprint
Data citations provide a foundation for studying research data impact. Collecting and managing data citations is a new frontier in archival science and scholarly communication. However, the discovery and curation of research data citations is labor intensive. Data citations that reference unique identifiers (i.e. DOIs) are readily findable; however...
Preprint
Full-text available
Data curation is the process of making a dataset fit-for-use and archiveable. It is critical to data-intensive science because it makes complex data pipelines possible, makes studies reproducible, and makes data (re)usable. Yet the complexities of the hands-on, technical and intellectual work of data curation is frequently overlooked or downplayed....
Article
Full-text available
Research in geographic information science has not yet found clear answers to the questions of what geographic information is about or what a geographic information system (GIS) contains. This lack of consensus makes it especially challenging to teach and learn GIS. Existing pedagogical approaches either focus on the representational level of data...
Preprint
Full-text available
This paper describes a machine learning approach for annotating and analyzing data curation work logs at ICPSR, a large social sciences data archive. The systems we studied track curation work and coordinate team decision-making at ICPSR. Repository staff use these systems to organize, prioritize, and document curation work done on datasets, making...
Article
Full-text available
The institutional review of interdisciplinary bodies of research lacks methods to systematically produce higher-level abstractions. Abstraction methods, like the “distant reading” of corpora, are increasingly important for knowledge discovery in the sciences and humanities. We demonstrate how abstraction methods complement the metrics on which rese...
Preprint
Institutional reviews typically rely on scientometrics, like the h-index and impact factors of their participants, to assess research productivity. Productivity is not the only review criterion however, and scientometrics can be difficult to generate and compare in multidisciplinary settings. “Distant reading” methods from the Digital Humanities ca...
Conference Paper
Full-text available
It is challenging for scholars to discover thematically related research in a multidisciplinary setting, such as that of a university library. In this work, we use spatialization techniques to convey the relatedness of research themes without requiring scholars to have specific knowledge of disciplinary search terminology. We approach this task con...
Conference Paper
Full-text available
We describe a method and system design for improved data discovery in an integrated network of open geospatial data that supports collaborative policy development between governments and local constituents. Metadata about civic data (such as thematic categories, user-generated tags, geo-references, or attribute schemata) primarily rely on technical...
Conference Paper
Full-text available
Access to public data in the United States and elsewhere has steadily increased as governments have launched geospatially-enabled web portals like Socrata, CKAN, and Esri Hub. However, data discovery in these portals remains a challenge for the average user. Differences between users' colloquial search terms and authoritative metadata impede data d...
Article
Full-text available
Georeferencing is the process of aligning a text description of a geographic location with a spatial location based on a geographic coordinate system. Training aids are commonly created around the georeferencing process to disseminate community standards and ideas, guide accurate georeferencing, inform users about new tools, and help users evaluate...
Article
Current publishing practices in academia tend to result in datasets that are difficult to discover. This is because datasets are not well-integrated across academic domains and they are often not linked to the documents that reference them. For these reasons, discovering datasets across domains can be challenging; for example, discovering archeolog...
Conference Paper
Farmers face pressure to respond to unpredictable weather, the spread of pests, and other variable events on their farms. This paper proposes a framework for data aggregation from diverse sources that extracts named places impacted by events relevant to agricultural practices. Our vision is to couple natural language processing, geocoding, and exis...
Conference Paper
Full-text available
We explore the idea of spatial lenses as pieces of software interpreting data sets in a particular spatial view of an environment. The lenses serve to prepare the data sets for subsequent analysis in that view. Examples include a network lens to view places in a literary text, or a field lens to interpret pharmacy sales in terms of seasonal allergy...
Article
Full-text available
Academic libraries have always supported research across disciplines by integrating access to diverse contents and resources. They now have the opportunity to reinvent their role in facilitating interdisciplinary work by offering researchers new ways of sharing, curating, discovering, and linking research data. Spatial data and metadata support thi...

Network

Cited By

Projects

Projects (2)
Project
In the larger context of a project on using spatial information and GIS for information search at libraries, we are now focusing on how to make research data discoverable and accessible. The two central ideas are (1) to complement search by author and theme with search by location(s) a data set is about, and (2) to link publications to the data and vice versa. http://spatial.ucsb.edu/research/spatial-discovery
Project
To identify and formally specify what spatial information is about, at a level above data models, but independent of particular application domains. The set of core concepts consists of four content concepts (field, object, network, event) and three quality concepts (granularity, accuracy, provenance). It serves as a formal basis for asking and answering spatial questions.