(A) Efficiency of communication from focal field i to target field j (E ij = 1 − C ij ) decays with the distance to field j differently for different fields. Here we show the slowest decay (option pricing, dashed line) and the fastest decay (environmental toxicology, solid line) out of the 60 fields (see supporting information for others). (B) Decay rate γ plotted for behavioral science fields (orange) and biological science fields (blue). Focal fields with fast decay tend to have few fields nearby with which communication is efficient. Those with slow decay have several neighbors with relatively small cultural holes, that is, efficient communication. Distance is computed using the normalized shortest path: the average shortest path from a paper in field i to a paper in field j divided by the average shortest path from a paper in field i to another paper in field i. We subtract 1 from this value so that the normalized shortest path from a field to itself is 0. This normalization allows us to account for differences in citation norms that cause focal fields to be tightly or loosely connected, that is, to have a short or long average path distance within field.  

(A) Efficiency of communication from focal field i to target field j (E ij = 1 − C ij ) decays with the distance to field j differently for different fields. Here we show the slowest decay (option pricing, dashed line) and the fastest decay (environmental toxicology, solid line) out of the 60 fields (see supporting information for others). (B) Decay rate γ plotted for behavioral science fields (orange) and biological science fields (blue). Focal fields with fast decay tend to have few fields nearby with which communication is efficient. Those with slow decay have several neighbors with relatively small cultural holes, that is, efficient communication. Distance is computed using the normalized shortest path: the average shortest path from a paper in field i to a paper in field j divided by the average shortest path from a paper in field i to another paper in field i. We subtract 1 from this value so that the normalized shortest path from a field to itself is 0. This normalization allows us to account for differences in citation norms that cause focal fields to be tightly or loosely connected, that is, to have a short or long average path distance within field.  

Source publication
Article
Full-text available
Divergent interests, expertise, and language form cultural barriers to communication. No formalism has been available to characterize these “cultural holes.” Here we use information theory to measure cultural holes and demonstrate our formalism in the context of scientific communication using papers from JSTOR. We extract scientific fields from the...

Similar publications

Conference Paper
Full-text available
In the modern digital cinema production, extremely large volumes (in order of 10s of TB) of footage data are captured every day. The process of cataloging and reviewing such footage is nowadays largely manual and time consuming process. In our work, we aim at technical quality aspects, such as correct exposure, color compatibility of adjacent shots...
Article
Full-text available
Different library departments must work together, both formally and informally, in implementing encoded archival description and in repackaging descriptive information about archival collections to other formats, particularly machine-readable cataloging. The authors, one a technical services librarian and the other a special collections archivist,...
Article
Full-text available
p>Summarizing the most recent studies about scientific approaches to the theme of building reuse, we propose a metadesign strategy applied to the single hall churches in Catania’s historic centre. We have performed a census and analyses on them from a morphological, technical and thermo-physical point of view. We also conducted an analysis of the u...
Article
Full-text available
Metadata librarian positions have been increasing in academic and research libraries in the last decade, paralleling the expanded provision of, and thus description of and access to, digital resources. Library literature has only begun to explore the significance and implications of this new, still evolving role. In the context of a twenty-first-ce...
Article
Full-text available
In the current competitive and dynamic environment, libraries must remain agile and flexible, as well as open to new ideas and ways of working. Based on a comparative case study of two academic libraries in Belgium, this research study investigates the opportunities of using Time-Driven Activity-Based Costing (TDABC) to benchmark library processes....

Citations

... It is especially prevalent in scholarly writing, where researchers use a rich repertoire of lexical choices to communicate. However, niche vocabularies can become a barrier between fields (Vilhena et al., 2014;Martínez and Mammola, 2021;Freeling et al., 2019), and between scientists and the general public (Liu et al., 2022;August et al., 2020a;Cervetti et al., 2015;Freeling et al., 2021). Identifying scholarly jargon is an initial step for designing resources and tools that can increase the readability and reach of science (August et al., 2022a;Plavén-Sigray et al., 2017;Rakedzon et al., 2017). ...
... Language differences among subsets of data can be measured to a variety of approaches, from geometric to information theoretic (Ramesh Kashyap et al., 2021;Vilhena et al., 2014;Aharoni and Goldberg, 2020). We calculate the association of a word's type or sense to subfields using normalized pointwise mutual information (NPMI). ...
... The linguistic insularity of science varies across fields. For example, Vilhena et al. (2014) found that phrase-level jargon separates biological sciences more so than behavioral and social sciences. In addition, articles written by social scientists are sense t1 Table 3: Top five words that have senses associated with each field (S f (t) > 0.1), ordered by the difference ∆ between word-level sense and type NPMI. ...
Preprint
Full-text available
Scholarly text is often laden with jargon, or specialized language that divides disciplines. We extend past work that characterizes science at the level of word types, by using BERT-based word sense induction to find additional words that are widespread but overloaded with different uses across fields. We define scholarly jargon as discipline-specific word types and senses, and estimate its prevalence across hundreds of fields using interpretable, information-theoretic metrics. We demonstrate the utility of our approach for science of science and computational sociolinguistics by highlighting two key social implications. First, we measure audience design, and find that most fields reduce jargon when publishing in general-purpose journals, but some do so more than others. Second, though jargon has varying correlation with articles' citation rates within fields, it nearly always impedes interdisciplinary impact. Broadly, our measurements can inform ways in which language could be revised to serve as a bridge rather than a barrier in science.
... Our approach builds on the long-standing tradition in the science of science that uses citation networks and text analysis of scientific papers to embody the flow of ideas in science and map its structure, as well as the distribution and spread of knowledge within it 5,[10][11][12][13][14] . Yet the citation networks and the textual similarity between fields are not always aligned. ...
... Yet the citation networks and the textual similarity between fields are not always aligned. There are commonly more citations between fields than we would expect on the basis of the textual similarity of their papers, or conversely, more similarity in the text than we would expect given the number of citations flowing between those fields 12,15 . ...
... In this line of thinking, the misalignment of citations and textual similarity is simply beside the point and does not impact the larger goal of mapping science. In the science of science, meanwhile, the misalignment is taken as a sign that any model of diffusion or communication between scientific fields needs to take both citations and textual similarity into account 11,12,15 . ...
Article
Full-text available
Citations and text analysis are both used to study the distribution and flow of ideas between researchers, fields and countries, but the resulting flows are rarely equal. We argue that the differences in these two flows capture a growing global inequality in the production of scientific knowledge. We offer a framework called ‘citational lensing’ to identify where citations should appear between countries but are absent given that what is embedded in their published abstract texts is highly similar. This framework also identifies where citations are overabundant given lower similarity. Our data come from nearly 20 million papers across nearly 35 years and 150 fields from the Microsoft Academic Graph. We find that scientific communities increasingly centre research from highly active countries while overlooking work from peripheral countries. This inequality is likely to pose substantial challenges to the growth of novel ideas.
... In our research agenda, we leverage research in natural language processing, information retrieval, data mining and human-computer interaction and draw concepts from multiple disciplines. For example, efforts in metascience focus on sociological factors that influence the evolution of science [25], e.g., analyses of information silos that impede mutual understanding and interaction [53] and analyses of macro-scale ...
Preprint
We stand at the foot of a significant inflection in the trajectory of scientific discovery. As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge and discourse. We now read and write papers in digitized form, and a great deal of the formal and informal processes of science are captured digitally -- including papers, preprints and books, code and datasets, conference presentations, and interactions in social networks and communication platforms. The transition has led to the growth of a tremendous amount of information, opening exciting opportunities for computational models and systems that analyze and harness it. In parallel, exponential growth in data processing power has fueled remarkable advances in AI, including self-supervised neural models capable of learning powerful representations from large-scale unstructured text without costly human supervision. The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself. However, the explosion of scientific data, results and publications stands in stark contrast to the constancy of human cognitive capacity. While scientific knowledge is expanding with rapidity, our minds have remained static, with severe limitations on the capacity for finding, assimilating and manipulating information. We propose a research agenda of task-guided knowledge retrieval, in which systems counter humans' bounded capacity by ingesting corpora of scientific knowledge and retrieving inspirations, explanations, solutions and evidence synthesized to directly augment human performance on salient tasks in scientific endeavors. We present initial progress on methods and prototypes, and lay out important opportunities and challenges ahead with computational approaches that have the potential to revolutionize science.
... Learning to interpret protein structures is therefore one of the fundamental tasks of a student in an introductory biochemistry course. This topic is traditionally considered difficult, and analysis of semantic distance between fields shows that molecular biology and biochemistry are culturally isolated from other disciplines (3). Therefore, a large corpus of fieldspecific language must be learned starting in the introductory classes, even without considering the information-packed graphical symbology used to express chemical structures. ...
Article
Full-text available
A major challenge for science educators is teaching foundational concepts while introducing their students to current research. Here we describe an active learning module developed to teach protein structure fundamentals while supporting ongoing research in enzyme discovery. It can be readily implemented in both entry-level and upper-division college biochemistry or biophysics courses. Preactivity lectures introduced fundamentals of protein secondary structure and provided context for the research projects, and a homework assignment familiarized students with 3-dimensional visualization of biomolecules with UCSF Chimera, a free protein structure viewer. The activity is an online survey in which students compare structure elements in papain, a well-characterized cysteine protease from Carica papaya, to novel homologous proteases identified from the genomes of an extremophilic microbe (Halanaerobium praevalens) and 2 carnivorous plants (Drosera capensis and Cephalotus follicularis). Students were then able to identify, with varying levels of accuracy, a number of structural features in cysteine proteases that could expedite the identification of novel or biochemically interesting cysteine proteases for experimental validation in a university laboratory. Student responses to a postactivity survey were largely positive and constructive, describing points in the activity that could be improved and indicating that the activity was an engaging way to learn about protein structure.
... Based on these Wikipedia pages, we computed the communication burden of every idea as an indicator of the idea's novelty to history 57 . Especially, the communication burden addresses the rare occurrence (i.e., novelty) of a word or text when the texts belong to different categories (i.e., one idea vs. all Wikipedia pages) 57 . For every idea, we first computed the communication burdens of all words: given the Wikipedia pages collection Q, a word w, and the focused idea d, the communication burden of w was calculated as follows 57 : where p w,d is equal to the number of times w appeared in d divided by the number of all words' appearances in d; p w,Q equals the number of times w appeared in Q divided by the number of all words' appearances in Q. ...
Article
Full-text available
Previous studies demonstrate that people with less professional knowledge can achieve higher performance than those with more professional knowledge in creative activities. However, the factors related to this phenomenon remain unclear. Based on previous Discussions in cognitive science, we hypothesised that people with different amounts of professional knowledge have varying attention deployment patterns, leading to different creative performances. To examine our hypothesis, we analysed two datasets collected from a web-based survey and a popular online shopping website, Amazon.com (United States). We found that during information processing, people with less professional knowledge tended to give their divided attention, which positively affected creative performances. Contrarily, people with more professional knowledge tended to give their concentrated attention, which had a negative effect. Our results shed light on the relation between the amount of professional knowledge and attention deployment patterns, thereby enabling a deeper understanding of the factors underlying the different creative performances of people with varying amounts of professional knowledge.
... Learning to interpret protein structures is therefore one of the fundamental tasks of a student in an introductory biochemistry course. This topic is traditionally considered difficult, and analysis of semantic distance between fields shows that molecular biology and biochemistry are culturally isolated from other disciplines (3). This means a large corpus of field-specific language must be learned starting in the introductory classes, even without considering the informationpacked graphical symbology used to express chemical structures. ...
Preprint
Full-text available
A major challenge for science educators is teaching foundational concepts while introducing their students to current research. Here we describe an active learning module developed to teach protein structure fundamentals while supporting ongoing research in enzyme discovery. It can be readily implemented in both entry-level and upper-division college biochemistry or biophysics courses. Pre-activity lectures introduced fundamentals of protein secondary structure and provided context for the research projects, while a homework assignment familiarized students with 3D visualization of biomolecules using UCSF Chimera, a free protein structure viewer. The activity is an online survey in which students compare structure elements in papain, a well-characterized cysteine protease from Carica papaya , to novel homologous proteases identified from the genomes of an extremophilic microbe ( Halanaerobium praevalens ) and two carnivorous plants ( Drosera capensis and Cephalotus follicularis ). Students were then able to identify, with varying levels of accuracy, a number of structural features in cysteine proteases that could expedite the identification of novel or biochemically interesting cysteine proteases for experimental validation in a university laboratory. Student responses to a post-activity survey were largely positive and constructive, indicating that the activity helped them learn about protein structure and describing points in the activity that could be improved.
... The scientific literature consists of complex models and theories, specialized language, and an endless diversity of continuously emerging concepts. Connecting blindly across these cultural boundaries requires significant cognitive effort [42], translating to time and resources most researchers are unlikely to have at their disposal to enter unfamiliar research territory. 2 Our vision in this paper is to develop an approach that boosts scientific innovation and builds bridges across scientific communities, by helping scientists discover authors that spark new ideas for research. ...
Preprint
Full-text available
Scientific silos can hinder innovation. These information "filter bubbles" and the growing challenge of information overload limit awareness across the literature, making it difficult to keep track of even narrow areas of interest, let alone discover new ones. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these bubbles. In response, we describe Bridger, a system for facilitating discovery of scholars and their work, to explore design tradeoffs among relevant and novel recommendations. We construct a faceted representation of authors using information extracted from their papers and inferred personas. We explore approaches both for recommending new content and for displaying it in a manner that helps researchers to understand the work of authors who they are unfamiliar with. In studies with computer science researchers, our approach substantially improves users' abilities to do so. We develop an approach that locates commonalities and contrasts between scientists---retrieving partially similar authors, rather than aiming for strict similarity. We find this approach helps users discover authors useful for generating novel research ideas of relevance to their work, at a higher rate than a state-of-art neural model. Our analysis reveals that Bridger connects authors who have different citation profiles, publish in different venues, and are more distant in social co-authorship networks, raising the prospect of bridging diverse communities and facilitating discovery.
... Information theory scholars seek to expand their frameworks to incorporate meaning into the analysis of scientific communication (Leydesdorff et al., 2018(Leydesdorff et al., , 2017. For instance, Vilhena et al. (2014) find that structural holes (cf. Pachucki and Breiger, 2010) and cultural holes overlap but not coincide in science, underlining the importance of studying not only citation networks but also the content of scientific communication. ...
Article
Full-text available
How do new scientific ideas diffuse? Computational studies reveal how network structures facilitate or obstruct diffusion; qualitative studies demonstrate that diffusion entails the continuous translation and transformation of ideas. This article bridges these computational and qualitative approaches to study diffusion as a complex process of continuous adaptation. As a case study, we analyze the spread of Granovetter's Strength of Weak Ties hypothesis, published in American Journal of Sociology in 1973. Through network analysis, topic modeling and a close reading of a diffusion network created using Web of Science data, we study how different communities in this network interpret and develop Granovetter's hypothesis in distinct ways. We further trace how these communities originate, merge and split, and examine how central scholars emerge as community leaders or brokers in the diffusion process.
... Standardized terminology specific to given disciplines, i.e., jargon, plays an important role in scientific communication by making explicit the assumptions that underlie concepts (Fauth et al. 1996) and by compressing language, specialists can communicate about technical terms without long explanations (Jones et al. 1997, Vilhena 2014. But in the absence of standardized terminology, there can be miscommunication as well as run-away creation and use of new jargon that enables longestablished ideas to be presented as novel, and thereby hinders the maturation of ideas (Velland 2010). ...
Article
Full-text available
The very presence of predators can strongly influence flexible prey‐traits such as behavior, morphology, life history and physiology. In a rapidly growing body of literature representing diverse ecological systems, these trait (or “fear”) responses have been shown to influence prey fitness‐components and density, and to have indirect effects on other species. However, this broad and exciting literature is burdened with inconsistent terminology that is likely hindering the development of inclusive frameworks and general advances in ecology. We examine the diverse terminology used in the literature, and discuss pros and cons of the many terms used. Common problems include the same term being used for different processes, and many different terms being used for the same process. To mitigate terminological barriers, we developed a conceptual framework that explicitly distinguishes the multiple predation‐risk effects studied. These multiple effects, along with suggested standardized terminology, are: risk‐induced trait responses (i.e., effects on prey traits), interaction modifications (i.e., effects on prey‐other species interactions), nonconsumptive effects (i.e., effects on the fitness and density of the prey), and trait‐mediated indirect effects (i.e., the effects on the fitness and density of other species). We apply the framework to three well studied systems to highlight how it can illuminate commonalities and differences among study systems. By clarifying and elucidating conceptually similar processes, the framework and standardized terminology can facilitate communication of insights and methodologies across systems and foster cross‐disciplinary perspectives.
... Others show that small teams tend to produce work that introduces novel and disruptive ideas in science and technology, whereas large teams tend to develop existing ideas further (Wu et al. 2019). Advances in text analysis have enabled scholars to shed new light on similarities and differences between disciplines as well (e.g., , McMahan & Evans 2018, Vilhena et al. 2014. For example, McMahan & Evans (2018) develop a measure to capture the ambiguity of language within scientific articles. ...
... Ambiguous language also produces more integrated citation streams, which stimulates more involved academic debate. Vilhena et al. (2014) draw on the concept of cultural holes to map differences in disciplinary jargon within and across fields. Although these language-based gaps do not map neatly onto structural holes within citation networks, they nevertheless inhibit efficient communication between scientists. ...
Article
Full-text available
The integration of social science with computer science and engineering fields has produced a new area of study: computational social science. This field applies computational methods to novel sources of digital data such as social media, administrative records, and historical archives to develop theories of human behavior. We review the evolution of this field within sociology via bibliometric analysis and in-depth analysis of the following subfields where this new work is appearing most rapidly: ( a) social network analysis and group formation; ( b) collective behavior and political sociology; ( c) the sociology of knowledge; ( d) cultural sociology, social psychology, and emotions; ( e) the production of culture; ( f ) economic sociology and organizations; and ( g) demography and population studies. Our review reveals that sociologists are not only at the center of cutting-edge research that addresses longstanding questions about human behavior but also developing new lines of inquiry about digital spaces as well. We conclude by discussing challenging new obstacles in the field, calling for increased attention to sociological theory, and identifying new areas where computational social science might be further integrated into mainstream sociology. Expected final online publication date for the Annual Review of Sociology, Volume 46 is July 30, 2020. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.