Conference PaperPDF Available

Reconstruction of Socio-Semantic Dynamics in Sciences-Society Networks: Methodology and Epistemology of large textual corpora analysis, Communication to the Science and Democracy Network / Annual Meeting, Paris 25-27 June 2012

Authors:

Abstract and Figures

Until recent time, the description, light-modeling and interpretation of socio-cognitive dynamics of science-society relations required a constructivist approach, involving collecting, reading, classifying and interpreting tasks performed by scholars examining sets of texts, archives, interviews, etc. The growing mass of data produced in the so-called Knowledge Society owes a lot to the acceleration and profusion of digital tools that are now widely used in different areas of human activities: work, culture, leisure, political expression, etc. Social scientists now largely acknowledge that the various modes of interaction brought by new information and communication technologies are changing the very nature of micro-politics and the expression of the self. In our views the conditions for producing knowledge from a Science & Technology Studies point of view are changed too, for at least three reasons: • the deluge of electronic sources of data overloads our capacity of enquiry, • S&TS dynamics now intertwine heterogeneous actors, matters of facts and matters of concerns coming from different arenas call for an integrated understanding of knowledge production and circulation. • Nevertheless, new digital infrastructures specifically designed for social sciences and humanities make it possible to equip scientists with tools that enable them to tackle the complexity of heterogeneous textual corpora dynamics and to develop innovative analytical methodologies that will bring new insights and renewed capacities to investigate contemporary issues.
Content may be subject to copyright.
1
Science and Democracy Network / Annual Meeting, Paris 25-27 June 2012
Reconstruction+ of+ Socio‐Semantic+ Dynamics+ in+ Sciences‐
Society+ Networks:+ Methodology+ and+ Epistemology+ of+ large+
textual+corpora+analysis+
Marc%Barbier%and%Jean-Philippe% Cointet% (INRA%SenS,%CorTexT%Digital% Platform%
of%IFRIS)%
Abstract:+
Until recent time, the description, light-modeling and interpretation of socio-cognitive
dynamics of science-society relations required a constructivist approach, involving collecting,
reading, classifying and interpreting tasks performed by scholars examining sets of texts,
archives, interviews, etc.
The growing mass of data produced in the so-called Knowledge Society owes a lot to the
acceleration and profusion of digital tools that are now widely used in different areas of
human activities: work, culture, leisure, political expression, etc. Social scientists now largely
acknowledge that the various modes of interaction brought by new information and
communication technologies are changing the very nature of micro-politics and the
expression of the self. In our views the conditions for producing knowledge from a Science &
Technology Studies point of view are changed too, for at least three reasons:
the deluge of electronic sources of data overloads our capacity of enquiry,
S&TS dynamics now intertwine heterogeneous actors, matters of facts and matters of
concerns coming from different arenas call for an integrated understanding of
knowledge production and circulation.
Nevertheless, new digital infrastructures specifically designed for social sciences and
humanities make it possible to equip scientists with tools that enable them to tackle
the complexity of heterogeneous textual corpora dynamics and to develop innovative
analytical methodologies that will bring new insights and renewed capacities to
investigate contemporary issues.
Many researchers are paddling at present in raising projects and building facilities to
concretize those digital infrastructures, but the implication and the interest of the STS
communities remains shy, and possibly bound by a foundational sense of skepticism towards
technological promises of any kind. The aim of this communication is to propose (1) to
discuss some of the epistemic problems that surge from the use of digital platforms in STS
ambitioning at developing our capacities of enquiry of science and technology in society; (2)
present the main developments that had been led within the CorTexT plateform as well as
their driving principles.
2
1 Introduction
The growing mass of data produced in a so-called Knowledge Society owes a lot to the
acceleration and profusion of electronic data affecting most areas of human activities: work,
culture, leisure, political expression, etc. Social Sciences now widely acknowledge that the
various modes of interaction brought by new communication technology are changing the
very nature of micro-politics and the expression of the self. In our view the conditions for
producing knowledge from a Sciences & Technology Studies point of view are changed too,
for three reasons:
(i) The deluge of electronic sources of data overloads our capacity of enquiry;
(ii) S&TS dynamics now intertwine heterogeneous actors, matters of facts and matters
of concerns coming from different arenas calling for an integrated understanding of
knowledge production and circulation;
(iii) New digital infrastructures specifically designed for social sciences and humanities
make it possible to equip social scientists with platforms enabling the innovative
analysis of heterogeneous textual corpora.
But abundance of information is certainly not equivalent with abundance of knowledge.
Many critical points of view assume the opposite, claiming that the growth of knowledge is
far to be proportional to the growth of information. This could perhaps explain why many
researchers of the STS communities could remain shy in front the challenge and the possible
use of such tools and instruments, and possibly bound by a foundational sense of skepticism
towards technological promises of any kind. This is why the use of informetrics or
webometrics technologies has to be accompanied with a critical discussion of epistemic
problems that may surge from the use of digital platforms in STS when their use aims at
developing our capacities of enquiry of science and technology in society. Serious questions
are undoubtedly raised: to what extend the constructivist approach of data is changed when
large corpora are mobilized, parsed and analyzed with machines that are black-boxing
statistical inferences, terms extraction algorithms or graph analysis metrics? What are the
new technical empowerments needed for the extension of interpretative strategy in STS
empirical work? Are we simply gaining the benefit of exploring bigger sets of data or does it
change the nature of our enquiry and the arrays of matters of facts? Does these new
methodologies open new perspectives of diachronic and multi level analysis of the
production, the use, circulation and contestation of the scientific enterprise in society?
This communication attempts to propose a first framework in order to give place to both
epistemological questions, methodological and technological issues. In a first section we will
try to provide a frame for questioning the epistemic problems that may surge from digital
platforms when used for developing our capacities of enquiry of science and technology in
society. We think that it is necessary to distinguish between epistemological and
technological issues, which will be addressed in a second section: what is at stake, from a
technical and organization point of view designing a data platform for STS. In the final
section we will depict the various types of analysis that are proposed within the CorTexT
platform as well as driving principles and thus invite our colleagues to use it.
2 Epistemological+issues+
2.1 A+changing+context+of+enquiry+
The ICT revolution has issued a large number of new patterns of communication and
collaborative tasks in various et rather segmented sphere of human activities: in the private
3
sphere of inter-individual exchanges; in the public sphere - even redefining the micro-politics
of using information system -; in the economic sphere while sourcing a new business sector
and engaging any type of professional or organized activities in new ways of being at work.
Therefore, the Web constitutes nowadays a research topic on its own for a range of emerging
research field such as bibliometrics, scientometrics, webometrics or informetrics, adopting
the idea of a measurement of information contained in flux of data that are fueled thanks to
many types of resources, structures and technologies that also build the infrastructure of this
circulation (Thelwall et al., 2006 ; Bar-Ilan, 2008).
Beyond the classical use of bibliographical datasets provided by WOS-like1 databases to
perform citations and bibliometric studies, many social scientists consider the web as a field
of enquiry on its own and are entering (or considering entering) hybrid quanti-qualitative
methods. Counting hits, nodes and links is surely not considered sufficient to turn
information into knowledge, but it may help when it comes to answering questions grounded
in cultural studies, science studies or political sciences. This may even be a necessity when
considering that digital information, which is being produced, echoes changes in human
activity that goes online for so many reasons and purposes, bearing in mind that the relation
established between entities are not virtual but actually the result of free associations
technically operated through multiple communication techniques. Nevertheless the Web as a
collective phenomenon does not translates into a unified humanity sharing cognitive
resources for the production of a common understanding. Political sciences rather insist
insists on the necessity to analyze the web as a mosaic space (Rogers, 2008). A view that
opposes a balkanized model (Sunstein, 2008) of the politics of the web to the long lasting
vision of one unified small world.
2.2 Empowering+the+idiom+of+the+co‐production+
Jasanoff landscape of the idiom of co-production and the specificity of a co-productionist
account of science-society (Jasanoff, 2006) propose a good starting point to position the
practical problem of many STS scholars dealing with the ever-growing mass and availability
of information when it comes to classify and interpret numerous sets of texts, archives,
interviews, etc. (see previous section 1.1).
2.2.1 Dualist+or+even+symmetric+description+of+science+in+society+and+of+society+in+science++
A first characteristic of the over-presence of data is that a large array of practical resources
for the public understanding of scientific activities and production is available. One might
make the “easy-going” hypothesis that the world science, innovation and scientific expertise
are more accessible to “lay-thinkers”. But the web is far from being a large Wikipedia for
training purposes. A second characteristic of the web - known by many of us - is that the web
represents a territory that should be occupied with meaningful declaration conceived as a
resource for action through communication for many actors engaged in the science-society
debates, on both sides. A very large mass of positions, claims, and advocacies - with
reference to science in civic debate or with reference to society in research policy - became
particularly easy to access. It does not mean that the expression on the web is saturating the
set of matters of fact and matters of concerns that has to be empirically accounted for. Many
of the classical in-depth, situated, longitudinal works of enquiry are still required to go
beyond an account of communicational strategy, somehow addressed by media studies.
Nevertheless the nature of this communication is related to the existence of ways of beings of
contemporaneous research activities, or to that of civic and activist interference with
technoscience. The task became harder, since the availability of such discourses, and its
profusion in case of controversies, remains as a sort of arena of political engagement, which
1 Web Of Science is brought by ISI from Thomson Reuters
4
at the same time empowers a symmetric description but also blurs the clarity of discourses.
As a result, it becomes difficult to stay away from the accountability of communicational
discourses online.
2.2.2 The+non‐linear+explanation+of+the+shaping+of+science+and+technology+in+context++
The second idiom is related to the fact that explanation of shaping of science and technology
have to be considered in context, meaning that the potentiality of any scientific or
technological scripts does not perform the world by itself, though one would get a laboratory.
Symmetrically the performativity of any techno-scientific achievements on a given society
corresponds to a redistribution of power relations, convoking STSs to ask question about the
“how” and the “why” and not strictly achieving heavy or thick account of successes or
failures. With this stake, the extension of traceability of many activities and communication
acts themselves represents a resource to keep this ambition high. More than ever, the
mobilization of ICT in scientific practices and the relative openness of science to public
scrutiny represent a possible extension for non-linear explanations. There also, one can claim
for the benefit of ad-hoc techniques with capacity to capture and to make the traces to
“become talkativewithin the interpretative stance of co-constructionists.
2.3 +The+stakes+of+the+politics+of+knowledge++
The classical reflection of STS about the co-construction particularly aims at developing a
symmetric account of ordering Nature through knowledge and technology and of ordering
Society through power and culture. This separation has been for long questioned by Actor-
Network-Theorising, pointing that the process of co-production should be considered
through the monad of translation and free association, considering the performativity of both
human and non human actants, as well as of crystallized arrangements and setting of hybrid
nature recalling Stengers cosmopolitan views.
For many contenders of ANT, the agnostic lecture of performativity tend not to consider any
type of politics that would not be at work within the web of translations, meaning that only
the necessity of a full account of the co-production process should drive the interpretation of
big or micro-politics of translations. If this position does fit with the purpose of symmetry in
order not to introduce critical views and the political affordance of the one who analyze and
interpret reality, still this methodological prudence is raising a problem: that of the closure of
the politics of any process of co-production. To make the argument short: one thing is to
adopt a symmetric attitude of power-relations within the co-production process, the other is
to consider the asymmetry of position of actants in terms of their capacity and force to act at
the governance of those process. ANTist would argue that there is no outside and inside in
the politics of knowledge, only long or short actor-networks, nevertheless not all actants have
a laboratory to raise the world, particularly when those actants are non human or of an
heterogeneous nature (hybrid arrangement and temporary settings). Moreover, as pointed by
many scholars who study the governance of Science, Technology and Innovation (Borras
and Edler, 2012), the co-production became a matter of politics that are not necessarily
assembled in one political process dedicated to a particular co-production process that would
be under study. It means that one might consider the existence of segmented areas of polities
dealing with co-production: the area of specific organization working on its own identity
building, the politics of a particular institution or institutional framework at work in the
governance of knowledge, the politics of establishing research policies connecting epistemic
communities together with stakeholders of innovation and the politics of re-presentation in
representative institution. It seems therefore particularly difficult to establish empirically and
theoretically a politics of hybrids that would be simply be passing through this fragmentation
of the politics of the state of knowledge.
Quoting Silvia Gherardi, we could support the idea that if we are to determine the linkage
among the various connections in action along the spiral from the individual to the institution, we must
5
abandon the idea that the social order is aggregated or negotiated by a plurality of dissonant voices
which eventually blend together to resemble a musical canonGherardi (2006: 220). Another way
of phrasing this idea in our context is to say that the politics of knowledge is not setting an
isotropic field of representations about nature and society co-production. We shall therefore
find in relation to the issues at stake in polities and in-between them. Where and how the co-
production is produced and governed in a texture of practices that are, of course, situated
and enacted but in the same time taking place in a political situation that has been designed
or which emerged to perform and support the realization of the co-production ? We could
argue thus that actors of the politics of knowledge are reflexively conscious of being engaged
into it, and that is changing a lot the type of investigation we shall develop and also the type
of engagement of the STS community as a specific “bound of knowledge” that reflect on the
process under study.
In this perspective, it is as necessary to work on disciplines of power, as on the subjectivation
of apparatus and technology of power-relations that are proposed or imposed by a sovereign
or legitimate centre. This is what M.Foucault confirmed when he said lately that “the
‘dispositif’ is essentially of a strategic nature; it follows that it deals with a certain tampering with
power-struggles, with a rational and concerted intervention within those power-struggles, either to
develop them in a specific direction, or to block them, or else to stabilize them and to use them. The
‘dispositif’, thus, is always encapsulated in relations of power, but it is also always linked to one or more
knowledge bounds that sprang from It, but that also empower its creation (Dits et écrits, Volume
III, p. 299 sq., our translation).
It follows from this particular epistemological debate about the politics of co-production, that
we have to assume that the large availability of arguments, which spring out of the
fragmented arenas of the politics of knowledge, is a resource and a challenge: a resource
since we use them as a draught for our professional engagement as researchers, but also a
challenge since they are possibly active within the world under study. This complexity thus
calls for a more systematic and responsible attitude in STS to capture this state of availability
and fluidity of arguments that spring from various “opinion wells” available on the web.
Willing to capture and interpret the fragmentation of the politics of knowledge on any
specific issues of pool of issues is therefore a methodological response that echoes the
epistemological problems that we just have tried to expose. This methodological response - if
considered as legitimated and constructive for STS whatever their orientation would be,
interactional or constitutive (Jasanoff, 2006: 18-19) - calls for a technological challenge that
one could called a Platform for STS.
3 Designing+a+digital+platform+for+STS+
3.1 The+challenges+
We can distinguish between three types of challenges:
- Scientific, insofar as the modeling of socio-cognitive dynamics from massive
observational data remains a young research eld. We assume that this eld will not
reach maturity until we combine a ne-grained analysis of textual content (relying on
ad hoc computational linguistics rather than plain term statistics) with multi-level
models of complex, hybrid systems (in particular involving heterogeneous networks
at various scales) and a qualitative understanding of the appraisal of these models by
users and practitioners (featuring social studies of science).
- Instrumental, since a comprehensive study of the public space requires taking into
account heterogeneous sources. This implies a variety of methodological and
technical challenges, which are only partly solved, as of today, and remain the focus
6
of ongoing research efforts: massive textual corpora processing, knowledge
extraction (from the identication of named entities to the characterization of
utterance endorsement and, more broadly, “hedging ”), visualization of multi-level
data in such a way that heterogeneous entities dynamics taking place at various time
scales are both accurately and ergonomically represented.
- Political: social sciences have clearly identied the phenomena of co-production of
knowledge and socio-political issues (see previous section), but without proposing a
systematic methodology of analysis of events, traces and discourses that ground the
science-society debates in the communication sphere. Besides, we need to bridge the
gap between qualitative small-scale studies and actual large-scale dynamics that can
be spread over different arenas. Therefore, a better understanding of the dynamics of
frames from the in-vivo observation of textual traces corpora in these arenas should
allow a better identication of opportunities/moments of mutation and modulation
of the scientic-technical trajectories. On the other hand, automatically drawn maps
can help practitioners to improve, for instance, the quality of deliberations.
In this context, we wish to target empirical domains – mostly define by the definition o
research matters of enquiry related to matters of concerns in society- where we expect to
witness a most signicant urge in the coming years in order be able to afford the description
of black-boxing of normative choices that accompanies any process of shaping the social and
nature in scientific and technological productions. Those political embeddings have to be
enlightened and possibly have to be accounted in order to point out the existence of
alternatives or the fact that previous orders are about to be erased. Moreover, we also wish to
equip the co-constructionist project consisting in opening up interpretation of forthcoming
reality in a predictive - but not ballistic - attempt with the willingness to enlighten the relation
of communities or groups of interests to possible common futures.
3.2 Working+on+free‐associations+or+the+come‐back+of+the+co‐word+analysis+with+a+socio‐
semantic+pulse+
3.2.1 Background+
Co-word analysis is a small branch of network analysis, which is largely grounded in Actor-
Network-Theory (Callon et al., 1983) and in the implementation of specific algorithm for
mapping scientific knowledge. Born in relation to the evaluation and policy of science
(Callon et al., 1986; Law et al., 1988), co-word analysis is a critical prolongation of the early
approaches of co-citation (Small, 1973), which largely depends on techniques for full text
indexation. The relevance of co-word analysis for mapping large scientific domain has
received critiques in relation to the significance of the relationships of words and its context
of enunciation (see lately Leydesdorff & Hellstein, 2006). Thus, it is to be noticed that other
types of characterization exist and that we only propose one possible way of characterizing
knowledge dynamics.
At present, the evolution of the analysis of scientific networks is largely attached to the
question of characterizing collaborative and cognitive dynamics of knowledge production
(Powell et al., 2005) and to the emergence of multi or trans-disciplinary emerging fields of
research (Lucio-Arias, Leydesdorff, 2007) or paradigmatic field of research (Chavalarias,
Cointet, 2008). Tracing and mapping knowledge in scientific database or in other electronic
sources represents a huge field of problems for many disciplines dealing with information. It
is also the case for co-word analysis (Mogoutov, Kahane, 2007). More locally, in relation to
specific area of research, mapping heterogeneous networks appears to help the understanding
of social dynamic of research activities (Cambrosio, Keating, Mogoutov, 2004; Cambrosio et
7
al., 2006; Bourret et al., 2006).
Beyond this precise inherited domain, elaborations of new methods of socio-cognitive
dynamics modeling can impact a much wider strategic economic eld of activity: new tools
to explore digital libraries, support computer- assisted scientic innovation and strategic
intelligence (Nederhof and Van Wijk, 1997; Valverde et al., 2007), knowledge extraction
(He, 1999; Sintchenko et al., 2010), tracking of debates and controversies in blogs, medias,
online forums and, more broadly, the digital public space at large (Lazer et al., 2009;
Sunstein, 2007).
The use of those tools in the context of an interactive work with members representing a
scientific community is a significant way of realizing a kind of participatory sociology of
scientific knowledge, trying notably to avoid an evaluative perspective and more to co-design
a situation of using tools in a comprehensive way and in relation to a purpose of maieutic
intervention. This attitude toward network using co-word analysis mapping in interaction
with a scientific community shares many ideas of shifting the use of tool from a scientific
context to a science policy context (Noyons, 2001).
3.2.2 The+Socio‐semantic+turn+
Aside semantic analysis, which has been fostered for long in sociology of science by pioneers
of co-word analysis as well as Natural Language Processing research community, Social
Network Analysis (SNA), has been developing into a “normal science” (Freeman, 2004) for
decades developing its own tools paradigms and conferences. Connecting social dynamics
drawn from the observation of the various interactions between actors and the very nature of
their exchange encapsulated in their shared production is a much more recent endeavor. The
socio-semantic turn tries to bridge the gap between purely structural accounts of the social
dynamics with a more precise account of the very practices of agents. Following Giddens
(1981), “the structure is both the medium and outcome of the social practices it recursively
organizes”. As a result, social structure understanding and more importantly social structure
dynamics comprehension is only possible if one analyses both individual dynamics and
global structure. We claim that semantic analysis is a viable strategy to track human
practices at least when it comes to study knowledge communities. Analyzing both the social
interactions linking actors the one with the others and the production and exchange of
knowledge is key to provide a realistic description of the social dynamics at stake in a given
community.
The socio-semantic turn (Roth, 2006) then proposes to extend the scope of SNA by
integrating knowledge to the very dynamics of the system in a way that tries to give back
some agency to actors, which are not anymore reduced to interchangeable nodes in a social
network but which are through their practices building and moving into a larger semantic
landscape.
3.3 A+multi‐arena+perspective+
If a STS digital platform objective is to follow issues dynamics, then it should be remarked
first that issues emerge and are being transformed in various social places: there is no unique
public space, rather a multiplicity of them. The “frame struggle” around those issues should
be systematically addressed in the different arenas pertinent to a given case study (and
consequently in corpora coming from different sources). Arenas then correspond to the
different public spaces where stakes are defined.
The platform then ambitions to propose a common solution for the analysis of various types
of arenas and related data sources: scientific production (publications, patents), international
8
press articles, web content, legal production, etc. Socio-semantic modeling is sufficiently rich
and generic to provide a general framework for analyzing dynamics pertaining to each arena.
Yet, an issue cannot be described simply by summing observations made in different arenas.
To get a realistic account of the issue one should also consider how they interact. The global
dynamics of the issue is certainly influenced by the overall circulation of entities (human or
non-human actors, pieces of knowledge, problems, promises, concerns...) in these
heterogeneous spaces. This point is still a challenge both from a technical (possibly multi-
lingual arenas, linguistic genders may bias tools outputs according to the considered arena,
etc.) and methodological stance (modeling coupling between two dynamics observed in
different arenas).
4 Presentation+of+the+Digital+Platform+“CorTexT”2+
4.1 Principles+of+the+Design+of+the+Platform++
Our objective is distinct from the purpose of scientific knowledge production and of
Research Evaluation. The main objective of a platform for STS is to design innovative
methods and tools to model empirical dynamics pertaining to various public issues: we aim
to apply advanced NLP techniques and complex network analysis to heterogeneous textual
corpora in order to track the dynamics of contemporary topics. Though one would find a lot
of continuity in terms of technological and algorithmic questions and design, we rather insist
in our project on developing a technology, meaning enabling new capacities in a co-design
way-of-developing, which in turn may renew the precise modeling strategy we set up.
Frame, or framing, is a wide spread notion used in sociology, media studies, or in
communication research. In our case, we dene framing as the way actors try to make sense
of an issue. By structuring its associated concepts. And actors imposing their frames will tend
to impose their own perception of the questions to the public. In this respect, our goal will be
to dene the different frames supported by actors and describe their dynamics: how frames
are built, set, bridged, aligned, extended and transformed? Frames evolve according to the
underlying landscape constraining and enacting socio-semantic dynamics.
4.2 A+multi‐level+modeling+of+the+empirical+heterogeneous+dynamics+
We assume that with the help of heterogeneous networks analysis and ne-grained NLP,
frame analysis can be translated into realistic quantitative modeling which in turn could pave
the way to new empirical ndings and theoretical breakthroughs. Indeed socio-semantic
network analysis offers the opportunity to operationalize the notion of frame dynamics.
Frames are structures emerging in a bottom-up fashion from the socio-semantic network.
The description and prediction of phylogenetic phenomena (i.e. development of
clusters/patterns and their continuation or disappearance from a period to another) should
provide us with an operational dynamical model of frames. Moreover this framework will
make it possible to better understand how different heterogeneous networks are coupled and
co-evolve (here, between socio-semantic networks built from sources coming from various
arenas).
4.3 The+attention+to+user+interfaces+and+the+use+of+visual+mapping+
Science mapping has always been one of the driving objective of scientometrics (De Solla
Price, 1976). Mapping science obviously echoes the epistemological objective of portraying
2 The Digital Platform CorTexT is a project of IFRIS, with the support of the LABEX SITES. The Lab
INRA SenS has particularily dedicated forces and skills to develop this project.
9
the scientific as a space on its own. Network analysis has made enormous progresses during
the 2000’s offering efficient and convenient methods through network spatialization
algorithm encapsulated in most libraries and network representation software.
Moreover, mapping provides an intermediary object recalling familiar cognitive habits
regarding classical geographical representations. If bibliometrics, scientometrics or more
generally webometrics approaches have bloomed over the last ten years, the immediate
attractiveness of maps is certainly one of the reasons. Yet network mapping has certainly not
reached the same maturity than geographical mapping: topological representation actually
requires some training for practitioners or may yield some misinterpretation as Euclidean
distances on a specialized network only try to optimally approximate actual topological
distances.
CorText platform clearly pursues the “tradition” of “knowledge mapping” even proposing its
own visualization strategy for spatializing networks3, but also proposing alternative strategies
of information representations, better suited in particular to monitor dynamical properties of
systems under study.
5 How+do+we+practice?
5.1 The+modularity+of+the+Platform:+Dataset,+analysis+and+visualization+
The methodological steps are threefold (see figure 1):
i) Back-Office: dening and collecting the corpora that shall be analyzed in each
domain of application,
ii) Middle Office: implementing linguistic and dynamical reconstruction models;
iii) Front Office: mapping framing dynamics in each application eld as user-friendly
interfaces for sociologists or larger audience (end-users at large) enabling them to
build, manipulate and navigate into socio-cognitive reconstructions
FRAIS E – Document B ANR MODÈLES NUM ÉRIQUE S 2012
www
scientific
corpus
blog
corpus
press
corpus
Socio-semantic networks
1.Linguistic
Processing
sources,
targets and
hedges
extraction
web data
collection
2.Socio-cognitive Modeling
multi-arena socio-semantic networks
multi-level reconstruction
3. Issue Mapping
Web services for the
analysis of public
issues
other
arenas...
public
dissemination
Interactive tools are
co-designed by sociologists
and methodologists enrich
feeds
primary
content
processing
empirical data
analysis
élevage
porcin
jugement
critique
dévelop.
économique
tourisme
élevage
porcin
algues
vertes
t =3
eaux
usées
dévelop.
économique
tourisme
élevage
porcin
algues
vertes
t =3
eaux
usées
Figure 1: Processing chain of the project, textual data are collected in every pertinent arenas and
processed with advanded NLP tools (Task 1) to build socio-semantic networks, which dynamics
modeling (Task 2) enable sociologists or end-user to investigate frame dynamics in each domain
(Task 3).
integrate both dimensions within the same framework. Secondly, they often overlooked
the diversity of arenas where public issues emerge and are being transformed.
The FRA ISE consortium gathers computational linguists, complex system scholars,
and social scientists in a unique interdisciplinary endeavor to tackle the challenges of
socio-cognitive dynamics modeling by connecting each required dimension. Our pro-
posal is innovative for four reasons:
1. A unifying conceptualization of issues dynamics analysis Frame, or framing, is a
widespread notion used in sociology,media studies, or in communication research.
In our case, we will define framing as the way actors try to make sense of an issue
by structuring its associated concepts. And actors imposing their frames will tend
to impose their own perception of the questions to the public. In this respect, our
goal will be to define the different frames supported by actors and describe their
dynamics: how frames are built, set, bridged, aligned, extended and transformed?
11
3 Spatialization occurs at two levels : nodes positions are constrained both by their relations to other nodes
in the network and by the higher-level community they belong to.
Those steps are not produced without interactions between those who run the platform and
those who use the platform. The production of local of global representations of socio-
cognitive dynamics observed in various cases under study, is issued thanks to a bridge
between the modeling effort previously described and the contribution of sociologists whose
interpretation will enrich the empirical reconstructions.
5.2 The+nature+and+meaning+of+Datasets:+Arenas+identification+and+corpus+collection+and+
normalization++
We grab and set-up into a shared architecture the datasets stemming from various arenas. In
this rst phase, sociologists precisely dene which sources are pertinent to the public issue
they target. At the moment, we are able to propose analysis of various sources corresponding
to various arenas:
• scientic arena which we will mainly track through scientic publications ( WOS
( Thomson Web Of Science), Pubmed - Medline, Cab), projects databases (Cordis,
NSF), and if necessary patent databases (Patstat);
media arena, which essentially corresponds to press articles (both online and ofine
press). We essentially make use of dedicated databases like Factiva to collect French-
and English-speaking thematic corpora;
• legal arena, is to be investigated through the construction of domain-specic corpora
from legifrance, parlex, eurolex, etc., according to the scope of the issue under study;
public opinion arena is of course a crucial “space” for all types of domains, and we
perform a systematic monitoring of the blogosphere as well as, when necessary, crawls
of specic forums and websites.
Beyond the denition of pertinent sources, designing a strategy for dening the appropriate
perimeter of these corpora is necessary. Delineation and extension with lexical or citationist
strategies should be applied each time it is necessary.
5.3 The+nature+and+meaning+of+maps:+visualization+and+user+interface++
The platform aims at equipping researchers with tools for monitoring issues dynamics. These
constraints call for designing web 2.0 tools relying as much as possible on open source
libraries. We benefit from previous consortium experience in designing online interfaces to
produce innovative and informative web services. These interactive representations enable
users to circulate easily between micro and macro levels (from specic documents to high
level general trends), switch between different arenas, or choose to focus on rather actors or
semantic dynamics. Users are thus able to use a series of representation applications to
analyze data collected in each arena. These modules – some of them being still under
development- help STS scholars to appraise the dynamics of co-construction according to
different viewpoints at various resolutions: actors/coalitions, terms/frames, static/dynamic,
mono-arena/multi-arena.
Equipped with such analytical capacities researchers can produce and share visualization
and analysis, and ultimately produce a collaborative interpretation of socio-political
dynamics at stake. Sociologists, as prime end-users, are naturally deeply committed in the
conceptual design of these tools and their feedback will help enhancing them.
6 Conclusion+
Classically, issue frame analysis gives birth to qualitative theoretical models that are well
know in Grounded-Theory and situated action theorizing which bring insightful intuitions
but are not designed to be systematically tested against empirical Data. Those technologies
currently accompany many social scientists in the grounded interpretation of their
ethnographical work. At the same time, attempts to appraise scientific and technological
dynamics in society with quantitative analysis have largely remained an open challenge for
two reasons. Firstly, it has always been difficult to produce a signicantly faithful
representation of the circulation and positioning of actors and their concerns, principally
because they carried a too scarce analysis of textual traces or simply because they had to
choose between focusing on actors interaction networks (SNA studies) or on their concerns
(purely semantic studies) — failing to integrate both dimensions within the same framework.
As a result, the sophisticated socio-semantic nature of public issues remains largely under-
exploited, and hence, under-observed. Public controversies during the last decades have
involved sciences and technology to a large extent: on one hand, the development of techno-
sciences has raised increasing concerns in terms of collective risks, on the other hand,
innovation is connected to key social issues in domains such as energy, health, food security
and carbon management. Such topics have become essentially political inasmuch as they
were dealing with possible or disputed futures, and as they were the focus of massive of
public and private funding. While this could have remained a classical eld of study for
social science, contemporary problems are accompanied by a tremendous amount of
dynamic data due to the proliferation of expertise, public inquiries, audits, think tanks and
web 2.0 discussion platforms: the introduction of modeling, algorithmic and text processing
methods is needed to capture the knowledge dynamics characterizing contemporary issues
and for social scientists not to remain myopic. We aim to develop a new stream of modeling
techniques to appraise these issues in an heterogeneous way (considering actors and
concepts), with constructed topics (modeling issue frames rather than counting terms), and
able to understand the coupling dynamics between various arenas (who is early, inuential,
winning over whom; inspired from whom, and deriving from which earlier issues) — we
expect the potential benefits of this type of really integrated innovation over existing local
approaches to be absolutely critical.
References+
1) Bar-Ilan J., (2008). Informetrics at the beginning of the 21stcentury—A review, Journal of Informetrics
2 (2008) 1–52
2) Borrás S. and Edler J., (2012). The Governance of Change in Socio-Technical and Innovation Systems:
Some Pillars for Theory-Building, Communication to the Jean Monnet International Workshop,
“The Governance of Innovation and Socio-Technical Systems: Theorising and Explaining
Change”, Copenhagen Business School, Denmark, March 1rst -2nd 2012.
3) Bourret P., Mogoutov A., Julian-Reynier C., and Cambrosio A., (2006). A New Clinical Collective for
French Cancer Genetics A Heterogeneous Mapping Analysis, Science, Technology, & Human
Values , 31 (4): 431-464
4) Callon, M., J. Law, A. Rip (1986), Mapping the Dynamics of Science and Technology. London: The
MacMillan Press Ltd.
5) Callon, M., J. P. Courtial, W. A. Turner, S. Bauin (1983), From translations to problematic networks:
An introduction to co-word analysis, Social Science Information, 22: 191-235.
6) Callon, M., J.P. Courtial, W.A. Turner, and S. Bauin (1983). From translations to problematic
networks: An introduction to co-word analysis. Social Science Information 22: 191–235.
7) Cambrosio A., Keating P., Mercier S., Lewisonc G., and Mogoutov A., (2006). Mapping the
emergence and development of translational cancer research, European journal of cancer, 24:
3140-3148
8) Cambrosio A., Keating P., Mogoutov A. (2004). Mapping collaborative work and innovation in
biomedicine: a computer assisted analysis of antibody reagent workshops, Social Studies of
Science, 34 (3): 325-364.
9) Chavalarias D, Cointet JP. (2008). Bottom-up scientific field detection for dynamical and hierarchical
science mapping, methodology and case study, Scientometrics. 75(1): 37-50.
10) Chavalarias, D. and J.P. Cointet (2008). Bottom-up scientific field detection for dynamical and
hierarchical science mapping, methodology and case study. Scientometrics 75: 37–50.
11) Foucault, M., (1994). Dits et Écrits, Volume III, Paris: Gallimard.
12) Freeman L.C., (2004). The Development of Social Network Analysis: A Study in the Sociology of
Science. Vancouver: BC Press.
13) Gherardi S., (2006). Organizational knowledge: the place of workplace learning, Oxford : Blackwell
Publishing.
14) Giddens, A. (1981). Agency, insitution, and time-space analysis. Advances in Social Theory and
Methodology: Toward an Integration of Micro-and Macro-sociologies, Knorr-Cetina, K. and Cicourel,
A.V., Routledge, 8.
15) He, Q. (1999). Knowledge Discovery through Co-Word Analysis. Library Trends.
16) Jasanoff S. (eds), 2006. States of knowledge, the co-production of science and local order, Routledge.
17) Jones, D.S., Cambrosio A., and Mogoutov A., (2011). Detection and characterization of transla-tional
research in cancer and cardiovascular medicine. Journal of Translational Medicine 9: 57
18) Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., Christakis, N., et al.
(2009). SOCIAL SCIENCE: Computational Social Science. Science (New York, NY),
323(5915), 721–723. doi:10.1126/science.1167742
19) Leydesdorff L, Hellsten I., (2006). Measuring the meaning of words in contexts: An automated analysis
of controversies about 'Monarch butterflies,' 'Frankenfoods,' and 'stem cells', Scientometrics, 67
(2): 231-258.
20) Lucio-Arias D, and Leydesdorff L, (2007). Knowledge emergence in scientific communication: from
"fullerenes" to "nanotubes", Scientometrics, 70 (3): 603-632
21) Nederhof, A., & Van Wijk, E. (1997). Mapping the social and behavioral sciences world-wide:
Use of maps in portfolio analysis of national research efforts. Scientometrics, 40(2), 237–276.
doi:http://dx.doi.org/10.1007/BF02457439
22) Noyons E., (2001). Bibliometric mapping of science in a policy context, Scientometrics, 50(1): 83-98.
23) Powell W.W., White D.R., Koput K.W. and Owen-Smith J., (2005). Network dynamics and field
evolution: the growth of interorganizational collaboration in the life sciences, American Journal of
Sociology, 110, pp. 901975.
24) Price, D. J. de S. (1976). A General Theory of Bibliometric and Other Cumulative Advantage
Processes. Journal of the American Society for Information Science and Technology, 27(5--6), 292–
306.
25) Rogers R., (2008). The Politics of Web Space,
26) Roth, C. (2006). Co-evolution in Epistemic Networks -- Reconstructing Social Complex Systems.
Structure and Dynamics: eJournal of Anthropological and Related Sciences, 1(3), article 2.
27) Sintchenko, V., Anthony, S., Phan, X., Lin, F., & Coiera, E. (2010). A PubMed-Wide
Associational Study of Infectious Diseases.
28) Small, H. (1973), Co-citation in scientific literature: A new measure of the relationship between
publications, Journal of the American Society for Information Science, 24: 265-269.
29) Sunstein, C. R. (2007). Republic. com 2.0. Princeton Univ Pr.
30) Thelwall M., Vaughan L., Björneborn L., (2006). Information Retrieval. Webometrics, Annual Review
of Information Science and Technology, Volume 39, Issue 1, Pages 81-135.
31) Valverde, S., Sole, R. V., Bedau, M. A., & Packard, N. H. (2007). Topology and Evolution of
Technology Innovation Networks. Physical Review E, 76(5), 056118.
... NooJ a aussi été intégré à de nombreuses applications de traitement automatique du langage naturel (TALN), par exemple la reconnaissance d'entités nommées, la génération automatique de textes, l'informatique décisionnelle, etc. D'autres plateformes comme UNITEX 4 s'appuient sur des dictionnaires et des grammaires pour effectuer notamment des concordances de termes à partir d'expressions régulières ou de graphes, appelés aussi grammaires locales (Paumier, 2020 ;Kyriacopoulou et al., 2018). D'autres plateformes intègrent des composantes de fouille de textes à l'image de CorTexT 5 dédiée aux sciences humaines et sociales (SHS) (Barbier et Cointet, 2012). Un des objectifs de cette plateforme est de produire des analyses issues des données textuelles. ...
Article
Full-text available
L’analyse des masses de données nécessite l’utilisation de méthodes mêlant harmonieusement différentes disciplines comme l’informatique, les mathématiques, les statistiques. L’ensemble de ces méthodes utiles pour traiter de telles données forme le socle de la « science des données ». Dans ce cadre, les approches de fouille de textes permettent de découvrir des connaissances utiles et nouvelles pour des experts issus généralement de différents domaines d’application (par exemple, veille épidémiologique, sécurité alimentaire, etc.). Cet article dresse un panorama de l’utilisation de méthodes de fouille de textes dans différents projets liés à l’agriculture et à la santé. Une démarche méthodologique générique est ensuite proposée et discutée.
... We used the CorTexT.Manager online instrument of the CorTexT Platform (cortext.net), which is housed in our research unit, both as a physical space and host of digital spaces comprised of tools, methodologies and skills to handle large textual corpora (Barbier and Cointet, 2012). This instrument develops a computational approach of scrapped datasets and local computational hermeneutic (Mohr et al., 2015) of the contents to be deciphered from the textual data. ...
Article
Full-text available
This article adopts the concept of prosumption in order to better understand the array of contemporary food sustainability transition initiatives that often come under the umbrella term of Alternative Food Networks (AFNs). AFNs have developed in parallel to prosumption, which is significant because AFNs are oriented towards localized and direct relationships between producers and consumers, while prosumption explains the hybridization of the consumer into a more complex and productive actor. Scholars argue that producer-consumer reconnections enable greater transparency and information exchange between the two types of actors. In addition, digitalization has recently brought new perspectives for both prosumption and AFN research. We explain the digital food prosumption phenomenon by drawing upon several years of research on an alternative food network with strong digital focus – La Ruche qui dit Oui!. As a decentralized network of local food operations that converge around a digital platform, it provides innovative virtual-material mediations between producers and consumers. This suggests that increasingly, consumers may be getting more deeply engaged in the (co-)production of commodities across different sectors and activities. Thus, while the prosumption and AFN literatures have mostly existed in parallel, future efforts should be made to intersect these two areas of sociological research. This is particularly pertinent today, as both prosumption and AFN phenomena are now increasingly mediated by powerful digital technologies. In the digital age, the alternative food prosumer phenomenon may well contribute to reconfiguring global food flows and industrial cultures towards sustainability.
... 12) ; Etat des lieux et enjeux économiques des Circuits courts (p. [13][14][15][16][17][18][19] ; Du plan de développement des circuits courts à la nouvelle Politique alimentaire (p. [20][21][22] ; Des repères réglementaires pour la [34][35] ; Saveurs du Coin : une cinquantaine de paysans lyonnais ont développé une distribution collective de leurs produits (p. ...
Thesis
Full-text available
Le système alimentaire peut-il changer ? Critiques à l’égard du régime dominant la provision, des phénomènes alternatifs proposent, depuis plus d’une vingtaine d’années, d’améliorer la durabilité, la qualité et la transparence de la provision alimentaire en raccourcissant les liens entre producteurs et consommateurs. Les discours, les pratiques et les innovations de ces Alternative Food Networks (« AFN ») génèrent, à travers leurs multiples oppositions aux logiques du régime de provision alimentaire industriel, des frictions chez celui-ci. A l’aune de l’essor spectaculaire d’une variété assez hétéroclite d’AFN ces dernières années, ce régime développe crescendo des questionnements et des prises sur l’alimentation locale. La théorisation de ces frictions peut bénéficier avantageusement de l'approche multi-niveaux (Multi Level Perspective, ou « MLP »), cadre théorique et méthodologique de recherche en étude des transitions qui permet une lecture évolutionniste des régimes sociotechniques, notamment dans leurs rapports avec les niches d’innovations alternatives. Avec l’appui d’analyses sociohistoriques robustes et un travail de terrain, il est alors possible de réfléchir les modalités d’une transition du régime de provision alimentaire par le raccourcissement des relations entre producteurs et mangeurs.La thèse propose alors deux focales d’analyse : le régime de provision alimentaire ; et les réseaux alimentaires alternatifs. Elle fait appel à une méthodologie composite adressant des données de natures hétérogènes tirées de terrains distribués : analyse discursive par lexicométrie, analyse de traces numériques, étude d’agencements institutionnels, entretiens approfondis, observations de type ethnographique. A la recherche d’une théorisation de moyenne portée, la thèse cible, dans leurs questionnements locaux, des acteurs-clés représentatifs de plusieurs compétences du régime de provision : distribution ; filière fruits et légumes ; salons alimentaires. Les objets alternatifs étudiés rendent compte d’une multiplicité de formes d’existence. Parce-que ces alternatives sont partiellement imbriquées avec certains dispositifs du régime dominant, cela conduit la thèse vers l’étude approfondie de l’une d’entre elles, très particulière du fait :(i) de son architecture se trouvant à la croisée des alternatives alimentaires et de réseaux numériques-matériels portés par des plateformes technologiques, et (ii) d’une qualité de données assez inédite.A partir de cette approche empirique distribuée, la thèse contribue à la caractérisation d’une transition vers un régime de provision numérique-matériel axé sur la prosumption par customisation transparente. Présentant la fin des années 2000 comme point d’inflexion du régime, la conjonction d’une crise de modèle de provision avec l’explosion du numérique accompagné d'un foisonnement continu de revendications et de pratiques alternatives, semblent en mesure d’accélérer un chemin de transition par reconfiguration du régime. Les prémices de cette reconfiguration se manifestent à travers de multiples évolutions discursives observées au sein du régime, ainsi que dans l’incorporation et l’agencement, en son sein, de phénomènes alternatifs qui participent crescendo à la caractérisation de nouvelles priorités qui redéfinissent les spécifications des aliments, les pratiques de provision, et les flux de provision. La valeur accrue de produits alimentaires enrichis de nouvelles spécifications alternatives véhiculées au travers de nouvelles proximités virtuelles et matérielles intiment ainsi au régime des interrogations sur le potentiel que présente le raccourcissement. Effectuant une confluence de trois champs d’études (transitions ; alternatives alimentaires ; prosumption) relativement peu liés jusque lors, la thèse ouvre ainsi des perspectives de recherche sur les capacités de tels marchés raccourcis à capter l’attention de prosumers alimentaires eux-aussi en plein essor.
Chapter
The KEOPS platform applies text mining approaches (e.g. classification, terminology and named entity extraction) to generate knowledge about each text and group of texts extracted from documents, web pages, or databases. KEOPS is currently implemented on real data of a project dedicated to Food security, for which preliminary results are presented.
Chapter
Full-text available
Massive collections of scientific publications are now available on-line thanks to multiple public platforms. These databases usually cover large-scale scientific production over several decades and for a broad range of thematic areas. Today researchers are used to perform queries on these databases with keywords or combination of keywords in order to find articles associated to a precise scientific field. This full text indexation performed for millions of articles represents a huge amount of public information. But instead of being used to characterize articles, can we revert the standpoint and use this information to characterize concepts neighborhood and their evolution? In this paper we give a yes answer to this question looking more precisely at the way concepts can be dynamically clustered to shed light on the way paradigm are structured. We define an asymmetric paradigmatic proximity between concepts which provide hierarchical structure to the scientific database upon which we test our methods. We also propose overlapping categorization to describe paradigms as sets of concepts that may have several usages.
Article
Full-text available
Collaborative forms of work such as extended networks, expert groups, and consortia increasingly structure biomedical activities. They are particularly prominent in the cancer field, where procedures such as multicenter clinical trials have been instrumental in establishing the specialty of oncology, and subfields such as cancer genetics, where bioclinical activities—for example, testing for breast and ovarian cancer (BRCA) genes and follow-up interventions—are predicated on the articulation of a number of tasks performed by new clinical collectives. In this article, we examine the founding and development of a French bioclinical collective—the Groupe Génétique et Cancer (GGC)—that coordinates and structures the activities of most French actors in cancer genetics and operates simultaneously in the clinical, research, and regulatory domains. To examine the group’s structure and dynamics, the article combines information gathered through traditional fieldwork methods with information elicited from a coauthorship and semantic-network analysis of the publications of GGC members from 1969 to 2001.
Article
List of Figures - List of Tables - Acknowledgements - PART 1 INTRODUCTION - Introduction: How to Study the Force of Science M.Callon, J.Law and A.Rip - PART 2 THE POWER OF TEXTS IN SCIENCE AND TECHNOLOGY - The Sociology of an Actor-Network: The Case of the Electric Vehicle M.Callon - Laboratories and Texts J.Law - Writing Science: Fact and Fiction: The Analysis of the Process of Reality Construction through the Application of Socio-Semiotic Methods to Scientific Texts B.Latour and F.Bastide - The Heterogeneity of Texts J.Law - Mobilising Resources through Texts A.Rip - PART 3 MAPPING SCIENCE AND TECHNOLOGY - Qualitative Scientometrics M.Callon, A.Rip and J.Law - Aquaculture: A Field by Bureaucratic Fiat S.Bauin - State Intervention in Academic and Industrial Research: The Case of Macromolecular Chemistry in France W.Turner and M.Callon - Pinpointing Industrial Invention: An Exploration of Quantitative Methods for the Analysis of Patents M.Callon - Technical Issues and Developments in Methodology J-P.Courtial - Future Developments M.Callon, J-P.Courtial and W.Turner - PART 4 CONCLUSIONS - Putting Texts in their Place M.Callon, J.Law and A.Rip - Glossary - Bibliography - Index
Article
Abstract The piece concerns efforts to see politics in Web space. Here I briefly periodize understandings of Web space, and the distinctive types of politics associated with their mappings, broadly conceived. In the Web as hyperspace period, where random site generators invited surfers to jump from site to site, mapping was performed for sites’ backlinks. It tethered Websites to one another, showing distinctive ‘politics of association’ from the linking behaviors of government, non-governmental organizations and corporations. In the Web as public sphere or neo-pluralistic period, circle maps served as virtual roundtables. What if the Web were to decide who should be at the table? As ideas about the Web shifted from new public spheres to more of a set of social networks, the cluster maps displayed ‘issue spaces,’ clusters of actors engaged in the same issue area, but now either central or marginal. Finally, in what is dubbed here as the revenge of geography, in the current locative period, maps show the distributed geography of engagement. Networking actors are temporarily ‘based’ and traveling physically from event to event; do they remember what is happening on the ground? The piece treats the shift in focus away from the ‘metaphysics’ of software-enabled spaces (the ‘virtual’ spheres) and critiques of the new ‘grounds’ (mobile network) to the return of classic questions now that cyberspace has been grounded. Introduction: The Death of Cyberspace The symbolic end of cyberspace may be located in the lawsuit against Yahoo in May 2000, brought before the Tribunal de Grande Instance de Paris by two French non- governmental organizations, the French Union of Jewish Students and the League Against Racism and Anti-Semitism. The suit ultimately led to the ruling in November 2000 that called for software to block Yahoo’s Nazi memorabilia pages from Web users located in France (Goldsmith & Wu, 2006). Web software now routinely knows a user’s geographical location, and acts upon the knowledge. You are reminded of the
Article
ABSTRAC TThis paper analyses a major episode in contemporary biomedical research using a new semi-quantitative approach. In the late 1970s, immunologists began producing new kinds of antibodies targeting molecules on the surface of normal and malignant blood cells. These tools quickly transformed biomedical research in immunology and oncology-hematology. Laboratories worldwide produced thousands of these new reagents and reorganized the classification, diagnosis, and prognosis of diseases such as leukemia and the lymphomas. The rapid development of these reagents initially generated considerable confusion. To avoid the impending chaos, researchers in the field, officially supported by the World Health Organization and the International Union of Immunological Societies, launched an ongoing series of distributed workshops that led to the establishment of a nomenclature of antibody reagents and cell surface molecules. The First Workshop (1981-82) mobilized 54 research groups from 14 countries and resulted in the establishment of 15 antibody/ molecule categories. By the late 1990s the number of these categories had increased to more than 247 and the number of participating laboratories had risen to more than 500. Sociological analyses of this kind of large-scale collaborative research usually adopt one of two equally unsatisfactory alternatives: either they provide thick descriptions of selected sites, thus missing the figurational dimension of the collaborative network, or they attempt to account for figurational complexity by reducing it to a few quantitative indicators, thus destroying for all practical purposes the very phenomena under investigation. To avoid these two alternatives, we opted for a combination of ethnographic methods (interviews, content analysis) and a computer-based analysis of the more than 6000 antibodies examined during the first six workshops, using R ´
Article
A Cumulative Advantage Distribution is proposed which models statistically the situation in which success breeds success. It differs from the Negative Binomial Distribution in that lack of success, being a non-event, is not punished by increased chance of failure. It is shown that such a stochastic law is governed by the Beta Function, containing only one free parameter, and this is approximated by a skew or hyperbolic distribution of the type that is widespread in bibliometrics and diverse social science phenomena. In particular, this is shown to be an appropriate underlying probabilistic theory for the Bradford Law, the Lotka Law, the Pareto and Zipf Distributions, and for all the empirical results of citation frequency analysis. As side results one may derive also the obsolescence factor for literature use. The Beta Function is peculiarly elegant for these manifold purposes because it yields both the actual and the cumulative distributions in simple form, and contains a limiting case of an inverse square law to which many empirical distributions conform.