The Planetary System: Executable Science, Technology, Engineering and Math Papers

Christoph Lange, Michael Kohlhase, Catalin David, Deyan Ginev, Andrea Kohlhase, Bogdan Matican, Stefan Mirea, Vyacheslav Zholudev

Journal Article: 03/2011;

Abstract

Executable scientific papers contain not just layouted text for reading. They
contain, or link to, machine-comprehensible representations of the scientific
findings or experiments they describe. Client-side players can thus enable
readers to "check, manipulate and explore the result space". We have realized
executable papers in the STEM domain with the Planetary system. Semantic
annotations associate the papers with a content commons holding the background
ontology, the annotations are exposed as Linked Data, and a frontend player
application hooks modular interactive services into the semantic annotations.

Source: arXiv

Comments on this publication

ResearchGate members can add comments. Sign up now and post your comment!

Page 1
 
Page 2
 
Page 3
 
Page 4
 
Page 5
 
End of preview.
Page 1
The Planetary System: Executable Science,
Technology, Engineering and Math Papers
Christoph Lange, Michael Kohlhase, Catalin David, Deyan Ginev, Andrea
Kohlhase, Bogdan Matican, Stefan Mirea, and Vyacheslav Zholudev
Computer Science, Jacobs University Bremen, Germany
{ch.lange,m.kohlhase,c.david,d.ginev,a.kohlhase,
b.matican,s.mirea,v.zholudev}@jacobs-university.de
Abstract. Executable scientific papers contain not just layouted text for
reading. They contain, or link to, machine-comprehensible representations
of the scientific findings or experiments they describe. Client-side players
can thus enable readers to “check, manipulate and explore the result
space” [9]. We have realized executable papers in the STEM domain with
the Planetary system. Semantic annotations associate the papers with
a content commons holding the background ontology, the annotations
are exposed as Linked Data, and a frontend player application hooks
modular interactive services into the semantic annotations.
1 Application Context: STEM Document Collections
The Planetary system [7] is a semantic social environment for document
collections in Science, Technology, Engineering and Mathematics (STEM). STEM
documents have in common that they describe concepts using mathematical
formulæ, which are composed from mathematical symbols – operators, functions,
etc. –, which have again been defined as more foundational mathematical concepts
in mathematical documents. Thus, there is a dynamically growing ontology of
domain knowledge. The domain knowledge is structured along the following,
largely independent dimensions [12, 16]: (i) logical and functional structures,
(ii) narrative and rhetorical document structures, (iii) information on how to
present all of the former to the reader (such as the notation of mathematical
symbols), (iv) application-specific structures (e.g. for physics), (v) administrative
metadata, and (vi) users’ discussions about artifacts of domain knowledge.
We have set up Planetary instances for the following paradigmatic document
collections: (i) a browser for the ePrint arχiv [2], (ii) a reincarnation of the
PlanetMath mathematical encyclopledia [21] (where the name Planetary comes
from), (iii) a companion site to the general computer science (GenCS) lecture of
the second author [15, 8], and (iv) an atlas of theories of formal logic [19]. This
list is ordered by increasing machine-comprehensibility of the representation and
thus, as explained below, by increasing “executability” of the respective papers.
All instances support browsing and fine-grained discussion. The PlanetMath and
GenCS collections are editable, as in a wiki1, whereas the arχiv and Logic Atlas
1 Planetary reuses technology of our earlier semantic wiki SWiM [17].
ar
X
iv
:1
10
3.
14
82
v1
[
cs
.D
L]
8
M
ar
20
11
Page 2
corpora have been imported from external sources and are presented read-only.
We have prepared demos of selected services in all of these instances.
2 Key Technology: Semantics-Preserving Transformations
Documents published in Planetary become flexible, adaptive interfaces to a
content commons of domain objects, context, and their relations. This is achieved
by providing an integrated user experience through a set of interactions with
documents based on an extensible set of client- and server side services that draw
on explicit (and thus machine-understandable) representations in the content
commons. We have implemented or reused ontologies for all structures of STEM
knowledge ([16] gives an overview). Annotations of papers with terms from these
ontologies act as hooks for local interactive services. By translation, Planetary
makes the structural ontologies editable in the same way as the papers, so that
the community can adapt and extend them to their needs.
The sources of the papers are maintained in LATEX or the semantic mathemat-
ical markup language OMDoc [14]. For querying and information retrieval, and
interlinking with external knowledge – including discussions about concepts in the
papers, but also remote Linked Datasets –, we extract their semantic structural
outlines to an RDF representation, which is accessible to external services via a
SPARQL endpoint and as Linked Data [8]. For human-comprehensible presenta-
tion, we transform the sources to XHTML+MathML+SVG [8]. These papers gain
their “executability” from embedded semantic annotations: Content MathML2
embedded into formulæ [3], and an RDFa subgraph of the above-mentioned RDF
representation embedded into XHTML and SVG.
The amount of semantic annotations depends on the source representation:
(i) The arχiv corpus – 500+K scientific publications – has LATEX sources, most of
which merely make the section structure of a document machine-comprehensible,
but hardly the fine-grained functional structures of mathematical formulæ, state-
ments (definition, axiom, theorem, proof, etc.), and theories. We have transformed
the papers to XHTML+MathML, preserving semantic properties like formula
and document structure [2]. (ii) The PlanetMath corpus is maintained inside
Planetary; it additionally features subject classification metadata and semi-
automatically annotated concept links [10], which we preserve as RDFa. (iii) The
GenCS corpus is maintained in STEX, a semantics-extended LATEX [13], inside
Planetary. STEX makes explicit the functional structures of formulæ, statements,
and theories, narrative and rhetorical structures, information on notation, as well
as – via an RDFa-like extensibility – arbitrary administrative and application-
specific metadata. This structural markup is preserved as Content MathML and
RDFa in the human-comprehensible output. In this translation, OMDoc, an
XML language semantically equivalent to STEX, serves as an intermediate repre-
sentation. (iv) The Logic Atlas is imported into Planetary from an external
OMDoc source but otherwise treated analogously to the GenCS corpus.
2 or the semantically equivalent OpenMath [4]
Page 3
3 Demo: Interactive Services and the Planetary API

Fig. 1. Interacting with an arχiv article via FoldingBar, InfoBar, and localized dis-
cussions. On the right: localized folding inside formulæ
Our demo focuses on how Planetary makes STEM papers executable – by hook-
ing interactive services into the annotations that the semantics-preserving transla-
tions put into the human-comprehensible presentations of the papers. Services are
accessible locally via a context menu for each object with (fine-grained) semantic
annotations – e.g. a subterm of a formula –, or via the “InfoBar”, as shown in fig. 1.
The menu has one entry per service available in the current context; the InfoBar
indicates the services available for the information objects in each line of the paper.
In the image on the right of fig. 1, we selected a subterm and requested to fold it, i.e.
to simplify its display by replacing it with an ellipsis. The FoldingBar on the left,
similar to source code IDEs, enables folding document structures, and the InfoBar
icons on the right indicate the availability of local discussions. Clicking them
highlights all items with discussions; clicking any of them yields an icon menu as
shown in the center. The icon menu for the discussion service allows for reporting
problems or asking questions using a STEM-specifically extended argumentation
ontology [18]. The richer semantic markup of the GenCS and Logic Atlas collec-
tions enable services that utilize logical and functional structures – reflected by a
different icon menu. Fig. 2 demonstrates looking up a definition and exploring the
prerequisites of a concept. The definition lookup service obtains the URI of a sym-
bol from the annotation of a formula and queries the server for the corresponding
definition. The server-side part of the prerequisite navigation service obtains the
transitive closure of all dependencies of a given item and returns them as an anno-
tated SVG graph. Computational services make mathematical formulæ truly exe-
cutable: The user can send a selected expression to a computer algebra web service
for evaluation or graphing [6], or have unit conversions applied to measurable quan-
tities [11]. Finally, besides these existing services, we will demonstrate the ease of
realizing additional services – within the Planetary environment or externally
of it. The API for services running as scripts in client-side documents is essentially
defined by the in-document annotations, the underlying structural ontologies
Page 4
Fig. 2. Definition Lookup and Prerequisites Navigation
that are retrievable from the content commons, the possibility to execute queries
against the content commons, and the extensibility of the client-side user interface.
4 Related Work
Like a semantic wiki, Planetary supports editing and discussing resources.
Many wikis support LATEX formulæ, but without fine-grained semantic annotation.
They can merely render formulæ in a human-readable way but not make them
executable. The Living Document [5] environment enables users to annotate and
share life science documents and interlink them with Web knowledge bases,
turning – like Planetary – every single paper into a portal for exploring the
underlying network. However, life science knowledge structures, e.g. proteins and
genes, are relatively flat, compared to the tree-like and context-sensitive formulæ
of STEM. State-of-the-art math e-learning systems, including ActiveMath [1]
and MathDox [20], also make papers executable. However, they do not preserve
the semantic structure of these papers in their human-readable output, which
makes it harder for developers to embed additional services into papers.
5 Conclusion and Outlook
Planetary makes documents executable on top of a content commons backed
by structural ontologies. Apart from mastering semantic markup – which we
alleviate with dedicated editing and transformation technology – document
authors, as well as authors of structural ontologies, only need expertise in their
own domain. In particular, no system level programming is necessary: The
semantic representations act as a high-level conceptual interface between content
authors and the system and service developers. Even developers can realize
Page 5
considerably new services as a client-side script that runs a query against the
content commons. This separation of concerns ensures a long-term compatibility
of the knowledge hosted in a Planetary instance with future demands.
References
[1] ActiveMath. url: http://www.activemath.org.
[2] arXMLiv Build System. url: http://arxivdemo.mathweb.org.
[3] MathML 3.0. url: http://www.w3.org/TR/MathML3.
[4] Open Math 2.0. 2004. url: http://www.openmath.org/standard/om20.
[5] A. García et al. “Semantic Web and Social Web heading towards Living
Documents in the Life Sciences”. In: Web Semantics 8.2–3 (2010).
[6] C. David, C. Lange, and F. Rabe. “Interactive Documents as Interfaces to
Computer Algebra Systems: JOBAD and Wolfram|Alpha”. In: CALCULE-
MUS (Emerging Trends). 2010.
[7] C. David et al. “eMath 3.0: Building Blocks for a social and semantic Web
for online mathematics & ELearning”. In: Workshop on Mathematics and
ICT. 2010. url: http://kwarc.info/kohlhase/papers/malog10.pdf.
[8] C. David, M. Kohlhase, C. Lange, F. Rabe, N. Zhiltsov, and V. Zholudev.
“Publishing Math Lecture Notes as Linked Data”. In: ESWC. 2010.
[9] Executable Paper Challenge. url: http://www.executablepapers.com.
[10] J. Gardner, A. Krowne, and L. Xiong. “NNexus: Towards an Automatic
Linker for a Massively-Distributed Collaborative Corpus”. In: IEEE Trans-
actions on Knowledge and Data Engineering 21.6 (2009).
[11] J. Giceva, C. Lange, and F. Rabe. “Integrating Web Services into Active
Mathematical Documents”. In: MKM/Calculemus Proceedings. LNAI 5625.
Springer, 2009.
[12] A. Kohlhase, M. Kohlhase, and C. Lange. “Dimensions of Formality: A
Case Study for MKM in Software Engineering”. In: Intelligent Computer
Mathematics. LNAI 6167. Springer, 2010.
[13] A. Kohlhase, M. Kohlhase, and C. Lange. “sTeX – A System for Flexible
Formalization of Linked Data”. In: I-Semantics. 2010.
[14] M. Kohlhase. OMDoc – An open markup format for mathematical docu-
ments [Version 1.2]. LNAI 4180. Springer, 2006.
[15] M. Kohlhase et al. Planet GenCS. url: http://gencs.kwarc.info.
[16] C. Lange. “Ontologies and Languages for Representing Mathematical
Knowledge on the Semantic Web”. submitted to Semantic Web Journal.
url: http://www.semantic-web-journal.net/underreview.
[17] C. Lange. “SWiM – A semantic wiki for mathematical knowledge manage-
ment”. In: ESWC. 2008.
[18] C. Lange et al. “Expressing Argumentative Discussions in Social Media
Sites”. In: Social Data on the Web Workshop at ISWC. 2008.
[19] Logic Atlas and Integrator. url: http://logicatlas.omdoc.org.
[20] MathDox – Interactive Mathematics. url: http://www.mathdox.org.
[21] PlanetMath Redux. url: http://planetmath.mathweb.org.
End of preview.
Preview full-text

Science & Research Jobs

Keywords

Client-side players
 
content commons
 
executable papers
 
Executable scientific papers
 
Linked Data
 
machine-comprehensible representations
 
Planetary system
 
result space"
 
STEM domain