Dimensions of Formality: A Case Study for MKM in Software Engineering

Andrea Kohlhase, Michael Kohlhase, Christoph Lange

Journal Article: 04/2010; DOI: abs/1004.5071

Abstract

We study the formalization of a collection of documents created for a Software Engineering project from an MKM perspective. We analyze how document and collection markup formats can cope with an open-ended, multi-dimensional space of primary and secondary classifications and relationships. We show that RDFa-based extensions of MKM formats, employing flexible "metadata" relationships referencing specific vocabularies for distinct dimensions, are well-suited to encode this and to put it into service. This formalized knowledge can be used for enriching interactive document browsing, for enabling multi-dimensional metadata queries over documents and collections, and for exporting Linked Data to the Semantic Web and thus enabling further reuse. Comment: To appear in The 9th International Conference on Mathematical Knowledge Management: MKM 2010

Source: arXiv

Comments on this publication

ResearchGate members can add comments. Sign up now and post your comment!

Similar publications

Page 1
 
Page 2
 
Page 3
 
Page 4
 
Page 5
 
End of preview.
Page 1
Dimensions of Formality:
A Case Study for MKM in Software Engineering
Andrea Kohlhase1 and Michael Kohlhase2 and Christoph Lange2
1 German Research Center for Artificial Intelligence (DFKI)
Andrea.Kohlhase@dfki.de
2 Computer Science, Jacobs University Bremen
{m.kohlhase,ch.lange}@jacobs-university.de
Abstract. We study the formalization of a collection of documents cre-
ated for a Software Engineering project from an MKM perspective. We
analyze how document and collection markup formats can cope with an
open-ended, multi-dimensional space of primary and secondary classifi-
cations and relationships. We show that RDFa-based extensions of MKM
formats, employing flexible “metadata” relationships referencing specific
vocabularies for distinct dimensions, are well-suited to encode this and to
put it into service. This formalized knowledge can be used for enriching
interactive document browsing, for enabling multi-dimensional metadata
queries over documents and collections, and for exporting Linked Data
to the Semantic Web and thus enabling further reuse.
1 Introduction
The field of Mathematical Knowledge Management (MKM) tries to model math-
ematical objects and their relationships, their creation and publication processes,
and their management requirements. In [CF09, 237 ff.] Carette and Farmer
analyzed “six major lenses through which researchers view MKM ”: the document,
library, formal, digital, interactive, and the process lens. Quite obviously, there
is a gap between the formal aspects {“library”, “formal”, “digital”} – related to
machine use of mathematical knowledge – and the informal ones {“document”,
“interactive”, “process”} – related to human use.
In the FormalSafe project [For08] at the German Research Center for Arti-
ficial Intelligence (DFKI) Bremen a main goal is the integration of project doc-
uments into a computer-supported software development process. MKM tech-
niques are used to bridge the gap between informally stated user requirements
and formal verification. One of the FormalSafe case studies is based on the
documents of the SAMS project (“Sicherungskomponente für Autonome Mobile
Systeme [Safety Component for Autonomous Mobile Systems]”, see [FHL+08]) at
DFKI. The SAMS objective was to develop a safety component for autonomous
mobile service robots and to get it certified as SIL-3 standard compliant in the
course of three years. On the one hand, certification required the verification
of certain safety properties in the code documents with the proof checker Is-
abelle [NPW02]. On the other hand, it necessitated the software development
The final publication of this paper is available at www.springerlink.com, foo
ar
X
iv
:1
00
4.
50
71
v1
[
cs
.SE
]
28
A
pr
20
10
Page 2
process to follow the V-Model (fig. 1). This mandates e. g. that relevant docu-
ment fragments get justified and linked to corresponding fragments in a succes-
sive document refinement process (the arms of the ‘V’ from the upper left over
the bottom to the upper right and between arms in fig. 1).
Fig. 1. Documents in the V-Model
The collection of SAMS documents
(we call it “SAMSDocs” [SAM09])
promised an interesting case study
for FormalSafe as system development
with respect to the V-Model regime re-
sulted in a highly interconnected col-
lection of design documents, certifica-
tion documents, code, formal specifica-
tions, and formal proofs. Furthermore,
it was supposed that adding semantics
to SAMSDocs would be comparatively
easy as it was developed under a strong formalization pressure.
In this paper we report on — and draw conclusions from — the SAMSDocs
formalization, particularly the formalization of its LATEX documents. In section
2, we document the process and detect inherent, distinct formality levels and
the multi-dimensionality of the formalized structures. Real information needs
(drawn from three use cases in the SAMS context) turn out in section 3 to be
multi-dimensional. This motivates our exploration of multi-dimensional markup
in section 4. Section 5 showcases the feasibility of multi-dimensional services
with MKM technology enabled by multi-dimensional structured representations
and section 6 concludes the paper.
2 Dimensions of Formality in SAMSDocs
In this paper, we are especially interested in the question “What should we
sensibly formalize in a document collection and can MKM methods
cope?” . Note that we understand “to formalize” as “making implicit knowledge
explicit” and not as “to make s.th. fully formal”.
The SAMS project was organized as a typical Software Engineering project,
its collection of documents SAMSDocs therefore has a prototypical composition
Format Files #
LATEX *.tex 251
MS Word *.doc 61
Isabelle *.thy 33
Misra-C Code *.c 40
Fig. 2. SAMSDocs
of distinct document types like contract, code, or
manual. Thus, SAMSDocs presents a good base for
a case study with respect to our question. In fig. 2
we can see the concrete distribution over used doc-
ument formats in SAMSDocs. Requirements analy-
sis, system and module specifications, reviews, and
the final manual were mainly written in LATEX, only
roughly a sixth in MS Word. The implementation in Misra-C contains Isabelle
theorem prover calls.
The first stinging, but unsurprising observation was that the level of for-
mality of the documents in SAMSDocs varies considerably — because distinct
2
Page 3
purposes create distinct formality requirements. For instance, the contract docu-
ment serves as communication medium between the customer and the contractor.
Here, underspecification is an important tool, whereas it is regarded harmful in
the fine-granular module specifications and a fatal flaw in input logic for a theo-
rem prover. Since this issue was already present in the set of LATEX documents,
we focused on just these.
For the formalization of this subset in SAMSDocs we used the STEX sys-
tem [Koh08], a semantic extension of LATEX. It offers to both publish documents
as high-quality human-readable PDF and as formal machine-processable OM-
Doc [Koh06] via LATEXML [SKG
+10]. Our formalization process revealed early
on that previous STEX applications (based on OMDoc 1.2) were too rigid for a
stepwise semantic markup. But fortunately, STEX also allows for the OMDoc 1.3
scheme of metadata via RDFa [ABMP08] annotations (see [Koh10]). In par-
ticular, we could ‘invent’ our own vocabulary for markup on demand without
extending OMDoc. This new vocabulary consists of SAMSDocs-specific metadata
properties and relationship types. We call the process of adding this pre-formal
markup to SAMSDocs (semantic) preloading. Concretely, we extended STEX
to STEX-SD (STEX for SAMSDocs) by adding LATEXML bindings for all SAMS
specific TEX macros and environments used in SAMSDocs, thus enabling the
conservation of the original PDF document layouts at the same time as the
generation of meaningful OMDoc.
Fig. 3. The Formalization Workflow with STEX-SD [ translated by the authors ]
3
Page 4
Let us look at an example for such an STEX extension within our formaliza-
tion workflow (see fig. 3). We started out with a TEX document (upper left),
which compiled to the PDF seen on the upper right. Here, we have a simple,
two-dimensional table, which is realized with a LATEX environment tabular.
Semantically, this table contains a list of symbols for document states with their
definitions, e. g. “i. B.” for “in Bearbeitung [in progress]”. As such definition tables
were used throughout the project, we developed the environment SDTab-def
and the macro SDdef as STEX extensions. We determined the OMDoc output for
these to be a symbol together with its definition element (for each use of SDdef
in place of the resp. table row) and moreover, to group all of them into a theory
(via using SDTab-def). Preloading the TEX table by employing SDTab-def
and SDdef turned it into an STEX document (middle of fig. 3) while keeping the
original PDF table structure. Using LATEXML on this STEX document produces
the OMDoc output shown in the lower area of fig. 3.
Mathematical, structural relationships have a privileged state in STEX: their
command sequence/environment syntax is analogous to the native XML element
and attribute names in OMDoc. Since many objects and relationships induce for-
mal representations for Isabelle, it seemed possible to semantically mark them
up with a logic-inspired structure. But in the formalization process it soon be-
came apparent that (important) knowledge implicit in SAMSDocs did not refer
to the ‘primary’ structure aimed at with the use of STEX. Instead, this knowl-
edge was concerned with a space of less formal, ‘secondary’ classifications and
relationships. Thus, our second observation pertains to the substance of formal-
izations. Even though we wanted to find out what we can sensibly formalize,
we had assumed this to mean how much we can sensibly formalize. Therefore,
we were rather surprised to find distinct formality structures realized in our
STEX extension. In the following we want to report on these structures.
We grouped the macros and environments of STEX-SD in fig. 5 according to
what induced them. Particularly, we distinguished the following triggers:
– “objects” — document fragments viewed as autonomous elements — and
– their net of relationships via the collection,
– documents and
– their organizational handling, and
– the project itself and thus, its own scheme of meaningful relationships.
For instance, in the system specification we marked a recap of a definition of
the braking distance function for straight-ahead driving sG as an object and
referenced it from within the assertion seen in fig. 4. In the module specification
Fig. 4. s is Bra-
king Distance?
sG was then meticulously specified. This document fragment
is connected to the original one via a refinement-relationship
from the V-Model, which determined the creation process of
the collection. Documents induce layout structures like sec-
tions or subsections and they are themselves organized for
example under a version management scheme. In the work-
flow in fig. 3 we already showcased a project-specific element,
the definition table, with its meaning. Interestingly, we can-
4
Page 5
not compare formality in one group with the formality in another. For example,
we cannot decide whether a document completely marked up with the object-
induced structures is more formal than one fully semantically enhanced by the
version management markup. As these grouped structures only interact rela-
tively lightly, we can consider them as independent dimensions of a formality
space that is reified in the formalization process of a document collection.
Concretely, STEX-SD covers the following dimensions and consists of the listed
extension macros/environments (with attributes in [·] where sensible):
Fig. 5. Formality Dimensions in STEX-SD
Formalizing object structures is not always obvious, since many of the doc-
uments contain recaps or previews of material that is introduced in other doc-
uments/parts (e. g. to make them self-contained). Compare for example fig. 4
Fig. 6. Yet another
Braking Distance s?
and fig. 6, which are actually clippings from the system
specification “KonzeptBremsmodell.pdf”. Note the
use of s resp. sG, both pointing in fig. 4 to the brak-
ing distance function for straight-ahead driving (which
is obvious from the local context), whereas in fig. 6 s
represents the general arc length function of a circle,
which is different in principle from the braking distance, but coincides here.
We also realized that STEX itself had already integrated another formality
dimension besides the logic-inspired one, the one concerned with document lay-
out: A typical document layout is structured into established parts like sections
or modules. If we want to keep this grouping information in the formal XML
5
End of preview.
Preview full-text

Science & Research Jobs

Keywords

9th International Conference
 
collection markup formats
 
collections
 
distinct dimensions
 
encode
 
formalization
 
formalized knowledge
 
interactive document browsing
 
Mathematical Knowledge Management
 
metadata
 
multi-dimensional metadata queries
 
open-ended
 
RDFa-based extensions
 
relationships referencing specific vocabularies
 
reuse
 
secondary classifications