Conference PaperPDF Available

Engineering Higher-Order Modules in SML/NJ

Authors:

Abstract and Figures

SML/NJ and other Standard ML variants extend the ML module system with higher-order functors, elevating the module language to a full functional language. In this paper, we describe the implementation of higher-order modules in SML/NJ, which is unique in providing “true” higher-order static behavior. This implementation is based on three key ideas: unique internal variables (entity variables) for naming static entities, factorization of the static information in both basic modules and functors into signatures and realizations, and representing the static “effects” and type-level mapping performed by a functor using a static lambda calculus (the entity calculus). This design conforms to MacQueen-Tofte’s re-elaboration semantics without having to re-elaborate functor bodies at functor applications.
Content may be subject to copyright.
Engineering Higher-Order Modules in SML/NJ
George Kuan and David MacQueen
University of Chicago
Abstract. SML/NJ and other Standard ML variants extend the ML
module system with higher-order functors, elevating the module lan-
guage to a full functional language. In this paper, we describe the im-
plementation of the higher-order module system in SML/NJ, which is
unique in providing “true” higher-order behavior at the static level. This
second generation implementation of higher-order modules in SML/NJ
is based on three key techniques: unique internal variables (entity vari-
ables) for naming static entities, a factorization of the static information
in both basic modules and functors into signatures and realizations, and
a static lambda calculus we call the entity calculus with static “effects”
to represent the type-level mapping performed by a functor. This system
implements MacQueen-Tofte’s re-elaboration semantics without having
to re-elaborate functor bodies or appeal to fragile stamp properties.
1 Introduction
The ML module system has evolved considerably over the past 25 years. One of
the Standard ML of New Jersey (SML/NJ) compiler’s more significant extensions
is support for higher-order functors, achieved by allowing structures, including
functor parameters and results, to contain functors as components. MacQueen
and Tofte [16] describe the original semantics for higher-order functors, which
has a strong policy regarding how functors propagate type information through
functor applications. We will refer to the MacQueen-Tofte higher-order functor
behavior as true higher-order behavior. This model of higher-order functors was
first implemented in SML/NJ Version 0.93 (1993), using techniques described in
Cr´egut and MacQueen [2]. This first generation implementation was based on
an earlier implementation of first-order functors, and its adaptation to handle
higher-order functors was rather ad hoc and complex. Here we describe the sec-
ond generation implementation used in current versions of SML/NJ, which is sig-
nificantly simpler and more principled. Our focus will be on the representations
and processes in the elaboration phase of the compiler. The relatively straight-
forward elaboration of the dynamic semantics of the module system through
abstract syntax is beyond the scope of this paper.
1.1 SML 97 Module System
The ML module system provides a set of constructs for expressing large-scale pro-
gram architecture, and is also the means for defining and enforcing abstractions.
Basic modules, called structures, are collections of types, values, and hierar-
chically nested modules. Signatures express static interfaces of structures, being
analogues of types for structures. A signature is comprised of a collection of type,
value, and module specifications, specifying their kinds, types, and signatures,
respectively. A functor is a module-level function formed by parameterizing a
structure (the functor body) with respect to a structure variable constrained by
a signature. Signatures and structures have a many-to-many relationship: mul-
tiple structures can match a single signature, and multiple signatures can be
ascribed to a single structure.
1.2 Higher-order Functors
The need for higher-order functors arises naturally in a module system with
functors. Just as a first-order functor is formed by abstracting over the name of
an external structure imported by a structure definition, a higher-order functor
is defined by abstracting over the name of an external functor. The module
abstracted over can be either a basic structure or a functor. So, for orthogonality,
we should be able to abstract with respect to both structure names and functor
names over both structure and functor definitions. However, this extension does
raise some significant issues for design, semantics, and implementation.
When abstracting over the name of either an imported structure or func-
tor, the parameter is described by a signature, which expresses all the static
interface information about the parameter that the client structure is allowed
to know. In the case of first-order structures (with no functor components), the
signature language is capable of expressing a fairly complete static description
of a given structure using definitional specs and where clauses to pin down the
type1components. But when we abstract over an imported structure, then we
normally use a looser, less exact signature for the functor parameter because the
parameter types can vary from one application to another. Signatures may be
looser in two senses. First, for some of the parameter’s tycons, we may specify
only kinds (arities) rather than definitions (e.g.,sig type (’a,’b) t end specifies a
tycon t that has arity 2), with the definitions to be supplied later by the argu-
ment structures to which the functor is applied. Second, the argument structure
may contain excess components, which will be coerced away during application,
or value components that are more polymorphic than what is specified in the
signature.
A functor can express complex static-level computations mapping its input
tycons to its output tycons, and we call this mapping the functor static action,
or simply the functor action. The defining characteristic of true higher-order
static behavior in functors is the faithful propagation of functor actions through
functor application. A functor action may involve static effects in the form of
the generation of new tycons, introduced either by datatype declarations or
by opaque (“sealed”) signature ascriptions. However, functor signatures, which
1Actually type constructors, but it is a common and convenient abuse of terminology
to refer to types when we mean type constructors (abbreviated as tycons).
consist of just a named parameter signature and a result signature, are only
capable of expressing very simple functor actions where the result tycons can
be defined directly in terms of the parameter tycons. Thus functor signatures
have a very limited ability to describe functor static actions. This means that a
complete description of the static content of a functor must include information
beyond the functor signature, in all but the simplest cases.
On the other hand, when a functor G is a component of the parameter of a
functor F, G is formal, and all we know about G is its functor signature. When
elaborating the body of F we need to determine the static effect of applying the
formal G. This means that we need to be able to synthesize a default functor
action from a functor signature. When F is applied to an actual parameter with
its own version of G, that G’s actual functor action should be used in place of
the approximation derived from G’s specification in F’s parameter signature. In
other words, the static action of F should be parameterized with respect to the
static action of G. This approach is the essence of true higher-order behavior.
The standard example illustrating this point is the Application functor2:
sign a t u r e SIG = s i g ty pe tend
functor F( f u n c t o r G (Y: S IG ) : SI G
structure A: SIG ) : SI G
= G(A)
functor I d (X: SIG ) = s tr u c t ty pe t = X. t end
functor Co ns t (X: S IG ) = st ru c t t ype t = i n t end
structure B: SIG = st r u ct typ e t = b o o l end
structure R1 = F( I d , B) (R1. t = bool )
structure R2 = F( C on st , B) (R2 . t = i n t )
Here the action of functor Id maps its argument tycon to itself (λt. t), while
the action of Const maps any argument tycon to int (λt. int). Applications
of F invoke F’s action, which in turn invokes the functor actions of its functor
parameters.
1.3 Overview
The following two sections describe the module elaboration in SML/NJ. Section
2 describes the internal static representations of types, signatures, structures,
and functors. The main ideas are the use of internal entity variables and paths
for relative references to tycons, the factorization of the static representations
of modules into signatures and associated realizations of the signatures, and a
2A common alternative solution to this problem is applicative functor semantics [13].
However, such semantics cannot capture generative functor actions. Applicator func-
tor semantics is also fragile in the presence of aliasing of structures and examples
where functor arguments are not named (A-normalized) or are aliased.
static lambda calculus of entity expressions we dub the entity calculus used as
the realization part of functors.
Section 3 covers the processes involved in the elaboration of modules, which
create and utilize the representations in Section 2. These processes include the
basic elaboration of signature, structure, and functor declarations, the static
aspect of functor application, and signature matching. An important subsidiary
process is signature instantiation, which is used in the elaboration of functors
and the application of formal functors (e.g., G in the example above).
Section 4 contains a short discussion of performance and scalability, and
Section 5 covers related work. We conclude in Section 6.
2 Semantic Objects
In the core ML language, a tycon always has a fixed identity such as a prim-
itive type or some specific user-defined type, e.g.,type t = int. We call these
tycons nonvolatile. As discussed in Section 1.2, a functor parameter signature
may specify only the kind and name of a tycon without defining it precisely.
Such a tycon is volatile because its actual definition is supplied upon each func-
tor application, and it can vary from one application to another. Tycons defined
in terms of volatile tycons are also considered volatile. For example, in the sig-
nature sig type ttype u = t list end,tand uare both volatile. Although volatile
tycons bear some resemblance to abstract types, volatility is not the same as
abstractness. The definition of a volatile tycon will be eventually determined,
e.g. by the actual parameter passed to a functor, after which the tycon may
become nonvolatile. However, a future definition of a volatile tycon cannot play
a role while type checking the functor itself, because it is not yet available.
2.1 Entity Paths
Following Harper and Lillibridge [9], we use a variation on internal names, which
we call entity variables, to provide a robust means to refer to tycons, structures,
and functors that avoids the problems associated with shadowing of symbolic
names. Here entity refers to static entities, the internal representation of any-
thing that may contain or produce static information in the form of tycons. This
includes tycons themselves, structures, and functors. Entity variables are unique
by construction – a given entity variable will be used in only one place and entity
variable bindings cannot be shadowed. Sequences of entity variables called entity
paths are used to refer to an entity that is located inside a hierarchy of nested
structures.
Consider the example in Fig. 1. Assuming that eAis an entity variable for
A, and so on, type A.B.u can be referred to by the entity path eA, eB, eu. An
entity path is similar to a symbolic path except it refers to an internal entity
rather than a syntactic object. Moreover, an entity path is robust, in that there
will always be a valid entity path for any entity even when no corresponding
symbolic path exists due to shadowing.
sign a t u r e S =
sig
structure A :
sig
type t
structure B :
sig
type u
val x : t
end
val y : t B . u
end
end
INTERNAL SIG =
sig
structure A(eA):
sig
type t(et)
structure B(eB):
sig
type u(eu)
val x : [ et]
end
val y : [ et][eB, eu]
end
end
Fig. 1. A syntactic signature and its internal representation
2.2 Internal Representations of Signatures
The internal representation of a signature is basically a list of pairs of component
names and their specifications. Each static component is also assigned a fresh
entity variable as part of its specification. Hereafter, signature will refer to this
internal representation as distinct from syntactic signatures, which are in the
surface language. We construct signature representations either by translating a
syntactic signature expression or by inferring a signature from a basic structure
expression. Using the entity variables in a signature, we can map a symbolic
path for a static component to a corresponding entity path. In Fig. 1, INTER-
NAL SIG is the translation from the syntactic signature S. We can traverse this
signature following a symbolic path A.B.u, collecting the corresponding entity
path eA, eB, euas we go.
We represent volatile tycon occurrences in value specifications by an entity
path relativized to the scope of the occurrence. For example, the spec for value y
has the relativized form [et][eB, eu]. Due to the presence of volatile tycons, the
signature is an incomplete representation of the static information in a structure
which matches the signature. The representation of a structure matching a given
signature supplements the signature with a realization that maps entity variables
and paths for volatile tycons to actual tycons, thus defining them.
2.3 Structure Realization
Astructure realization is a finite map from entity variables to entities. An en-
tity variable for a tycon component is mapped to a tycon. An entity variable
for a substructure is mapped to another structure realization. In the case of a
functor component, its entity variable is mapped to a functor realization, which
will be described in the Section 2.5. Because structure realizations contain only
static entities, value specifications have no corresponding mapping. Because a
structure realization may contain nested structure realizations, it can be thought
of as a tree where the edges are labeled by the entity variables, internal nodes
are subtrees (structure realizations), and leaves are tycons or functor realiza-
tions. For example, Fig. 2 shows a structure M matching signature S and the
corresponding structure realization that complements S with entities from M.
structure M =
st r u ct
structure A =
st r u ct
type t = i n t
structure B =
st r u ct
type u = b o ol
val x = 1
end
val y = ( 1 , t ru e )
end
end
et
eteu
eB
eA
int
int bool
Fig. 2. A structure and structure realization matching signature S
The seemingly duplicate etedge under node B may look peculiar. Because
substructures such as B may be selected out by later declarations such as struc-
ture B’ = M.A.B, the structure realization of B must be able to stand on its own.
Consequently, we need to close the structure realization for B by including the
mapping for et, which is not a local component of B.
Looking up the type for value spec A.y from the previous section given this
matching realization takes two steps. First, we construct the entity path to each
of the tycons in the type expression [et]*[eB, eu], namely eA, etand eA, eB, eu
respectively by traversing the signature. Then we lookup the entity paths by
following the corresponding edges in the structure realization to get int and bool
respectively.
A generative type declaration in a structure such as datatype declaration will
produce a fresh tycon with a unique identifying stamp. Other type declarations
will define a tycon component in terms of an existing tycon or as an abbreviation
for a type expression or type function. All these tycons will be found as leaf
nodes in the structure’s realization tree, accessed by entity paths that specify
their location in the structure hierarchy.
2.4 Full Signatures
When we put together a signature and a compatible structure realization, i.e.,
a realization that at least maps all the entity paths in the signature, we have
a complete static description of a structure, which we call a full signature. For
a structure expression such as the one in Fig. 2 that has no explicit signature
ascribed, the elaborator will construct a full signature for the structure, including
a synthesized signature and matching realization.
Structure declarations can also explicitly ascribe syntactic signature expres-
sions to a structure:
structure M1 : S = M
structure M2 : S =
st r u ct
structure A =
st r u ct
type t = r e a l
structure B =
st r u ct
type u = s t r i n g
val x = 1 . 0
end
val y = (1 . 0 , ” s t r i n g ” )
end
end
Although M1 and M2’s realizations obviously differ, their full signatures will
share the same signature representation INTERNAL SIG, representing the syn-
tactic signature S. The sharing of common signature information among all the
structures matching an explicit signature3is one advantage of the factorization.
2.5 Functor Entities and the Entity Calculus
The complete static description of a functor is called a full functor signature,
and it is also factored into a functor signature and a functor realization. The
internal representation of a functor signature, which we informally write as
(X (ex):SIGPARAM) : SIGBODY, consists of a parameter signature SIGPARAM
and a functor body signature SIGBODY whose specifications may mention the
bound parameter X via the associated parameter entity variable ex. Both SIG-
PARAM and SIGBODY are internal representations of signatures and thus are
decorated with entity variables and use entity paths to reference volatile entities.
While the functor signature specifies the fixed shape of the parameter and
result, information that is common to all calls of the functor, the functor re-
alization describes how the structure realization of the functor body structure
is computed in terms of the structure realization of the parameter structure.
The structure realization deals with the part of the information that varies from
call to call. This is where the signature-realization factorization clarifies the se-
mantics of functors. The functor realization is an entity function of the form
λex.strexp where strexp is a structure entity expression that the compiler eval-
uates to a structure realization for the functor body. These entity expressions
3We always infer a full signature for a structure, even if the structure declaration has
an ascription. But we immediately match that full signature with the ascribed sig-
nature, producing a realization for the ascribed signature, as described in Section 3.
and functions are formalized by an applied, call-by-value λ-calculus called the
entity calculus (Fig. 3). Terms in the entity calculus express static information
and are evaluated only during compilation, specifically when elaborating functor
applications (see Section 3).
tycon ::= Formal(tycon)
|Def(ty peexp)
|Data(ConsN ame of typeexp)
|entity path
strexp ::= STRUCTURE{entitydec}
|fctexp(strexp)
|FORM{sig}
|entity path
entitydec ::= type ex=tycon
|structure ex=strexp
|functor ex=fctexp
|entitydec, entitydec
fctexp ::= λex.strexp[entityenv]
|entity path
Fig. 3. A simplified entity calculus
The tycon expressions include Formals representing dummy tycons that are
specified in a functor parameter. A typeexp is a type expression that may contain
applied occurrences of tycons. A Def tycon defines a tycon as an abbreviation
for a type expression. A Data tycon corresponds to the tycon for a datatype
with the given constructor name and constructor type parameter. For simplicity,
we are assuming only one data constructor per datatype. Entity paths are the
relativized tycon references described earlier.
Entity declarations bind entity variables to an appropriate kind of entity
expression. A functor body may contain free occurrences of entities such as
a tycon, structure, or functor declared in an outer functor, and these volatile
entities are denoted by entity paths (see section 2.6 for an example). Thus, the
functor realization for higher-order functors require a closure environment, and
the correct form of a functor realization is an entity function closure of the form
λex.strexp[entityenv] where entityenv is an entity environment mapping all free
entity variables to entities. An entityenv has exactly the same representation as
a structure realization. Section 2.6 will further explain the need for a closure
environment. Structure entity expressions include a form for basic structures,
which encapsulate an entity declaration for its static components, entity paths
to refer to structure entities bound in the local entity environment, applications,
and a special form (FORM) for functors in formal parameters, which will be
explained in Section 3.
Consider the following example:
functor F(X: s i g ty pe tval x : t end ) =
st r u ct
datatype u = A o f X. t
type v = X. t u list
fu n f ( x : X . t ) : u = A x
end
The above functor is represented by a functor signature and a functor real-
ization. The inferred functor signature is:
(X (eX) : s i g t yp e t (et)val x : [ et]end )
:sig
type u (eu)type v ( ev)
val f : [ eX,et]>[eu]
end
where eX,et,eu, and evare fresh entity variables.
The realization for the functor has to specify how realizations for the static
components of the result (entities for the types uand v) are constructed given
a structure realization for X, which includes a tycon entity for X. t . The functor
realization for Fis the entity function:
λeX.STRUCTURE{type eu=tyconu,type ev=tyconv}
where tyconuand tyconvare tycon entity expressions for the datatype uand
type abbreviation v:
tyconu= Data(A of [eX,et]) tyconv= Def([eX,et][eu] list)
Here the closure environment can be empty, assuming the functor is defined
at top level (it is also closed, having no references to nonlocal volatile entities).
When this functor is applied to an argument structure, the argument structure
is coerced by signature matching (described in Section 3) with the parameter
signature yielding a structure realization for the parameter signature. This pa-
rameter realization is bound to the entity variable eXand the body of the entity
function is evaluated in the resulting entity environment.
The body specifies that a structure realization is to be constructed, whose
contents will be defined by a sequence of two entity declarations. euwill be bound
to a new datatype generated from the datatype specification, with the associated
entity paths referencing imported types being evaluated relative to the evalu-
ation entity environment, the entity environment at that point of elaboration.
Similarly, the definition of type vwill be instantiated by evaluating its embedded
entity paths in that same entity environment extended with the binding of eu.
In particular, in the application F(struct type t = int end), the realization bound
to eXwill be the entity environment {et7→ int}(representing the environ-
ment as a sequence of mappings), and the evaluation entity environment is
{eX7→ {et7→ int}}. Evaluating tyconuin this environment yields a fresh
datatype corresponding to the definition datatype u = A of int and tyconvyields
an instantiated type abbreviation for the definition type v = int * (u list).
2.6 Higher-order functors
The preceding example involves the classic case of a first-order functor defined
at top level, i.e., one not defined within another functor, and in such a case
the closure environment of the functor realization can be empty. Here we show
higher-order functors can require a nontrivial closure environment. Consider the
following example:
functor F(X: s i g ty pe tend ) =
st r u ct
datatype u = C o f X. t
functor G(Y : s i g t yp e vval x:vuX. t end ) =
st r u ct
datatype s = D of X . t u>Y . v
end
end
We can see that all the tycons X.t,u,Y.v, and sare volatile in the sense that
their actual bindings are to be determined later, when Fis applied. When we
relativize the specification of datatype swith respect to these volatile tycons,
we get:
tycons= Data (D o f [eX,et]eu>[eY,ev] )
Now consider an application of F,structure A = F(struct type t = int end). When
this expression is evaluated, we will develop an entity environment that binds
eXand its extension [eX,et]as before, and the definition of uwill give rise to a
new datatype that will be bound to eu. As before, the realization of functor G
will involve a lambda expression in our entity calculus:
λeY.STRUCTURE{type es=tycons}
But note that this term binds only the entity variable eY, leaving eXand eu
occurring free in tycons. So the lambda term is not closed. As usual, we need
to close it by supplying a closure environment, namely the entity environment
mentioned above that binds eXand eu.
Now when we apply A.G we will add a binding of eYto its closure environment
and use this when evaluating the body of the lambda term for G. For instance,
after
structure B = A .G( s t r uc t typ e = b o o l val x = ( t ru e , A .C 3 , 1) end)
the datatype B.s = Data (D of int A.u >bool).
3 Elaboration
Elaboration is the translation of simple syntax trees produced by the parser
into (1) a typed abstract syntax for use in subsequent compiler stages, and
(2) a static environment mapping identifiers defined at top-level to their static
representations. As mentioned in the introduction, we are focusing exclusively
on how the static environment is produced; the construction of abstract syntax
is relatively straightforward in comparison.
At the core language level, the elaborator does type checking and type infer-
ence for value declarations, and produces static bindings mapping type names to
tycon representations, and variable, data constructor, and exception constructor
names to their types. At the module level, the elaborator translates signature,
structure, and functor expressions and declarations into the internal representa-
tions described in Section 2. The type information is recorded as new bindings
added to the static environment, which is used for elaborating later compilation
units (e.g. source files) that import them. An initial static environment contains
predefined modules, types, and values (the Basis libraries).
Elaboration can be broken down into a set of subtasks. The main tasks are
elaborating signature expressions, structure expressions, and functor declara-
tions, and these involve subsidiary processes including functor application, signa-
ture matching, and signature instantiation. Signature expressions and structure
expressions often occur as the definiens in a declaration, but they can also occur
“in-lined”, in an ascription in the case of signatures, or as a functor parameter
or functor body in the case of structures.
Elaboration modes. It is useful to distinguish two contexts in which elabora-
tion takes place: functor context, where the expression or declaration elaborated
occurs within the body of a functor, and top level, when outside of any func-
tor. Elaboration in a functor context is more complicated, because in addition
to performing the usual type-checking and static environment building tasks,
it must also “compile” declarations to the entity calculus expressions used to
encode the functor static action. Thus in a functor context elaboration must
operate in dual, simultaneous, modes. We use the term direct elaboration for the
basic mode that deals with type checking and translation to static representa-
tions, while entity compilation refers to the parallel process of compiling static
declarations into the entity calculus. Direct elaboration occurs in both contexts,
while entity compilation is relevant only to the functor context. In practice, to
simplify the code, we always perform both modes of elaboration and if we are
in top level mode we discard the unneeded byproducts of entity compilation.
The extra work involved in unnecessary entity compilation is not a significant
overhead.
Functor volatile entities. A related factor associated specifically with functor
mode is that static entities constructed within a functor (and the functor pa-
rameter itself) are volatile, as opposed to entities constructed in top level mode,
which are fixed and hence nonvolatile. During functor elaboration, the functor
nonvolatile entities are virtual or potential, in the sense that the actual entities
will be created later at functor application time. However, in the direct elabora-
tion mode volatile entities need to be represented by dummy entities to support
type checking, so they will have static representations in the “working” static
environment used for direct elaboration of functor bodies. In embedded, in-line
signatures, and in compiled entity calculus expressions, references to volatile en-
tities (e.g. structures and tycons) must be “relativized” by translating them into
entity paths.
The process of relativizing references to volatile entities, and the interpre-
tation of the resulting entity paths, require that two additional parameters be
provided to the elaboration process. An entity environment is threaded through
to be used (1) to interpret entity paths of functor volatile entities in embedded
signatures, and (2) to construct closure environments for structure realizations
and functor realizations (entity functions). Elaboration of declarations will add
new entity variable bindings to this entity environment. The second new ingre-
dient is called an entity path context. It is an inverse environment that maps
dummy volatile entities to their entity paths, and it is used for relativization of
references to those dummy entities. So a complete schematic of the inputs and
outputs of elaboration is shown in Fig. 4.
elaborate
static environment
syntax trees typed abstract syntax
static environment
entity environment entity environment
entity path context
entity declaration
entity path context
Fig. 4. Schematic for elaboration
Signature elaboration. The specifications in the body of the signature are trans-
lated into a mapping from component names to internal specs in the form of
formal tycons for tycon specs, types for values, and internal representations of
signatures and functor signatures for structure and functor elements respectively.
Each static element (tycon or module) is assigned a fresh entity variable. The
types of value elements, data constructors, and types occurring in definitional
type specs, are relativized by replacing local tycon references (which we can call
signature volatiles to distinguish them from functor volatiles) with entity paths.
If the signature is in-line in a functor context, it may contain functor volatiles,
which are also relativized. Any where type constraints are elaborated and pushed
inward to the type specifications they apply to. Sharing constraints are recorded
in a normalized form as pairs of paths.
Structure elaboration. There are several cases for structure expression elabora-
tion, corresponding to the syntactic forms for such expressions (e.g., structures
declared in-line struct ... end, structure symbolic paths A, A.B, and functor ap-
plications). A symbolic name or path for a structure is simply looked up in the
current static environment, returning a full signature for the structure. A basic
in-line structure expression struct decls end form is elaborated in the following
steps
1. elaborate the body declarations decls, yielding a static environment envBody,
an entity environment, and an entity declaration entitydec
2. derive from envBody a signature and matching structure realization (entity
environment), and combine them to create a full signature;
3. return the full signature from step 2, a structure entity expression STRUC-
TURE{entitydec}, and the entity environment from step 1.
Signature matching. When a signature is ascribed to a structure in a structure
declaration, or when a functor is applied to a structure, implicitly ascribing
the parameter signature to the argument, we must verify that the structure in
question matches the signature. This is a kind of module-level type checking,
but it also has a coercive effect, producing a modified structure realization that
is exactly conforming to the ascribed signature (similar to coercive subtyping).
Signature matching involves scanning the specifications in the signature and
verifying that the matching structure satisfies these specifications. There are
two modes of signature matching. Opaque signature matching generates fresh
tycons for signature volatile tycon specifications, whereas transparent signature
matching uses the corresponding tycons from the matching structure.
Signature instantiation. At a couple of points during elaboration, we have only
a signature on hand when what we need is a full signature. To synthesize a full
signature from the signature, we need to produce a dummy structure realization
for the signature. Signature instantiation is the process of creating a “free”
structure realization for a signature. This process is nontrivial because of sharing
specifications, which were introduced to address an issue called the diamond
import problem. This problem refers to the scenario where a functor parameter
contains tycons, possibly nested inside different argument structures, that must
be identical for the functor body to type check. This required identification of
types, which is called type sharing (see Pierce and Harper’s account [10]), must
be explicitly specified within parameter signatures using one of three syntactic
mechanisms, namely:
1. definitional specifications of tycons,
2. where clauses modifying signature expressions (an indirect form of defini-
tional specs), and
3. equational sharing constraints identifying different tycons by name or path.
This realization includes fresh formal tycons for each type component, but
chosen to satisfy the signature’s sharing constraints, and only those sharing
constraints (i.e., no incidental sharing not forced by the specifications). The al-
gorithm used for signature instantiation is adapted from the Patterson-Wegman
linear unification algorithm [20].
When instantiating a functor specification in a signature, we must create
a corresponding functor realization. This will be, as usual, an entity function,
but one where the body of the lambda abstraction is the special structure entity
expression form (FORM{sig}) containing only the formal functor signature. How
this is evaluated will be explained below.
Functor application. When a functor is applied, the argument structure expres-
sion is elaborated, and then signature matching is performed to verify that it
matches the parameter signature and to coerce the argument structure realiza-
tion to a realization for the parameter signature, yielding a full signature for the
coerced argument. The functor realization, which is an entity calculus lambda-
abstraction complete with a closure entity environment, is then applied to the
argument realization using a conventional call-by-value, environment-based in-
terpreter for the entity calculus. As usual, this entails extending the closure
environment with a binding of the argument realization to the lambda-bound
entity variable, and then evaluating the body structure expression with respect
to this extended entity environment.
This is the standard case. But in a functor context, the functor being applied
may be an element of an outer functor parameter, i.e. a formal functor. Suppose,
for example, the elaborator encountered the following program:
fu n s i g FS ( ) = s i g t ype tend
functor F(X: s i g f u nc t or G : FS end ) =
st r u ct
structure M = X.G( )
end
functor H0( ) = s tr u c t ty pe t = i n t end
structure FR0 = F( s t ru c t functor G = H0 end)
where FS is a functor signature where the parameter signature is empty and the
result signature specifies a single tycon t. When elaborating the application of
functor X.G in the direct mode, we do not seem to have a functor realization
for X.G because that will only be supplied by an actual parameter (as in the
definition of FR0). We solve this problem by synthesizing a special entity function
from the functor signature FS. This entity function is, as usual, a closure of a
lambda expression, but the body of this lambda expression is a special form
of structure entity expression that simply wraps the body signature from FS:
FORM{sig type tend}. When elaborating X.G() in functor F’s body, we evaluate
this new form of structure expression by instantiating sig type tend with respect
to the evaluation environment. In this case, this will create a fresh abstract tycon
as the realization of t. This allows the type checking of the body of F to proceed
with no information about actual parameters other than that they match the
signature of X.
The entity declaration corresponding to M in the lambda abstraction for F
applies the relativized entity path for G to an empty structure entity expression:
structure eM= [ eX,eG] (STRUCTURE{})
When this declaration is evaluated at the call of F defining FR0, [eX, eG]will
evaluate to the entity function for H0, and this function will define the entity
binding of etto be int. So FR0.t is int. On the other hand, in the following
example, the definition of functor H1 uses an opaque ascription to cause a new
abstract tycon to be generated for t on each application.
functor H1( ) = s t r u c t ty pe t = i n t end :>s i g t yp e tend
structure FR1 = F( s t ru c t functor G = H1 end)
FR1.t will be a new abstract tycon. Thus, while the direct mode elaboration
of the body of F has to assume a conservative approximation to the functor
action of the G parameter, when F is applied uses the actual functor action
associated with G in the argument. This technique is the key to supporting true
higher-order functor semantics.
Functor elaboration. Functor elaboration involves several new problems. One
issue is how to deal with references to the formal parameter structure in the
body, both during elaboration of the body and later during application of the
functor. As we have seen in the earlier example, when applying the functor, the
functor parameter will be represented by an entity variable that serves as the
formal parameter of the entity function.
During direct mode elaboration of the functor body, we bind the parameter
name to a full signature for the parameter structure obtained by instantiating the
parameter signature. This instantiation can serve as a formal representative of
all possible actual arguments because it embodies the minimal required sharing
among its tycon components. Any actual parameter will have to satisfy at least
as much sharing.
Now having bound the formal parameter symbol to the instantiation of the
parameter signature in the static environment, the body structure of the functor
is elaborated. This produces a full signature and a structure entity expression
for the functor body. A functor signature is created by combining the parameter
signature and the signature part of the body full signature. The functor’s entity
function is created by wrapping a lambda abstraction around the body’s struc-
ture entity expression, and closing it with respect to the entity environment in
which the functor is elaborated.
4 Discussion
The type information generated during elaboration in ML can grow quite large,
and experience with early, relatively naive versions of the elaborator demon-
strated that the size of static data structures can become a real resource bot-
tleneck. Although we do not have systematic experimental data comparing the
efficiency of the current implementation with simpler versions, it is certainly the
case that the current implementation has shown that it scales very well and
can easily cope with large and complex programs. Sharing signatures is conjec-
tured to be a considerable win. We believe that the factorization of modules into
signatures and realizations is a key part of the scalability of the SML/NJ’s elab-
orator. Although hash-consing of type information turned out to be necessary in
the FLINT intermediate language, this technique does not seem to be required
in the front end, and this is probably partly due to the sharing of signature
information.
5 Related Work
Although the literature developing module system semantics is rich, there are
few accounts of the implementation approaches and techniques. As far as the au-
thors know, this paper is one of the few besides Cr´egut and MacQueen [2], which
reported on an earlier implementation. In that implementation, the internal rep-
resentations and algorithms were considerably more baroque and less principled.
Before the implementation of the entity path and signature-realization factor-
ization, the compiler relied on comparison of stamp creation times to index into
several arrays containing the relevant static information. The former design was
fragile and insufficiently abstract. This new design is a clear advancement that
greatly simplifies the implementation.
Most of the literature focuses on the ML module system. Both Haskell [3] and
Scheme’s [7] module systems are primarily concerned about namespace manage-
ment through explicit import and export syntax. Because Haskell and Scheme
have no equivalent of functors in the module language and in the case of Scheme
has no type components, they are not directly comparable to ML module sys-
tems. The several proposals addressing module system semantics and design
can be classified as falling under a continuum with the abstract approach on
one end and the operational approach on the other. The former, a term coined
by Shao [23], refers to the type-theoretic accounts in Harper-Lillibridge [9] and
Leroy [12]. The latter refers to the approach embodied in MacQueen-Tofte and
the Definition of Standard ML [17]. Several accounts [5, 11, 14, 22] follow the
abstract approach closely. Module systems from that pedigree generally do not
have internal representations of signatures distincty from syntactic signatures.
Type equivalence and generative types are generally modeled by a simple nom-
inal check and existential types respectively. Thus, they do not support true
higher-order functor semantics. The TILT [11], Moscow ML [22], and OCaml
compilers are implementations from this line of development.
The Definition [17] does not support higher-order functors. The semantic
objects in its treatment differ from ours primarily in our use of entity environ-
ments and entity expressions. The Definition has a notion of type realizations,
which are maps from type names to tycons, and instantiation of both signatures
and functor signatures, producing a static environment and a pair of static en-
vironments with a set of flexible names. In contrast to SML/NJ, the result of
functor instantiation is only an approximation of our functor realization – there
are no entity functions to express functor actions. Signatures in the Definition
explicitly name the volatile tycons, but there are no analogues of entity variables
associated with tycon, structure, and functor specs.
Other proposals fall somewhere in between the abstract and operational ap-
proaches. Biswas [1] and Shao [23] propose type-theoretic accounts that support
limited forms of higher-order functors. Both of these module systems can rep-
resent some functor actions (which they refer to as “argument-to-result depen-
dency”) in functor signatures. Biswas utilizes higher-order variables that have
about the same expressiveness as applicative functors in OCaml. A variant of
Biswas’s design is implemented in the Moscow ML compiler. Shao’s solution
uses a higher-order tycon that serves a similar role. Unlike Biswas and SML/NJ,
Shao’s account admits syntactic signatures that can express some functor actions
in terms of higher-order type constructor expressions.
More recent variations of the ML module system such as Dreyer’s RMC [4]
and MixML [6] express type abstraction using an existential type discipline fol-
lowing Mitchell and Plotkin [18] and Russo [22]. Signature matching is non-
coercive, though coercions are definable in the module language [5]. Montagu
and R´emy [19] develop a more modular form of the existential type calculus by
splitting open and pack into separate scoping and witness packing/unpacking
constructs to address the tension between modularity and existential-encoded
abstract types as pointed out by MacQueen [15]. None of these accounts handles
true higher-order functor semantics.
6 Conclusion
The module system implementation in SML/NJ has proven itself to be very
scalable [8, 21]. SML/NJ is self-hosting and compiles a wide range of Standard
ML programs. This is at least due in part to the robust module system imple-
mentation. The implementation is both practical and principled. The current
incarnation of SML/NJ’s implementation of higher-order modules is a marked
improvement over the previously reported version. The implementation is a
demonstration of how to design a lambda calculus-like language to efficiently
represent static functor actions. Some directions for future investigation include
the relationship between the static lambda calculus and the syntactic module
language, especially the signature language, and the implications of this language
for true separate compilation.
References
1. Sandip K. Biswas. Higher-order functors with transparent signatures. In POPL
’95: Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles
of programming languages, pages 154–163, New York, NY, USA, 1995. ACM.
2. P. Cr´egut and D. MacQueen. An implementation of higher-order functors. In
ACM SIGPLAN Workshop on Standard ML and its Applications, June 1994.
3. Iavor S. Diatchki, Mark P. Jones, and Thomas Hallgren. A formal specification
of the haskell 98 module system. In Haskell ’02: Proceedings of the 2002 ACM
SIGPLAN workshop on Haskell, pages 17–28, New York, NY, USA, 2002. ACM.
4. Derek Dreyer. A type system for recursive modules. In ICFP ’07: Proceedings
of the 2007 ACM SIGPLAN international conference on Functional programming,
pages 289–302, New York, NY, USA, 2007. ACM.
5. Derek Dreyer, Karl Crary, and Robert Harper. A type system for higher-order
modules. In POPL ’03: Proceedings of the 30th ACM SIGPLAN-SIGACT sym-
posium on Principles of programming languages, pages 236–249, New York, NY,
USA, 2003. ACM.
6. Derek Dreyer and Andreas Rossberg. Mixin’ up the ML module system. In ICFP
’08: Proceeding of the 13th ACM SIGPLAN international conference on Functional
programming, pages 307–320, New York, NY, USA, 2008. ACM.
7. Matthew Flatt. Composable and compilable macros:: you want it when? In ICFP
’02: Proceedings of the seventh ACM SIGPLAN international conference on Func-
tional programming, pages 72–83, New York, NY, USA, 2002. ACM.
8. Ronald Garcia, Jaakko J¨arvi, Andrew Lumsdaine, Jeremy G. Siek, and Jeremiah
Willcock. An extended comparative study of language support for generic pro-
gramming. J. Funct. Program., 17(2):145–205, 2007.
9. Robert Harper and Mark Lillibridge. A type-theoretic approach to higher-order
modules with sharing. In POPL ’94: Proceedings of the 21st ACM SIGPLAN-
SIGACT symposium on Principles of programming languages, pages 123–137, New
York, NY, USA, 1994. ACM.
10. Robert Harper and Benjamin C. Pierce. Advanced Topics in Types and Program-
ming Languages, chapter Design Considerations for ML-Style Module Systems.
MIT Press, 2005.
11. Robert Harper and Chris Stone. An interpretation of Standard ML in type the-
ory. Technical Report CMU–CS–97–147, CMU, Pittsburgh, PA, June 1997. (Also
published as Fox Memorandum CMU–CS–FOX–97–01.).
12. Xavier Leroy. Manifest types, modules, and separate compilation. In POPL ’94:
Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of pro-
gramming languages, pages 109–122, New York, NY, USA, 1994. ACM.
13. Xavier Leroy. Applicative functors and fully transparent higher-order modules.
In POPL ’95: Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on
Principles of programming languages, pages 142–153, New York, NY, USA, 1995.
ACM.
14. Xavier Leroy. A modular module system. J. Funct. Program., 10(3):269–303, 2000.
15. David B. MacQueen. Using dependent types to express modular structure. In
POPL ’86: Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Prin-
ciples of programming languages, pages 277–286, New York, NY, USA, 1986. ACM.
16. David B. MacQueen and Mads Tofte. A semantics for higher-order functors. In
ESOP ’94: Proceedings of the 5th European Symposium on Programming, pages
409–423, London, UK, 1994. Springer-Verlag.
17. Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. The Definition
of Standard ML - Revised. The MIT Press, May 1997.
18. John C. Mitchell and Gordon D. Plotkin. Abstract types have existential types.
In POPL ’85: Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on
Principles of programming languages, pages 37–51, New York, NY, USA, 1985.
ACM.
19. Benoˆıt Montagu and Didier R´emy. Modeling abstract types in modules with open
existential types. In Proceedings of the 36th ACM Symposium on Principles of Pro-
gramming Languages (POPL’09), pages 63–74, Savannah, Georgia, USA, January
2009.
20. M. S. Paterson and M. N. Wegman. Linear unification. In STOC ’76: Proceedings
of the eighth annual ACM symposium on Theory of computing, pages 181–186,
New York, NY, USA, 1976. ACM.
21. Norman Ramsey. Ml module mania: A type-safe, separately compiled, extensible
interpreter. Electr. Notes Theor. Comput. Sci., 148(2):181–209, 2006.
22. Claudio V. Russo. Types for Modules. PhD thesis, Edinburgh University, 1998.
23. Zhong Shao. Transparent modules with fully syntactic signatures. In ICFP ’99:
Proceedings of the fourth ACM SIGPLAN international conference on Functional
programming, pages 220–232, New York, NY, USA, 1999. ACM.
... The higher-order functors introduced by SML/NJ were considered too radical and untested a change, and did not make it into the Definition (Revised). There has been further work on the semantics and implementation of higher-order module systems has by Claudio Russo [1998], Dreyer, Russo, and Rossberg [Rossberg 2015;, and by George Kuan and MacQueen [Kuan 2010;Kuan and MacQueen 2010]. ...
Conference Paper
Full-text available
The ML family of strict functional languages, which includes F#, OCaml, and Standard ML, evolved from the Meta Language of the LCF theorem proving system developed by Robin Milner and his research group at the University of Edinburgh in the 1970s. This paper focuses on the history of Standard ML, which plays a central rôle in this family of languages, as it was the first to include the complete set of features that we now associate with the name “ML” (i.e., polymorphic type inference, datatypes with pattern matching, modules, exceptions, and mutable state). Standard ML, and the ML family of languages, have had enormous influence on the world of programming language design and theory. ML is the foremost exemplar of a functional programming language with strict evaluation (call-by-value) and static typing. The use of parametric polymorphism in its type system, together with the automatic inference of such types, has influenced a wide variety of modern languages (where polymorphism is often referred to as generics). It has popularized the idea of datatypes with associated case analysis by pattern matching. The module system of Standard ML extends the notion of type-level parameterization to large-scale programming with the notion of parametric modules, or functors. Standard ML also set a precedent by being a language whose design included a formal definition with an associated metatheory of mathematical proofs (such as soundness of the type system). A formal definition was one of the explicit goals from the beginning of the project. While some previous languages had rigorous definitions, these definitions were not integral to the design process, and the formal part was limited to the language syntax and possibly dynamic semantics or static semantics, but not both. The paper covers the early history of ML, the subsequent efforts to define a standard ML language, and the development of its major features and its formal definition. We also review the impact that the language had on programming-language research.
Article
ML is two languages in one: there is the core , with types and expressions, and there are modules , with signatures, structures, and functors. Modules form a separate, higher-order functional language on top of the core. There are both practical and technical reasons for this stratification; yet, it creates substantial duplication in syntax and semantics, and it imposes seemingly unnecessary limits on expressiveness because it makes modules second-class citizens of the language. For example, selecting one among several possible modules implementing a given interface cannot be made a dynamic decision. Language extensions allowing modules to be packaged up as first-class values have been proposed and implemented in different variations. However, they remedy expressiveness only to some extent and tend to be even more syntactically heavyweight than using second-class modules alone. We propose a redesign of ML in which modules are truly first-class values, and core and module layers are unified into one language. In this “1ML”, functions, functors, and even type constructors are one and the same construct; likewise, no distinction is needed between structures, records, or tuples. Or viewed the other way round, everything is just (“a mode of use of”) modules. Yet, 1ML does not require dependent types: its type structure is expressible in terms of plain System F ω , with a minor variation of our F-ing modules approach. We introduce both an explicitly typed version of 1ML and an extension with Damas–Milner-style implicit quantification. Type inference for this language is not complete, but, we argue, not substantially worse than for Standard ML.
Chapter
We take another look at 1ML, a language in the ML tradition, but with core and modules merged into one unified language, rendering all modules first-class values. 1ML already comes with a simple form of effect system that distinguishes pure from impure computations. Now we enrich it with effect polymorphism: by introducing effect declarations and, more interestingly, abstract effect specifications, effects can be parameterised over, and treated as abstract or concrete in the type system, very much like types themselves. Because type generativity in 1ML is controlled by (im)purity effects, this yields a somewhat exotic novel notion of generativity polymorphism – that is, a given functor can be asked to behave as either “generative” or “applicative”. And this time, we even get to define an interesting (poly)monad for that!
Article
Full-text available
Many modern programming languages support basic generics, sucient to implement type-safe polymorphic containers. Some languages have moved beyond this basic support, and in doing so have enabled a broader, more powerful form of generic programming. This paper reports on a comprehensive comparison of facilities for generic programming in eight programming languages: C++, Standard ML, Objective Caml, Haskell, Eiel, Java, C# (with its proposed generics extension), and Cecil. By implementing a substantial example in each of these languages, we illustrate how the basic roles of generic programming can be represented in each language. We also identify eight language properties that support this broader view of generic programming: support for multi-type concepts, multiple con- straints on type parameters, convenient associated type access, constraints on associated types, retroactive modeling, type aliases, separate compilation of algorithms and data structures, and implicit argument type deduction for generic algorithms. We find that these features are necessary to avoid awkward designs, poor maintainability, and painfully verbose code. As languages increasingly support generics, it is important that language designers understand the features necessary to enable the eective use of generics and that their absence can cause diculties for programmers.
Article
Full-text available
: A simple implementation of a SML-like module system is presented as a module parameterized by a base language and its type-checker. This demonstrates constructively the applicability of that module system to a wide range of programming languages. Key-words: Module systems, type systems, functors, sharing constraints, Caml, SML, ML. (R'esum'e : tsvp) Unit'e de recherche INRIA Rocquencourt Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France) T'el'ephone : (33 1) 39 63 55 11 -- T'el'ecopie : (33 1) 39 63 53 30 Un syst`eme de modules modulaire R'esum'e : Ce rapport pr'esente une impl'ementation simple d'un syst`eme de modules `a la SML, sous la forme d'un module param'etr'e par la description d'un langage de base et du typeur associ'e. Cette impl'ementation d'emontre de mani`ere constructive que ce syst`eme de modules s'applique `a une large classe de langages de programmation. Mots-cl'e : Syst`emes de modules, syst`emes de types, foncteurs, contraintes de pa...
Article
Full-text available
This paper presents a variant of the SML module system that introduces a strict distinction between abstract types and manifest types (types whose definitions are part of the module specification), while retaining most of the expressive power of the SML module system. The resulting module system provides much better support for separate compila- tion.
Article
In the past three decades, the ML module system has been the focal point of tremen-dous interest in the research community. The combination of parameterized modules and fine-grain data abstraction control have proven to be quite powerful in practice. Mainstream languages have slowly adopted features inspired by the ML module system. However, programmers have run into various limitations and complexities in implemen-tations of the ML module system. In the presence of common extensions such as true higher-order modules, true separate compilation becomes a problem. This conflict re-flects a fundamental tension in module system design. Module systems should both propagate as much type information across module boundaries as is unconstrained by the programmer and be able to separately typecheck modules.
Article
ML-style modules are valuable in the development and maintenance of large software systems, unfortunately, none of the existing languages support them in a fully satisfactory manner. The Official SML'97 Definition does not allow higher-order functors, so a module that refers to externally defined functors cannot accurately describe its import interface. MacQueen and Tofte [26] extended SML'97 with fully transparent higher-order functors, but their system does not have a type-theoretic semantics thus fails to support fully syntactic signatures. The systems of manifest types [19, 20] and translucent sums [12] support fully syntactic signatures but they may propagate fewer type equalities than fully transparent functors. This paper presents a module calculus that supports both fully transparent higher-order functors and fully syntactic signatures (and thus true separate compilation). We give a simple type-theoretic semantics to our calculus and show how to compile it into an Fω-like λ-calculus extended with existential types.
Conference Paper
There has been much work in recent years on extending ML with recursive modules. One of the most difficult problems in the development of such an extension is the double vision problem, which concerns the interaction of recursion and data abstraction. In previous work, I defined a type system called RTG, which solves the double vision problem at the level of a System-F-style core calculus. In this paper, I scale the ideas and techniques of RTG to the level of a recursive ML-style module calculus called RMC, thus establishing that no tradeoff between data abstraction and recursive modules is necessary. First, I describe RMC's typing rules for recursive modules informally and discuss some of the design questions that arose in developing them. Then, I present the formal semantics of RMC, which is interesting in its own right. The formalization synthesizes aspects of both the Definition and the Harper-Stone interpretation of Standard ML, and includes a novel two-pass algorithm for recursive module typechecking in which the coherence of the two passes is emphasized by their representation in terms of the same set of inference rules.
Conference Paper
We propose F-zip, a calculus of open existential types that is an extension of System F obtained by decomposing the introduction and elimination of existential types into more atomic constructs. Open existential types model modular type abstraction as done in module systems. The static semantics of F-zip adapts standard techniques to deal with linearity of typing contexts, its dynamic semantics is a small-step reduction semantics that performs extrusion of type abstraction as needed during reduction, and the two are related by subject reduction and progress lemmas. Applying the Curry-Howard isomorphism, F-zip can be also read back as a logic with the same expressive power as second-order logic but with more modular ways of assembling partial proofs. We also extend the core calculus to handle the double vision problem as well as type-level and term-level recursion. The resulting language turns out to be a new formalization of (a minor variant of) Dreyer's internal language for recursive and mixin modules.
Article
Abstract data type declarations appear in typed programming languages like Ada, Alphard, CLU and ML. This form of declaration binds a list of identifiers to a type with associated operations, a composite “value” we call a data algebra. We use a second-order typed lambda calculus SOL to show how data algebras may be given types, passed as parameters, and returned as results of function calls. In the process, we discuss the semantics of abstract data type declarations and review a connection between typed programming languages and constructive logic.
Article
ML modules provide hierarchical namespace management, as well as fine-grained control over the propagation of type information, but they do not allow modules to be broken up into mutually recursive, separately compilable components. Mixin modules facilitate recursive linking of separately compiled components, but they are not hierarchically composable and typically do not support type abstraction. We synthesize the complementary advantages of these two mechanisms in a novel module system design we call MixML. A MixML module is like an ML structure in which some of the components are specified but not defined. In other words, it unifies the ML structure and signature languages into one. MixML seamlessly integrates hierarchical composition, translucent ML-style data abstraction, and mixin-style recursive linking. Moreover, the design of MixML is clean and minimalist; it emphasizes how all the salient, semantically interesting features of the ML module system (and several proposed extensions to it) can be understood simply as stylized uses of a small set of orthogonal underlying constructs, with mixin composition playing a central role. We provide a declarative type system for MixML, including two important extensions: higher-order modules, and modules as first-class values. We also present a sound and complete, three-pass type-checking algorithm for this system. The operational semantics of MixML is defined by an elaboration translation into an internal core language called LTG---namely, a polymorphic lambda calculus with single-assignment references and recursive type generativity---which employs a linear type and kind system to track definedness of term and type imports.