Content uploaded by David Macqueen

Author content

All content in this area was uploaded by David Macqueen

Content may be subject to copyright.

Engineering Higher-Order Modules in SML/NJ

George Kuan and David MacQueen

University of Chicago

Abstract. SML/NJ and other Standard ML variants extend the ML

module system with higher-order functors, elevating the module lan-

guage to a full functional language. In this paper, we describe the im-

plementation of the higher-order module system in SML/NJ, which is

unique in providing “true” higher-order behavior at the static level. This

second generation implementation of higher-order modules in SML/NJ

is based on three key techniques: unique internal variables (entity vari-

ables) for naming static entities, a factorization of the static information

in both basic modules and functors into signatures and realizations, and

a static lambda calculus we call the entity calculus with static “eﬀects”

to represent the type-level mapping performed by a functor. This system

implements MacQueen-Tofte’s re-elaboration semantics without having

to re-elaborate functor bodies or appeal to fragile stamp properties.

1 Introduction

The ML module system has evolved considerably over the past 25 years. One of

the Standard ML of New Jersey (SML/NJ) compiler’s more signiﬁcant extensions

is support for higher-order functors, achieved by allowing structures, including

functor parameters and results, to contain functors as components. MacQueen

and Tofte [16] describe the original semantics for higher-order functors, which

has a strong policy regarding how functors propagate type information through

functor applications. We will refer to the MacQueen-Tofte higher-order functor

behavior as true higher-order behavior. This model of higher-order functors was

ﬁrst implemented in SML/NJ Version 0.93 (1993), using techniques described in

Cr´egut and MacQueen [2]. This ﬁrst generation implementation was based on

an earlier implementation of ﬁrst-order functors, and its adaptation to handle

higher-order functors was rather ad hoc and complex. Here we describe the sec-

ond generation implementation used in current versions of SML/NJ, which is sig-

niﬁcantly simpler and more principled. Our focus will be on the representations

and processes in the elaboration phase of the compiler. The relatively straight-

forward elaboration of the dynamic semantics of the module system through

abstract syntax is beyond the scope of this paper.

1.1 SML 97 Module System

The ML module system provides a set of constructs for expressing large-scale pro-

gram architecture, and is also the means for deﬁning and enforcing abstractions.

Basic modules, called structures, are collections of types, values, and hierar-

chically nested modules. Signatures express static interfaces of structures, being

analogues of types for structures. A signature is comprised of a collection of type,

value, and module speciﬁcations, specifying their kinds, types, and signatures,

respectively. A functor is a module-level function formed by parameterizing a

structure (the functor body) with respect to a structure variable constrained by

a signature. Signatures and structures have a many-to-many relationship: mul-

tiple structures can match a single signature, and multiple signatures can be

ascribed to a single structure.

1.2 Higher-order Functors

The need for higher-order functors arises naturally in a module system with

functors. Just as a ﬁrst-order functor is formed by abstracting over the name of

an external structure imported by a structure deﬁnition, a higher-order functor

is deﬁned by abstracting over the name of an external functor. The module

abstracted over can be either a basic structure or a functor. So, for orthogonality,

we should be able to abstract with respect to both structure names and functor

names over both structure and functor deﬁnitions. However, this extension does

raise some signiﬁcant issues for design, semantics, and implementation.

When abstracting over the name of either an imported structure or func-

tor, the parameter is described by a signature, which expresses all the static

interface information about the parameter that the client structure is allowed

to know. In the case of ﬁrst-order structures (with no functor components), the

signature language is capable of expressing a fairly complete static description

of a given structure using deﬁnitional specs and where clauses to pin down the

type1components. But when we abstract over an imported structure, then we

normally use a looser, less exact signature for the functor parameter because the

parameter types can vary from one application to another. Signatures may be

looser in two senses. First, for some of the parameter’s tycons, we may specify

only kinds (arities) rather than deﬁnitions (e.g.,sig type (’a,’b) t end speciﬁes a

tycon t that has arity 2), with the deﬁnitions to be supplied later by the argu-

ment structures to which the functor is applied. Second, the argument structure

may contain excess components, which will be coerced away during application,

or value components that are more polymorphic than what is speciﬁed in the

signature.

A functor can express complex static-level computations mapping its input

tycons to its output tycons, and we call this mapping the functor static action,

or simply the functor action. The deﬁning characteristic of true higher-order

static behavior in functors is the faithful propagation of functor actions through

functor application. A functor action may involve static eﬀects in the form of

the generation of new tycons, introduced either by datatype declarations or

by opaque (“sealed”) signature ascriptions. However, functor signatures, which

1Actually type constructors, but it is a common and convenient abuse of terminology

to refer to types when we mean type constructors (abbreviated as tycons).

consist of just a named parameter signature and a result signature, are only

capable of expressing very simple functor actions where the result tycons can

be deﬁned directly in terms of the parameter tycons. Thus functor signatures

have a very limited ability to describe functor static actions. This means that a

complete description of the static content of a functor must include information

beyond the functor signature, in all but the simplest cases.

On the other hand, when a functor G is a component of the parameter of a

functor F, G is formal, and all we know about G is its functor signature. When

elaborating the body of F we need to determine the static eﬀect of applying the

formal G. This means that we need to be able to synthesize a default functor

action from a functor signature. When F is applied to an actual parameter with

its own version of G, that G’s actual functor action should be used in place of

the approximation derived from G’s speciﬁcation in F’s parameter signature. In

other words, the static action of F should be parameterized with respect to the

static action of G. This approach is the essence of true higher-order behavior.

The standard example illustrating this point is the Application functor2:

sign a t u r e SIG = s i g ty pe tend

functor F( f u n c t o r G (Y: S IG ) : SI G

structure A: SIG ) : SI G

= G(A)

functor I d (X: SIG ) = s tr u c t ty pe t = X. t end

functor Co ns t (X: S IG ) = st ru c t t ype t = i n t end

structure B: SIG = st r u ct typ e t = b o o l end

structure R1 = F( I d , B) (∗R1. t = bool ∗)

structure R2 = F( C on st , B) (∗R2 . t = i n t ∗)

Here the action of functor Id maps its argument tycon to itself (λt. t), while

the action of Const maps any argument tycon to int (λt. int). Applications

of F invoke F’s action, which in turn invokes the functor actions of its functor

parameters.

1.3 Overview

The following two sections describe the module elaboration in SML/NJ. Section

2 describes the internal static representations of types, signatures, structures,

and functors. The main ideas are the use of internal entity variables and paths

for relative references to tycons, the factorization of the static representations

of modules into signatures and associated realizations of the signatures, and a

2A common alternative solution to this problem is applicative functor semantics [13].

However, such semantics cannot capture generative functor actions. Applicator func-

tor semantics is also fragile in the presence of aliasing of structures and examples

where functor arguments are not named (A-normalized) or are aliased.

static lambda calculus of entity expressions we dub the entity calculus used as

the realization part of functors.

Section 3 covers the processes involved in the elaboration of modules, which

create and utilize the representations in Section 2. These processes include the

basic elaboration of signature, structure, and functor declarations, the static

aspect of functor application, and signature matching. An important subsidiary

process is signature instantiation, which is used in the elaboration of functors

and the application of formal functors (e.g., G in the example above).

Section 4 contains a short discussion of performance and scalability, and

Section 5 covers related work. We conclude in Section 6.

2 Semantic Objects

In the core ML language, a tycon always has a ﬁxed identity such as a prim-

itive type or some speciﬁc user-deﬁned type, e.g.,type t = int. We call these

tycons nonvolatile. As discussed in Section 1.2, a functor parameter signature

may specify only the kind and name of a tycon without deﬁning it precisely.

Such a tycon is volatile because its actual deﬁnition is supplied upon each func-

tor application, and it can vary from one application to another. Tycons deﬁned

in terms of volatile tycons are also considered volatile. For example, in the sig-

nature sig type ttype u = t list end,tand uare both volatile. Although volatile

tycons bear some resemblance to abstract types, volatility is not the same as

abstractness. The deﬁnition of a volatile tycon will be eventually determined,

e.g. by the actual parameter passed to a functor, after which the tycon may

become nonvolatile. However, a future deﬁnition of a volatile tycon cannot play

a role while type checking the functor itself, because it is not yet available.

2.1 Entity Paths

Following Harper and Lillibridge [9], we use a variation on internal names, which

we call entity variables, to provide a robust means to refer to tycons, structures,

and functors that avoids the problems associated with shadowing of symbolic

names. Here entity refers to static entities, the internal representation of any-

thing that may contain or produce static information in the form of tycons. This

includes tycons themselves, structures, and functors. Entity variables are unique

by construction – a given entity variable will be used in only one place and entity

variable bindings cannot be shadowed. Sequences of entity variables called entity

paths are used to refer to an entity that is located inside a hierarchy of nested

structures.

Consider the example in Fig. 1. Assuming that eAis an entity variable for

A, and so on, type A.B.u can be referred to by the entity path eA, eB, eu. An

entity path is similar to a symbolic path except it refers to an internal entity

rather than a syntactic object. Moreover, an entity path is robust, in that there

will always be a valid entity path for any entity even when no corresponding

symbolic path exists due to shadowing.

sign a t u r e S =

sig

structure A :

sig

type t

structure B :

sig

type u

val x : t

end

val y : t ∗B . u

end

end

INTERNAL SIG =

sig

structure A(eA):

sig

type t(et)

structure B(eB):

sig

type u(eu)

val x : [ et]

end

val y : [ et]∗[eB, eu]

end

end

Fig. 1. A syntactic signature and its internal representation

2.2 Internal Representations of Signatures

The internal representation of a signature is basically a list of pairs of component

names and their speciﬁcations. Each static component is also assigned a fresh

entity variable as part of its speciﬁcation. Hereafter, signature will refer to this

internal representation as distinct from syntactic signatures, which are in the

surface language. We construct signature representations either by translating a

syntactic signature expression or by inferring a signature from a basic structure

expression. Using the entity variables in a signature, we can map a symbolic

path for a static component to a corresponding entity path. In Fig. 1, INTER-

NAL SIG is the translation from the syntactic signature S. We can traverse this

signature following a symbolic path A.B.u, collecting the corresponding entity

path eA, eB, euas we go.

We represent volatile tycon occurrences in value speciﬁcations by an entity

path relativized to the scope of the occurrence. For example, the spec for value y

has the relativized form [et]∗[eB, eu]. Due to the presence of volatile tycons, the

signature is an incomplete representation of the static information in a structure

which matches the signature. The representation of a structure matching a given

signature supplements the signature with a realization that maps entity variables

and paths for volatile tycons to actual tycons, thus deﬁning them.

2.3 Structure Realization

Astructure realization is a ﬁnite map from entity variables to entities. An en-

tity variable for a tycon component is mapped to a tycon. An entity variable

for a substructure is mapped to another structure realization. In the case of a

functor component, its entity variable is mapped to a functor realization, which

will be described in the Section 2.5. Because structure realizations contain only

static entities, value speciﬁcations have no corresponding mapping. Because a

structure realization may contain nested structure realizations, it can be thought

of as a tree where the edges are labeled by the entity variables, internal nodes

are subtrees (structure realizations), and leaves are tycons or functor realiza-

tions. For example, Fig. 2 shows a structure M matching signature S and the

corresponding structure realization that complements S with entities from M.

structure M =

st r u ct

structure A =

st r u ct

type t = i n t

structure B =

st r u ct

type u = b o ol

val x = 1

end

val y = ( 1 , t ru e )

end

end

et

eteu

eB

eA

int

int bool

Fig. 2. A structure and structure realization matching signature S

The seemingly duplicate etedge under node B may look peculiar. Because

substructures such as B may be selected out by later declarations such as struc-

ture B’ = M.A.B, the structure realization of B must be able to stand on its own.

Consequently, we need to close the structure realization for B by including the

mapping for et, which is not a local component of B.

Looking up the type for value spec A.y from the previous section given this

matching realization takes two steps. First, we construct the entity path to each

of the tycons in the type expression [et]*[eB, eu], namely eA, etand eA, eB, eu

respectively by traversing the signature. Then we lookup the entity paths by

following the corresponding edges in the structure realization to get int and bool

respectively.

A generative type declaration in a structure such as datatype declaration will

produce a fresh tycon with a unique identifying stamp. Other type declarations

will deﬁne a tycon component in terms of an existing tycon or as an abbreviation

for a type expression or type function. All these tycons will be found as leaf

nodes in the structure’s realization tree, accessed by entity paths that specify

their location in the structure hierarchy.

2.4 Full Signatures

When we put together a signature and a compatible structure realization, i.e.,

a realization that at least maps all the entity paths in the signature, we have

a complete static description of a structure, which we call a full signature. For

a structure expression such as the one in Fig. 2 that has no explicit signature

ascribed, the elaborator will construct a full signature for the structure, including

a synthesized signature and matching realization.

Structure declarations can also explicitly ascribe syntactic signature expres-

sions to a structure:

structure M1 : S = M

structure M2 : S =

st r u ct

structure A =

st r u ct

type t = r e a l

structure B =

st r u ct

type u = s t r i n g

val x = 1 . 0

end

val y = (1 . 0 , ” s t r i n g ” )

end

end

Although M1 and M2’s realizations obviously diﬀer, their full signatures will

share the same signature representation INTERNAL SIG, representing the syn-

tactic signature S. The sharing of common signature information among all the

structures matching an explicit signature3is one advantage of the factorization.

2.5 Functor Entities and the Entity Calculus

The complete static description of a functor is called a full functor signature,

and it is also factored into a functor signature and a functor realization. The

internal representation of a functor signature, which we informally write as

(X (ex):SIGPARAM) : SIGBODY, consists of a parameter signature SIGPARAM

and a functor body signature SIGBODY whose speciﬁcations may mention the

bound parameter X via the associated parameter entity variable ex. Both SIG-

PARAM and SIGBODY are internal representations of signatures and thus are

decorated with entity variables and use entity paths to reference volatile entities.

While the functor signature speciﬁes the ﬁxed shape of the parameter and

result, information that is common to all calls of the functor, the functor re-

alization describes how the structure realization of the functor body structure

is computed in terms of the structure realization of the parameter structure.

The structure realization deals with the part of the information that varies from

call to call. This is where the signature-realization factorization clariﬁes the se-

mantics of functors. The functor realization is an entity function of the form

λex.strexp where strexp is a structure entity expression that the compiler eval-

uates to a structure realization for the functor body. These entity expressions

3We always infer a full signature for a structure, even if the structure declaration has

an ascription. But we immediately match that full signature with the ascribed sig-

nature, producing a realization for the ascribed signature, as described in Section 3.

and functions are formalized by an applied, call-by-value λ-calculus called the

entity calculus (Fig. 3). Terms in the entity calculus express static information

and are evaluated only during compilation, speciﬁcally when elaborating functor

applications (see Section 3).

tycon ::= Formal(tycon)

|Def(ty peexp)

|Data(ConsN ame of typeexp)

|entity path

strexp ::= STRUCTURE{entitydec}

|fctexp(strexp)

|FORM{sig}

|entity path

entitydec ::= type ex=tycon

|structure ex=strexp

|functor ex=fctexp

|entitydec, entitydec

fctexp ::= λex.strexp[entityenv]

|entity path

Fig. 3. A simpliﬁed entity calculus

The tycon expressions include Formals representing dummy tycons that are

speciﬁed in a functor parameter. A typeexp is a type expression that may contain

applied occurrences of tycons. A Def tycon deﬁnes a tycon as an abbreviation

for a type expression. A Data tycon corresponds to the tycon for a datatype

with the given constructor name and constructor type parameter. For simplicity,

we are assuming only one data constructor per datatype. Entity paths are the

relativized tycon references described earlier.

Entity declarations bind entity variables to an appropriate kind of entity

expression. A functor body may contain free occurrences of entities such as

a tycon, structure, or functor declared in an outer functor, and these volatile

entities are denoted by entity paths (see section 2.6 for an example). Thus, the

functor realization for higher-order functors require a closure environment, and

the correct form of a functor realization is an entity function closure of the form

λex.strexp[entityenv] where entityenv is an entity environment mapping all free

entity variables to entities. An entityenv has exactly the same representation as

a structure realization. Section 2.6 will further explain the need for a closure

environment. Structure entity expressions include a form for basic structures,

which encapsulate an entity declaration for its static components, entity paths

to refer to structure entities bound in the local entity environment, applications,

and a special form (FORM) for functors in formal parameters, which will be

explained in Section 3.

Consider the following example:

functor F(X: s i g ty pe tval x : t end ) =

st r u ct

datatype u = A o f X. t

type v = X. t ∗u list

fu n f ( x : X . t ) : u = A x

end

The above functor is represented by a functor signature and a functor real-

ization. The inferred functor signature is:

(X (eX) : s i g t yp e t (et)val x : [ et]end )

:sig

type u (eu)type v ( ev)

val f : [ eX,et]−>[eu]

end

where eX,et,eu, and evare fresh entity variables.

The realization for the functor has to specify how realizations for the static

components of the result (entities for the types uand v) are constructed given

a structure realization for X, which includes a tycon entity for X. t . The functor

realization for Fis the entity function:

λeX.STRUCTURE{type eu=tyconu,type ev=tyconv}

where tyconuand tyconvare tycon entity expressions for the datatype uand

type abbreviation v:

tyconu= Data(A of [eX,et]) tyconv= Def([eX,et]∗[eu] list)

Here the closure environment can be empty, assuming the functor is deﬁned

at top level (it is also closed, having no references to nonlocal volatile entities).

When this functor is applied to an argument structure, the argument structure

is coerced by signature matching (described in Section 3) with the parameter

signature yielding a structure realization for the parameter signature. This pa-

rameter realization is bound to the entity variable eXand the body of the entity

function is evaluated in the resulting entity environment.

The body speciﬁes that a structure realization is to be constructed, whose

contents will be deﬁned by a sequence of two entity declarations. euwill be bound

to a new datatype generated from the datatype speciﬁcation, with the associated

entity paths referencing imported types being evaluated relative to the evalu-

ation entity environment, the entity environment at that point of elaboration.

Similarly, the deﬁnition of type vwill be instantiated by evaluating its embedded

entity paths in that same entity environment extended with the binding of eu.

In particular, in the application F(struct type t = int end), the realization bound

to eXwill be the entity environment {et7→ int}(representing the environ-

ment as a sequence of mappings), and the evaluation entity environment is

{eX7→ {et7→ int}}. Evaluating tyconuin this environment yields a fresh

datatype corresponding to the deﬁnition datatype u = A of int and tyconvyields

an instantiated type abbreviation for the deﬁnition type v = int * (u list).

2.6 Higher-order functors

The preceding example involves the classic case of a ﬁrst-order functor deﬁned

at top level, i.e., one not deﬁned within another functor, and in such a case

the closure environment of the functor realization can be empty. Here we show

higher-order functors can require a nontrivial closure environment. Consider the

following example:

functor F(X: s i g ty pe tend ) =

st r u ct

datatype u = C o f X. t

functor G(Y : s i g t yp e vval x:v∗u∗X. t end ) =

st r u ct

datatype s = D of X . t ∗u−>Y . v

end

end

We can see that all the tycons X.t,u,Y.v, and sare volatile in the sense that

their actual bindings are to be determined later, when Fis applied. When we

relativize the speciﬁcation of datatype swith respect to these volatile tycons,

we get:

tycons= Data (D o f [eX,et]∗eu−>[eY,ev] )

Now consider an application of F,structure A = F(struct type t = int end). When

this expression is evaluated, we will develop an entity environment that binds

eXand its extension [eX,et]as before, and the deﬁnition of uwill give rise to a

new datatype that will be bound to eu. As before, the realization of functor G

will involve a lambda expression in our entity calculus:

λeY.STRUCTURE{type es=tycons}

But note that this term binds only the entity variable eY, leaving eXand eu

occurring free in tycons. So the lambda term is not closed. As usual, we need

to close it by supplying a closure environment, namely the entity environment

mentioned above that binds eXand eu.

Now when we apply A.G we will add a binding of eYto its closure environment

and use this when evaluating the body of the lambda term for G. For instance,

after

structure B = A .G( s t r uc t typ e = b o o l val x = ( t ru e , A .C 3 , 1) end)

the datatype B.s = Data (D of int ∗A.u −>bool).

3 Elaboration

Elaboration is the translation of simple syntax trees produced by the parser

into (1) a typed abstract syntax for use in subsequent compiler stages, and

(2) a static environment mapping identiﬁers deﬁned at top-level to their static

representations. As mentioned in the introduction, we are focusing exclusively

on how the static environment is produced; the construction of abstract syntax

is relatively straightforward in comparison.

At the core language level, the elaborator does type checking and type infer-

ence for value declarations, and produces static bindings mapping type names to

tycon representations, and variable, data constructor, and exception constructor

names to their types. At the module level, the elaborator translates signature,

structure, and functor expressions and declarations into the internal representa-

tions described in Section 2. The type information is recorded as new bindings

added to the static environment, which is used for elaborating later compilation

units (e.g. source ﬁles) that import them. An initial static environment contains

predeﬁned modules, types, and values (the Basis libraries).

Elaboration can be broken down into a set of subtasks. The main tasks are

elaborating signature expressions, structure expressions, and functor declara-

tions, and these involve subsidiary processes including functor application, signa-

ture matching, and signature instantiation. Signature expressions and structure

expressions often occur as the deﬁniens in a declaration, but they can also occur

“in-lined”, in an ascription in the case of signatures, or as a functor parameter

or functor body in the case of structures.

Elaboration modes. It is useful to distinguish two contexts in which elabora-

tion takes place: functor context, where the expression or declaration elaborated

occurs within the body of a functor, and top level, when outside of any func-

tor. Elaboration in a functor context is more complicated, because in addition

to performing the usual type-checking and static environment building tasks,

it must also “compile” declarations to the entity calculus expressions used to

encode the functor static action. Thus in a functor context elaboration must

operate in dual, simultaneous, modes. We use the term direct elaboration for the

basic mode that deals with type checking and translation to static representa-

tions, while entity compilation refers to the parallel process of compiling static

declarations into the entity calculus. Direct elaboration occurs in both contexts,

while entity compilation is relevant only to the functor context. In practice, to

simplify the code, we always perform both modes of elaboration and if we are

in top level mode we discard the unneeded byproducts of entity compilation.

The extra work involved in unnecessary entity compilation is not a signiﬁcant

overhead.

Functor volatile entities. A related factor associated speciﬁcally with functor

mode is that static entities constructed within a functor (and the functor pa-

rameter itself) are volatile, as opposed to entities constructed in top level mode,

which are ﬁxed and hence nonvolatile. During functor elaboration, the functor

nonvolatile entities are virtual or potential, in the sense that the actual entities

will be created later at functor application time. However, in the direct elabora-

tion mode volatile entities need to be represented by dummy entities to support

type checking, so they will have static representations in the “working” static

environment used for direct elaboration of functor bodies. In embedded, in-line

signatures, and in compiled entity calculus expressions, references to volatile en-

tities (e.g. structures and tycons) must be “relativized” by translating them into

entity paths.

The process of relativizing references to volatile entities, and the interpre-

tation of the resulting entity paths, require that two additional parameters be

provided to the elaboration process. An entity environment is threaded through

to be used (1) to interpret entity paths of functor volatile entities in embedded

signatures, and (2) to construct closure environments for structure realizations

and functor realizations (entity functions). Elaboration of declarations will add

new entity variable bindings to this entity environment. The second new ingre-

dient is called an entity path context. It is an inverse environment that maps

dummy volatile entities to their entity paths, and it is used for relativization of

references to those dummy entities. So a complete schematic of the inputs and

outputs of elaboration is shown in Fig. 4.

elaborate

static environment

syntax trees typed abstract syntax

static environment

entity environment entity environment

entity path context

entity declaration

entity path context

Fig. 4. Schematic for elaboration

Signature elaboration. The speciﬁcations in the body of the signature are trans-

lated into a mapping from component names to internal specs in the form of

formal tycons for tycon specs, types for values, and internal representations of

signatures and functor signatures for structure and functor elements respectively.

Each static element (tycon or module) is assigned a fresh entity variable. The

types of value elements, data constructors, and types occurring in deﬁnitional

type specs, are relativized by replacing local tycon references (which we can call

signature volatiles to distinguish them from functor volatiles) with entity paths.

If the signature is in-line in a functor context, it may contain functor volatiles,

which are also relativized. Any where type constraints are elaborated and pushed

inward to the type speciﬁcations they apply to. Sharing constraints are recorded

in a normalized form as pairs of paths.

Structure elaboration. There are several cases for structure expression elabora-

tion, corresponding to the syntactic forms for such expressions (e.g., structures

declared in-line struct ... end, structure symbolic paths A, A.B, and functor ap-

plications). A symbolic name or path for a structure is simply looked up in the

current static environment, returning a full signature for the structure. A basic

in-line structure expression struct decls end form is elaborated in the following

steps

1. elaborate the body declarations decls, yielding a static environment envBody,

an entity environment, and an entity declaration entitydec

2. derive from envBody a signature and matching structure realization (entity

environment), and combine them to create a full signature;

3. return the full signature from step 2, a structure entity expression STRUC-

TURE{entitydec}, and the entity environment from step 1.

Signature matching. When a signature is ascribed to a structure in a structure

declaration, or when a functor is applied to a structure, implicitly ascribing

the parameter signature to the argument, we must verify that the structure in

question matches the signature. This is a kind of module-level type checking,

but it also has a coercive eﬀect, producing a modiﬁed structure realization that

is exactly conforming to the ascribed signature (similar to coercive subtyping).

Signature matching involves scanning the speciﬁcations in the signature and

verifying that the matching structure satisﬁes these speciﬁcations. There are

two modes of signature matching. Opaque signature matching generates fresh

tycons for signature volatile tycon speciﬁcations, whereas transparent signature

matching uses the corresponding tycons from the matching structure.

Signature instantiation. At a couple of points during elaboration, we have only

a signature on hand when what we need is a full signature. To synthesize a full

signature from the signature, we need to produce a dummy structure realization

for the signature. Signature instantiation is the process of creating a “free”

structure realization for a signature. This process is nontrivial because of sharing

speciﬁcations, which were introduced to address an issue called the diamond

import problem. This problem refers to the scenario where a functor parameter

contains tycons, possibly nested inside diﬀerent argument structures, that must

be identical for the functor body to type check. This required identiﬁcation of

types, which is called type sharing (see Pierce and Harper’s account [10]), must

be explicitly speciﬁed within parameter signatures using one of three syntactic

mechanisms, namely:

1. deﬁnitional speciﬁcations of tycons,

2. where clauses modifying signature expressions (an indirect form of deﬁni-

tional specs), and

3. equational sharing constraints identifying diﬀerent tycons by name or path.

This realization includes fresh formal tycons for each type component, but

chosen to satisfy the signature’s sharing constraints, and only those sharing

constraints (i.e., no incidental sharing not forced by the speciﬁcations). The al-

gorithm used for signature instantiation is adapted from the Patterson-Wegman

linear uniﬁcation algorithm [20].

When instantiating a functor speciﬁcation in a signature, we must create

a corresponding functor realization. This will be, as usual, an entity function,

but one where the body of the lambda abstraction is the special structure entity

expression form (FORM{sig}) containing only the formal functor signature. How

this is evaluated will be explained below.

Functor application. When a functor is applied, the argument structure expres-

sion is elaborated, and then signature matching is performed to verify that it

matches the parameter signature and to coerce the argument structure realiza-

tion to a realization for the parameter signature, yielding a full signature for the

coerced argument. The functor realization, which is an entity calculus lambda-

abstraction complete with a closure entity environment, is then applied to the

argument realization using a conventional call-by-value, environment-based in-

terpreter for the entity calculus. As usual, this entails extending the closure

environment with a binding of the argument realization to the lambda-bound

entity variable, and then evaluating the body structure expression with respect

to this extended entity environment.

This is the standard case. But in a functor context, the functor being applied

may be an element of an outer functor parameter, i.e. a formal functor. Suppose,

for example, the elaborator encountered the following program:

fu n s i g FS ( ) = s i g t ype tend

functor F(X: s i g f u nc t or G : FS end ) =

st r u ct

structure M = X.G( )

end

functor H0( ) = s tr u c t ty pe t = i n t end

structure FR0 = F( s t ru c t functor G = H0 end)

where FS is a functor signature where the parameter signature is empty and the

result signature speciﬁes a single tycon t. When elaborating the application of

functor X.G in the direct mode, we do not seem to have a functor realization

for X.G because that will only be supplied by an actual parameter (as in the

deﬁnition of FR0). We solve this problem by synthesizing a special entity function

from the functor signature FS. This entity function is, as usual, a closure of a

lambda expression, but the body of this lambda expression is a special form

of structure entity expression that simply wraps the body signature from FS:

FORM{sig type tend}. When elaborating X.G() in functor F’s body, we evaluate

this new form of structure expression by instantiating sig type tend with respect

to the evaluation environment. In this case, this will create a fresh abstract tycon

as the realization of t. This allows the type checking of the body of F to proceed

with no information about actual parameters other than that they match the

signature of X.

The entity declaration corresponding to M in the lambda abstraction for F

applies the relativized entity path for G to an empty structure entity expression:

structure eM= [ eX,eG] (STRUCTURE{})

When this declaration is evaluated at the call of F deﬁning FR0, [eX, eG]will

evaluate to the entity function for H0, and this function will deﬁne the entity

binding of etto be int. So FR0.t is int. On the other hand, in the following

example, the deﬁnition of functor H1 uses an opaque ascription to cause a new

abstract tycon to be generated for t on each application.

functor H1( ) = s t r u c t ty pe t = i n t end :>s i g t yp e tend

structure FR1 = F( s t ru c t functor G = H1 end)

FR1.t will be a new abstract tycon. Thus, while the direct mode elaboration

of the body of F has to assume a conservative approximation to the functor

action of the G parameter, when F is applied uses the actual functor action

associated with G in the argument. This technique is the key to supporting true

higher-order functor semantics.

Functor elaboration. Functor elaboration involves several new problems. One

issue is how to deal with references to the formal parameter structure in the

body, both during elaboration of the body and later during application of the

functor. As we have seen in the earlier example, when applying the functor, the

functor parameter will be represented by an entity variable that serves as the

formal parameter of the entity function.

During direct mode elaboration of the functor body, we bind the parameter

name to a full signature for the parameter structure obtained by instantiating the

parameter signature. This instantiation can serve as a formal representative of

all possible actual arguments because it embodies the minimal required sharing

among its tycon components. Any actual parameter will have to satisfy at least

as much sharing.

Now having bound the formal parameter symbol to the instantiation of the

parameter signature in the static environment, the body structure of the functor

is elaborated. This produces a full signature and a structure entity expression

for the functor body. A functor signature is created by combining the parameter

signature and the signature part of the body full signature. The functor’s entity

function is created by wrapping a lambda abstraction around the body’s struc-

ture entity expression, and closing it with respect to the entity environment in

which the functor is elaborated.

4 Discussion

The type information generated during elaboration in ML can grow quite large,

and experience with early, relatively naive versions of the elaborator demon-

strated that the size of static data structures can become a real resource bot-

tleneck. Although we do not have systematic experimental data comparing the

eﬃciency of the current implementation with simpler versions, it is certainly the

case that the current implementation has shown that it scales very well and

can easily cope with large and complex programs. Sharing signatures is conjec-

tured to be a considerable win. We believe that the factorization of modules into

signatures and realizations is a key part of the scalability of the SML/NJ’s elab-

orator. Although hash-consing of type information turned out to be necessary in

the FLINT intermediate language, this technique does not seem to be required

in the front end, and this is probably partly due to the sharing of signature

information.

5 Related Work

Although the literature developing module system semantics is rich, there are

few accounts of the implementation approaches and techniques. As far as the au-

thors know, this paper is one of the few besides Cr´egut and MacQueen [2], which

reported on an earlier implementation. In that implementation, the internal rep-

resentations and algorithms were considerably more baroque and less principled.

Before the implementation of the entity path and signature-realization factor-

ization, the compiler relied on comparison of stamp creation times to index into

several arrays containing the relevant static information. The former design was

fragile and insuﬃciently abstract. This new design is a clear advancement that

greatly simpliﬁes the implementation.

Most of the literature focuses on the ML module system. Both Haskell [3] and

Scheme’s [7] module systems are primarily concerned about namespace manage-

ment through explicit import and export syntax. Because Haskell and Scheme

have no equivalent of functors in the module language and in the case of Scheme

has no type components, they are not directly comparable to ML module sys-

tems. The several proposals addressing module system semantics and design

can be classiﬁed as falling under a continuum with the abstract approach on

one end and the operational approach on the other. The former, a term coined

by Shao [23], refers to the type-theoretic accounts in Harper-Lillibridge [9] and

Leroy [12]. The latter refers to the approach embodied in MacQueen-Tofte and

the Deﬁnition of Standard ML [17]. Several accounts [5, 11, 14, 22] follow the

abstract approach closely. Module systems from that pedigree generally do not

have internal representations of signatures distincty from syntactic signatures.

Type equivalence and generative types are generally modeled by a simple nom-

inal check and existential types respectively. Thus, they do not support true

higher-order functor semantics. The TILT [11], Moscow ML [22], and OCaml

compilers are implementations from this line of development.

The Deﬁnition [17] does not support higher-order functors. The semantic

objects in its treatment diﬀer from ours primarily in our use of entity environ-

ments and entity expressions. The Deﬁnition has a notion of type realizations,

which are maps from type names to tycons, and instantiation of both signatures

and functor signatures, producing a static environment and a pair of static en-

vironments with a set of ﬂexible names. In contrast to SML/NJ, the result of

functor instantiation is only an approximation of our functor realization – there

are no entity functions to express functor actions. Signatures in the Deﬁnition

explicitly name the volatile tycons, but there are no analogues of entity variables

associated with tycon, structure, and functor specs.

Other proposals fall somewhere in between the abstract and operational ap-

proaches. Biswas [1] and Shao [23] propose type-theoretic accounts that support

limited forms of higher-order functors. Both of these module systems can rep-

resent some functor actions (which they refer to as “argument-to-result depen-

dency”) in functor signatures. Biswas utilizes higher-order variables that have

about the same expressiveness as applicative functors in OCaml. A variant of

Biswas’s design is implemented in the Moscow ML compiler. Shao’s solution

uses a higher-order tycon that serves a similar role. Unlike Biswas and SML/NJ,

Shao’s account admits syntactic signatures that can express some functor actions

in terms of higher-order type constructor expressions.

More recent variations of the ML module system such as Dreyer’s RMC [4]

and MixML [6] express type abstraction using an existential type discipline fol-

lowing Mitchell and Plotkin [18] and Russo [22]. Signature matching is non-

coercive, though coercions are deﬁnable in the module language [5]. Montagu

and R´emy [19] develop a more modular form of the existential type calculus by

splitting open and pack into separate scoping and witness packing/unpacking

constructs to address the tension between modularity and existential-encoded

abstract types as pointed out by MacQueen [15]. None of these accounts handles

true higher-order functor semantics.

6 Conclusion

The module system implementation in SML/NJ has proven itself to be very

scalable [8, 21]. SML/NJ is self-hosting and compiles a wide range of Standard

ML programs. This is at least due in part to the robust module system imple-

mentation. The implementation is both practical and principled. The current

incarnation of SML/NJ’s implementation of higher-order modules is a marked

improvement over the previously reported version. The implementation is a

demonstration of how to design a lambda calculus-like language to eﬃciently

represent static functor actions. Some directions for future investigation include

the relationship between the static lambda calculus and the syntactic module

language, especially the signature language, and the implications of this language

for true separate compilation.

References

1. Sandip K. Biswas. Higher-order functors with transparent signatures. In POPL

’95: Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles

of programming languages, pages 154–163, New York, NY, USA, 1995. ACM.

2. P. Cr´egut and D. MacQueen. An implementation of higher-order functors. In

ACM SIGPLAN Workshop on Standard ML and its Applications, June 1994.

3. Iavor S. Diatchki, Mark P. Jones, and Thomas Hallgren. A formal speciﬁcation

of the haskell 98 module system. In Haskell ’02: Proceedings of the 2002 ACM

SIGPLAN workshop on Haskell, pages 17–28, New York, NY, USA, 2002. ACM.

4. Derek Dreyer. A type system for recursive modules. In ICFP ’07: Proceedings

of the 2007 ACM SIGPLAN international conference on Functional programming,

pages 289–302, New York, NY, USA, 2007. ACM.

5. Derek Dreyer, Karl Crary, and Robert Harper. A type system for higher-order

modules. In POPL ’03: Proceedings of the 30th ACM SIGPLAN-SIGACT sym-

posium on Principles of programming languages, pages 236–249, New York, NY,

USA, 2003. ACM.

6. Derek Dreyer and Andreas Rossberg. Mixin’ up the ML module system. In ICFP

’08: Proceeding of the 13th ACM SIGPLAN international conference on Functional

programming, pages 307–320, New York, NY, USA, 2008. ACM.

7. Matthew Flatt. Composable and compilable macros:: you want it when? In ICFP

’02: Proceedings of the seventh ACM SIGPLAN international conference on Func-

tional programming, pages 72–83, New York, NY, USA, 2002. ACM.

8. Ronald Garcia, Jaakko J¨arvi, Andrew Lumsdaine, Jeremy G. Siek, and Jeremiah

Willcock. An extended comparative study of language support for generic pro-

gramming. J. Funct. Program., 17(2):145–205, 2007.

9. Robert Harper and Mark Lillibridge. A type-theoretic approach to higher-order

modules with sharing. In POPL ’94: Proceedings of the 21st ACM SIGPLAN-

SIGACT symposium on Principles of programming languages, pages 123–137, New

York, NY, USA, 1994. ACM.

10. Robert Harper and Benjamin C. Pierce. Advanced Topics in Types and Program-

ming Languages, chapter Design Considerations for ML-Style Module Systems.

MIT Press, 2005.

11. Robert Harper and Chris Stone. An interpretation of Standard ML in type the-

ory. Technical Report CMU–CS–97–147, CMU, Pittsburgh, PA, June 1997. (Also

published as Fox Memorandum CMU–CS–FOX–97–01.).

12. Xavier Leroy. Manifest types, modules, and separate compilation. In POPL ’94:

Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of pro-

gramming languages, pages 109–122, New York, NY, USA, 1994. ACM.

13. Xavier Leroy. Applicative functors and fully transparent higher-order modules.

In POPL ’95: Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on

Principles of programming languages, pages 142–153, New York, NY, USA, 1995.

ACM.

14. Xavier Leroy. A modular module system. J. Funct. Program., 10(3):269–303, 2000.

15. David B. MacQueen. Using dependent types to express modular structure. In

POPL ’86: Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Prin-

ciples of programming languages, pages 277–286, New York, NY, USA, 1986. ACM.

16. David B. MacQueen and Mads Tofte. A semantics for higher-order functors. In

ESOP ’94: Proceedings of the 5th European Symposium on Programming, pages

409–423, London, UK, 1994. Springer-Verlag.

17. Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. The Deﬁnition

of Standard ML - Revised. The MIT Press, May 1997.

18. John C. Mitchell and Gordon D. Plotkin. Abstract types have existential types.

In POPL ’85: Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on

Principles of programming languages, pages 37–51, New York, NY, USA, 1985.

ACM.

19. Benoˆıt Montagu and Didier R´emy. Modeling abstract types in modules with open

existential types. In Proceedings of the 36th ACM Symposium on Principles of Pro-

gramming Languages (POPL’09), pages 63–74, Savannah, Georgia, USA, January

2009.

20. M. S. Paterson and M. N. Wegman. Linear uniﬁcation. In STOC ’76: Proceedings

of the eighth annual ACM symposium on Theory of computing, pages 181–186,

New York, NY, USA, 1976. ACM.

21. Norman Ramsey. Ml module mania: A type-safe, separately compiled, extensible

interpreter. Electr. Notes Theor. Comput. Sci., 148(2):181–209, 2006.

22. Claudio V. Russo. Types for Modules. PhD thesis, Edinburgh University, 1998.

23. Zhong Shao. Transparent modules with fully syntactic signatures. In ICFP ’99:

Proceedings of the fourth ACM SIGPLAN international conference on Functional

programming, pages 220–232, New York, NY, USA, 1999. ACM.