Standard ML


Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.
SML is a modern dialect of ML, the programming language used in the Logic for Computable Functions theorem-proving project. It is distinctive among widely used languages in that it has a formal specification, given as typing rules and operational semantics in The Definition of Standard ML.

Language

Standard ML is a functional programming language with some impure features. Programs written in Standard ML consist of expressions to be evaluated, as opposed to statements or commands, although some expressions return a trivial "unit" value and are only evaluated for their side-effects.
Like all functional programming languages, a key feature of Standard ML is the function, which is used for abstraction. For instance, the factorial function can be expressed as:
fun factorial n =
if n = 0 then 1 else n * factorial

A Standard ML compiler is required to infer the static type int -> int of this function without user-supplied type annotations. I.e., it has to deduce that n is only used with integer expressions, and must therefore itself be an integer, and that all value-producing expressions within the function return integers.
The same function can be expressed with clausal function definitions where the if-then-else conditional is replaced by a sequence of templates of the factorial function evaluated for specific values, separated by '|', which are tried one by one in the order written until a match is found:
fun factorial 0 = 1
| factorial n = n * factorial

This can be rewritten using a case statement like this:
val rec factorial =
fn n => case n of 0 => 1
| n => n * factorial

or in iterative:

fun factorialIT value =
let
val flag = ref value and i = ref 1
in
while !flag <> 0 do ;
!i
end;

or as a lambda function:
val rec factorial = fn 0 => 1 | n => n * factorial

Here, the keyword val introduces a binding of an identifier to a value, fn introduces the definition of an anonymous function, and case introduces a sequence of patterns and corresponding expressions.
Using a local function, this function can be rewritten in a more efficient tail recursive style.
fun factorial n = let
fun lp = acc
| lp = lp
in
lp
end

The encapsulation of an invariant-preserving tail-recursive tight loop with one or more accumulator parameters inside an invariant-free outer function, as seen here, is a common idiom in Standard ML, and appears with great frequency in SML code.

Type synonyms

A type synonym is defined with the type keyword. Here is a type synonym for points in the plane, and functions computing the distances between two points, and the area of a triangle with the given corners as per Heron's formula..
type loc = real * real
fun dist, ) = let
val dx = x1 - x0
val dy = y1 - y0
in
Math.sqrt
end
fun heron = let
val ab = dist
val bc = dist
val ac = dist
val perim = ab + bc + ac
val s = perim / 2.0
in
Math.sqrt * * )
end

Algebraic datatypes and pattern matching

Standard ML provides strong support for algebraic datatypes. An ML datatype can be thought of as a disjoint union of tuples. They are easy to define and easy to program with, in large part because of Standard ML's pattern matching as well as most Standard ML implementations' pattern exhaustiveness checking and pattern redundancy checking.
A datatype is defined with the datatype keyword, as in
datatype shape
= Circle of loc * real
| Square of loc * real
| Triangle of loc * loc * loc

Note: datatypes, not type synonyms, are necessary to define recursive constructors.
Order matters in pattern matching: patterns that are textually first are tried first. Pattern matching can be syntactically embedded in function definitions as follows:
fun area = 3.14 * r * r
| area = s * s
| area = heron

Note that subcomponents whose values are not needed in a particular computation are elided with underscores, or so-called wildcard patterns.
The so-called "clausal form" style function definition, where patterns appear immediately after the function name, is merely syntactic sugar for
fun area shape =
case shape
of Circle => 3.14 * r * r
| Square => s * s
| Triangle => heron

Pattern exhaustiveness checking will make sure each case of the datatype has been accounted for, and will produce a warning if not. The following pattern is inexhaustive:
fun center = c
| center ) =

There is no pattern for the Triangle case in the center function. The compiler will issue a warning that the pattern is inexhaustive, and if, at runtime, a Triangle is passed to this function, the exception Match will be raised.
The set of clauses in the following function definition is exhaustive and not redundant:
fun hasCorners = false
| hasCorners _ = true

If control gets past the first pattern, we know the value must be either a Square or a Triangle. In either of those cases, we know the shape has corners, so we can return true without discriminating which case we are in.
The pattern in the second clause of the following function is redundant:
fun f ) = x+y
| f = 1.0
| f _ = 0.0

Any value that matches the pattern in the second clause will also match the pattern in the first clause, so the second clause is unreachable. Therefore, this definition as a whole exhibits redundancy, and causes a compile-time warning.
C programmers can use tagged unions, dispatching on tag values, to accomplish what ML accomplishes with datatypes and pattern matching. Nevertheless, while a C program decorated with appropriate checks will be in a sense as robust as the corresponding ML program, those checks will of necessity be dynamic; ML provides a set of static checks that give the programmer a high degree of confidence in the correctness of the program at compile time.
Note that in object-oriented programming languages, such as Java, a disjoint union can be expressed by designing class hierarchies. However, as opposed to class hierarchies, ADTs are closed. This makes ADT extensible in a way that is orthogonal to the extensibility of class hierarchies. Class hierarchies can be extended with new subclasses but no new methods, while ADTs can be extended to provide new behavior for all existing constructors, but do not allow defining new constructors.

Higher-order functions

Functions can consume functions as arguments:
fun applyToBoth f x y =

Functions can produce functions as return values:
fun constantFn k = let
fun const anything = k
in
const
end

fun constantFn k =

Functions can also both consume and produce functions:
fun compose = let
fun h x = f
in
h
end

fun compose =

The function List.map from the basis library is one of the most commonly used higher-order functions in Standard ML:
fun map _ =
| map f = f x :: map f xs

fun map f xs = let
fun m = List.rev acc
| m = m
in
m
end

Exceptions

Exceptions are raised with the raise keyword, and handled with pattern matching handle constructs.
exception Undefined
fun max = x
| max = let val m = max xs in if x > m then x else m end
| max = raise Undefined
fun main xs = let
val msg = handle Undefined => "empty list...there is no max!"
in
print
end

The exception system can be exploited to implement non-local exit, an optimization technique suitable for functions like the following.
exception Zero
fun listProd ns = let
fun p = 1
| p = raise Zero
| p = h * p t
in
handle Zero => 0
end

When the exception Zero is raised in the 0 case, control leaves the function p altogether. Consider the alternative: the value 0 would be returned to the most recent awaiting frame, it would be multiplied by the local value of h, the resulting value would be returned in turn to the next awaiting frame, and so on. The raising of the exception allows control to leapfrog directly over the entire chain of frames and avoid the associated computation. It has to be noted that the same optimization could have been obtained by using a tail recursion for this example.

Module system

Standard ML has an advanced module system, allowing programs to be decomposed into hierarchically organized structures of logically related type and value declarations. SML modules provide not only namespace control but also abstraction, in the sense that they allow programmers to define abstract data types.
Three main syntactic constructs comprise the SML module system: structures, signatures and functors. A structure is a module; it consists of a collection of types, exceptions, values and structures packaged together into a logical unit. A signature is an interface, usually thought of as a type for a structure: it specifies the names of all the entities provided by the structure as well as the arities of type components, the types of value components, and signatures for substructures. The definitions of type components may or may not be exported; type components whose definitions are hidden are abstract types. Finally, a functor is a function from structures to structures; that is, a functor accepts one or more arguments, which are usually structures of a given signature, and produces a structure as its result. Functors are used to implement generic data structures and algorithms.
For example, the signature for a queue data structure might be:
signature QUEUE =
sig
type 'a queue
exception QueueError
val empty : 'a queue
val isEmpty : 'a queue -> bool
val singleton : 'a -> 'a queue
val insert : 'a * 'a queue -> 'a queue
val peek : 'a queue -> 'a
val remove : 'a queue -> 'a * 'a queue
end

This signature describes a module that provides a parameterized type queue of queues, an exception called QueueError, and six values providing the basic operations on queues. One can now implement the queue data structure by writing a structure with this signature:
structure TwoListQueue :> QUEUE =
struct
type 'a queue = 'a list * 'a list
exception QueueError
val empty =
fun isEmpty = true
| isEmpty _ = false

fun singleton a =
fun insert =
| insert =

fun peek = raise QueueError
| peek = a

fun remove = raise QueueError
| remove =
| remove =

end

This definition declares that TwoListQueue is an implementation of the QUEUE signature. Furthermore, the opaque ascription states that any type components whose definitions are not provided in the signature should be treated as abstract, meaning that the definition of a queue as a pair of lists is not visible outside the module. The body of the structure provides bindings for all of the components listed in the signature.
To use a structure, one can access its type and value members using "dot notation". For instance, a queue of strings would have type string TwoListQueue.queue, the empty queue is TwoListQueue.empty, and to remove the first element from a queue called q one would write TwoListQueue.remove q.
One popular algorithm for breadth-first search of trees makes uses of queues. Here we present a version of that algorithm parameterized over an abstract queue structure:
functor BFS =
struct
datatype 'a tree
= E
| T of 'a * 'a tree * 'a tree
fun bfsQ : 'a list =
if Q.isEmpty q then
else let
val = Q.remove q
in case t
of E => bfsQ q'
| T => let
val q = Q.insert
in
x :: bfsQ q

end
end
fun bfs t = bfsQ
end

Please note that inside the BFS structure, the program has no access to the particular queue representation in play. More concretely, there is no way for the program to, say, select the first list in the two-list queue representation, if that is indeed the representation being used. This data abstraction mechanism makes the breadth-first code truly agnostic to the queue representation choice.
This is in general desirable; in the present case, the queue structure can safely maintain any of the various logical invariants on which its correctness depends behind the bulletproof wall of abstraction.

Libraries

Standard

The SML Basis Library has been standardized and ships with most implementations. It provides modules for trees, arrays and other data structures as well as input/output and system interfaces.

Third party

For numerical computing, a Matrix module exists, https://www.cs.cmu.edu/afs/cs/project/pscico/pscico/src/matrix/README.html.
For graphics, cairo-sml is an open source interface to the Cairo graphics library.
For machine learning, a library for graphical models exists.

Code examples

Snippets of SML code are most easily studied by entering them into a "top-level", also known as a read-eval-print loop or REPL. This is an interactive session that prints the inferred types of resulting or defined expressions. Many SML implementations provide an interactive REPL, including SML/NJ:
$ sml
Standard ML of New Jersey v110.52
-
Code can then be entered at the "-" prompt. For example, to calculate 1+2*3:

- 1 + 2 * 3;
val it = 7 : int

The top-level infers the type of the expression to be "int" and gives the result "7".

Hello world

The following program, "hello.sml":

print "Hello world!\n";

can be compiled with MLton:
$ mlton hello.sml
and executed:
$./hello
Hello world!

Insertion sort

Insertion sort for lists of integers is expressed concisely as follows:

fun ins =
| ins = if then n::ns else h::
val insertionSort = List.foldr ins

This can be made polymorphic by abstracting over the ordering operator. Here we use the symbolic name << for that operator.

fun ins' << = let
fun i =
| i = if << then n::ns else h::i
in
i
end
fun insertionSort' << = List.foldr

The type of insertionSort' is -> -> .
An example call is:

- insertionSort' op< ;
val it = : int list

Mergesort

Here, the classic mergesort algorithm is implemented in three functions: split, merge and mergesort.
The function split is implemented with a local function named loop, which has two additional parameters. The local function loop is written in a tail-recursive style; as such it can be compiled efficiently. This function makes use of SML's pattern matching syntax to differentiate between non-empty list and empty list cases. For stability, the input list ns is reversed before being passed to loop.

local
fun loop = loop
| loop =
| loop =
in
fun split ns = loop
end

The local-in-end syntax could be replaced with a let-in-end syntax, yielding the equivalent definition:
fun split ns = let
fun loop = loop
| loop =
| loop =
in
loop
end

As with split, merge also uses a local function loop for efficiency. The inner loop is defined in terms of cases: when two non-empty lists are passed, when one non-empty list is passed, and when two empty lists are passed. Note the use of the underscore as a wildcard pattern.
This function merges two "ascending" lists into one ascending list. Note how the accumulator out is built "backwards", then reversed with List.rev before being returned. This is a common technique—build a list backwards, then reverse it before returning it. In SML, lists are represented as a linked list of conses, and thus it is efficient to prepend an element to a list, but inefficient to append an element to a list. The extra pass over the list is a linear time operation, so while this technique requires more wall clock time, the asymptotics are not any worse.

fun merge lt = let
fun loop =
if lt then loop
else loop
| loop = loop
| loop = loop
| loop = List.rev out
in
loop
end

The main function.

fun mergesort lt xs = let
val merge' = merge lt
fun ms =
| ms =
| ms xs = let
val = split xs
in
merge'
end
in
ms xs
end

Also note that the code makes no mention of variable types, with the exception of the :: and syntax which signify lists. This code will sort lists of any type, so long as a consistent ordering function lt can be defined. Using Hindley–Milner type inference, the compiler is capable of inferring the types of all variables, even complicated types such as that of the lt function.

Quicksort

Quicksort can be expressed as follows. This generic quicksort consumes an order operator <<.

fun quicksort << xs = let
fun qs =
| qs =
| qs = let
val = List.partition xs
in
qs less @ p :: qs more
end
in
qs xs
end

Writing a language interpreter

Note the relative ease with which a small expression language is defined and processed.

exception Err
datatype ty
= IntTy
| BoolTy
datatype exp
= True
| False
| Int of int
| Not of exp
| Add of exp * exp
| If of exp * exp * exp
fun typeOf = BoolTy
| typeOf = BoolTy
| typeOf = IntTy
| typeOf = if typeOf e = BoolTy then BoolTy else raise Err
| typeOf =
if andalso then IntTy else raise Err
| typeOf =
if typeOf e1 <> BoolTy then raise Err
else if typeOf e2 <> typeOf e3 then raise Err
else typeOf e2

fun eval = True
| eval = False
| eval = Int n
| eval =

| eval = let
val = eval e1
val = eval e2
in
Int
end
| eval =
if eval e1 = True then eval e2 else eval e3
fun chkEval e =

Example usage on correctly typed and incorrectly typed examples:

- val e1 = Add, Int);
val e1 = Add : exp
- chkEval e1;
val it = Int 3 : exp
- val e2 = Add;
val e2 = Add : exp
- chkEval e2;
uncaught exception Err

Arbitrary-precision factorial function (libraries)

In SML, the IntInf module provides arbitrary-precision integer arithmetic. Moreover, integer literals may be used as arbitrary-precision integers without the programmer having to do anything.
The following program "fact.sml" implements an arbitrary-precision factorial function and prints the factorial of 120:
fun fact n : IntInf.int =
if n=0 then 1 else n * fact
val =
print

and can be compiled and run with:

$ mlton fact.sml
$./fact
66895029134491270575881180540903725867527463331380298102956713523016335
57244962989366874165271984981308157637893214090552534408589408121859898
481114389650005964960521256960000000000000000000000000000

Numerical derivative (higher-order functions)

Since SML is a functional programming language, it is easy to create and pass around functions in SML programs. This capability has an enormous number of applications. Calculating the numerical derivative of a function is one such application. The following SML function "d" computes the numerical derivative of a given function "f" at a given point "x":

- fun d delta f x =
- f ) / ;
val d = fn : real -> -> real -> real

This function requires a small value "delta". A good choice for delta when using this algorithm is the cube root of the machine epsilon.
The type of the function "d" indicates that it maps a "float" onto another function with the type " -> real -> real". This allows us to partially apply arguments. This functional style is known as currying. In this case, it is useful to partially apply the first argument "delta" to "d", to obtain a more specialised function:

- val d = d 1E~8;
val d = fn : -> real -> real

Note that the inferred type indicates that the replacement "d" is expecting a function with the type "real -> real" as its first argument. We can compute a numerical approximation to the derivative of at with:

- d 3.0;
val it = 25.9999996644 : real

The correct answer is ;.
The function "d" is called a "higher-order function" because it accepts another function as an argument.
Curried and higher-order functions can be used to eliminate redundant code. For example, a library may require functions of type a -> b, but it is more convenient to write functions of type a * c -> b where there is a fixed relationship between the objects of type a and c. A higher order function of type -> can factor out this commonality. This is an example of the adapter pattern.

Discrete wavelet transform (pattern matching)

The 1D Haar wavelet transform of an integer-power-of-two-length list of numbers can be implemented very succinctly in SML and is an excellent example of the use of pattern matching over lists, taking pairs of elements off the front and storing their sums and differences on the lists "s" and "d", respectively:

- fun haar l = let
fun aux d = s :: d
| aux s d = aux s d
| aux s d = aux t
| aux _ _ _ = raise Empty
in
aux l
end;
val haar = fn : int list -> int list

For example:

- haar ;
val it = : int list

Pattern matching is a useful construct that allows complicated transformations to be represented clearly and succinctly. Moreover, SML compilers turn pattern matches into efficient code, resulting in programs that are not only shorter but also faster.

Implementations

Many SML implementations exist, including:
All of these implementations are open-source and freely available. Most are implemented themselves in SML. There are no longer any commercial SML implementations. Harlequin once produced a commercial IDE and compiler for SML called MLWorks. The company is now defunct. MLWorks passed on to Xanalys and was later acquired by Ravenbrook Limited on 2013-04-26 and open sourced.

Major projects using SML

The IT University of Copenhagen's entire enterprise architecture is implemented in around 100,000 lines of SML, including staff records, payroll, course administration and feedback, student project management, and web-based self-service interfaces.
The Isabelle proof assistant is written in SML.
SML is widely used in-house by compiler and chip designers, for example by ARM.