Posts Tagged: fsharp


28
Dec 15

A safer way to use F# with SQL CLR

Every blog post I’ve read about using F# with SQL CLR involves turning off SQL CLR security for the entire database like so:

alter database mydatabase set trustworthy on

This is because there is not currently a “safe” FSharp.Core, and so even when building with –standalone you must load the assemblies in to SQL as “unsafe”.

In our business we deal with real live bank data, and I would never be able to sell this to our ops team. Fortunately, there is a way to allow unsafe assemblies on a per-assembly basis as I found out here. It’s a little more complex, but not awful.

I’m going to gloss over some of the wiring things up code here, as it can be found in many other places.

1) As with any SQL CLR, you need to turn it on before you can use it


SP_CONFIGURE 'clr enabled', 1
GO
RECONFIGURE
GO

2) You must sign your assemblies to-be-loaded with an asymmetric key pair.

In F# you do this by adding an attribute to your assembly. I like to put these kinds of things in my AssemblyInfo.fs. You can find out more information on how to create a public-private key pair here. I also recommend compiling with the F# –standalone flag, so as to avoid having to pull in FSharp.Core as well.


[<assembly:AssemblyKeyFileAttribute("MyKeyPair.snk")>]
do ()

3) Make sure you’re in the Master database.


USE [Master]
GO

4) Pull in that asymmetric key into SQL Server and give it a name.


CREATE ASYMMETRIC KEY FSHARP_CLR_Key
FROM EXECUTABLE FILE = 'C:\MyProj\bin\Release\FSHARPCLR.dll'
GO

5) Create a login linked to this asymmetric key. You also need to give this user external assembly access.


CREATE LOGIN FSHARP_CLR_Login 
FROM ASYMMETRIC KEY FSHARP_CLR_Key

GRANT EXTERNAL ACCESS ASSEMBLY TO FSHARP_CLR_Login

GO

6) Move to the database where you’ll be deploying your assembly.


USE [Resources]
GO

7) Make a database user to run under.


CREATE USER FSHARP_CLR_Login 
FOR LOGIN FSHARP_CLR_Login
GO

8) Pull your assembly into SQL!

It’s still unsafe, but in this case it’s special dispensation for a single assembly with controllable permissions instead of whole-hog access.


CREATE ASSEMBLY FSHARP_CLR 
FROM 'C:\MyProj\bin\Release\FSHARPCLR.dll'
WITH PERMISSION_SET=UNSAFE
GO

9) Wire up the API with CREATE FUNCTION or CREATE PROCEDURE calls as you would normally.


CREATE FUNCTION [dbo].[StringDistance]
   (@name1 [nvarchar](64),
    @name2 [nvarchar](64))
RETURNS float
AS
EXTERNAL NAME [FSHARP_CLR].[Namespace.ClassName]
              .[CalcStringDistance]
GO

10) And now we can call it easily in SQL!


SELECT dbo.StringDistance('Richard', 'Rick')

That’s it!

This code was tested in SQL Server 2008r2. Please let me know if there are any future incompatibilities.


7
Dec 14

Developing an Algorithm in F#: Fast Rotational Alignments with Gosper’s Hack

This post is for the 7th day of the 2014 F# Advent Calendar.

It has been said that functional languages can’t be as fast as their imperative cousins because of all of the allocation and garbage collection, this is patently false (as far as F# is concerned at least) largely because the CLR has value types.

Recently Louis Thrall and I ran into an interesting problem at work: how do you efficiently compare all rotations of one array to another to select the best? This is easy when the arrays are the same size, just shift the indices around:

Same Size Rotations

Which is equivalent to a nested for loop, the top one rotating and the inner doing comparisons.

But what about differently sized arrays? It turns out this is more complicated, even for comparing a size 2 array to size 3:

mixedrotations

Like comparing beads on two strings, there is much shifting as the indices are rotated around. The extra freedom from this shifting means that more comparisons must be made even though there total number of combined elements is smaller.

This problem also has an interesting structure, as it costs the most when one array is about half the size of the other. When the smaller array is size 1 then it simply needs to be moved around, and when the arrays are equal in size it’s a simple matter of rotations.

After some research Lou noted that this we were “counting the number of cyclic suborders of size k of a cyclic order of size n” which produces k(n choose k) different configurations.

knplot

Now that we had bounds and knew what we were doing was reasonable we had to figure out an algorithm for this process. It occurred to us that it would be ideal to pre-compute the comparison values to avoid repeated comparisons. Given this matrix of comparison values it wasn’t too difficult to determine a procedure for checking all possible configurations.

matrixpaths

Each time a pair is chosen, it further constrains what the rest of the assignments may be. The bottom level is somewhat obvious, just select each of the possible starting combinations down the y-axis. In the next column you potentially have a range of possible “downward” (using clock arithmetic) selections bounded by the difference in lengths and what has already been fixed.

Now it was time to implement, and the first stab was nice and concise and functional and worked pretty well.

 1: open System
 2: 
 3: // arr1 is always equal in size or smaller than arr2
 4: let bestRot (f: 'a -> 'a -> float) (arr1: 'a []) (arr2: 'a []) =
 5:     // Pre-calculate similarities
 6:     let scores = 
 7:         Array2D.init arr1.Length arr2.Length 
 8:                     (fun i j -> f arr1.[i] arr2.[j])
 9:     // inner function for recursively finding paths
10:     // col = previous column, prow = previous row
11:     // df = degrees of freedom, path = path accumulator
12:     let rec inner col prow df path =
13:         seq {
14:             // when we're out of columns to assign we're done
15:             if col >= arr1.Length then yield path |> List.rev
16:             else
17:                 // We have df "degrees of freedom" left
18:                 for d = 1 to df + 1 do 
19:                     // Clock arithmetic for the row
20:                     let nrow = (prow + d) % arr2.Length
21:                     // Recurse yielding out all further paths
22:                     yield! inner (col + 1) nrow (df + 1 - d) 
23:                                  ((col, nrow) :: path)           
24:         }
25:     let res, score =
26:         // the difference in length 
27:         // is the starting "degrees of freedom"
28:         let diff = arr2.Length - arr1.Length
29:         seq {
30:             // for each y-axis starting point
31:             for y = 0 to arr2.Length - 1 do
32:                 // 1 selected, r is the starting point (on y), 
33:                 //    starting is always 0 on x. 
34:                 // ((0, y) :: []) is the accumulator 
35:                 //    with the initial (0, y) coordinates
36:                 yield! inner 1 y diff ((0, y) :: [])
37:         } 
38:         // Sum each path to find total similarity
39:         |> Seq.map 
40:             (fun l -> l, l |> Seq.sumBy (fun (i,j) -> scores.[i,j]))
41:         // Get the path with the highest similarity
42:         |> Seq.maxBy snd
43:     // Create output array and copy in the results
44:     let out = Array.create arr1.Length Int32.MinValue
45:     for (i,j) in res do
46:         out.[i] <- j
47:     // Return results
48:     out, score
namespace System
val bestRot : ('a -> 'a -> float) -> 'a [] -> 'a [] -> int [] * float

Full name: Snippet.bestRot

val f : ('a -> 'a -> float)
Multiple items
val float : 'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

——————–
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

  type: float<'Measure>
  implements: IComparable
  implements: IConvertible
  implements: IFormattable
  implements: IComparable<float<'Measure>>
  implements: IEquatable<float<'Measure>>
  inherits: ValueType

——————–
type float = Double

Full name: Microsoft.FSharp.Core.float

  type: float
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<float>
  implements: IEquatable<float>
  inherits: ValueType

val arr1 : 'a []

  type: 'a []
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  implements: Collections.Generic.IList<'a>
  implements: Collections.Generic.ICollection<'a>
  implements: seq<'a>
  implements: Collections.IEnumerable
  inherits: Array

val arr2 : 'a []

  type: 'a []
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  implements: Collections.Generic.IList<'a>
  implements: Collections.Generic.ICollection<'a>
  implements: seq<'a>
  implements: Collections.IEnumerable
  inherits: Array

val scores : float [,]

  type: float [,]
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IEnumerable
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  inherits: Array

module Array2D

from Microsoft.FSharp.Collections

val init : int -> int -> (int -> int -> 'T) -> 'T [,]

Full name: Microsoft.FSharp.Collections.Array2D.init

property Array.Length: int
val i : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val j : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val inner : (int -> int -> int -> (int * int) list -> seq<(int * int) list>)
val col : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val prow : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val df : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val path : (int * int) list

  type: (int * int) list
  implements: Collections.IStructuralEquatable
  implements: IComparable<List<int * int>>
  implements: IComparable
  implements: Collections.IStructuralComparable
  implements: Collections.Generic.IEnumerable<int * int>
  implements: Collections.IEnumerable

Multiple items
val seq : seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

——————–
type seq<'T> = Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>

  type: seq<'T>
  inherits: Collections.IEnumerable

Multiple items
module List

from Microsoft.FSharp.Collections

——————–
type List<'T> =
  | ( [] )
  | ( :: ) of 'T * 'T list
  with
    interface Collections.IEnumerable
    interface Collections.Generic.IEnumerable<'T>
    member Head : 'T
    member IsEmpty : bool
    member Item : index:int -> 'T with get
    member Length : int
    member Tail : 'T list
    static member Cons : head:'T * tail:'T list -> 'T list
    static member Empty : 'T list
  end

Full name: Microsoft.FSharp.Collections.List<_>

  type: List<'T>
  implements: Collections.IStructuralEquatable
  implements: IComparable<List<'T>>
  implements: IComparable
  implements: Collections.IStructuralComparable
  implements: Collections.Generic.IEnumerable<'T>
  implements: Collections.IEnumerable

val rev : 'T list -> 'T list

Full name: Microsoft.FSharp.Collections.List.rev

val d : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val nrow : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val res : (int * int) list

  type: (int * int) list
  implements: Collections.IStructuralEquatable
  implements: IComparable<List<int * int>>
  implements: IComparable
  implements: Collections.IStructuralComparable
  implements: Collections.Generic.IEnumerable<int * int>
  implements: Collections.IEnumerable

val score : float

  type: float
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<float>
  implements: IEquatable<float>
  inherits: ValueType

val diff : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val y : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

module Seq

from Microsoft.FSharp.Collections

val map : ('T -> 'U) -> seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map

val l : (int * int) list

  type: (int * int) list
  implements: Collections.IStructuralEquatable
  implements: IComparable<List<int * int>>
  implements: IComparable
  implements: Collections.IStructuralComparable
  implements: Collections.Generic.IEnumerable<int * int>
  implements: Collections.IEnumerable

val sumBy : ('T -> 'U) -> seq<'T> -> 'U (requires member ( + ) and member get_Zero)

Full name: Microsoft.FSharp.Collections.Seq.sumBy

val maxBy : ('T -> 'U) -> seq<'T> -> 'T (requires comparison)

Full name: Microsoft.FSharp.Collections.Seq.maxBy

val snd : ('T1 * 'T2) -> 'T2

Full name: Microsoft.FSharp.Core.Operators.snd

val out : int []

  type: int []
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  implements: Collections.Generic.IList<int>
  implements: Collections.Generic.ICollection<int>
  implements: seq<int>
  implements: Collections.IEnumerable
  inherits: Array

type Array =
  class
    member Clone : unit -> obj
    member CopyTo : System.Array * int -> unit
    member CopyTo : System.Array * int64 -> unit
    member GetEnumerator : unit -> System.Collections.IEnumerator
    member GetLength : int -> int
    member GetLongLength : int -> int64
    member GetLowerBound : int -> int
    member GetUpperBound : int -> int
    member GetValue : int [] -> obj
    member GetValue : int -> obj
    member GetValue : int64 -> obj
    member GetValue : int64 [] -> obj
    member GetValue : int * int -> obj
    member GetValue : int64 * int64 -> obj
    member GetValue : int * int * int -> obj
    member GetValue : int64 * int64 * int64 -> obj
    member Initialize : unit -> unit
    member IsFixedSize : bool
    member IsReadOnly : bool
    member IsSynchronized : bool
    member Length : int
    member LongLength : int64
    member Rank : int
    member SetValue : obj * int -> unit
    member SetValue : obj * int [] -> unit
    member SetValue : obj * int64 -> unit
    member SetValue : obj * int64 [] -> unit
    member SetValue : obj * int * int -> unit
    member SetValue : obj * int64 * int64 -> unit
    member SetValue : obj * int * int * int -> unit
    member SetValue : obj * int64 * int64 * int64 -> unit
    member SyncRoot : obj
    static member AsReadOnly<'T> : 'T [] -> System.Collections.ObjectModel.ReadOnlyCollection<'T>
    static member BinarySearch : System.Array * obj -> int
    static member BinarySearch<'T> : 'T [] * 'T -> int
    static member BinarySearch : System.Array * obj * System.Collections.IComparer -> int
    static member BinarySearch<'T> : 'T [] * 'T * System.Collections.Generic.IComparer<'T> -> int
    static member BinarySearch : System.Array * int * int * obj -> int
    static member BinarySearch<'T> : 'T [] * int * int * 'T -> int
    static member BinarySearch : System.Array * int * int * obj * System.Collections.IComparer -> int
    static member BinarySearch<'T> : 'T [] * int * int * 'T * System.Collections.Generic.IComparer<'T> -> int
    static member Clear : System.Array * int * int -> unit
    static member ConstrainedCopy : System.Array * int * System.Array * int * int -> unit
    static member ConvertAll<'TInput,'TOutput> : 'TInput [] * System.Converter<'TInput,'TOutput> -> 'TOutput []
    static member Copy : System.Array * System.Array * int -> unit
    static member Copy : System.Array * System.Array * int64 -> unit
    static member Copy : System.Array * int * System.Array * int * int -> unit
    static member Copy : System.Array * int64 * System.Array * int64 * int64 -> unit
    static member CreateInstance : System.Type * int -> System.Array
    static member CreateInstance : System.Type * int [] -> System.Array
    static member CreateInstance : System.Type * int64 [] -> System.Array
    static member CreateInstance : System.Type * int * int -> System.Array
    static member CreateInstance : System.Type * int [] * int [] -> System.Array
    static member CreateInstance : System.Type * int * int * int -> System.Array
    static member Exists<'T> : 'T [] * System.Predicate<'T> -> bool
    static member Find<'T> : 'T [] * System.Predicate<'T> -> 'T
    static member FindAll<'T> : 'T [] * System.Predicate<'T> -> 'T []
    static member FindIndex<'T> : 'T [] * System.Predicate<'T> -> int
    static member FindIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
    static member FindIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
    static member FindLast<'T> : 'T [] * System.Predicate<'T> -> 'T
    static member FindLastIndex<'T> : 'T [] * System.Predicate<'T> -> int
    static member FindLastIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
    static member FindLastIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
    static member ForEach<'T> : 'T [] * System.Action<'T> -> unit
    static member IndexOf : System.Array * obj -> int
    static member IndexOf<'T> : 'T [] * 'T -> int
    static member IndexOf : System.Array * obj * int -> int
    static member IndexOf<'T> : 'T [] * 'T * int -> int
    static member IndexOf : System.Array * obj * int * int -> int
    static member IndexOf<'T> : 'T [] * 'T * int * int -> int
    static member LastIndexOf : System.Array * obj -> int
    static member LastIndexOf<'T> : 'T [] * 'T -> int
    static member LastIndexOf : System.Array * obj * int -> int
    static member LastIndexOf<'T> : 'T [] * 'T * int -> int
    static member LastIndexOf : System.Array * obj * int * int -> int
    static member LastIndexOf<'T> : 'T [] * 'T * int * int -> int
    static member Resize<'T> : 'T [] * int -> unit
    static member Reverse : System.Array -> unit
    static member Reverse : System.Array * int * int -> unit
    static member Sort : System.Array -> unit
    static member Sort<'T> : 'T [] -> unit
    static member Sort : System.Array * System.Array -> unit
    static member Sort : System.Array * System.Collections.IComparer -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] -> unit
    static member Sort<'T> : 'T [] * System.Collections.Generic.IComparer<'T> -> unit
    static member Sort<'T> : 'T [] * System.Comparison<'T> -> unit
    static member Sort : System.Array * int * int -> unit
    static member Sort : System.Array * System.Array * System.Collections.IComparer -> unit
    static member Sort<'T> : 'T [] * int * int -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * System.Collections.Generic.IComparer<'TKey> -> unit
    static member Sort : System.Array * System.Array * int * int -> unit
    static member Sort : System.Array * int * int * System.Collections.IComparer -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int -> unit
    static member Sort<'T> : 'T [] * int * int * System.Collections.Generic.IComparer<'T> -> unit
    static member Sort : System.Array * System.Array * int * int * System.Collections.IComparer -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int * System.Collections.Generic.IComparer<'TKey> -> unit
    static member TrueForAll<'T> : 'T [] * System.Predicate<'T> -> bool
  end

Full name: System.Array

  type: Array
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IEnumerable
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable

val create : int -> 'T -> 'T []

Full name: Microsoft.FSharp.Collections.Array.create

type Int32 =
  struct
    member CompareTo : obj -> int
    member CompareTo : int -> int
    member Equals : obj -> bool
    member Equals : int -> bool
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> System.TypeCode
    member ToString : unit -> string
    member ToString : string -> string
    member ToString : System.IFormatProvider -> string
    member ToString : string * System.IFormatProvider -> string
    static val MaxValue : int
    static val MinValue : int
    static member Parse : string -> int
    static member Parse : string * System.Globalization.NumberStyles -> int
    static member Parse : string * System.IFormatProvider -> int
    static member Parse : string * System.Globalization.NumberStyles * System.IFormatProvider -> int
    static member TryParse : string * int -> bool
    static member TryParse : string * System.Globalization.NumberStyles * System.IFormatProvider * int -> bool
  end

Full name: System.Int32

  type: Int32
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

field Int32.MinValue = -2147483648

However, after real world trials we found that it was taking an astonishing 20% of our runtime. We had to find a better way. We soon found Gosper’s hack which did mostly what we wanted. It gave us the combinations, and we found we could just rotate each of those around to give us all of our cyclic suborders.

And so we went about rewriting our nice functional sample to use Gosper’s Hack to avoid the need to generate paths and also to be (nearly) minimally allocating to avoid garbage collection overhead.

 1: open System
 2: 
 3: // arr1 is always equal in size or smaller than arr2
 4: let bestRotGos (f: 'a -> 'a -> float) (arr1: 'a []) (arr2: 'a []) =
 5:     // Start Gosper's Hack Machinery Bootstrap
 6:     let bitsConfig = Array.create arr1.Length Int32.MinValue
 7:    
 8:     let inline setBits v =
 9:         let inline getBit x i = x &&& (1 <<< i) <> 0
10:         let mutable idx = 0
11:         for i = 0 to 31 do
12:             if getBit v i then
13:                 bitsConfig.[idx] <- i
14:                 idx <- idx + 1
15:  
16:     let inline nextGosper set =
17:         let c = set &&& -set
18:         let r = set + c
19:         r + (((r^^^set)/c)>>>2)
20:         
21:     let limit =
22:         let n = arr2.Length
23:         (1 <<< n)
24:  
25:     let mutable set =
26:         let k = arr1.Length
27:         (1 <<< k) - 1    
28:     // End Gosper's Hack Machinery Bootstrap
29: 
30:     // Pre-calculate sims and init placeholder variables
31:     let scores = Array2D.init arr1.Length arr2.Length 
32:                     (fun i j -> f arr1.[i] arr2.[j])    
33:                         
34:     let mutable maxScore = Double.NegativeInfinity
35:     let bestConfig = Array.create arr1.Length Int32.MinValue
36: 
37:     while (set < limit) do
38:         setBits set // Turn "set" bits into indices in "bitsConfig"
39: 
40:         // For each rotation
41:         for i = 0 to bitsConfig.Length - 1 do
42: 
43:             // calculate score and compare with previous best
44:             let mutable totalScore = 0.0
45:             for i = 0 to bitsConfig.Length - 1 do
46:                 let tokenScore = scores.[i,bitsConfig.[i]]
47:                 totalScore <- totalScore + tokenScore
48:             if totalScore > maxScore then
49:                 // and replace if better
50:                 maxScore <- totalScore
51:                 Array.blit bitsConfig 0 
52:                            bestConfig 0 
53:                            bitsConfig.Length 
54: 
55:             // Rotate the array
56:             let firstElement = bitsConfig.[0]
57:             for i = 0 to bitsConfig.Length - 2 do
58:                 bitsConfig.[i] <- bitsConfig.[i + 1]
59:             bitsConfig.[bitsConfig.Length - 1] <- firstElement
60:         
61:         set <- nextGosper set // Put the next combination in "set"
62:     bestConfig, maxScore
namespace System
val bestRotGos : ('a -> 'a -> float) -> 'a [] -> 'a [] -> int [] * float

Full name: Snippet.bestRotGos

val f : ('a -> 'a -> float)
Multiple items
val float : 'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

——————–
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

  type: float<'Measure>
  implements: IComparable
  implements: IConvertible
  implements: IFormattable
  implements: IComparable<float<'Measure>>
  implements: IEquatable<float<'Measure>>
  inherits: ValueType

——————–
type float = Double

Full name: Microsoft.FSharp.Core.float

  type: float
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<float>
  implements: IEquatable<float>
  inherits: ValueType

val arr1 : 'a []

  type: 'a []
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  implements: Collections.Generic.IList<'a>
  implements: Collections.Generic.ICollection<'a>
  implements: seq<'a>
  implements: Collections.IEnumerable
  inherits: Array

val arr2 : 'a []

  type: 'a []
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  implements: Collections.Generic.IList<'a>
  implements: Collections.Generic.ICollection<'a>
  implements: seq<'a>
  implements: Collections.IEnumerable
  inherits: Array

val bitsConfig : int []

  type: int []
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  implements: Collections.Generic.IList<int>
  implements: Collections.Generic.ICollection<int>
  implements: seq<int>
  implements: Collections.IEnumerable
  inherits: Array

type Array =
  class
    member Clone : unit -> obj
    member CopyTo : System.Array * int -> unit
    member CopyTo : System.Array * int64 -> unit
    member GetEnumerator : unit -> System.Collections.IEnumerator
    member GetLength : int -> int
    member GetLongLength : int -> int64
    member GetLowerBound : int -> int
    member GetUpperBound : int -> int
    member GetValue : int [] -> obj
    member GetValue : int -> obj
    member GetValue : int64 -> obj
    member GetValue : int64 [] -> obj
    member GetValue : int * int -> obj
    member GetValue : int64 * int64 -> obj
    member GetValue : int * int * int -> obj
    member GetValue : int64 * int64 * int64 -> obj
    member Initialize : unit -> unit
    member IsFixedSize : bool
    member IsReadOnly : bool
    member IsSynchronized : bool
    member Length : int
    member LongLength : int64
    member Rank : int
    member SetValue : obj * int -> unit
    member SetValue : obj * int [] -> unit
    member SetValue : obj * int64 -> unit
    member SetValue : obj * int64 [] -> unit
    member SetValue : obj * int * int -> unit
    member SetValue : obj * int64 * int64 -> unit
    member SetValue : obj * int * int * int -> unit
    member SetValue : obj * int64 * int64 * int64 -> unit
    member SyncRoot : obj
    static member AsReadOnly<'T> : 'T [] -> System.Collections.ObjectModel.ReadOnlyCollection<'T>
    static member BinarySearch : System.Array * obj -> int
    static member BinarySearch<'T> : 'T [] * 'T -> int
    static member BinarySearch : System.Array * obj * System.Collections.IComparer -> int
    static member BinarySearch<'T> : 'T [] * 'T * System.Collections.Generic.IComparer<'T> -> int
    static member BinarySearch : System.Array * int * int * obj -> int
    static member BinarySearch<'T> : 'T [] * int * int * 'T -> int
    static member BinarySearch : System.Array * int * int * obj * System.Collections.IComparer -> int
    static member BinarySearch<'T> : 'T [] * int * int * 'T * System.Collections.Generic.IComparer<'T> -> int
    static member Clear : System.Array * int * int -> unit
    static member ConstrainedCopy : System.Array * int * System.Array * int * int -> unit
    static member ConvertAll<'TInput,'TOutput> : 'TInput [] * System.Converter<'TInput,'TOutput> -> 'TOutput []
    static member Copy : System.Array * System.Array * int -> unit
    static member Copy : System.Array * System.Array * int64 -> unit
    static member Copy : System.Array * int * System.Array * int * int -> unit
    static member Copy : System.Array * int64 * System.Array * int64 * int64 -> unit
    static member CreateInstance : System.Type * int -> System.Array
    static member CreateInstance : System.Type * int [] -> System.Array
    static member CreateInstance : System.Type * int64 [] -> System.Array
    static member CreateInstance : System.Type * int * int -> System.Array
    static member CreateInstance : System.Type * int [] * int [] -> System.Array
    static member CreateInstance : System.Type * int * int * int -> System.Array
    static member Exists<'T> : 'T [] * System.Predicate<'T> -> bool
    static member Find<'T> : 'T [] * System.Predicate<'T> -> 'T
    static member FindAll<'T> : 'T [] * System.Predicate<'T> -> 'T []
    static member FindIndex<'T> : 'T [] * System.Predicate<'T> -> int
    static member FindIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
    static member FindIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
    static member FindLast<'T> : 'T [] * System.Predicate<'T> -> 'T
    static member FindLastIndex<'T> : 'T [] * System.Predicate<'T> -> int
    static member FindLastIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
    static member FindLastIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
    static member ForEach<'T> : 'T [] * System.Action<'T> -> unit
    static member IndexOf : System.Array * obj -> int
    static member IndexOf<'T> : 'T [] * 'T -> int
    static member IndexOf : System.Array * obj * int -> int
    static member IndexOf<'T> : 'T [] * 'T * int -> int
    static member IndexOf : System.Array * obj * int * int -> int
    static member IndexOf<'T> : 'T [] * 'T * int * int -> int
    static member LastIndexOf : System.Array * obj -> int
    static member LastIndexOf<'T> : 'T [] * 'T -> int
    static member LastIndexOf : System.Array * obj * int -> int
    static member LastIndexOf<'T> : 'T [] * 'T * int -> int
    static member LastIndexOf : System.Array * obj * int * int -> int
    static member LastIndexOf<'T> : 'T [] * 'T * int * int -> int
    static member Resize<'T> : 'T [] * int -> unit
    static member Reverse : System.Array -> unit
    static member Reverse : System.Array * int * int -> unit
    static member Sort : System.Array -> unit
    static member Sort<'T> : 'T [] -> unit
    static member Sort : System.Array * System.Array -> unit
    static member Sort : System.Array * System.Collections.IComparer -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] -> unit
    static member Sort<'T> : 'T [] * System.Collections.Generic.IComparer<'T> -> unit
    static member Sort<'T> : 'T [] * System.Comparison<'T> -> unit
    static member Sort : System.Array * int * int -> unit
    static member Sort : System.Array * System.Array * System.Collections.IComparer -> unit
    static member Sort<'T> : 'T [] * int * int -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * System.Collections.Generic.IComparer<'TKey> -> unit
    static member Sort : System.Array * System.Array * int * int -> unit
    static member Sort : System.Array * int * int * System.Collections.IComparer -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int -> unit
    static member Sort<'T> : 'T [] * int * int * System.Collections.Generic.IComparer<'T> -> unit
    static member Sort : System.Array * System.Array * int * int * System.Collections.IComparer -> unit
    static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int * System.Collections.Generic.IComparer<'TKey> -> unit
    static member TrueForAll<'T> : 'T [] * System.Predicate<'T> -> bool
  end

Full name: System.Array

  type: Array
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IEnumerable
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable

val create : int -> 'T -> 'T []

Full name: Microsoft.FSharp.Collections.Array.create

property Array.Length: int
type Int32 =
  struct
    member CompareTo : obj -> int
    member CompareTo : int -> int
    member Equals : obj -> bool
    member Equals : int -> bool
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> System.TypeCode
    member ToString : unit -> string
    member ToString : string -> string
    member ToString : System.IFormatProvider -> string
    member ToString : string * System.IFormatProvider -> string
    static val MaxValue : int
    static val MinValue : int
    static member Parse : string -> int
    static member Parse : string * System.Globalization.NumberStyles -> int
    static member Parse : string * System.IFormatProvider -> int
    static member Parse : string * System.Globalization.NumberStyles * System.IFormatProvider -> int
    static member TryParse : string * int -> bool
    static member TryParse : string * System.Globalization.NumberStyles * System.IFormatProvider * int -> bool
  end

Full name: System.Int32

  type: Int32
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

field Int32.MinValue = -2147483648
val setBits : (int -> unit)
val v : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val getBit : (int -> int32 -> bool)
val x : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val i : int32

  type: int32
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val mutable idx : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val i : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val nextGosper : ('b -> 'd) (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val set : 'b (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val c : 'b (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val r : 'b (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val limit : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val n : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val mutable set : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val k : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val scores : float [,]

  type: float [,]
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IEnumerable
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  inherits: Array

module Array2D

from Microsoft.FSharp.Collections

val init : int -> int -> (int -> int -> 'T) -> 'T [,]

Full name: Microsoft.FSharp.Collections.Array2D.init

val j : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

val mutable maxScore : float

  type: float
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<float>
  implements: IEquatable<float>
  inherits: ValueType

type Double =
  struct
    member CompareTo : obj -> int
    member CompareTo : float -> int
    member Equals : obj -> bool
    member Equals : float -> bool
    member GetHashCode : unit -> int
    member GetTypeCode : unit -> System.TypeCode
    member ToString : unit -> string
    member ToString : string -> string
    member ToString : System.IFormatProvider -> string
    member ToString : string * System.IFormatProvider -> string
    static val MinValue : float
    static val MaxValue : float
    static val Epsilon : float
    static val NegativeInfinity : float
    static val PositiveInfinity : float
    static val NaN : float
    static member IsInfinity : float -> bool
    static member IsNaN : float -> bool
    static member IsNegativeInfinity : float -> bool
    static member IsPositiveInfinity : float -> bool
    static member Parse : string -> float
    static member Parse : string * System.Globalization.NumberStyles -> float
    static member Parse : string * System.IFormatProvider -> float
    static member Parse : string * System.Globalization.NumberStyles * System.IFormatProvider -> float
    static member TryParse : string * float -> bool
    static member TryParse : string * System.Globalization.NumberStyles * System.IFormatProvider * float -> bool
  end

Full name: System.Double

  type: Double
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<float>
  implements: IEquatable<float>
  inherits: ValueType

field Double.NegativeInfinity = -Infinity
val bestConfig : int []

  type: int []
  implements: ICloneable
  implements: Collections.IList
  implements: Collections.ICollection
  implements: Collections.IStructuralComparable
  implements: Collections.IStructuralEquatable
  implements: Collections.Generic.IList<int>
  implements: Collections.Generic.ICollection<int>
  implements: seq<int>
  implements: Collections.IEnumerable
  inherits: Array

val mutable totalScore : float

  type: float
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<float>
  implements: IEquatable<float>
  inherits: ValueType

val tokenScore : float

  type: float
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<float>
  implements: IEquatable<float>
  inherits: ValueType

val blit : 'T [] -> int -> 'T [] -> int -> int -> unit

Full name: Microsoft.FSharp.Collections.Array.blit

val firstElement : int

  type: int
  implements: IComparable
  implements: IFormattable
  implements: IConvertible
  implements: IComparable<int>
  implements: IEquatable<int>
  inherits: ValueType

A quick test shows that the new algorithm is about seven times faster across a variety of test inputs:

 1: let f x y = if x = y then 1.0 else 0.0
 2: 
 3: let cmp = [|"one"; "two"; "three"; "four"|]
 4: let cmps = 
 5:     [| for i = 0 to cmp.Length - 1 do 
 6:            for j = i to cmp.Length - 1 do 
 7:                // Longer array is second
 8:                yield cmp.[j..], cmp.[i..] 
 9:     |] 
10: 
11: #time
12: for i = 0 to 1000000 do
13:     for c1,c2 in cmps do 
14:         bestRot f c1 c2 |> ignore
15: //Real: 00:00:41.800, CPU: 00:00:41.812, 
16: // GC gen0: 10175, gen1: 6, gen2: 1
17: #time 
18: 
19: #time
20: for i = 0 to 1000000 do
21:     for c1,c2 in cmps do 
22:         bestRotGos f c1 c2 |> ignore
23: //Real: 00:00:06.378, CPU: 00:00:06.375, 
24: // GC gen0: 689, gen1: 1, gen2: 0
25: #time 
26: 
val f : 'a -> 'a -> float (requires equality)

Full name: Snippet.f

val x : 'a (requires equality)
val y : 'a (requires equality)
val cmp : string []

Full name: Snippet.cmp

  type: string []
  implements: System.ICloneable
  implements: System.Collections.IList
  implements: System.Collections.ICollection
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.IStructuralEquatable
  implements: System.Collections.Generic.IList<string>
  implements: System.Collections.Generic.ICollection<string>
  implements: seq<string>
  implements: System.Collections.IEnumerable
  inherits: System.Array

val cmps : (string [] * string []) []

Full name: Snippet.cmps

  type: (string [] * string []) []
  implements: System.ICloneable
  implements: System.Collections.IList
  implements: System.Collections.ICollection
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.IStructuralEquatable
  implements: System.Collections.Generic.IList<string [] * string []>
  implements: System.Collections.Generic.ICollection<string [] * string []>
  implements: seq<string [] * string []>
  implements: System.Collections.IEnumerable
  inherits: System.Array

val i : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

property System.Array.Length: int
val j : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

val c1 : string []

  type: string []
  implements: System.ICloneable
  implements: System.Collections.IList
  implements: System.Collections.ICollection
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.IStructuralEquatable
  implements: System.Collections.Generic.IList<string>
  implements: System.Collections.Generic.ICollection<string>
  implements: seq<string>
  implements: System.Collections.IEnumerable
  inherits: System.Array

val c2 : string []

  type: string []
  implements: System.ICloneable
  implements: System.Collections.IList
  implements: System.Collections.ICollection
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.IStructuralEquatable
  implements: System.Collections.Generic.IList<string>
  implements: System.Collections.Generic.ICollection<string>
  implements: seq<string>
  implements: System.Collections.IEnumerable
  inherits: System.Array

val ignore : 'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore

Of course, there are a few more things that could be done to make this even faster mostly related to further reducing allocations. For example, we could potentially use larger fixed sized arrays if we knew our input was bounded and pass in the arrays as references letting the calling function handle proper recycling.

However, at this point the function no longer appeared anywhere near significant when profiling our overall usage and so we found this was fast enough. It’s not worth making things uglier and more prone to breakage for minor gains when what we have right now is a referentially transparent function with predictable behavior.


18
Jul 13

Come join me at the SkillsMatter F# Tutorials NYC 2013

Last year was our first NYC F# tutorials and they were just amazing (you can read about them here) but this year’s are going to be even better. We’ve got a lineup including some of the most talented teachers in the F# community, and the tickets are extremely inexpensive as conferences and training events go.

Looking to learn F#? Our beginner track is jam packed with hands on exercises. It’s was amazing to see what just two days of training can do. A C# co-worker of mine was a beginner track attendee last year and delivered a project in F# just the next week.

Already have some serious F# skills? In our advanced track we’ve got a lineup that will push those skills to the limit. I personally am particularly excited to dig into the F# compiler with Don and Tomas.

Now that I’ve had my say, here’s the official spiel:

On the back of the success of the 2013 edition, the Progressive F# Tutorials return to New York in September – this time packing an even bigger punch! With F# UG lead Rick Minerich at the helm, we’ve put together a expert filled line-up – featuring Don Syme (creator of F#), Tomas Petricek, and Miguel de Icaza. The Tutorials will be split in two – a beginners track for those eager to unleash F#’s full power, and a ‘meaty track’ for those more experience f#pers amongst you! Each session will be a 4 hour hands-on deep dive, brushing aside the traditional format of conferences to allow you to truly immerse into the subject topic.

Want to get involved? We’re giving a special community 20% discount! Just go ahead and enter SkillsMatter_Community on the booking form and the team at Skills Matter will look forward to welcoming you to NYC this September!

Check out our schedule.
Purchase tickets.
Read about last year’s tutorials.

Are you as excited as I am yet?


3
Jul 13

On Type Safety, Representable States and Erlang

Close your eyes and imagine your program as a function that takes a set of inputs and produces a set of outputs. I know this may seem overly simple, but a set of actions in a GUI can be thought of as a set of inputs, and a set of resulting side effects to a database can be seen as a new state of the world being returned.

Now focus on its input space. This space is comprised as all possible combinations of all possible inputs. In this set some will be well defined for your program and some not. An example of a not well defined input could be as simple as an incorrect database connection string, as straightforward as an incorrect combination of flags on a console application, or as difficult to detect as a date with month and day transposed.

Input Space

A program thought of in this way is a fractal-like thing, a program made of little smaller programs, made of smaller programs yet. However, there’s no guarantee that each of these smaller programs will treat of a piece of data in exactly the same way as others. In addition to any initial validation, any top-level inputs which cause other inputs to be given to sub-programs where they are not properly handled are similarly considered not well defined. Consider these three approaches to making your program safer by reducing the size of incorrect input space:

First, you can increase the size of the blue circle with explicit input checking. This means numerous validations to ensure the program exits with proper notification when incorrect inputs are given. However, the program is fractal, and so if we want to be safe we’ll need to reproduce many of these checks fractally. A great example of this is handling null values and we all know how that turns out.

Another approach is to shrink the size of the red circle. We can do this by making fewer incorrect states representable with types. Because we know that all of the potential states are valid once encoded, we only need to do our checks once while marshalling our data into a well-typed representation. This eliminates almost all need for repeated validation, limited only by how far your type system will take you. Even better, with newer language features (such as F# type providers) we can eliminate much of this marshalling phase, this is however similarly limited by how far the schema of the data will take you.

A third approach, available only in some situations but which I find extremely fascinating, is to build everything in such a way as the entire program logs the error and resets its state when an incorrect input is found. Most paradoxically, in this case the more fragile you make your system, the safer it is (as long as you ensure that external state changes are the very last thing done, and that they’re done transactionally). This seems to be the Erlang philosophy and the only flaw I can find with it is shared with most type systems. That is, you can’t implicitly account for ambiguous inputs or state spaces that your type system can’t constrain.


27
Mar 13

Setting up F# Interactive for Machine Learning with Large Datasets

Before getting started with Machine Learning in F# Interactive it’s best to prepare it for large datasets and external 64-bit libraries so you don’t get blindsided with strange errors when you happen to cross the line. The good news is it’s a simple process that should only take a few minutes.

The first step is to go into the configuration and set fsi to 64-bit. It’s only matter of changing a boolean value buried deep in the Visual Studio settings. First, Go into Tools->Settings.

Tools->Settings

Then find the “F# Tools” section on the left and select the “F# Interactive” subsection.

tools_settings_fsi

Finally, set “64-bit F# Interactive” to true and click OK.

tools_settings_fsi_64-bit

What this does is set Visual Studio to use “FsiAnyCPU.exe” for the F# Interactive window instead of 32-bit “Fsi.exe”.

Now, after we restart Visual Studio, your F# Interactive is running with as many bits as your operating system can handle. However, if we want to support really big matrices we’re going to need to go a bit further. If we want really large arrays, that is greater than 2 gigabytes, we’re going to need to fiddle with the F# Interactive application config and enable the “gcAllowVeryLargeObjects” attribute.

For .NET 4.5 on Windows 7, Windows 8 and Windows Sever 2008R2 the standard directory for both the fsi exeuctables and their application configs is:

“C:\Program Files (x86)\Microsoft SDKs\F#\3.0\Framework\v4.0”

Navigate there and open “FsiAnyCPU.exe.config” in your favorite text editor. Then under the <runtime> tag add:

<gcAllowVeryLargeObjects enabled="true" />

When you’re done it should look like:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runtime>
    <gcAllowVeryLargeObjects enabled="true" />
    <legacyUnhandledExceptionPolicy enabled="true" />
    <assemblyBinding 
      xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity
          name="FSharp.Core"
          publicKeyToken="b03f5f7f11d50a3a"
          culture="neutral"/>
        <bindingRedirect
          oldVersion="2.0.0.0"
          newVersion="4.3.0.0"/>
        <bindingRedirect
          oldVersion="4.0.0.0"
          newVersion="4.3.0.0"/>
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
</configuration>

Just save and restart Visual Studio and you’re done! Your F# Interactive can now handle large datasets and loading external 64-bit native libraries.


4
Sep 12

Levenshtein Distance and the Triangle Inequality

Levenshtein distance is one of my favorite algorithms. On the surface it seems so very simple, but when you spend some time thinking hard on it deep insights are waiting to be had.

The first and most important thing about Levenshtein distance is it’s actually a metric distance. That is, it obeys the triangle inequality. For most other string distance measurements this property doesn’t hold.

The Vector Triangle Inequality

The Vector Triangle Inequality

This might not seem like such a big deal, but this property gives the measurements meaning in a context larger than just the pair. For example, it allows you to embed your pair distance measurements into a higher dimensional space and so use it for things like clustering.

This was one of the first insights that Levenshtein distance gave me: A measurement doesn’t need to give you an absolute location in space to be useful for telling you where you are, it just has to tell you how far away everything else is. But what is it about Levenshtein distance that gives it this property? It’s not immediately obvious to most people, at least it wasn’t to me.

First, let’s consider a naive implementation of the Wagner-Fisher algorithm for Levenshtein distance. As stated above, here the triangle inequality holds.

 1: let wagnerFischer (s: string) (t: string) =
 2:     let m = s.Length
 3:     let n = t.Length
 4:     let d = Array2D.create (m + 1) (n + 1) 0
 5: 
 6:     for i = 0 to m do d.[i, 0] <- i
 7:     for j = 0 to n do d.[0, j] <- j    
 8: 
 9:     for j = 1 to n do
10:         for i = 1 to m do
11:             if s.[i-1] = t.[j-1] then
12:                 d.[i, j] <- d.[i-1, j-1]
13:             else
14:                 d.[i, j] <-
15:                     List.min
16:                         [
17:                             // a deletion
18:                             d.[i-1, j  ] + 1; 
19:                             // an insertion
20:                             d.[i  , j-1] + 1; 
21:                             // a substitution
22:                             d.[i-1, j-1] + 1; 
23:                         ]
24:     d.[m,n]

Now compare this with an incorrect version of an extension called Damerau–Levenshtein distance (or restricted edit distance). This change adds support for Jaro-Winkler like transpositions to the original algorithm. However, in the process of adding just this minor tweak we lose the triangle inequality.

26: let damerauLevenshtein (s: string) (t: string) =
27:     let m = s.Length
28:     let n = t.Length
29:     let d = Array2D.create (m + 1) (n + 1) 0
30: 
31:     for i = 0 to m do d.[i, 0] <- i
32:     for j = 0 to n do d.[0, j] <- j    
33: 
34:     for j = 1 to n do
35:         for i = 1 to m do
36:             // 1 if a substitution
37:             // 0 if no change
38:             let cost = if s.[i-1] = t.[j-1] then 0 else 1
39:             d.[i, j] <-
40:                 List.min
41:                     [
42:                         // a deletion
43:                         d.[i-1, j  ] + 1; 
44:                         // an insertion
45:                         d.[i  , j-1] + 1; 
46:                         // a substitution or nothing
47:                         d.[i-1, j-1] + cost;
48:                     ]
49:             if // boundary check
50:                i > 1 && j > 1 
51:                // transposition check
52:             && s.[i-1] = t.[j-2] && s.[i-2] = t.[j-1] 
53:             then // the lesser of a transposition or current cost
54:                 d.[i, j] <- min d.[i,j] (d.[i-2, j-2] + cost)
55:     d.[m,n]

val wagnerFischer : string -> string -> int

Full name: Snippet.wagnerFischer

val s : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

Multiple items

val string : 'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

——————–

type string = System.String

Full name: Microsoft.FSharp.Core.string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

val t : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

val m : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

property System.String.Length: int
val n : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

val d : int [,]

  type: int [,]
  implements: System.ICloneable
  implements: System.Collections.IList
  implements: System.Collections.ICollection
  implements: System.Collections.IEnumerable
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.IStructuralEquatable
  inherits: System.Array

module Array2D

from Microsoft.FSharp.Collections

val create : int -> int -> 'T -> 'T [,]

Full name: Microsoft.FSharp.Collections.Array2D.create

val i : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

val j : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

Multiple items

module List

from Microsoft.FSharp.Collections

——————–

type List<'T> =
  | ( [] )
  | ( :: ) of 'T * 'T list
  with
    interface System.Collections.IEnumerable
    interface System.Collections.Generic.IEnumerable<'T>
    member Head : 'T
    member IsEmpty : bool
    member Item : index:int -> 'T with get
    member Length : int
    member Tail : 'T list
    static member Cons : head:'T * tail:'T list -> 'T list
    static member Empty : 'T list
  end

Full name: Microsoft.FSharp.Collections.List<_>

  type: List<'T>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<List<'T>>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.Generic.IEnumerable<'T>
  implements: System.Collections.IEnumerable

val min : 'T list -> 'T (requires comparison)

Full name: Microsoft.FSharp.Collections.List.min

val damerauLevenshtein : string -> string -> int

Full name: Snippet.damerauLevenshtein

val cost : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

val min : 'T -> 'T -> 'T (requires comparison)

Full name: Microsoft.FSharp.Core.Operators.min

It seems like such a simple and obvious addition to the algorithm. Just what is it about about the way we’ve added transpositions that ruins the magic? We’ve just put something like wormholes to our little universe. That’s right, the simple addition of transpositions in this way implies a universe where some combinations of characters treat space differently than everything else. The easiest way to prove this is the case is to give the definition of the triangle inequality for metric spaces a read.

From Wikipedia’s Triangle Inequality article:
In a metric space M with metric d, the triangle inequality is a requirement upon distance: d(x, z) <= d(x, y) + d(y, z) for all x, y, z in M. That is, the distance from x to z is at most as large as the sum of the distance from x to y and the distance from y to z.

From this, it’s easy to construct a counterexample for our broken Damerau-Levenshtein distance simply by exploiting the transpositions.

Damerau-Levenshtein distance not satisfying the triangle inequality.

As you can see in this picture 4 is most certainly greater than 1 + 2, and so the triangle inequality is broken. Consider also the pathology that this example shows in the algorithm. Why didn’t it just go along the irkc –> rick –> rcik path when it’s obviously less expensive?

Levenshtein distance satisfying the triangle inequality.

For comparison, if we measure those same pairs with standard Levenshtein distance everything is just peachy.

So we now know of at least one case which causes the triangle inequality to fail, does this imply what causes it to succeed? I think yes, at least in a limited sense. We can see that with Levenshtein distance any given pair of characters is considered independently, changes at each only happen once and are exactly one character in size. As each pair in the strings is considered, small changes push them further and further apart, but in discrete and equal units for discrete and equal changes. While in our Damerau-Levenshtein distance implementation we greedily perform operations of a larger size and then never revisit their implications, standard Levenshtein is reversibly symmetric both in how it treats locations over the string as well as in how it treats the characters themselves due to its uniform granularity. The uniform granularity of the changes ensures all important paths are explored.

Can transpositions be reliably counted with a different approach? We’ll find out the answer to this question next time.


31
Aug 12

What is good API design?

Some say that API design is one of the hardest things in programming. A few even go as far as to say you should have at least 10 years of experience to even attempt it. While I think this process can be sped up almost an order of magnitude by good mentorship, at one time or another we’ve all suffered under the API of an inexperienced programmer. Though, this does raise the question: what exactly is it about building libraries that can take up to 10 years to learn?

I was lucky in that I got a strict API education early on. Right out of college I joined Atalasoft, a company for which the API was the product and so was under the strictest of scrutiny. My mentor was Steve Hawley, a man who has spent much of his life solving difficult problems and wrapping them up in nice little packages. Steve had little patience for babysitting as he always had a lot on his plate and so under him I was forced to learn very quickly.

His philosophy, which was never explicitly stated, I call 90-9-.9. For 90% of the users you want the problem solved out of the box with just a couple of lines of code that can be cut and pasted. Here defaults matter the most. For the next 9% you’re aiming for simple configuration; something that can be easily figured out from the documentation or resolved in just a few minutes by the support team. Then there’s the .9% who will want to bend your libraries in all kinds of twisted ways, sometimes for performance or and other times some wacky (but workable) use case you never thought of. It’s completely fine to sacrifice the experience of the .9% for the sake of everyone else, just make sure it’s possible to get what they want done and that your documentation will show them the way.

Finally, there’s the unmentioned .1% who you’ll never make happy because they’re mistaken about the capabilities of your product. Better to either ignore them, or do market research to see if they’re worth the cost of extending your library to pull them in.

A great example of this is Atalasoft’s barcode product. A lot of effort went into carefully tuning it to preprocess most scanned documents without issue. After preprocessing it will by default go whole hog and try every kind of possible barcode type that you have a license for. This is still quite fast, fast enough for anyone with a small time scanning operation. Sometimes for folks doing large scale batch scanning on expensive equipment it’s just not fast enough though, so they can configure which barcode scanners are used by changing a simple enumeration property. Once in a while they get folks doing things that are a bit more difficult, like for example maybe trying to scan a barcode wrapped around a banana. For this there are events that let you interrupt, tweak and replace whole chunks of the barcode engine. But the guy who wants to read the barcodes he hand shaved into the side of the dogs in his pet store? Sorry pal, you’re better off finding another product.

When I first saw this it seemed like bad design. The whole component is like a frickin’ monolithic program with an event based do-it-yourself plugin system! You see though, aesthetic beauty as judged by an architecture astronaut isn’t what Atalasoft is optimizing for. They’re optimizing for reduction of the customer support burden. As much as I dislike object oriented programming for writing the internals of libraries like these, I think there’s no better paradigm for exposing a single simple interface that allows for manipulation at all of these levels.

Now, for the past two years I’ve been in charge of the APIs at Bayard Rock, a completely different kind of company. We do research and development primarily for anti-money laundering. This means lots of little experiments and the occasional medium-scale project which will later be integrated into our sister company’s larger infrastructure. In the vast majority of cases Atalasoft-style monolithic black-boxing wouldn’t be helpful at all. We only have one customer and we work with them to tailor our external APIs directly to their needs.

However, code reuse at a fine grained level is much more important at Bayard Rock than it was at Atalasoft. In this context what matters most is the construction of large libraries full of many small categorized functions which we can first use to put together experiments quickly (that is, through simple composition and without comprehensive unit tests) but later still feel confident about shipping in our product. We’re optimizing for experimentation and the rapid development of components that we trust enough to ship. It should come as no surprise that here typed functional programming wins hands down.

So, what is good API design? It depends, and that’s why it’s so hard.


3
Oct 11

Advice for Getting Started with F#

I had a great time at NYC Code Camp this last weekend. About half the people in my talk already knew F# and were there to talk about Type Providers, the other half just came to see what this F# thing was all about. This post is to help those in the second half begin their F# learning adventure.

The first thing anyone who is looking to get started with F# should do is work through the tutorial examples on tryfsharp.org. Getting some hands-on time with the basics is extremely important, especially if you haven’t done much functional programming before.

Once you’ve got the basics down, the next step is thinking through some real problems in F#. For me, this was writing an Ant Colony simulation. With the version linked to here you can actually compile and then try your own ant behavior code against mine! Other great options include the Coding Dojo Katas or Project Euler (for the mathematically inclined).

It’s best if you do most of your play using .fsx files and F# interactive at first. Just highlighting code and hitting Alt-Enter makes the process of learning so much faster. However, if you insist on working with unit tests I suggest Xunit as it doesn’t require the use of classes for hosting your tests.

At first it’s also best to ignore most of F#’s features, it can be overwhelming otherwise. Try your best to focus on comprehensions. These may not always be the best way to solve a problem, but they allow you to write very imperative code right in the middle of a bunch of functional stuff and so are kind of an escape hatch for when you don’t know how to do what you want functionally. Eventually you’ll want to move on to recursion and folding though. When you’re ready to make the jump read this post which talks about how to go about it.

As you’re playing with F#, don’t be afraid to look things up along the way when you get stuck. There’s quite a few good resources around for F# including the F# Programming WikiBook, the code samples provided by the F# team, and a number of print books including Professional F# 2.0 (shameless plug). When all else fails the MSDN documentation for F# is actually very good!

I can say from experience that at first you will likely experience quite a few syntax errors. Don’t worry too much about it, your brain needs to build up a model of how the types and syntax work. Once you’ve gotten over that hurdle I promise you’ll be amazed by how different the programming world looks.


27
Sep 11

Record Linkage Algorithms in F# – Extensions to Jaro-Winkler Distance (Part 3)

While writing the previous article on tokenized matching I realized I left out some important background information on Jaro-Winkler distance.

The Vector Triangle Inequality

The Vector Triangle Inequality

First, there’s something important to know about the Jaro-Winkler distance: it’s not a metric distance and so does not obey the triangle inequality. That is, if you found the JW distance between strings A and B, and then found the JW distance between strings B and C, those results would have no relationship with JW distance between strings A and C. This may not seem like a big deal, but it means Jaro-Winkler distance can’t be used to embed strings in a metric space and so is a poor algorithm choice for many types of clustering. This will be an important point in future articles.

Second, it can be very helpful to extend the results of Jaro-Winkler based on the nature of your own data and your use of the algorithm. To better support my own use case I’ve made changes put the emphasis on better token alignment.

 1: let jaroWinklerMI (t1:string) (t2:string) = 
 2:     // Optimizations for easy to calculate cases
 3:     if t1.Length = 0 || t2.Length = 0 then 0.0
 4:     elif t1 = t2 then 1.0
 5:     else
 6:         // Even more weight for the first char
 7:         let score = jaroWinkler t1 t2
 8:         let p = 0.2 //percentage of score from new metric
 9:         let b = if t1.[0] = t2.[0] then 1.0 else 0.0
10:         ((1.0 - p) * score) + (p * b)

Beyond the optimization for empty strings and those which are exactly the same, you can see here that I weight the first character even more heavily. This is due to my data being very initial heavy.

To compensate for the frequent use of middle initials I count Jaro-Winkler distance as 80% of the score, while the remaining 20% is fully based on the first character matching. The value of p here was determined by the results of heavy experimentation and hair pulling. Before making this extension initials would frequently align incorrectly.

12: let scoreNamePairs (t1:string) (t2:string) =  
13:     //Raise jaro to a power in order to over-weight better matches        
14:     jaroWinklerMI t1 t2 ** 2.0

I also take the square of the result of jaroWinklerMI to weight better matches even more heavily. I found that in doing this I was able to get much more reliable matching. To understand how this works take a gander at this plot.

As you already know, multiplying any number greater than 0 but less than 1 by itself will give you a smaller number. However are you might intuit, the smaller the number the greater the proportional reduction. As you can see here, anything less than 1 takes a hit, but worse matches get dragged down significantly more.

Initially I was frustrated by bad alignments which would sometimes be chosen over better ones when two or more tokens were both fairly close, but not great. After seeing a variation on this squaring technique used for matrix convergence the thought occurred to me: why not see if it helps with token alignment? After implementing this I saw a huge improvement in results: incorrect alignments completely disappeared!

It’s often surprising where inspiration will come from.

Edit: The above code and it’s composition with Gale-Shapely is now available in my github repository.

val jaroWinklerMI : string -> string -> float

Full name: Snippet.jaroWinklerMI

val t1 : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

Multiple items

val string : 'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

——————–

type string = System.String

Full name: Microsoft.FSharp.Core.string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

val t2 : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

property System.String.Length: int
val score : float

  type: float
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<float>
  implements: System.IEquatable<float>
  inherits: System.ValueType

val p : float

  type: float
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<float>
  implements: System.IEquatable<float>
  inherits: System.ValueType

val b : float

  type: float
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<float>
  implements: System.IEquatable<float>
  inherits: System.ValueType

val scoreNamePairs : string -> string -> float

Full name: Snippet.scoreNamePairs


19
Sep 11

Imperative Pseudocode to Pure Functional Algorithm with Gale-Shapely and F#

Continuing from last time, let’s look at how one goes from imperative pseudocode to pure functional using Gale-Shapely as an example.

Overall, to convert an algorithm from imperative to functional is a fairly simple process once you understand how to convert from a while loop to recursion with accumulators. This post is just a more advanced version of what I talked about in my From Imperative to Computation Expressions post. In fact, if you haven’t already I suggest you read it first. I’ll be assuming you understand how to perform that conversion in this article.  If you’re still having trouble, another good article which is a bit easier is The Ted Neward F# Folding Challenge.

The conversion of an algorithm is a two stage process: first identify algorithmic cases, implied mutable data structures, and alternate immutable data structures.  Then convert the given loops into recursion while leveraging pattern matching.  I know it may seem difficult at first, but after reading this I think you’ll have a good idea of how to get there.

So, let’s identify the cases.

function stableMatching {   
    Initialize all m ? M and w ? W to free
    // Loop Termination Case on
    // check for “free m with unproposed w” and
 
    // implicit ranking of W for each m in M
   
    while ? free man m who still has a
            woman w to propose to {      
       w = m’s highest ranked such woman
               who he has not proposed to yet
      
// She’s Single Case 
       if w is free
         (m, w) become engaged
       // She’s Engaged Case 
       else some pair (m’, w) already exists
         // She Picks m Sub-Case 

         if w prefers m to m’
           (m, w) become engaged
           m’ becomes free
         // She Picks m’ Sub-Case 
         else
           (m’, w) remain engaged
    }
}

Starting from the top, we can just ignore the imperative initialization.  However, you’ll notice that a great deal is going on in that while clause. 

First, it defines our loop termination by looking for a “free m who still has a w to propose to”.  This is interesting because it seems to be begging for some kind of data structure lookup.  Second it defines a implicit ranking of each member of W by each member of M.  Note that this is not mentioned in the initialization phase.

The rest of this algorithm reads like a simple list of cases.  To make this plain, let’s list them out along with state changes.

  1. no single m in M with unproposed w 
    — execution terminates
  2. m proposes to unengaged w
    — w is no longer in m proposals
    — m is no longer single
    — m and w are engaged
  3. m proposes to w engaged to m’
    a.  She switches to m
          — w is no longer in m proposals
          — m is no longer single
          — m and w are engaged
          — m’ is single
    b.  She stays with m’
          — w is no longer in m proposals
          — m’ and w stay engaged
          — m remains single

The mechanics of what exactly we do in these cases depends on our data structures.  So we can’t get much further without figuring that out.  To identify what data structures are needed we’ll need to identify what exactly it is that needs to be kept track of.  Only once we know that can we pick our immutable data structures appropriately. 

Looking above, it appears that we’ll need to keep track of:

  1. Given a woman, is she engaged and, if so, who is she engaged to? (need to look it up)
  2. Which men are single? (just need one at a time, order doesn’t matter)
  3. For each man, which is his next most desirable match that he has not yet proposed to? (just need one at a time, order matters)

Note that we don’t have to keep track of which men are engaged to which women, the question is never asked.  This is a minimal set of requirements here.  No extra stuff allowed.  I’m serious.

Moving on, in an imperative mutable version you might use these data structures:

  1. An array indexed by w to the m index engaged to (-1 if single)
  2. An array indexed by m to the w index engaged to (-1 if single)
  3. An array indexed by m where each element is an ordered linked list of unproposed w indices.

However, what would satisfy these same requirements but be immutable?  Well, if we step out of the mutation mindset it becomes obvious pretty quickly:

  1. An engagement map of women to men
  2. An immutable linked list of single men
  3. Define a man to include his index as well as an ordered unproposed woman immutable linked list

Instead of scanning arrays, or querying mutable structures we simply decompose our data structures as much as needed and recompose them into the form we need on the next call.   This may sound difficult, but it’s really not.  The fact that we have pattern matching and our cases can be defined in terms of the shape of our data structures actually makes this very simple to do.  That is, once you get the hang of it.

To see this in action, let’s break up the code in the last article and discuss each chunk.

 1: open System
 2: 
 3: // a Bachelor is an identity index and an 
 4: // ordered list of women indicies to approach.
 5: type Bachelor = int * int list

Here we see the definition of a bachelor, it’s a tuple of an integer and a list of integers. This is effectively a pair containing an index and a list of woman indices. Note that in this example all men are of type Bachelor.

 7: // Some notation:
 8: // wi = woman index (int)
 9: // mi = man index (int)
10: // mi' = woman's current partner index (int)
11: // m = man with index and unapproached women indices (Bachelor)
12: // mSingle = men that are single (Bachelor list)
13: // wEngaged = engagements from women to men (int, Bachelor)

We’ll keep this notation around just so you have easy reference.

15: let funGS (comp: _ -> _ -> float) (M: _ array) (W: _ array) =
16:   let Windices = [ 0 .. W.Length - 1 ]
17:   // List of men with women in order of desire  
18:   let Munproposed = 
19:     List.init M.Length 
20:       (fun mi -> 
21:            let sortFun wi = 1.0 - (comp M.[mi] W.[wi])
22:            mi, Windices |> List.sortBy sortFun)

Windices is just a list of numbers from 0 to the number of woman tokens. Doing this first makes the calculation of Munproposed less expensive.

Munproposed is our initial set of all single men. Here List.init is used to call the given lambda to generate each element. As you can see by the last line within that lambda, each element of the list is a tuple containing the index of that man and a list of women indices sorted by the given desirability function, sortFun.

23:   // Recursively solve stable marriages
24:   let rec findMarriages mSingle wEngaged =
25:     match mSingle with
26:     // No single guys left with desired women, we're done
27:     | [] -> wEngaged
28:     // Guy is out of luck, remove from singles
29:     | (mi, []) :: bachelors -> findMarriages bachelors wEngaged

These are our first two cases, if the mSingle list pattern matches with [], we know it’s empty and so we are done.

In the second case we are pattern matching on the structure of singles list as well as it’s first element.  The syntax (mi, []) matches on a tuple of an index and an empty inner list.  This list is the list of yet unproposed to women and so if it’s empty we know that this guy is out of options and so it’s ok to drop him from the list of bachelors.  We do this by simply recursing on bachelors, which is the tail of the mSingle list.

30:     // He's got options!
31:     | (mi, wi :: rest) :: bachelors -> 
32:       let m = mi, rest

This is our general purpose match, and is the structure we expect to see in most cases.  Here the head is fully decomposed into a man index (mi), his next first choice woman index (wi), the rest of the women (rest) and the rest of the single men (bachelors).

Immediately afterward we define m to be that given man index (mi) and the rest of the women tokens (rest).  As two tokens are only ever compared once, m should no longer contain wi in his ordered list of proposals.  This is the form he will take until the algorithm ends or he is evaluated again after being placed in the mSingle pool.

33:       match wEngaged |> Map.tryFind wi with
34:       // She's single! m is now engaged!
35:       | None -> findMarriages bachelors (wEngaged |> Map.add wi m)
36:       // She's already engaged, let the best man win!
37:       | Some (m') –> 
38: let mi', _ = m'

Now that we have fully matched out everything masculine, it’s time to turn to the feminine. 

Here we look in the wEngaged map to find out if wi is single.  This is done by a using Map.tryFind which returns None if the given key is not in the map and Some(value) if it is.

If the Map.tryFind call returns None we know she is single and so recurse with the rest of our single men (bachelors) and our wEngaged map is extended via Map.add to contain a new mapping from the woman index (wi) to the given man (m).

If she’s engaged our pattern match automatically decomposes the given option type and we know exactly who she is engaged to (m’).

39:         if comp W.[wi] M.[mi] > comp W.[wi] M.[mi'] then 
40:           // Congrats mi, he is now engaged to wi
41:           // The previous suitor (mi') is bested 
42:           findMarriages 
43:             (m' :: bachelors) 
44:             (wEngaged |> Map.add wi m)
45:         else
46:           // The current bachelor (mi) lost, better luck next time
47:           findMarriages 
48:             (m :: bachelors) 
49:             wEngaged

This is the core comparison logic.  Here we determine which bachelor gets her hand.  This is pretty straightforward from an imperative perspective as we’re using the oh-so familiar if statement.

In the first case the new contender (m) wins.  We append the loser (m’) back on to the head of what will be mSingle next time around with the syntax (m’ :: bachelors).  Similar to what happens if she’s single, we add a mapping from the woman index (wi) to the man instance (m) via the call to Map.add.  Note that this will override the former mapping from wi to m’.

If the current bachelor loses we simply append him back to the head of the bachelors list and try again next time around.  You’ll get her next time bro.

50:   findMarriages Munproposed Map.empty
51:   // Before returning, remove unproposed lists from man instances  
52:   |> Map.map (fun wi m -> let mi, _ = m in mi)  

The final section is just how we set up our recursion.  We start with the full list of single men and their ranked lady lists (Munproposed) and an empty map of relationships (Map.empty).  When the algorithm exits we make one last pass through our returned wEngaged map in order to strip out all of the lists of unproposed to women.  It’s just baggage now anyway.

Well that’s how you get from imperative pseudocode to a pure functional implementation.  All of the code here, as well as a really nasty imperative version to compare it to is available on my GitHub page

If you liked this post, have any questions, or feel like I missed something important I hope you’ll comment here on it.  We’ll all benefit from the knowledge shared. 

namespace System
type Bachelor = int * int list

Full name: Snippet.Bachelor

Multiple items

val int : ‘T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

——————–

type int<‘Measure> = int

Full name: Microsoft.FSharp.Core.int<_>

  type: int<‘Measure>

  implements: IComparable

  implements: IConvertible

  implements: IFormattable

  implements: IComparable<int<‘Measure>>

  implements: IEquatable<int<‘Measure>>

  inherits: ValueType

——————–

type int = int32

Full name: Microsoft.FSharp.Core.int

  type: int

  implements: IComparable

  implements: IFormattable

  implements: IConvertible

  implements: IComparable<int>

  implements: IEquatable<int>

  inherits: ValueType

type ‘T list = List<‘T>

Full name: Microsoft.FSharp.Collections.list<_>

  type: ‘T list

  implements: Collections.IStructuralEquatable

  implements: IComparable<List<‘T>>

  implements: IComparable

  implements: Collections.IStructuralComparable

  implements: Collections.Generic.IEnumerable<‘T>

  implements: Collections.IEnumerable

val funGS : (‘a -> ‘a -> float) -> ‘a array -> ‘a array -> Map<int,int>

Full name: Snippet.funGS

val comp : (‘a -> ‘a -> float)
Multiple items

val float : ‘T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

——————–

type float<‘Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

  type: float<‘Measure>

  implements: IComparable

  implements: IConvertible

  implements: IFormattable

  implements: IComparable<float<‘Measure>>

  implements: IEquatable<float<‘Measure>>

  inherits: ValueType

——————–

type float = Double

Full name: Microsoft.FSharp.Core.float

  type: float

  implements: IComparable

  implements: IFormattable

  implements: IConvertible

  implements: IComparable<float>

  implements: IEquatable<float>

  inherits: ValueType

Multiple items

val M : ‘a array

  type: ‘a array

  implements: ICloneable

  implements: Collections.IList

  implements: Collections.ICollection

  implements: Collections.IStructuralComparable

  implements: Collections.IStructuralEquatable

  implements: Collections.Generic.IList<‘a>

  implements: Collections.Generic.ICollection<‘a>

  implements: seq<‘a>

  implements: Collections.IEnumerable

  inherits: Array

——————–

val M : ‘a array

  type: ‘a array

  implements: ICloneable

  implements: Collections.IList

  implements: Collections.ICollection

  implements: Collections.IStructuralComparable

  implements: Collections.IStructuralEquatable

  implements: Collections.Generic.IList<‘a>

  implements: Collections.Generic.ICollection<‘a>

  implements: seq<‘a>

  implements: Collections.IEnumerable

  inherits: Array

type ‘T array = ‘T []

Full name: Microsoft.FSharp.Core.array<_>

  type: ‘T array

  implements: ICloneable

  implements: Collections.IList

  implements: Collections.ICollection

  implements: Collections.IStructuralComparable

  implements: Collections.IStructuralEquatable

  implements: Collections.Generic.IList<‘T>

  implements: Collections.Generic.ICollection<‘T>

  implements: seq<‘T>

  implements: Collections.IEnumerable

  inherits: Array

Multiple items

val W : ‘a array

  type: ‘a array

  implements: ICloneable

  implements: Collections.IList

  implements: Collections.ICollection

  implements: Collections.IStructuralComparable

  implements: Collections.IStructuralEquatable

  implements: Collections.Generic.IList<‘a>

  implements: Collections.Generic.ICollection<‘a>

  implements: seq<‘a>

  implements: Collections.IEnumerable

  inherits: Array

——————–

val W : ‘a array

  type: ‘a array

  implements: ICloneable

  implements: Collections.IList

  implements: Collections.ICollection

  implements: Collections.IStructuralComparable

  implements: Collections.IStructuralEquatable

  implements: Collections.Generic.IList<‘a>

  implements: Collections.Generic.ICollection<‘a>

  implements: seq<‘a>

  implements: Collections.IEnumerable

  inherits: Array

val Windices : int list

  type: int list

  implements: Collections.IStructuralEquatable

  implements: IComparable<List<int>>

  implements: IComparable

  implements: Collections.IStructuralComparable

  implements: Collections.Generic.IEnumerable<int>

  implements: Collections.IEnumerable

val W : ‘a array

  type: ‘a array

  implements: ICloneable

  implements: Collections.IList

  implements: Collections.ICollection

  implements: Collections.IStructuralComparable

  implements: Collections.IStructuralEquatable

  implements: Collections.Generic.IList<‘a>

  implements: Collections.Generic.ICollection<‘a>

  implements: seq<‘a>

  implements: Collections.IEnumerable

  inherits: Array

property Array.Length: int
val Munproposed : (int * int list) list

  type: (int * int list) list

  implements: Collections.IStructuralEquatable

  implements: IComparable<List<int * int list>>

  implements: IComparable

  implements: Collections.IStructuralComparable

  implements: Collections.Generic.IEnumerable<int * int list>

  implements: Collections.IEnumerable

Multiple items

module List

from Microsoft.FSharp.Collections

——————–

type List<‘T> =

  | ( [] )

  | ( :: ) of ‘T * ‘T list

  with

    interface Collections.IEnumerable

    interface Collections.Generic.IEnumerable<‘T>

    member Head : ‘T

    member IsEmpty : bool

    member Item : index:int -> ‘T with get

    member Length : int

    member Tail : ‘T list

    static member Cons : head:’T * tail:’T list -> ‘T list

    static member Empty : ‘T list

  end

Full name: Microsoft.FSharp.Collections.List<_>

  type: List<‘T>

  implements: Collections.IStructuralEquatable

  implements: IComparable<List<‘T>>

  implements: IComparable

  implements: Collections.IStructuralComparable

  implements: Collections.Generic.IEnumerable<‘T>

  implements: Collections.IEnumerable

val init : int -> (int -> ‘T) -> ‘T list

Full name: Microsoft.FSharp.Collections.List.init

val M : ‘a array

  type: ‘a array

  implements: ICloneable

  implements: Collections.IList

  implements: Collections.ICollection

  implements: Collections.IStructuralComparable

  implements: Collections.IStructuralEquatable

  implements: Collections.Generic.IList<‘a>

  implements: Collections.Generic.ICollection<‘a>

  implements: seq<‘a>

  implements: Collections.IEnumerable

  inherits: Array

val mi : int

  type: int

  implements: IComparable

  implements: IFormattable

  implements: IConvertible

  implements: IComparable<int>

  implements: IEquatable<int>

  inherits: ValueType

val sortFun : (int -> float)
val wi : int

  type: int

  implements: IComparable

  implements: IFormattable

  implements: IConvertible

  implements: IComparable<int>

  implements: IEquatable<int>

  inherits: ValueType

val sortBy : (‘T -> ‘Key) -> ‘T list -> ‘T list (requires comparison)

Full name: Microsoft.FSharp.Collections.List.sortBy

val findMarriages : ((int * int list) list -> Map<int,(int * int list)> -> Map<int,(int * int list)>)
val mSingle : (int * int list) list

  type: (int * int list) list

  implements: Collections.IStructuralEquatable

  implements: IComparable<List<int * int list>>

  implements: IComparable

  implements: Collections.IStructuralComparable

  implements: Collections.Generic.IEnumerable<int * int list>

  implements: Collections.IEnumerable

val wEngaged : Map<int,(int * int list)>

  type: Map<int,(int * int list)>

  implements: IComparable

  implements: Collections.Generic.IDictionary<int,(int * int list)>

  implements: Collections.Generic.ICollection<Collections.Generic.KeyValuePair<int,(int * int list)>>

  implements: seq<Collections.Generic.KeyValuePair<int,(int * int list)>>

  implements: Collections.IEnumerable

val bachelors : (int * int list) list

  type: (int * int list) list

  implements: Collections.IStructuralEquatable

  implements: IComparable<List<int * int list>>

  implements: IComparable

  implements: Collections.IStructuralComparable

  implements: Collections.Generic.IEnumerable<int * int list>

  implements: Collections.IEnumerable

val rest : int list

  type: int list

  implements: Collections.IStructuralEquatable

  implements: IComparable<List<int>>

  implements: IComparable

  implements: Collections.IStructuralComparable

  implements: Collections.Generic.IEnumerable<int>

  implements: Collections.IEnumerable

val m : int * int list
Multiple items

module Map

from Microsoft.FSharp.Collections

——————–

type Map<‘Key,’Value (requires comparison)> =

  class

    interface Collections.IEnumerable

    interface IComparable

    interface Collections.Generic.IEnumerable<Collections.Generic.KeyValuePair<‘Key,’Value>>

    interface Collections.Generic.ICollection<Collections.Generic.KeyValuePair<‘Key,’Value>>

    interface Collections.Generic.IDictionary<‘Key,’Value>

    new : elements:seq<‘Key * ‘Value> -> Map<‘Key,’Value>

    member Add : key:’Key * value:’Value -> Map<‘Key,’Value>

    member ContainsKey : key:’Key -> bool

    override Equals : obj -> bool

    member Remove : key:’Key -> Map<‘Key,’Value>

    member TryFind : key:’Key -> ‘Value option

    member Count : int

    member IsEmpty : bool

    member Item : key:’Key -> ‘Value with get

  end

Full name: Microsoft.FSharp.Collections.Map<_,_>

  type: Map<‘Key,’Value>

  implements: IComparable

  implements: Collections.Generic.IDictionary<‘Key,’Value>

  implements: Collections.Generic.ICollection<Collections.Generic.KeyValuePair<‘Key,’Value>>

  implements: seq<Collections.Generic.KeyValuePair<‘Key,’Value>>

  implements: Collections.IEnumerable

val tryFind : ‘Key -> Map<‘Key,’T> -> ‘T option (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.tryFind

union case Option.None: Option<‘T>
val add : ‘Key -> ‘T -> Map<‘Key,’T> -> Map<‘Key,’T> (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.add

union case Option.Some: ‘T -> Option<‘T>
val m’ : int * int list
val mi’ : int

  type: int

  implements: IComparable

  implements: IFormattable

  implements: IConvertible

  implements: IComparable<int>

  implements: IEquatable<int>

  inherits: ValueType

val empty<‘Key,’T (requires comparison)> : Map<‘Key,’T> (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.empty

val map : (‘Key -> ‘T -> ‘U) -> Map<‘Key,’T> -> Map<‘Key,’U> (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.map

val alignJaroWinkler : (‘a array -> ‘a array -> Map<int,int>)

Full name: Snippet.alignJaroWinkler