7
Dec 14

## Developing an Algorithm in F#: Fast Rotational Alignments with Gosper’s Hack

This post is for the 7th day of the 2014 F# Advent Calendar.

It has been said that functional languages can’t be as fast as their imperative cousins because of all of the allocation and garbage collection, this is patently false (as far as F# is concerned at least) largely because the CLR has value types.

Recently Louis Thrall and I ran into an interesting problem at work: how do you efficiently compare all rotations of one array to another to select the best? This is easy when the arrays are the same size, just shift the indices around:

Which is equivalent to a nested for loop, the top one rotating and the inner doing comparisons.

But what about differently sized arrays? It turns out this is more complicated, even for comparing a size 2 array to size 3:

Like comparing beads on two strings, there is much shifting as the indices are rotated around. The extra freedom from this shifting means that more comparisons must be made even though there total number of combined elements is smaller.

This problem also has an interesting structure, as it costs the most when one array is about half the size of the other. When the smaller array is size 1 then it simply needs to be moved around, and when the arrays are equal in size it’s a simple matter of rotations.

After some research Lou noted that this we were “counting the number of cyclic suborders of size k of a cyclic order of size n” which produces k(n choose k) different configurations.

Now that we had bounds and knew what we were doing was reasonable we had to figure out an algorithm for this process. It occurred to us that it would be ideal to pre-compute the comparison values to avoid repeated comparisons. Given this matrix of comparison values it wasn’t too difficult to determine a procedure for checking all possible configurations.

Each time a pair is chosen, it further constrains what the rest of the assignments may be. The bottom level is somewhat obvious, just select each of the possible starting combinations down the y-axis. In the next column you potentially have a range of possible “downward” (using clock arithmetic) selections bounded by the difference in lengths and what has already been fixed.

Now it was time to implement, and the first stab was nice and concise and functional and worked pretty well.

``` 1: open System
2:
3: // arr1 is always equal in size or smaller than arr2
4: let bestRot (f: 'a -> 'a -> float) (arr1: 'a []) (arr2: 'a []) =
5:     // Pre-calculate similarities
6:     let scores =
7:         Array2D.init arr1.Length arr2.Length
8:                     (fun i j -> f arr1.[i] arr2.[j])
9:     // inner function for recursively finding paths
10:     // col = previous column, prow = previous row
11:     // df = degrees of freedom, path = path accumulator
12:     let rec inner col prow df path =
13:         seq {
14:             // when we're out of columns to assign we're done
15:             if col >= arr1.Length then yield path |> List.rev
16:             else
17:                 // We have df "degrees of freedom" left
18:                 for d = 1 to df + 1 do
19:                     // Clock arithmetic for the row
20:                     let nrow = (prow + d) % arr2.Length
21:                     // Recurse yielding out all further paths
22:                     yield! inner (col + 1) nrow (df + 1 - d)
23:                                  ((col, nrow) :: path)
24:         }
25:     let res, score =
26:         // the difference in length
27:         // is the starting "degrees of freedom"
28:         let diff = arr2.Length - arr1.Length
29:         seq {
30:             // for each y-axis starting point
31:             for y = 0 to arr2.Length - 1 do
32:                 // 1 selected, r is the starting point (on y),
33:                 //    starting is always 0 on x.
34:                 // ((0, y) :: []) is the accumulator
35:                 //    with the initial (0, y) coordinates
36:                 yield! inner 1 y diff ((0, y) :: [])
37:         }
38:         // Sum each path to find total similarity
39:         |> Seq.map
40:             (fun l -> l, l |> Seq.sumBy (fun (i,j) -> scores.[i,j]))
41:         // Get the path with the highest similarity
42:         |> Seq.maxBy snd
43:     // Create output array and copy in the results
44:     let out = Array.create arr1.Length Int32.MinValue
45:     for (i,j) in res do
46:         out.[i] <- j
47:     // Return results
48:     out, score```
namespace System
val bestRot : ('a -> 'a -> float) -> 'a [] -> 'a [] -> int [] * float

Full name: Snippet.bestRot

val f : ('a -> 'a -> float)
Multiple items
val float : 'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

——————–
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

type: float<'Measure>
implements: IComparable
implements: IConvertible
implements: IFormattable
implements: IComparable<float<'Measure>>
implements: IEquatable<float<'Measure>>
inherits: ValueType

——————–
type float = Double

Full name: Microsoft.FSharp.Core.float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val arr1 : 'a []

type: 'a []
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
implements: Collections.Generic.IList<'a>
implements: Collections.Generic.ICollection<'a>
implements: seq<'a>
implements: Collections.IEnumerable
inherits: Array

val arr2 : 'a []

type: 'a []
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
implements: Collections.Generic.IList<'a>
implements: Collections.Generic.ICollection<'a>
implements: seq<'a>
implements: Collections.IEnumerable
inherits: Array

val scores : float [,]

type: float [,]
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IEnumerable
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
inherits: Array

module Array2D

from Microsoft.FSharp.Collections

val init : int -> int -> (int -> int -> 'T) -> 'T [,]

Full name: Microsoft.FSharp.Collections.Array2D.init

property Array.Length: int
val i : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val j : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val inner : (int -> int -> int -> (int * int) list -> seq<(int * int) list>)
val col : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val prow : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val df : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val path : (int * int) list

type: (int * int) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * int>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * int>
implements: Collections.IEnumerable

Multiple items
val seq : seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Core.Operators.seq

——————–
type seq<'T> = Collections.Generic.IEnumerable<'T>

Full name: Microsoft.FSharp.Collections.seq<_>

type: seq<'T>
inherits: Collections.IEnumerable

Multiple items
module List

from Microsoft.FSharp.Collections

——————–
type List<'T> =
| ( [] )
| ( :: ) of 'T * 'T list
with
interface Collections.IEnumerable
interface Collections.Generic.IEnumerable<'T>
member IsEmpty : bool
member Item : index:int -> 'T with get
member Length : int
member Tail : 'T list
static member Cons : head:'T * tail:'T list -> 'T list
static member Empty : 'T list
end

Full name: Microsoft.FSharp.Collections.List<_>

type: List<'T>
implements: Collections.IStructuralEquatable
implements: IComparable<List<'T>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<'T>
implements: Collections.IEnumerable

val rev : 'T list -> 'T list

Full name: Microsoft.FSharp.Collections.List.rev

val d : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val nrow : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val res : (int * int) list

type: (int * int) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * int>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * int>
implements: Collections.IEnumerable

val score : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val diff : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val y : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

module Seq

from Microsoft.FSharp.Collections

val map : ('T -> 'U) -> seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map

val l : (int * int) list

type: (int * int) list
implements: Collections.IStructuralEquatable
implements: IComparable<List<int * int>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<int * int>
implements: Collections.IEnumerable

val sumBy : ('T -> 'U) -> seq<'T> -> 'U (requires member ( + ) and member get_Zero)

Full name: Microsoft.FSharp.Collections.Seq.sumBy

val maxBy : ('T -> 'U) -> seq<'T> -> 'T (requires comparison)

Full name: Microsoft.FSharp.Collections.Seq.maxBy

val snd : ('T1 * 'T2) -> 'T2

Full name: Microsoft.FSharp.Core.Operators.snd

val out : int []

type: int []
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
implements: Collections.Generic.IList<int>
implements: Collections.Generic.ICollection<int>
implements: seq<int>
implements: Collections.IEnumerable
inherits: Array

type Array =
class
member Clone : unit -> obj
member CopyTo : System.Array * int -> unit
member CopyTo : System.Array * int64 -> unit
member GetEnumerator : unit -> System.Collections.IEnumerator
member GetLength : int -> int
member GetLongLength : int -> int64
member GetLowerBound : int -> int
member GetUpperBound : int -> int
member GetValue : int [] -> obj
member GetValue : int -> obj
member GetValue : int64 -> obj
member GetValue : int64 [] -> obj
member GetValue : int * int -> obj
member GetValue : int64 * int64 -> obj
member GetValue : int * int * int -> obj
member GetValue : int64 * int64 * int64 -> obj
member Initialize : unit -> unit
member IsFixedSize : bool
member IsSynchronized : bool
member Length : int
member LongLength : int64
member Rank : int
member SetValue : obj * int -> unit
member SetValue : obj * int [] -> unit
member SetValue : obj * int64 -> unit
member SetValue : obj * int64 [] -> unit
member SetValue : obj * int * int -> unit
member SetValue : obj * int64 * int64 -> unit
member SetValue : obj * int * int * int -> unit
member SetValue : obj * int64 * int64 * int64 -> unit
member SyncRoot : obj
static member BinarySearch : System.Array * obj -> int
static member BinarySearch<'T> : 'T [] * 'T -> int
static member BinarySearch : System.Array * obj * System.Collections.IComparer -> int
static member BinarySearch<'T> : 'T [] * 'T * System.Collections.Generic.IComparer<'T> -> int
static member BinarySearch : System.Array * int * int * obj -> int
static member BinarySearch<'T> : 'T [] * int * int * 'T -> int
static member BinarySearch : System.Array * int * int * obj * System.Collections.IComparer -> int
static member BinarySearch<'T> : 'T [] * int * int * 'T * System.Collections.Generic.IComparer<'T> -> int
static member Clear : System.Array * int * int -> unit
static member ConstrainedCopy : System.Array * int * System.Array * int * int -> unit
static member ConvertAll<'TInput,'TOutput> : 'TInput [] * System.Converter<'TInput,'TOutput> -> 'TOutput []
static member Copy : System.Array * System.Array * int -> unit
static member Copy : System.Array * System.Array * int64 -> unit
static member Copy : System.Array * int * System.Array * int * int -> unit
static member Copy : System.Array * int64 * System.Array * int64 * int64 -> unit
static member CreateInstance : System.Type * int -> System.Array
static member CreateInstance : System.Type * int [] -> System.Array
static member CreateInstance : System.Type * int64 [] -> System.Array
static member CreateInstance : System.Type * int * int -> System.Array
static member CreateInstance : System.Type * int [] * int [] -> System.Array
static member CreateInstance : System.Type * int * int * int -> System.Array
static member Exists<'T> : 'T [] * System.Predicate<'T> -> bool
static member Find<'T> : 'T [] * System.Predicate<'T> -> 'T
static member FindAll<'T> : 'T [] * System.Predicate<'T> -> 'T []
static member FindIndex<'T> : 'T [] * System.Predicate<'T> -> int
static member FindIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
static member FindIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
static member FindLast<'T> : 'T [] * System.Predicate<'T> -> 'T
static member FindLastIndex<'T> : 'T [] * System.Predicate<'T> -> int
static member FindLastIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
static member FindLastIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
static member ForEach<'T> : 'T [] * System.Action<'T> -> unit
static member IndexOf : System.Array * obj -> int
static member IndexOf<'T> : 'T [] * 'T -> int
static member IndexOf : System.Array * obj * int -> int
static member IndexOf<'T> : 'T [] * 'T * int -> int
static member IndexOf : System.Array * obj * int * int -> int
static member IndexOf<'T> : 'T [] * 'T * int * int -> int
static member LastIndexOf : System.Array * obj -> int
static member LastIndexOf<'T> : 'T [] * 'T -> int
static member LastIndexOf : System.Array * obj * int -> int
static member LastIndexOf<'T> : 'T [] * 'T * int -> int
static member LastIndexOf : System.Array * obj * int * int -> int
static member LastIndexOf<'T> : 'T [] * 'T * int * int -> int
static member Resize<'T> : 'T [] * int -> unit
static member Reverse : System.Array -> unit
static member Reverse : System.Array * int * int -> unit
static member Sort : System.Array -> unit
static member Sort<'T> : 'T [] -> unit
static member Sort : System.Array * System.Array -> unit
static member Sort : System.Array * System.Collections.IComparer -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] -> unit
static member Sort<'T> : 'T [] * System.Collections.Generic.IComparer<'T> -> unit
static member Sort<'T> : 'T [] * System.Comparison<'T> -> unit
static member Sort : System.Array * int * int -> unit
static member Sort : System.Array * System.Array * System.Collections.IComparer -> unit
static member Sort<'T> : 'T [] * int * int -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * System.Collections.Generic.IComparer<'TKey> -> unit
static member Sort : System.Array * System.Array * int * int -> unit
static member Sort : System.Array * int * int * System.Collections.IComparer -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int -> unit
static member Sort<'T> : 'T [] * int * int * System.Collections.Generic.IComparer<'T> -> unit
static member Sort : System.Array * System.Array * int * int * System.Collections.IComparer -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int * System.Collections.Generic.IComparer<'TKey> -> unit
static member TrueForAll<'T> : 'T [] * System.Predicate<'T> -> bool
end

Full name: System.Array

type: Array
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IEnumerable
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable

val create : int -> 'T -> 'T []

Full name: Microsoft.FSharp.Collections.Array.create

type Int32 =
struct
member CompareTo : obj -> int
member CompareTo : int -> int
member Equals : obj -> bool
member Equals : int -> bool
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member ToString : unit -> string
member ToString : string -> string
member ToString : System.IFormatProvider -> string
member ToString : string * System.IFormatProvider -> string
static val MaxValue : int
static val MinValue : int
static member Parse : string -> int
static member Parse : string * System.Globalization.NumberStyles -> int
static member Parse : string * System.IFormatProvider -> int
static member Parse : string * System.Globalization.NumberStyles * System.IFormatProvider -> int
static member TryParse : string * int -> bool
static member TryParse : string * System.Globalization.NumberStyles * System.IFormatProvider * int -> bool
end

Full name: System.Int32

type: Int32
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

field Int32.MinValue = -2147483648

However, after real world trials we found that it was taking an astonishing 20% of our runtime. We had to find a better way. We soon found Gosper’s hack which did mostly what we wanted. It gave us the combinations, and we found we could just rotate each of those around to give us all of our cyclic suborders.

And so we went about rewriting our nice functional sample to use Gosper’s Hack to avoid the need to generate paths and also to be (nearly) minimally allocating to avoid garbage collection overhead.

``` 1: open System
2:
3: // arr1 is always equal in size or smaller than arr2
4: let bestRotGos (f: 'a -> 'a -> float) (arr1: 'a []) (arr2: 'a []) =
5:     // Start Gosper's Hack Machinery Bootstrap
6:     let bitsConfig = Array.create arr1.Length Int32.MinValue
7:
8:     let inline setBits v =
9:         let inline getBit x i = x &&& (1 <<< i) <> 0
10:         let mutable idx = 0
11:         for i = 0 to 31 do
12:             if getBit v i then
13:                 bitsConfig.[idx] <- i
14:                 idx <- idx + 1
15:
16:     let inline nextGosper set =
17:         let c = set &&& -set
18:         let r = set + c
19:         r + (((r^^^set)/c)>>>2)
20:
21:     let limit =
22:         let n = arr2.Length
23:         (1 <<< n)
24:
25:     let mutable set =
26:         let k = arr1.Length
27:         (1 <<< k) - 1
28:     // End Gosper's Hack Machinery Bootstrap
29:
30:     // Pre-calculate sims and init placeholder variables
31:     let scores = Array2D.init arr1.Length arr2.Length
32:                     (fun i j -> f arr1.[i] arr2.[j])
33:
34:     let mutable maxScore = Double.NegativeInfinity
35:     let bestConfig = Array.create arr1.Length Int32.MinValue
36:
37:     while (set < limit) do
38:         setBits set // Turn "set" bits into indices in "bitsConfig"
39:
40:         // For each rotation
41:         for i = 0 to bitsConfig.Length - 1 do
42:
43:             // calculate score and compare with previous best
44:             let mutable totalScore = 0.0
45:             for i = 0 to bitsConfig.Length - 1 do
46:                 let tokenScore = scores.[i,bitsConfig.[i]]
47:                 totalScore <- totalScore + tokenScore
48:             if totalScore > maxScore then
49:                 // and replace if better
50:                 maxScore <- totalScore
51:                 Array.blit bitsConfig 0
52:                            bestConfig 0
53:                            bitsConfig.Length
54:
55:             // Rotate the array
56:             let firstElement = bitsConfig.[0]
57:             for i = 0 to bitsConfig.Length - 2 do
58:                 bitsConfig.[i] <- bitsConfig.[i + 1]
59:             bitsConfig.[bitsConfig.Length - 1] <- firstElement
60:
61:         set <- nextGosper set // Put the next combination in "set"
62:     bestConfig, maxScore```
namespace System
val bestRotGos : ('a -> 'a -> float) -> 'a [] -> 'a [] -> int [] * float

Full name: Snippet.bestRotGos

val f : ('a -> 'a -> float)
Multiple items
val float : 'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

——————–
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

type: float<'Measure>
implements: IComparable
implements: IConvertible
implements: IFormattable
implements: IComparable<float<'Measure>>
implements: IEquatable<float<'Measure>>
inherits: ValueType

——————–
type float = Double

Full name: Microsoft.FSharp.Core.float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val arr1 : 'a []

type: 'a []
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
implements: Collections.Generic.IList<'a>
implements: Collections.Generic.ICollection<'a>
implements: seq<'a>
implements: Collections.IEnumerable
inherits: Array

val arr2 : 'a []

type: 'a []
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
implements: Collections.Generic.IList<'a>
implements: Collections.Generic.ICollection<'a>
implements: seq<'a>
implements: Collections.IEnumerable
inherits: Array

val bitsConfig : int []

type: int []
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
implements: Collections.Generic.IList<int>
implements: Collections.Generic.ICollection<int>
implements: seq<int>
implements: Collections.IEnumerable
inherits: Array

type Array =
class
member Clone : unit -> obj
member CopyTo : System.Array * int -> unit
member CopyTo : System.Array * int64 -> unit
member GetEnumerator : unit -> System.Collections.IEnumerator
member GetLength : int -> int
member GetLongLength : int -> int64
member GetLowerBound : int -> int
member GetUpperBound : int -> int
member GetValue : int [] -> obj
member GetValue : int -> obj
member GetValue : int64 -> obj
member GetValue : int64 [] -> obj
member GetValue : int * int -> obj
member GetValue : int64 * int64 -> obj
member GetValue : int * int * int -> obj
member GetValue : int64 * int64 * int64 -> obj
member Initialize : unit -> unit
member IsFixedSize : bool
member IsSynchronized : bool
member Length : int
member LongLength : int64
member Rank : int
member SetValue : obj * int -> unit
member SetValue : obj * int [] -> unit
member SetValue : obj * int64 -> unit
member SetValue : obj * int64 [] -> unit
member SetValue : obj * int * int -> unit
member SetValue : obj * int64 * int64 -> unit
member SetValue : obj * int * int * int -> unit
member SetValue : obj * int64 * int64 * int64 -> unit
member SyncRoot : obj
static member BinarySearch : System.Array * obj -> int
static member BinarySearch<'T> : 'T [] * 'T -> int
static member BinarySearch : System.Array * obj * System.Collections.IComparer -> int
static member BinarySearch<'T> : 'T [] * 'T * System.Collections.Generic.IComparer<'T> -> int
static member BinarySearch : System.Array * int * int * obj -> int
static member BinarySearch<'T> : 'T [] * int * int * 'T -> int
static member BinarySearch : System.Array * int * int * obj * System.Collections.IComparer -> int
static member BinarySearch<'T> : 'T [] * int * int * 'T * System.Collections.Generic.IComparer<'T> -> int
static member Clear : System.Array * int * int -> unit
static member ConstrainedCopy : System.Array * int * System.Array * int * int -> unit
static member ConvertAll<'TInput,'TOutput> : 'TInput [] * System.Converter<'TInput,'TOutput> -> 'TOutput []
static member Copy : System.Array * System.Array * int -> unit
static member Copy : System.Array * System.Array * int64 -> unit
static member Copy : System.Array * int * System.Array * int * int -> unit
static member Copy : System.Array * int64 * System.Array * int64 * int64 -> unit
static member CreateInstance : System.Type * int -> System.Array
static member CreateInstance : System.Type * int [] -> System.Array
static member CreateInstance : System.Type * int64 [] -> System.Array
static member CreateInstance : System.Type * int * int -> System.Array
static member CreateInstance : System.Type * int [] * int [] -> System.Array
static member CreateInstance : System.Type * int * int * int -> System.Array
static member Exists<'T> : 'T [] * System.Predicate<'T> -> bool
static member Find<'T> : 'T [] * System.Predicate<'T> -> 'T
static member FindAll<'T> : 'T [] * System.Predicate<'T> -> 'T []
static member FindIndex<'T> : 'T [] * System.Predicate<'T> -> int
static member FindIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
static member FindIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
static member FindLast<'T> : 'T [] * System.Predicate<'T> -> 'T
static member FindLastIndex<'T> : 'T [] * System.Predicate<'T> -> int
static member FindLastIndex<'T> : 'T [] * int * System.Predicate<'T> -> int
static member FindLastIndex<'T> : 'T [] * int * int * System.Predicate<'T> -> int
static member ForEach<'T> : 'T [] * System.Action<'T> -> unit
static member IndexOf : System.Array * obj -> int
static member IndexOf<'T> : 'T [] * 'T -> int
static member IndexOf : System.Array * obj * int -> int
static member IndexOf<'T> : 'T [] * 'T * int -> int
static member IndexOf : System.Array * obj * int * int -> int
static member IndexOf<'T> : 'T [] * 'T * int * int -> int
static member LastIndexOf : System.Array * obj -> int
static member LastIndexOf<'T> : 'T [] * 'T -> int
static member LastIndexOf : System.Array * obj * int -> int
static member LastIndexOf<'T> : 'T [] * 'T * int -> int
static member LastIndexOf : System.Array * obj * int * int -> int
static member LastIndexOf<'T> : 'T [] * 'T * int * int -> int
static member Resize<'T> : 'T [] * int -> unit
static member Reverse : System.Array -> unit
static member Reverse : System.Array * int * int -> unit
static member Sort : System.Array -> unit
static member Sort<'T> : 'T [] -> unit
static member Sort : System.Array * System.Array -> unit
static member Sort : System.Array * System.Collections.IComparer -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] -> unit
static member Sort<'T> : 'T [] * System.Collections.Generic.IComparer<'T> -> unit
static member Sort<'T> : 'T [] * System.Comparison<'T> -> unit
static member Sort : System.Array * int * int -> unit
static member Sort : System.Array * System.Array * System.Collections.IComparer -> unit
static member Sort<'T> : 'T [] * int * int -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * System.Collections.Generic.IComparer<'TKey> -> unit
static member Sort : System.Array * System.Array * int * int -> unit
static member Sort : System.Array * int * int * System.Collections.IComparer -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int -> unit
static member Sort<'T> : 'T [] * int * int * System.Collections.Generic.IComparer<'T> -> unit
static member Sort : System.Array * System.Array * int * int * System.Collections.IComparer -> unit
static member Sort<'TKey,'TValue> : 'TKey [] * 'TValue [] * int * int * System.Collections.Generic.IComparer<'TKey> -> unit
static member TrueForAll<'T> : 'T [] * System.Predicate<'T> -> bool
end

Full name: System.Array

type: Array
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IEnumerable
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable

val create : int -> 'T -> 'T []

Full name: Microsoft.FSharp.Collections.Array.create

property Array.Length: int
type Int32 =
struct
member CompareTo : obj -> int
member CompareTo : int -> int
member Equals : obj -> bool
member Equals : int -> bool
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member ToString : unit -> string
member ToString : string -> string
member ToString : System.IFormatProvider -> string
member ToString : string * System.IFormatProvider -> string
static val MaxValue : int
static val MinValue : int
static member Parse : string -> int
static member Parse : string * System.Globalization.NumberStyles -> int
static member Parse : string * System.IFormatProvider -> int
static member Parse : string * System.Globalization.NumberStyles * System.IFormatProvider -> int
static member TryParse : string * int -> bool
static member TryParse : string * System.Globalization.NumberStyles * System.IFormatProvider * int -> bool
end

Full name: System.Int32

type: Int32
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

field Int32.MinValue = -2147483648
val setBits : (int -> unit)
val v : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val getBit : (int -> int32 -> bool)
val x : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val i : int32

type: int32
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val mutable idx : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val i : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val nextGosper : ('b -> 'd) (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val set : 'b (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val c : 'b (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val r : 'b (requires member ( ~- ) and member ( &&& ) and member ( + ) and member ( ^^^ ) and member ( / ) and member ( + ) and member ( >>> ))
val limit : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val n : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val mutable set : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val k : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val scores : float [,]

type: float [,]
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IEnumerable
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
inherits: Array

module Array2D

from Microsoft.FSharp.Collections

val init : int -> int -> (int -> int -> 'T) -> 'T [,]

Full name: Microsoft.FSharp.Collections.Array2D.init

val j : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val mutable maxScore : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

type Double =
struct
member CompareTo : obj -> int
member CompareTo : float -> int
member Equals : obj -> bool
member Equals : float -> bool
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member ToString : unit -> string
member ToString : string -> string
member ToString : System.IFormatProvider -> string
member ToString : string * System.IFormatProvider -> string
static val MinValue : float
static val MaxValue : float
static val Epsilon : float
static val NegativeInfinity : float
static val PositiveInfinity : float
static val NaN : float
static member IsInfinity : float -> bool
static member IsNaN : float -> bool
static member IsNegativeInfinity : float -> bool
static member IsPositiveInfinity : float -> bool
static member Parse : string -> float
static member Parse : string * System.Globalization.NumberStyles -> float
static member Parse : string * System.IFormatProvider -> float
static member Parse : string * System.Globalization.NumberStyles * System.IFormatProvider -> float
static member TryParse : string * float -> bool
static member TryParse : string * System.Globalization.NumberStyles * System.IFormatProvider * float -> bool
end

Full name: System.Double

type: Double
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

field Double.NegativeInfinity = -Infinity
val bestConfig : int []

type: int []
implements: ICloneable
implements: Collections.IList
implements: Collections.ICollection
implements: Collections.IStructuralComparable
implements: Collections.IStructuralEquatable
implements: Collections.Generic.IList<int>
implements: Collections.Generic.ICollection<int>
implements: seq<int>
implements: Collections.IEnumerable
inherits: Array

val mutable totalScore : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val tokenScore : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val blit : 'T [] -> int -> 'T [] -> int -> int -> unit

Full name: Microsoft.FSharp.Collections.Array.blit

val firstElement : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

A quick test shows that the new algorithm is about seven times faster across a variety of test inputs:

``` 1: let f x y = if x = y then 1.0 else 0.0
2:
3: let cmp = [|"one"; "two"; "three"; "four"|]
4: let cmps =
5:     [| for i = 0 to cmp.Length - 1 do
6:            for j = i to cmp.Length - 1 do
7:                // Longer array is second
8:                yield cmp.[j..], cmp.[i..]
9:     |]
10:
11: #time
12: for i = 0 to 1000000 do
13:     for c1,c2 in cmps do
14:         bestRot f c1 c2 |> ignore
15: //Real: 00:00:41.800, CPU: 00:00:41.812,
16: // GC gen0: 10175, gen1: 6, gen2: 1
17: #time
18:
19: #time
20: for i = 0 to 1000000 do
21:     for c1,c2 in cmps do
22:         bestRotGos f c1 c2 |> ignore
23: //Real: 00:00:06.378, CPU: 00:00:06.375,
24: // GC gen0: 689, gen1: 1, gen2: 0
25: #time
26: ```
val f : 'a -> 'a -> float (requires equality)

Full name: Snippet.f

val x : 'a (requires equality)
val y : 'a (requires equality)
val cmp : string []

Full name: Snippet.cmp

type: string []
implements: System.ICloneable
implements: System.Collections.IList
implements: System.Collections.ICollection
implements: System.Collections.IStructuralComparable
implements: System.Collections.IStructuralEquatable
implements: System.Collections.Generic.IList<string>
implements: System.Collections.Generic.ICollection<string>
implements: seq<string>
implements: System.Collections.IEnumerable
inherits: System.Array

val cmps : (string [] * string []) []

Full name: Snippet.cmps

type: (string [] * string []) []
implements: System.ICloneable
implements: System.Collections.IList
implements: System.Collections.ICollection
implements: System.Collections.IStructuralComparable
implements: System.Collections.IStructuralEquatable
implements: System.Collections.Generic.IList<string [] * string []>
implements: System.Collections.Generic.ICollection<string [] * string []>
implements: seq<string [] * string []>
implements: System.Collections.IEnumerable
inherits: System.Array

val i : int

type: int
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<int>
implements: System.IEquatable<int>
inherits: System.ValueType

property System.Array.Length: int
val j : int

type: int
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<int>
implements: System.IEquatable<int>
inherits: System.ValueType

val c1 : string []

type: string []
implements: System.ICloneable
implements: System.Collections.IList
implements: System.Collections.ICollection
implements: System.Collections.IStructuralComparable
implements: System.Collections.IStructuralEquatable
implements: System.Collections.Generic.IList<string>
implements: System.Collections.Generic.ICollection<string>
implements: seq<string>
implements: System.Collections.IEnumerable
inherits: System.Array

val c2 : string []

type: string []
implements: System.ICloneable
implements: System.Collections.IList
implements: System.Collections.ICollection
implements: System.Collections.IStructuralComparable
implements: System.Collections.IStructuralEquatable
implements: System.Collections.Generic.IList<string>
implements: System.Collections.Generic.ICollection<string>
implements: seq<string>
implements: System.Collections.IEnumerable
inherits: System.Array

val ignore : 'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore

Of course, there are a few more things that could be done to make this even faster mostly related to further reducing allocations. For example, we could potentially use larger fixed sized arrays if we knew our input was bounded and pass in the arrays as references letting the calling function handle proper recycling.

However, at this point the function no longer appeared anywhere near significant when profiling our overall usage and so we found this was fast enough. It’s not worth making things uglier and more prone to breakage for minor gains when what we have right now is a referentially transparent function with predictable behavior.

27
Sep 11

## Record Linkage Algorithms in F# – Extensions to Jaro-Winkler Distance (Part 3)

While writing the previous article on tokenized matching I realized I left out some important background information on Jaro-Winkler distance.

The Vector Triangle Inequality

First, there’s something important to know about the Jaro-Winkler distance: it’s not a metric distance and so does not obey the triangle inequality. That is, if you found the JW distance between strings A and B, and then found the JW distance between strings B and C, those results would have no relationship with JW distance between strings A and C. This may not seem like a big deal, but it means Jaro-Winkler distance can’t be used to embed strings in a metric space and so is a poor algorithm choice for many types of clustering. This will be an important point in future articles.

Second, it can be very helpful to extend the results of Jaro-Winkler based on the nature of your own data and your use of the algorithm. To better support my own use case I’ve made changes put the emphasis on better token alignment.

``` 1: let jaroWinklerMI (t1:string) (t2:string) =
2:     // Optimizations for easy to calculate cases
3:     if t1.Length = 0 || t2.Length = 0 then 0.0
4:     elif t1 = t2 then 1.0
5:     else
6:         // Even more weight for the first char
7:         let score = jaroWinkler t1 t2
8:         let p = 0.2 //percentage of score from new metric
9:         let b = if t1.[0] = t2.[0] then 1.0 else 0.0
10:         ((1.0 - p) * score) + (p * b)
```

Beyond the optimization for empty strings and those which are exactly the same, you can see here that I weight the first character even more heavily. This is due to my data being very initial heavy.

To compensate for the frequent use of middle initials I count Jaro-Winkler distance as 80% of the score, while the remaining 20% is fully based on the first character matching. The value of p here was determined by the results of heavy experimentation and hair pulling. Before making this extension initials would frequently align incorrectly.

```12: let scoreNamePairs (t1:string) (t2:string) =
13:     //Raise jaro to a power in order to over-weight better matches
14:     jaroWinklerMI t1 t2 ** 2.0```

I also take the square of the result of jaroWinklerMI to weight better matches even more heavily. I found that in doing this I was able to get much more reliable matching. To understand how this works take a gander at this plot.

As you already know, multiplying any number greater than 0 but less than 1 by itself will give you a smaller number. However are you might intuit, the smaller the number the greater the proportional reduction. As you can see here, anything less than 1 takes a hit, but worse matches get dragged down significantly more.

Initially I was frustrated by bad alignments which would sometimes be chosen over better ones when two or more tokens were both fairly close, but not great. After seeing a variation on this squaring technique used for matrix convergence the thought occurred to me: why not see if it helps with token alignment? After implementing this I saw a huge improvement in results: incorrect alignments completely disappeared!

It’s often surprising where inspiration will come from.

Edit: The above code and it’s composition with Gale-Shapely is now available in my github repository.

val jaroWinklerMI : string -> string -> float

Full name: Snippet.jaroWinklerMI

val t1 : string

type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>

Multiple items

val string : 'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

——————–

type string = System.String

Full name: Microsoft.FSharp.Core.string

type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>

val t2 : string

type: string
implements: System.IComparable
implements: System.ICloneable
implements: System.IConvertible
implements: System.IComparable<string>
implements: seq<char>
implements: System.Collections.IEnumerable
implements: System.IEquatable<string>

property System.String.Length: int
val score : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

val p : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

val b : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

val scoreNamePairs : string -> string -> float

Full name: Snippet.scoreNamePairs

19
Sep 11

## Imperative Pseudocode to Pure Functional Algorithm with Gale-Shapely and F#

Continuing from last time, let’s look at how one goes from imperative pseudocode to pure functional using Gale-Shapely as an example.

Overall, to convert an algorithm from imperative to functional is a fairly simple process once you understand how to convert from a while loop to recursion with accumulators. This post is just a more advanced version of what I talked about in my From Imperative to Computation Expressions post. In fact, if you haven’t already I suggest you read it first. I’ll be assuming you understand how to perform that conversion in this article.  If you’re still having trouble, another good article which is a bit easier is The Ted Neward F# Folding Challenge.

The conversion of an algorithm is a two stage process: first identify algorithmic cases, implied mutable data structures, and alternate immutable data structures.  Then convert the given loops into recursion while leveraging pattern matching.  I know it may seem difficult at first, but after reading this I think you’ll have a good idea of how to get there.

So, let’s identify the cases.

function stableMatching {
Initialize all m ? M and w ? W to free
// Loop Termination Case on
// check for “free m with unproposed w” and

// implicit ranking of W for each m in M

while ? free man m who still has a
woman w to propose to {
w = m’s highest ranked such woman
who he has not proposed to yet

// She’s Single Case
if w is free
(m, w) become engaged
// She’s Engaged Case
else some pair (m’, w) already exists
// She Picks m Sub-Case

if w prefers m to m’
(m, w) become engaged
m’ becomes free
// She Picks m’ Sub-Case
else
(m’, w) remain engaged
}
}

Starting from the top, we can just ignore the imperative initialization.  However, you’ll notice that a great deal is going on in that while clause.

First, it defines our loop termination by looking for a “free m who still has a w to propose to”.  This is interesting because it seems to be begging for some kind of data structure lookup.  Second it defines a implicit ranking of each member of W by each member of M.  Note that this is not mentioned in the initialization phase.

The rest of this algorithm reads like a simple list of cases.  To make this plain, let’s list them out along with state changes.

1. no single m in M with unproposed w
— execution terminates
2. m proposes to unengaged w
— w is no longer in m proposals
— m is no longer single
— m and w are engaged
3. m proposes to w engaged to m’
a.  She switches to m
— w is no longer in m proposals
— m is no longer single
— m and w are engaged
— m’ is single
b.  She stays with m’
— w is no longer in m proposals
— m’ and w stay engaged
— m remains single

The mechanics of what exactly we do in these cases depends on our data structures.  So we can’t get much further without figuring that out.  To identify what data structures are needed we’ll need to identify what exactly it is that needs to be kept track of.  Only once we know that can we pick our immutable data structures appropriately.

Looking above, it appears that we’ll need to keep track of:

1. Given a woman, is she engaged and, if so, who is she engaged to? (need to look it up)
2. Which men are single? (just need one at a time, order doesn’t matter)
3. For each man, which is his next most desirable match that he has not yet proposed to? (just need one at a time, order matters)

Note that we don’t have to keep track of which men are engaged to which women, the question is never asked.  This is a minimal set of requirements here.  No extra stuff allowed.  I’m serious.

Moving on, in an imperative mutable version you might use these data structures:

1. An array indexed by w to the m index engaged to (-1 if single)
2. An array indexed by m to the w index engaged to (-1 if single)
3. An array indexed by m where each element is an ordered linked list of unproposed w indices.

However, what would satisfy these same requirements but be immutable?  Well, if we step out of the mutation mindset it becomes obvious pretty quickly:

1. An engagement map of women to men
2. An immutable linked list of single men
3. Define a man to include his index as well as an ordered unproposed woman immutable linked list

Instead of scanning arrays, or querying mutable structures we simply decompose our data structures as much as needed and recompose them into the form we need on the next call.   This may sound difficult, but it’s really not.  The fact that we have pattern matching and our cases can be defined in terms of the shape of our data structures actually makes this very simple to do.  That is, once you get the hang of it.

To see this in action, let’s break up the code in the last article and discuss each chunk.

``` 1: open System
2:
3: // a Bachelor is an identity index and an
4: // ordered list of women indicies to approach.
5: type Bachelor = int * int list```

Here we see the definition of a bachelor, it’s a tuple of an integer and a list of integers. This is effectively a pair containing an index and a list of woman indices. Note that in this example all men are of type Bachelor.

``` 7: // Some notation:
8: // wi = woman index (int)
9: // mi = man index (int)
10: // mi' = woman's current partner index (int)
11: // m = man with index and unapproached women indices (Bachelor)
12: // mSingle = men that are single (Bachelor list)
13: // wEngaged = engagements from women to men (int, Bachelor)```

We’ll keep this notation around just so you have easy reference.

```15: let funGS (comp: _ -> _ -> float) (M: _ array) (W: _ array) =
16:   let Windices = [ 0 .. W.Length - 1 ]
17:   // List of men with women in order of desire
18:   let Munproposed =
19:     List.init M.Length
20:       (fun mi ->
21:            let sortFun wi = 1.0 - (comp M.[mi] W.[wi])
22:            mi, Windices |> List.sortBy sortFun)```

Windices is just a list of numbers from 0 to the number of woman tokens. Doing this first makes the calculation of Munproposed less expensive.

Munproposed is our initial set of all single men. Here List.init is used to call the given lambda to generate each element. As you can see by the last line within that lambda, each element of the list is a tuple containing the index of that man and a list of women indices sorted by the given desirability function, sortFun.

```23:   // Recursively solve stable marriages
24:   let rec findMarriages mSingle wEngaged =
25:     match mSingle with
26:     // No single guys left with desired women, we're done
27:     | [] -> wEngaged
28:     // Guy is out of luck, remove from singles
29:     | (mi, []) :: bachelors -> findMarriages bachelors wEngaged```

These are our first two cases, if the mSingle list pattern matches with [], we know it’s empty and so we are done.

In the second case we are pattern matching on the structure of singles list as well as it’s first element.  The syntax (mi, []) matches on a tuple of an index and an empty inner list.  This list is the list of yet unproposed to women and so if it’s empty we know that this guy is out of options and so it’s ok to drop him from the list of bachelors.  We do this by simply recursing on bachelors, which is the tail of the mSingle list.

```30:     // He's got options!
31:     | (mi, wi :: rest) :: bachelors ->
32:       let m = mi, rest```

This is our general purpose match, and is the structure we expect to see in most cases.  Here the head is fully decomposed into a man index (mi), his next first choice woman index (wi), the rest of the women (rest) and the rest of the single men (bachelors).

Immediately afterward we define m to be that given man index (mi) and the rest of the women tokens (rest).  As two tokens are only ever compared once, m should no longer contain wi in his ordered list of proposals.  This is the form he will take until the algorithm ends or he is evaluated again after being placed in the mSingle pool.

```33:       match wEngaged |> Map.tryFind wi with
34:       // She's single! m is now engaged!
35:       | None -> findMarriages bachelors (wEngaged |> Map.add wi m)
36:       // She's already engaged, let the best man win!
37:       | Some (m') –> 38:         let mi', _ = m'```

Now that we have fully matched out everything masculine, it’s time to turn to the feminine.

Here we look in the wEngaged map to find out if wi is single.  This is done by a using Map.tryFind which returns None if the given key is not in the map and Some(value) if it is.

If the Map.tryFind call returns None we know she is single and so recurse with the rest of our single men (bachelors) and our wEngaged map is extended via Map.add to contain a new mapping from the woman index (wi) to the given man (m).

If she’s engaged our pattern match automatically decomposes the given option type and we know exactly who she is engaged to (m’).

```39:         if comp W.[wi] M.[mi] > comp W.[wi] M.[mi'] then
40:           // Congrats mi, he is now engaged to wi
41:           // The previous suitor (mi') is bested
42:           findMarriages
43:             (m' :: bachelors)
44:             (wEngaged |> Map.add wi m)
45:         else
46:           // The current bachelor (mi) lost, better luck next time
47:           findMarriages
48:             (m :: bachelors)
49:             wEngaged```

This is the core comparison logic.  Here we determine which bachelor gets her hand.  This is pretty straightforward from an imperative perspective as we’re using the oh-so familiar if statement.

In the first case the new contender (m) wins.  We append the loser (m’) back on to the head of what will be mSingle next time around with the syntax (m’ :: bachelors).  Similar to what happens if she’s single, we add a mapping from the woman index (wi) to the man instance (m) via the call to Map.add.  Note that this will override the former mapping from wi to m’.

If the current bachelor loses we simply append him back to the head of the bachelors list and try again next time around.  You’ll get her next time bro.

```50:   findMarriages Munproposed Map.empty
51:   // Before returning, remove unproposed lists from man instances
52:   |> Map.map (fun wi m -> let mi, _ = m in mi)  ```

The final section is just how we set up our recursion.  We start with the full list of single men and their ranked lady lists (Munproposed) and an empty map of relationships (Map.empty).  When the algorithm exits we make one last pass through our returned wEngaged map in order to strip out all of the lists of unproposed to women.  It’s just baggage now anyway.

Well that’s how you get from imperative pseudocode to a pure functional implementation.  All of the code here, as well as a really nasty imperative version to compare it to is available on my GitHub page

If you liked this post, have any questions, or feel like I missed something important I hope you’ll comment here on it.  We’ll all benefit from the knowledge shared.

namespace System
type Bachelor = int * int list

Full name: Snippet.Bachelor

Multiple items

val int : ‘T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

——————–

type int<‘Measure> = int

Full name: Microsoft.FSharp.Core.int<_>

type: int<‘Measure>

implements: IComparable

implements: IConvertible

implements: IFormattable

implements: IComparable<int<‘Measure>>

implements: IEquatable<int<‘Measure>>

inherits: ValueType

——————–

type int = int32

Full name: Microsoft.FSharp.Core.int

type: int

implements: IComparable

implements: IFormattable

implements: IConvertible

implements: IComparable<int>

implements: IEquatable<int>

inherits: ValueType

type ‘T list = List<‘T>

Full name: Microsoft.FSharp.Collections.list<_>

type: ‘T list

implements: Collections.IStructuralEquatable

implements: IComparable<List<‘T>>

implements: IComparable

implements: Collections.IStructuralComparable

implements: Collections.Generic.IEnumerable<‘T>

implements: Collections.IEnumerable

val funGS : (‘a -> ‘a -> float) -> ‘a array -> ‘a array -> Map<int,int>

Full name: Snippet.funGS

val comp : (‘a -> ‘a -> float)
Multiple items

val float : ‘T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

——————–

type float<‘Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

type: float<‘Measure>

implements: IComparable

implements: IConvertible

implements: IFormattable

implements: IComparable<float<‘Measure>>

implements: IEquatable<float<‘Measure>>

inherits: ValueType

——————–

type float = Double

Full name: Microsoft.FSharp.Core.float

type: float

implements: IComparable

implements: IFormattable

implements: IConvertible

implements: IComparable<float>

implements: IEquatable<float>

inherits: ValueType

Multiple items

val M : ‘a array

type: ‘a array

implements: ICloneable

implements: Collections.IList

implements: Collections.ICollection

implements: Collections.IStructuralComparable

implements: Collections.IStructuralEquatable

implements: Collections.Generic.IList<‘a>

implements: Collections.Generic.ICollection<‘a>

implements: seq<‘a>

implements: Collections.IEnumerable

inherits: Array

——————–

val M : ‘a array

type: ‘a array

implements: ICloneable

implements: Collections.IList

implements: Collections.ICollection

implements: Collections.IStructuralComparable

implements: Collections.IStructuralEquatable

implements: Collections.Generic.IList<‘a>

implements: Collections.Generic.ICollection<‘a>

implements: seq<‘a>

implements: Collections.IEnumerable

inherits: Array

type ‘T array = ‘T []

Full name: Microsoft.FSharp.Core.array<_>

type: ‘T array

implements: ICloneable

implements: Collections.IList

implements: Collections.ICollection

implements: Collections.IStructuralComparable

implements: Collections.IStructuralEquatable

implements: Collections.Generic.IList<‘T>

implements: Collections.Generic.ICollection<‘T>

implements: seq<‘T>

implements: Collections.IEnumerable

inherits: Array

Multiple items

val W : ‘a array

type: ‘a array

implements: ICloneable

implements: Collections.IList

implements: Collections.ICollection

implements: Collections.IStructuralComparable

implements: Collections.IStructuralEquatable

implements: Collections.Generic.IList<‘a>

implements: Collections.Generic.ICollection<‘a>

implements: seq<‘a>

implements: Collections.IEnumerable

inherits: Array

——————–

val W : ‘a array

type: ‘a array

implements: ICloneable

implements: Collections.IList

implements: Collections.ICollection

implements: Collections.IStructuralComparable

implements: Collections.IStructuralEquatable

implements: Collections.Generic.IList<‘a>

implements: Collections.Generic.ICollection<‘a>

implements: seq<‘a>

implements: Collections.IEnumerable

inherits: Array

val Windices : int list

type: int list

implements: Collections.IStructuralEquatable

implements: IComparable<List<int>>

implements: IComparable

implements: Collections.IStructuralComparable

implements: Collections.Generic.IEnumerable<int>

implements: Collections.IEnumerable

val W : ‘a array

type: ‘a array

implements: ICloneable

implements: Collections.IList

implements: Collections.ICollection

implements: Collections.IStructuralComparable

implements: Collections.IStructuralEquatable

implements: Collections.Generic.IList<‘a>

implements: Collections.Generic.ICollection<‘a>

implements: seq<‘a>

implements: Collections.IEnumerable

inherits: Array

property Array.Length: int
val Munproposed : (int * int list) list

type: (int * int list) list

implements: Collections.IStructuralEquatable

implements: IComparable<List<int * int list>>

implements: IComparable

implements: Collections.IStructuralComparable

implements: Collections.Generic.IEnumerable<int * int list>

implements: Collections.IEnumerable

Multiple items

module List

from Microsoft.FSharp.Collections

——————–

type List<‘T> =

| ( [] )

| ( :: ) of ‘T * ‘T list

with

interface Collections.IEnumerable

interface Collections.Generic.IEnumerable<‘T>

member IsEmpty : bool

member Item : index:int -> ‘T with get

member Length : int

member Tail : ‘T list

static member Cons : head:’T * tail:’T list -> ‘T list

static member Empty : ‘T list

end

Full name: Microsoft.FSharp.Collections.List<_>

type: List<‘T>

implements: Collections.IStructuralEquatable

implements: IComparable<List<‘T>>

implements: IComparable

implements: Collections.IStructuralComparable

implements: Collections.Generic.IEnumerable<‘T>

implements: Collections.IEnumerable

val init : int -> (int -> ‘T) -> ‘T list

Full name: Microsoft.FSharp.Collections.List.init

val M : ‘a array

type: ‘a array

implements: ICloneable

implements: Collections.IList

implements: Collections.ICollection

implements: Collections.IStructuralComparable

implements: Collections.IStructuralEquatable

implements: Collections.Generic.IList<‘a>

implements: Collections.Generic.ICollection<‘a>

implements: seq<‘a>

implements: Collections.IEnumerable

inherits: Array

val mi : int

type: int

implements: IComparable

implements: IFormattable

implements: IConvertible

implements: IComparable<int>

implements: IEquatable<int>

inherits: ValueType

val sortFun : (int -> float)
val wi : int

type: int

implements: IComparable

implements: IFormattable

implements: IConvertible

implements: IComparable<int>

implements: IEquatable<int>

inherits: ValueType

val sortBy : (‘T -> ‘Key) -> ‘T list -> ‘T list (requires comparison)

Full name: Microsoft.FSharp.Collections.List.sortBy

val findMarriages : ((int * int list) list -> Map<int,(int * int list)> -> Map<int,(int * int list)>)
val mSingle : (int * int list) list

type: (int * int list) list

implements: Collections.IStructuralEquatable

implements: IComparable<List<int * int list>>

implements: IComparable

implements: Collections.IStructuralComparable

implements: Collections.Generic.IEnumerable<int * int list>

implements: Collections.IEnumerable

val wEngaged : Map<int,(int * int list)>

type: Map<int,(int * int list)>

implements: IComparable

implements: Collections.Generic.IDictionary<int,(int * int list)>

implements: Collections.Generic.ICollection<Collections.Generic.KeyValuePair<int,(int * int list)>>

implements: seq<Collections.Generic.KeyValuePair<int,(int * int list)>>

implements: Collections.IEnumerable

val bachelors : (int * int list) list

type: (int * int list) list

implements: Collections.IStructuralEquatable

implements: IComparable<List<int * int list>>

implements: IComparable

implements: Collections.IStructuralComparable

implements: Collections.Generic.IEnumerable<int * int list>

implements: Collections.IEnumerable

val rest : int list

type: int list

implements: Collections.IStructuralEquatable

implements: IComparable<List<int>>

implements: IComparable

implements: Collections.IStructuralComparable

implements: Collections.Generic.IEnumerable<int>

implements: Collections.IEnumerable

val m : int * int list
Multiple items

module Map

from Microsoft.FSharp.Collections

——————–

type Map<‘Key,’Value (requires comparison)> =

class

interface Collections.IEnumerable

interface IComparable

interface Collections.Generic.IEnumerable<Collections.Generic.KeyValuePair<‘Key,’Value>>

interface Collections.Generic.ICollection<Collections.Generic.KeyValuePair<‘Key,’Value>>

interface Collections.Generic.IDictionary<‘Key,’Value>

new : elements:seq<‘Key * ‘Value> -> Map<‘Key,’Value>

member Add : key:’Key * value:’Value -> Map<‘Key,’Value>

member ContainsKey : key:’Key -> bool

override Equals : obj -> bool

member Remove : key:’Key -> Map<‘Key,’Value>

member TryFind : key:’Key -> ‘Value option

member Count : int

member IsEmpty : bool

member Item : key:’Key -> ‘Value with get

end

Full name: Microsoft.FSharp.Collections.Map<_,_>

type: Map<‘Key,’Value>

implements: IComparable

implements: Collections.Generic.IDictionary<‘Key,’Value>

implements: Collections.Generic.ICollection<Collections.Generic.KeyValuePair<‘Key,’Value>>

implements: seq<Collections.Generic.KeyValuePair<‘Key,’Value>>

implements: Collections.IEnumerable

val tryFind : ‘Key -> Map<‘Key,’T> -> ‘T option (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.tryFind

union case Option.None: Option<‘T>
val add : ‘Key -> ‘T -> Map<‘Key,’T> -> Map<‘Key,’T> (requires comparison)

union case Option.Some: ‘T -> Option<‘T>
val m’ : int * int list
val mi’ : int

type: int

implements: IComparable

implements: IFormattable

implements: IConvertible

implements: IComparable<int>

implements: IEquatable<int>

inherits: ValueType

val empty<‘Key,’T (requires comparison)> : Map<‘Key,’T> (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.empty

val map : (‘Key -> ‘T -> ‘U) -> Map<‘Key,’T> -> Map<‘Key,’U> (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.map

val alignJaroWinkler : (‘a array -> ‘a array -> Map<int,int>)

Full name: Snippet.alignJaroWinkler

6
Sep 11

## Record Linkage Algorithms in F# – Jaro-Winkler Distance (Part 2)

Last time we dove into the Jaro distance algorithm and picked apart how each of its components are calculated. However, from a modern perspective Jaro alone is a rather weak method of string matching. It was Winkler’s extension that brought this algorithm into widespread modern use.

Matthew Jaro’s insight when inventing the Jaro distance algorithm was that humans tend to make certain kinds of errors. However, while working on the 1990 census William Winkler had an additional insight: these kinds of errors are much more likely to occur later in the string. By simply wrapping Jaro’s algorithm to forgive some penalty according to the similarity of the first few characters, Winkler was able to get significantly better results with little additional overhead.

Here we see that the Jaro-Winkler distance (dw) is equal to the result of the Jaro distance (dj) plus one minus that same value times some weighted metric (lp). This is a great mathematical trick for two reasons. First, as long as the weighted metric (lp) doesn’t exceed 1, the final result will stay within the 0-1 range of the Jaro metric. Second, it guarantees that the result of Jaro-Winkler will never be lower than the result of Jaro alone. It effectively lets Winkler enrich the results of Jaro by filling in the remaining gap.

The meaning of the weighted metric (lp) is even more simple. Here l is just the number of initial characters which match, up to a maximum of four. Meanwhile p is a weight which can be up to 1/4, or one over the maximum possible value of l.

As p gets larger and larger, more and more of the gap will be filled in by Winkler’s extension. When p is 1/4, strings where first four characters match will always be given a perfect score. Just as we’d never want strings like “JOHN” and “JOHNSON” to be considered a perfect match, we’d never want to use a p value of 1/4. After much experimentation, Winkler recommends using a p value of 0.1, which is also what I use.

Now that we’ve covered how the math works, let’s take a look at an actual implementation. If you’d like to see an implementation of the Jaro distance algorithm take a look at the previous installment of Record Linkage Algorithms in F#.

``` 1: /// Calculate the Jaro-Winkler distance of s1 and s2
2: let jaroWinkler s1 s2 =
3:     let jaroScore = jaro s1 s2
4:     // Accumulate the number of matching initial characters
5:     let maxLength = (min s1.Length s2.Length) - 1
6:     let rec calcL i acc =
7:         if i > maxLength || s1.[i] <> s2.[i] then acc
8:         else calcL (i + 1) (acc + 1.0)
9:     let l = min (calcL 0 0.0) 4.0
10:     // Calculate the JW distance
11:     let p = 0.1
12:     jaroScore + (l * p * (1.0 - jaroScore))```

In my implementation I’ve allowed for some slight inefficiency in order to make it easier to play with the p value and number of characters examined. So far I’ve found Winkler’s choice of four characters to be the best, which incidentally has been shown as a great number of initial characters to look at when using a number of record linkage algorithms on things in the English language. However, I suspect that other values may work better when working in other languages.

The math here is so simple that I don’t feel it’s worth breaking down further, but I’ve included my tests built from the examples in the Jaro-Winkler distance Wikipedia article. For a fuller understanding just break out the debugger and play a bit on your own.

```14: open Xunit
15:
16: [<Fact>]
17: let ``Jaro-Winkler identity test`` () =
18:     let result = jaroWinkler "RICK" "RICK"
19:     Assert.Equal("1.000", String.Format("{0:0.000}", result))
20:
21: [<Fact>]
22: let ``Jaro-Winkler martha test`` () =
23:     let result = jaroWinkler "MARTHA" "MARHTA"
24:     Assert.Equal("0.961", String.Format("{0:0.000}", result))
25:
26: [<Fact>]
27: let ``Jaro-Winkler dwayne test`` () =
28:     let result = jaroWinkler "DWAYNE" "DUANE"
29:     Assert.Equal("0.840", String.Format("{0:0.000}", result))
30:
31: [<Fact>]
32: let ``Jaro-Winkler dixon test`` () =
33:     let result = jaroWinkler "DIXON" "DICKSONX"
34:     Assert.Equal("0.813", String.Format("{0:0.000}", result))
35: ```

val jaroWinkler : 'a -> 'b -> float

Full name: Snippet.jaroWinkler

Calculate the Jaro-Winkler distance of s1 and s2

val s1 : 'a
val s2 : 'b
val jaroScore : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

val maxLength : int

type: int
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<int>
implements: System.IEquatable<int>
inherits: System.ValueType

val min : 'T -> 'T -> 'T (requires comparison)

Full name: Microsoft.FSharp.Core.Operators.min

val calcL : (int -> float -> float)
val i : int

type: int
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<int>
implements: System.IEquatable<int>
inherits: System.ValueType

val acc : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

val l : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

val p : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

val result : float

type: float
implements: System.IComparable
implements: System.IFormattable
implements: System.IConvertible
implements: System.IComparable<float>
implements: System.IEquatable<float>
inherits: System.ValueType

module String

from Microsoft.FSharp.Core

Multiple items

type Format<'Printer,'State,'Residue,'Result,'Tuple> = PrintfFormat<'Printer,'State,'Residue,'Result,'Tuple>

Full name: Microsoft.FSharp.Core.Format<_,_,_,_,_>

type: Format<'Printer,'State,'Residue,'Result,'Tuple>
inherits: PrintfFormat<'Printer,'State,'Residue,'Result>

——————–

type Format<'Printer,'State,'Residue,'Result> = PrintfFormat<'Printer,'State,'Residue,'Result>

Full name: Microsoft.FSharp.Core.Format<_,_,_,_>

In the next installment of Record Linkage Algorithms in F# we’ll take a look a method for efficient token based matching with Jaro-Winkler. Stay tuned!

Edit: All current code for this series is now available on github.

2
Sep 11

## Record Linkage Algorithms in F# – Jaro-Winkler Distance (Part 1)

When first approaching the task of record linkage I was initially overwhelmed by the huge number of different algorithms available for comparing strings. Now I know that the secret to finding your way in this sea of algorithms is two fold. First, know that many are outdated and have newer and better implementations, so they can be ignored. Second, of the newest extension of each algorithm, each has a particular use depending on the kind of data you’re comparing.

Jaro-Winkler distance is a particularly good choice when comparing short strings of human entered data, such as names. This is due to it’s relative robustness against letter transpositions and it’s weighting of similarity toward the beginning of the string. When it comes to comparing names, a slightly customized version of tokenized Jaro-Winkler is one of the first things in my toolbox that I reach for.

Jaro-Winkler distance also has the benefit of being a rather easy to understand algorithm. It’s composed of two parts, Jaro’s original algorithm and Winkler’s extension.  In this installment we’ll roll up our sleeves and dig into the first part of this algorithm, Jaro distance.

This is the equation for Jaro distance. It’s made up of the average of three sub-calculations.

1. The ratio of matching characters to the length of the first string.
2. The ratio of matching characters to the length of the second string.
3. The ratio of non-transpositions to the number matching of characters.

My implementation has been optimized a bit, but I think it’s still rather easy to understand.

``` 1: open System
2:
3: /// Given an offset and a radius from that offset,
4: /// does mChar exist in that part of str?
5: let inline existsInWin (mChar: char) (str: string) offset rad =
6:     let startAt = max 0 (offset - rad)
7:     let endAt = min (offset + rad) (String.length str - 1)
8:     if endAt - startAt < 0 then false
9:     else
10:         let rec exists index =
11:             if str.[index] = mChar then true
12:             elif index = endAt then false
13:             else exists (index + 1)
14:         exists startAt
15:
16: /// The jaro distance between s1 and s2
17: let jaro s1 s2 =
18:
19:     // The radius is half of the lesser
20:     // of the two string lengths rounded up.
22:         let minLen =
23:             min (String.length s1) (String.length s2) in
24:               minLen / 2 + minLen % 2
25:
26:     // An inner function which recursively finds the number
27:     // of matched characters within the radius.
28:     let commonChars (chars1: string) (chars2: string) =
29:         let rec inner i result =
30:             match i with
31:             | -1 -> result
32:             | _ -> if existsInWin chars1.[i] chars2 i matchRadius
33:                    then inner (i - 1) (chars1.[i] :: result)
34:                    else inner (i - 1) result
35:         inner (chars1.Length - 1) []
36:
37:     // The sets of common characters and their lengths as floats
38:     let c1 = commonChars s1 s2
39:     let c2 = commonChars s2 s1
40:     let c1length = float (List.length c1)
41:     let c2length = float (List.length c2)
42:
43:     // The number of transpositions within
44:     // the sets of common characters.
45:     let transpositions =
46:         let rec inner cl1 cl2 result =
47:             match cl1, cl2 with
48:             | [], _ | _, [] -> result
49:             | c1h :: c1t, c2h :: c2t ->
50:                 if c1h <> c2h
51:                 then inner c1t c2t (result + 1.0)
52:                 else inner c1t c2t result
53:         let mismatches = inner c1 c2 0.0
54:         // If one common string is longer than the other
55:         // each additional char counts as half a transposition
56:         (mismatches + abs (c1length - c2length)) / 2.0
57:
58:     let s1length = float (String.length s1)
59:     let s2length = float (String.length s2)
60:     let tLength = max c1length c2length
61:
62:     // The jaro distance as given by
63:     // 1/3 ( m2/|s1| + m1/|s2| + (mc-t)/mc )
64:     let result = (c1length / s1length +
65:                   c2length / s2length +
66:                   (tLength - transpositions) / tLength)
67:                  / 3.0
68:
69:     // This is for cases where |s1|, |s2| or m are zero
70:     if Double.IsNaN result then 0.0 else result```

We’re quite lucky to have examples of the algorithm in action available right in the Wikipedia article to use as unit tests. When optimizing, you couldn’t ask for more.

Take ‘DWAYNE’ and ‘DUANE’ for example. According to the article we should end up with the following results:

So, let’s break it down. The lesser of the two strings, DUANE, is 5 characters long. When we set minLen to 5 in (minLen / 2 + minLen % 2 is 3), we get a radius of 3.

Next we find the common characters in each direction. Comparing at DWAYNE to DUANE with a radius of 3 we can see that D, A, N and E will match, giving us a c1 = “DANE”. Similarly, comparing at DUANE to DWAYNE we get exactly the same thing. That is, c2 = “DANE” as well.

As you might have guessed by the fact both sets of common strings are the same, we have zero transpositions in this instance.

So now, we just plug in the numbers and get 1/3 * (4/6 + 4/5 + (4-0)/4) = 0.822. Not too difficult, eh?

For a better understanding of how transpositions end up working out, try walking through the debugger with the MARTHA vs MARHTA test below.

```72: open Xunit
73:
74: [<Fact>]
75: let ``Jaro identity test`` () =
76:     let result = jaro "RICK" "RICK"
77:     Assert.Equal("1.000", String.Format("{0:0.000}", result))
78:
79: [<Fact>]
80: let ``Jaro martha test`` () =
81:     let result = jaro "MARTHA" "MARHTA"
82:     Assert.Equal("0.944", String.Format("{0:0.000}", result))
83:
84: [<Fact>]
85: let ``Jaro dwayne test`` () =
86:     let result = jaro "DWAYNE" "DUANE"
87:     Assert.Equal("0.822", String.Format("{0:0.000}", result))
88:
89: [<Fact>]
90: let ``Jaro dixon test`` () =
91:     let result = jaro "DIXON" "DICKSONX"
92:     Assert.Equal("0.767", String.Format("{0:0.000}", result))
93:
94: ```

namespace System
val existsInWin : char -> string -> 'a -> 'b -> bool (requires member ( – ) and member ( + ))

Full name: Snippet.existsInWin

Given an offset and a radius from that office,
does mChar exist in that part of str?

val mChar : char

type: char
implements: IComparable
implements: IConvertible
implements: IComparable<char>
implements: IEquatable<char>
inherits: ValueType

Multiple items

val char : 'T -> char (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.char

——————–

type char = Char

Full name: Microsoft.FSharp.Core.char

type: char
implements: IComparable
implements: IConvertible
implements: IComparable<char>
implements: IEquatable<char>
inherits: ValueType

val str : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

Multiple items

val string : 'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

——————–

type string = String

Full name: Microsoft.FSharp.Core.string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val offset : 'a (requires member ( – ) and member ( + ))
val rad : 'b (requires member ( – ) and member ( + ))
val startAt : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val max : 'T -> 'T -> 'T (requires comparison)

Full name: Microsoft.FSharp.Core.Operators.max

val endAt : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val min : 'T -> 'T -> 'T (requires comparison)

Full name: Microsoft.FSharp.Core.Operators.min

type String =
class
new : char -> string
new : char * int * int -> string
new : System.SByte -> string
new : System.SByte * int * int -> string
new : System.SByte * int * int * System.Text.Encoding -> string
new : char [] * int * int -> string
new : char [] -> string
new : char * int -> string
member Chars : int -> char
member Clone : unit -> obj
member CompareTo : obj -> int
member CompareTo : string -> int
member Contains : string -> bool
member CopyTo : int * char [] * int * int -> unit
member EndsWith : string -> bool
member EndsWith : string * System.StringComparison -> bool
member EndsWith : string * bool * System.Globalization.CultureInfo -> bool
member Equals : obj -> bool
member Equals : string -> bool
member Equals : string * System.StringComparison -> bool
member GetEnumerator : unit -> System.CharEnumerator
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member IndexOf : char -> int
member IndexOf : string -> int
member IndexOf : char * int -> int
member IndexOf : string * int -> int
member IndexOf : string * System.StringComparison -> int
member IndexOf : char * int * int -> int
member IndexOf : string * int * int -> int
member IndexOf : string * int * System.StringComparison -> int
member IndexOf : string * int * int * System.StringComparison -> int
member IndexOfAny : char [] -> int
member IndexOfAny : char [] * int -> int
member IndexOfAny : char [] * int * int -> int
member Insert : int * string -> string
member IsNormalized : unit -> bool
member IsNormalized : System.Text.NormalizationForm -> bool
member LastIndexOf : char -> int
member LastIndexOf : string -> int
member LastIndexOf : char * int -> int
member LastIndexOf : string * int -> int
member LastIndexOf : string * System.StringComparison -> int
member LastIndexOf : char * int * int -> int
member LastIndexOf : string * int * int -> int
member LastIndexOf : string * int * System.StringComparison -> int
member LastIndexOf : string * int * int * System.StringComparison -> int
member LastIndexOfAny : char [] -> int
member LastIndexOfAny : char [] * int -> int
member LastIndexOfAny : char [] * int * int -> int
member Length : int
member Normalize : unit -> string
member Normalize : System.Text.NormalizationForm -> string
member PadLeft : int -> string
member PadLeft : int * char -> string
member PadRight : int -> string
member PadRight : int * char -> string
member Remove : int -> string
member Remove : int * int -> string
member Replace : char * char -> string
member Replace : string * string -> string
member Split : char [] -> string []
member Split : char [] * int -> string []
member Split : char [] * System.StringSplitOptions -> string []
member Split : string [] * System.StringSplitOptions -> string []
member Split : char [] * int * System.StringSplitOptions -> string []
member Split : string [] * int * System.StringSplitOptions -> string []
member StartsWith : string -> bool
member StartsWith : string * System.StringComparison -> bool
member StartsWith : string * bool * System.Globalization.CultureInfo -> bool
member Substring : int -> string
member Substring : int * int -> string
member ToCharArray : unit -> char []
member ToCharArray : int * int -> char []
member ToLower : unit -> string
member ToLower : System.Globalization.CultureInfo -> string
member ToLowerInvariant : unit -> string
member ToString : unit -> string
member ToString : System.IFormatProvider -> string
member ToUpper : unit -> string
member ToUpper : System.Globalization.CultureInfo -> string
member ToUpperInvariant : unit -> string
member Trim : unit -> string
member Trim : char [] -> string
member TrimEnd : char [] -> string
member TrimStart : char [] -> string
static val Empty : string
static member Compare : string * string -> int
static member Compare : string * string * bool -> int
static member Compare : string * string * System.StringComparison -> int
static member Compare : string * string * System.Globalization.CultureInfo * System.Globalization.CompareOptions -> int
static member Compare : string * string * bool * System.Globalization.CultureInfo -> int
static member Compare : string * int * string * int * int -> int
static member Compare : string * int * string * int * int * bool -> int
static member Compare : string * int * string * int * int * System.StringComparison -> int
static member Compare : string * int * string * int * int * bool * System.Globalization.CultureInfo -> int
static member Compare : string * int * string * int * int * System.Globalization.CultureInfo * System.Globalization.CompareOptions -> int
static member CompareOrdinal : string * string -> int
static member CompareOrdinal : string * int * string * int * int -> int
static member Concat : obj -> string
static member Concat : obj [] -> string
static member Concat<'T> : System.Collections.Generic.IEnumerable<'T> -> string
static member Concat : System.Collections.Generic.IEnumerable<string> -> string
static member Concat : string [] -> string
static member Concat : obj * obj -> string
static member Concat : string * string -> string
static member Concat : obj * obj * obj -> string
static member Concat : string * string * string -> string
static member Concat : obj * obj * obj * obj -> string
static member Concat : string * string * string * string -> string
static member Copy : string -> string
static member Equals : string * string -> bool
static member Equals : string * string * System.StringComparison -> bool
static member Format : string * obj -> string
static member Format : string * obj [] -> string
static member Format : string * obj * obj -> string
static member Format : System.IFormatProvider * string * obj [] -> string
static member Format : string * obj * obj * obj -> string
static member Intern : string -> string
static member IsInterned : string -> string
static member IsNullOrEmpty : string -> bool
static member IsNullOrWhiteSpace : string -> bool
static member Join : string * string [] -> string
static member Join : string * obj [] -> string
static member Join<'T> : string * System.Collections.Generic.IEnumerable<'T> -> string
static member Join : string * System.Collections.Generic.IEnumerable<string> -> string
static member Join : string * string [] * int * int -> string
end

Full name: System.String

type: String
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val length : string -> int

Full name: Microsoft.FSharp.Core.String.length

val exists : (int -> bool)
val index : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val jaro : string -> string -> float

Full name: Snippet.jaro

The jaro distance between s1 and s2

val s1 : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val s2 : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val minLen : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val commonChars : (string -> string -> char list)
val chars1 : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val chars2 : string

type: string
implements: IComparable
implements: ICloneable
implements: IConvertible
implements: IComparable<string>
implements: seq<char>
implements: Collections.IEnumerable
implements: IEquatable<string>

val inner : (int -> char list -> char list)
val i : int

type: int
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<int>
implements: IEquatable<int>
inherits: ValueType

val result : char list

type: char list
implements: Collections.IStructuralEquatable
implements: IComparable<List<char>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<char>
implements: Collections.IEnumerable

property String.Length: int
val c1 : char list

type: char list
implements: Collections.IStructuralEquatable
implements: IComparable<List<char>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<char>
implements: Collections.IEnumerable

val c2 : char list

type: char list
implements: Collections.IStructuralEquatable
implements: IComparable<List<char>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<char>
implements: Collections.IEnumerable

val c1length : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

Multiple items

val float : 'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

——————–

type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

type: float<'Measure>
implements: IComparable
implements: IConvertible
implements: IFormattable
implements: IComparable<float<'Measure>>
implements: IEquatable<float<'Measure>>
inherits: ValueType

——————–

type float = Double

Full name: Microsoft.FSharp.Core.float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

Multiple items

module List

from Microsoft.FSharp.Collections

——————–

type List<'T> =
| ( [] )
| ( :: ) of 'T * 'T list
with
interface Collections.IEnumerable
interface Collections.Generic.IEnumerable<'T>
member IsEmpty : bool
member Item : index:int -> 'T with get
member Length : int
member Tail : 'T list
static member Cons : head:'T * tail:'T list -> 'T list
static member Empty : 'T list
end

Full name: Microsoft.FSharp.Collections.List<_>

type: List<'T>
implements: Collections.IStructuralEquatable
implements: IComparable<List<'T>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<'T>
implements: Collections.IEnumerable

val length : 'T list -> int

Full name: Microsoft.FSharp.Collections.List.length

val c2length : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val transpositions : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val inner : ('a list -> 'a list -> float -> float) (requires equality)
val cl1 : 'a list (requires equality)

type: 'a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<'a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<'a>
implements: Collections.IEnumerable

val cl2 : 'a list (requires equality)

type: 'a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<'a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<'a>
implements: Collections.IEnumerable

val result : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val c1h : 'a (requires equality)
val c1t : 'a list (requires equality)

type: 'a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<'a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<'a>
implements: Collections.IEnumerable

val c2h : 'a (requires equality)
val c2t : 'a list (requires equality)

type: 'a list
implements: Collections.IStructuralEquatable
implements: IComparable<List<'a>>
implements: IComparable
implements: Collections.IStructuralComparable
implements: Collections.Generic.IEnumerable<'a>
implements: Collections.IEnumerable

val mismatches : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val abs : 'T -> 'T (requires member Abs)

Full name: Microsoft.FSharp.Core.Operators.abs

val s1length : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val s2length : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

val tLength : float

type: float
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

type Double =
struct
member CompareTo : obj -> int
member CompareTo : float -> int
member Equals : obj -> bool
member Equals : float -> bool
member GetHashCode : unit -> int
member GetTypeCode : unit -> System.TypeCode
member ToString : unit -> string
member ToString : string -> string
member ToString : System.IFormatProvider -> string
member ToString : string * System.IFormatProvider -> string
static val MinValue : float
static val MaxValue : float
static val Epsilon : float
static val NegativeInfinity : float
static val PositiveInfinity : float
static val NaN : float
static member IsInfinity : float -> bool
static member IsNaN : float -> bool
static member IsNegativeInfinity : float -> bool
static member IsPositiveInfinity : float -> bool
static member Parse : string -> float
static member Parse : string * System.Globalization.NumberStyles -> float
static member Parse : string * System.IFormatProvider -> float
static member Parse : string * System.Globalization.NumberStyles * System.IFormatProvider -> float
static member TryParse : string * float -> bool
static member TryParse : string * System.Globalization.NumberStyles * System.IFormatProvider * float -> bool
end

Full name: System.Double

type: Double
implements: IComparable
implements: IFormattable
implements: IConvertible
implements: IComparable<float>
implements: IEquatable<float>
inherits: ValueType

Double.IsNaN(d: float) : bool

String.Format(format: string, args: obj []) : string

String.Format(format: string, arg0: obj) : string

String.Format(provider: IFormatProvider, format: string, args: obj []) : string

String.Format(format: string, arg0: obj, arg1: obj) : string

String.Format(format: string, arg0: obj, arg1: obj, arg2: obj) : string

I hope you enjoyed this installment of Record Linkage Algorithms in F#. Next time we’ll explore the Winkler extension to this algorithm and take a look at why weighting errors earlier in the string more heavily ends up giving significantly better results.

Edit: All current code for this series is now available on github.