Category: Algorithms

Developing an Algorithm in F#: Fast Rotational Alignments with Gosper’s Hack
This post is for the 7th day of the 2014 F# Advent Calendar. It has been said that functional languages can’t be as fast as their imperative cousins because of all of the allocation and garbage collection, this is patently false (as far as F# is concerned at least) largely because the CLR has value types.…

Levenshtein Distance and the Triangle Inequality
Levenshtein distance is one of my favorite algorithms. On the surface it seems so very simple, but when you spend some time thinking hard on it deep insights are waiting to be had. The first and most important thing about Levenshtein distance is it’s actually a metric distance. That is, it obeys the triangle inequality. For…

Record Linkage Algorithms in F# – Extensions to JaroWinkler Distance (Part 3)
While writing the previous article on tokenized matching I realized I left out some important background information on JaroWinkler distance. First, there’s something important to know about the JaroWinkler distance: it’s not a metric distance and so does not obey the triangle inequality. That is, if you found the JW distance between strings A and B, and then…

Imperative Pseudocode to Pure Functional Algorithm with GaleShapely and F#
Continuing from last time, let’s look at how one goes from imperative pseudocode to pure functional using GaleShapely as an example. Overall, to convert an algorithm from imperative to functional is a fairly simple process once you understand how to convert from a while loop to recursion with accumulators. This post is just a more advanced…

Record Linkage in F# – Token Matching, Stable Marriages and the GaleShapley algorithm
This post was originally published September 13th, 2011. Initially, one of the biggest problems I found when trying to marry records was the god awful quality of much of data I often have to work with. It’s mostly old mainframe and database data with truncated fields, limited character sets and fields with nonsensical contents. Even…

Record Linkage Algorithms in F# – JaroWinkler Distance (Part 2)
Last time we dove into the Jaro distance algorithm and picked apart how each of its components are calculated. However, from a modern perspective Jaro alone is a rather weak method of string matching. It was Winkler’s extension that brought this algorithm into widespread modern use. Matthew Jaro’s insight when inventing the Jaro distance algorithm was that…