# Levenshtein Distance and the Triangle Inequality

Levenshtein distance is one of my favorite algorithms. On the surface it seems so very simple, but when you spend some time thinking hard on it deep insights are waiting to be had.

The first and most important thing about Levenshtein distance is it’s actually a metric distance. That is, it obeys the triangle inequality. For most other string distance measurements this property doesn’t hold.

This might not seem like such a big deal, but this property gives the measurements meaning in a context larger than just the pair. For example, it allows you to embed your pair distance measurements into a higher dimensional space and so use it for things like clustering.

This was one of the first insights that Levenshtein distance gave me: A measurement doesn’t need to give you an absolute location in space to be useful for telling you where you are, it just has to tell you how far away everything else is. But what is it about Levenshtein distance that gives it this property? It’s not immediately obvious to most people, at least it wasn’t to me.

First, let’s consider a naive implementation of the Wagner-Fisher algorithm for Levenshtein distance. As stated above, here the triangle inequality holds.

```let wagnerFischer (s: string) (t: string) =
let m = s.Length
let n = t.Length
let d = Array2D.create (m + 1) (n + 1) 0

for i = 0 to m do d.[i, 0] <- i
for j = 0 to n do d.[0, j] <- j

for j = 1 to n do
for i = 1 to m do
if s.[i-1] = t.[j-1] then
d.[i, j] <- d.[i-1, j-1]
else
d.[i, j] <-
List.min
[
// a deletion
d.[i-1, j  ] + 1;
// an insertion
d.[i  , j-1] + 1;
// a substitution
d.[i-1, j-1] + 1;
]
d.[m,n]
```

Now compare this with an incorrect version of an extension called Damerau–Levenshtein distance (or restricted edit distance). This change adds support for Jaro-Winkler like transpositions to the original algorithm. However, in the process of adding just this minor tweak we lose the triangle inequality.

```let damerauLevenshtein (s: string) (t: string) =
let m = s.Length
let n = t.Length
let d = Array2D.create (m + 1) (n + 1) 0

for i = 0 to m do d.[i, 0] <- i
for j = 0 to n do d.[0, j] <- j

for j = 1 to n do
for i = 1 to m do
// 1 if a substitution
// 0 if no change
let cost = if s.[i-1] = t.[j-1] then 0 else 1
d.[i, j] <-
List.min
[
// a deletion
d.[i-1, j  ] + 1;
// an insertion
d.[i  , j-1] + 1;
// a substitution or nothing
d.[i-1, j-1] + cost;
]
if // boundary check
i > 1 && j > 1
// transposition check
&& s.[i-1] = t.[j-2] && s.[i-2] = t.[j-1]
then // the lesser of a transposition or current cost
d.[i, j] <- min d.[i,j] (d.[i-2, j-2] + cost)
d.[m,n]
```

It seems like such a simple and obvious addition to the algorithm. Just what is it about about the way we’ve added transpositions that ruins the magic? We’ve just put something like wormholes to our little universe. That’s right, the simple addition of transpositions in this way implies a universe where some combinations of characters treat space differently than everything else. The easiest way to prove this is the case is to give the definition of the triangle inequality for metric spaces a read.

From Wikipedia’s Triangle Inequality article:
In a metric space M with metric d, the triangle inequality is a requirement upon distance: d(x, z) <= d(x, y) + d(y, z)
for all x, y, z in M. That is, the distance from x to z is at most as large as the sum of the distance from x to y and the distance from y to z.

From this, it’s easy to construct a counterexample for our broken Damerau-Levenshtein distance simply by exploiting the transpositions.