15
Oct 12

What do you do with a drunken coder?

What do you do with a drunken coder,
What do you do with a drunken coder,
What do you do with a drunken coder,
Ear-ly in the morning?

Put em’ on the build and make em fix it,
Put em’ on the build and make em’ fix it,
Put em’ on the build and make em’ fix it,
Ear-ly in the morning.

Give em’ an intern an make em’ teach em’,
Give em’ an intern an make em’ teach em’,
Give em’ an intern an make em’ teach em’,
Ear-ly in the morning.

Fill ‘is cube with kitchen products,
Fill ‘is cube with kitchen products,
Fill ‘is cube with kitchen products,
Ear-ly in the morning.

Give em’ the legacy and make em’ test it,
Give em’ the legacy and make em’ test it,
Give em’ the legacy and make em’ test it,
Ear-ly in the morning.

(From @lojikil)
Put em’ in a meeting and make em’ sit there,
Put em’ in a meeting and make em’ sit there,
Put em’ in a meeting and make em’ sit there,
Ear-ly in the morning.

(also ala @lojikil)
Make em’ file a change of request form,
Make em’ file a change of request form,
Make em’ file a change of request form,
Ear-ly in the morning.

(From @JobVranish)
Trap’m in a monad till he’s sober,
Trap’m in a monad till he’s sober,
Trap’m in a monad till he’s sober,
Ear-ly in the morning.

Give em’ a lecture on corporate policy,
Give em’ a lecture on corporate policy,
Give em’ a lecture on corporate policy,
Ear-ly in the morning.

Fun times on the high twitter seas me matey’s. Come on over and post your own: #WhatDoYouDoWithADrunkenCoder


04
Sep 12

Levenshtein Distance and the Triangle Inequality

Levenshtein distance is one of my favorite algorithms. On the surface it seems so very simple, but when you spend some time thinking hard on it deep insights are waiting to be had.

The first and most important thing about Levenshtein distance is it’s actually a metric distance. That is, it obeys the triangle inequality. For most other string distance measurements this property doesn’t hold.

The Vector Triangle Inequality

The Vector Triangle Inequality

This might not seem like such a big deal, but this property gives the measurements meaning in a context larger than just the pair. For example, it allows you to embed your pair distance measurements into a higher dimensional space and so use it for things like clustering.

This was one of the first insights that Levenshtein distance gave me: A measurement doesn’t need to give you an absolute location in space to be useful for telling you where you are, it just has to tell you how far away everything else is. But what is it about Levenshtein distance that gives it this property? It’s not immediately obvious to most people, at least it wasn’t to me.

First, let’s consider a naive implementation of the Wagner-Fisher algorithm for Levenshtein distance. As stated above, here the triangle inequality holds.

 1: let wagnerFischer (s: string) (t: string) =
 2:     let m = s.Length
 3:     let n = t.Length
 4:     let d = Array2D.create (m + 1) (n + 1) 0
 5: 
 6:     for i = 0 to m do d.[i, 0] <- i
 7:     for j = 0 to n do d.[0, j] <- j    
 8: 
 9:     for j = 1 to n do
10:         for i = 1 to m do
11:             if s.[i-1] = t.[j-1] then
12:                 d.[i, j] <- d.[i-1, j-1]
13:             else
14:                 d.[i, j] <-
15:                     List.min
16:                         [
17:                             // a deletion
18:                             d.[i-1, j  ] + 1; 
19:                             // an insertion
20:                             d.[i  , j-1] + 1; 
21:                             // a substitution
22:                             d.[i-1, j-1] + 1; 
23:                         ]
24:     d.[m,n]

Now compare this with an incorrect version of an extension called Damerau–Levenshtein distance (or restricted edit distance). This change adds support for Jaro-Winkler like transpositions to the original algorithm. However, in the process of adding just this minor tweak we lose the triangle inequality.

26: let damerauLevenshtein (s: string) (t: string) =
27:     let m = s.Length
28:     let n = t.Length
29:     let d = Array2D.create (m + 1) (n + 1) 0
30: 
31:     for i = 0 to m do d.[i, 0] <- i
32:     for j = 0 to n do d.[0, j] <- j    
33: 
34:     for j = 1 to n do
35:         for i = 1 to m do
36:             // 1 if a substitution
37:             // 0 if no change
38:             let cost = if s.[i-1] = t.[j-1] then 0 else 1
39:             d.[i, j] <-
40:                 List.min
41:                     [
42:                         // a deletion
43:                         d.[i-1, j  ] + 1; 
44:                         // an insertion
45:                         d.[i  , j-1] + 1; 
46:                         // a substitution or nothing
47:                         d.[i-1, j-1] + cost;
48:                     ]
49:             if // boundary check
50:                i > 1 && j > 1 
51:                // transposition check
52:             && s.[i-1] = t.[j-2] && s.[i-2] = t.[j-1] 
53:             then // the lesser of a transposition or current cost
54:                 d.[i, j] <- min d.[i,j] (d.[i-2, j-2] + cost)
55:     d.[m,n]

val wagnerFischer : string -> string -> int

Full name: Snippet.wagnerFischer

val s : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

Multiple items

val string : 'T -> string

Full name: Microsoft.FSharp.Core.Operators.string

——————–

type string = System.String

Full name: Microsoft.FSharp.Core.string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

val t : string

  type: string
  implements: System.IComparable
  implements: System.ICloneable
  implements: System.IConvertible
  implements: System.IComparable<string>
  implements: seq<char>
  implements: System.Collections.IEnumerable
  implements: System.IEquatable<string>

val m : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

property System.String.Length: int
val n : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

val d : int [,]

  type: int [,]
  implements: System.ICloneable
  implements: System.Collections.IList
  implements: System.Collections.ICollection
  implements: System.Collections.IEnumerable
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.IStructuralEquatable
  inherits: System.Array

module Array2D

from Microsoft.FSharp.Collections

val create : int -> int -> 'T -> 'T [,]

Full name: Microsoft.FSharp.Collections.Array2D.create

val i : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

val j : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

Multiple items

module List

from Microsoft.FSharp.Collections

——————–

type List<'T> =
  | ( [] )
  | ( :: ) of 'T * 'T list
  with
    interface System.Collections.IEnumerable
    interface System.Collections.Generic.IEnumerable<'T>
    member Head : 'T
    member IsEmpty : bool
    member Item : index:int -> 'T with get
    member Length : int
    member Tail : 'T list
    static member Cons : head:'T * tail:'T list -> 'T list
    static member Empty : 'T list
  end

Full name: Microsoft.FSharp.Collections.List<_>

  type: List<'T>
  implements: System.Collections.IStructuralEquatable
  implements: System.IComparable<List<'T>>
  implements: System.IComparable
  implements: System.Collections.IStructuralComparable
  implements: System.Collections.Generic.IEnumerable<'T>
  implements: System.Collections.IEnumerable

val min : 'T list -> 'T (requires comparison)

Full name: Microsoft.FSharp.Collections.List.min

val damerauLevenshtein : string -> string -> int

Full name: Snippet.damerauLevenshtein

val cost : int

  type: int
  implements: System.IComparable
  implements: System.IFormattable
  implements: System.IConvertible
  implements: System.IComparable<int>
  implements: System.IEquatable<int>
  inherits: System.ValueType

val min : 'T -> 'T -> 'T (requires comparison)

Full name: Microsoft.FSharp.Core.Operators.min

It seems like such a simple and obvious addition to the algorithm. Just what is it about about the way we’ve added transpositions that ruins the magic? We’ve just put something like wormholes to our little universe. That’s right, the simple addition of transpositions in this way implies a universe where some combinations of characters treat space differently than everything else. The easiest way to prove this is the case is to give the definition of the triangle inequality for metric spaces a read.

From Wikipedia’s Triangle Inequality article:
In a metric space M with metric d, the triangle inequality is a requirement upon distance: d(x, z) <= d(x, y) + d(y, z)
for all x, y, z in M. That is, the distance from x to z is at most as large as the sum of the distance from x to y and the distance from y to z.

From this, it’s easy to construct a counterexample for our broken Damerau-Levenshtein distance simply by exploiting the transpositions.

Damerau-Levenshtein distance not satisfying the triangle inequality.

As you can see in this picture 4 is most certainly greater than 1 + 2, and so the triangle inequality is broken. Consider also the pathology that this example shows in the algorithm. Why didn’t it just go along the irkc –> rick –> rcik path when it’s obviously less expensive?

Levenshtein distance satisfying the triangle inequality.

For comparison, if we measure those same pairs with standard Levenshtein distance everything is just peachy.

So we now know of at least one case which causes the triangle inequality to fail, does this imply what causes it to succeed? I think yes, at least in a limited sense. We can see that with Levenshtein distance any given pair of characters is considered independently, changes at each only happen once and are exactly one character in size. As each pair in the strings is considered, small changes push them further and further apart, but in discrete and equal units for discrete and equal changes. While in our Damerau-Levenshtein distance implementation we greedily perform operations of a larger size and then never revisit their implications, standard Levenshtein is reversibly symmetric both in how it treats locations over the string as well as in how it treats the characters themselves due to its uniform granularity. The uniform granularity of the changes ensures all important paths are explored.

Can transpositions be reliably counted with a different approach? We’ll find out the answer to this question next time.


31
Aug 12

What is good API design?

Some say that API design is one of the hardest things in programming. A few even go as far as to say you should have at least 10 years of experience to even attempt it. While I think this process can be sped up almost an order of magnitude by good mentorship, at one time or another we’ve all suffered under the API of an inexperienced programmer. Though, this does raise the question: what exactly is it about building libraries that can take up to 10 years to learn?

I was lucky in that I got a strict API education early on. Right out of college I joined Atalasoft, a company for which the API was the product and so was under the strictest of scrutiny. My mentor was Steve Hawley, a man who has spent much of his life solving difficult problems and wrapping them up in nice little packages. Steve had little patience for babysitting as he always had a lot on his plate and so under him I was forced to learn very quickly.

His philosophy, which was never explicitly stated, I call 90-9-.9. For 90% of the users you want the problem solved out of the box with just a couple of lines of code that can be cut and pasted. Here defaults matter the most. For the next 9% you’re aiming for simple configuration; something that can be easily figured out from the documentation or resolved in just a few minutes by the support team. Then there’s the .9% who will want to bend your libraries in all kinds of twisted ways, sometimes for performance or and other times some wacky (but workable) use case you never thought of. It’s completely fine to sacrifice the experience of the .9% for the sake of everyone else, just make sure it’s possible to get what they want done and that your documentation will show them the way.

Finally, there’s the unmentioned .1% who you’ll never make happy because they’re mistaken about the capabilities of your product. Better to either ignore them, or do market research to see if they’re worth the cost of extending your library to pull them in.

A great example of this is Atalasoft’s barcode product. A lot of effort went into carefully tuning it to preprocess most scanned documents without issue. After preprocessing it will by default go whole hog and try every kind of possible barcode type that you have a license for. This is still quite fast, fast enough for anyone with a small time scanning operation. Sometimes for folks doing large scale batch scanning on expensive equipment it’s just not fast enough though, so they can configure which barcode scanners are used by changing a simple enumeration property. Once in a while they get folks doing things that are a bit more difficult, like for example maybe trying to scan a barcode wrapped around a banana. For this there are events that let you interrupt, tweak and replace whole chunks of the barcode engine. But the guy who wants to read the barcodes he hand shaved into the side of the dogs in his pet store? Sorry pal, you’re better off finding another product.

When I first saw this it seemed like bad design. The whole component is like a frickin’ monolithic program with an event based do-it-yourself plugin system! You see though, aesthetic beauty as judged by an architecture astronaut isn’t what Atalasoft is optimizing for. They’re optimizing for reduction of the customer support burden. As much as I dislike object oriented programming for writing the internals of libraries like these, I think there’s no better paradigm for exposing a single simple interface that allows for manipulation at all of these levels.

Now, for the past two years I’ve been in charge of the APIs at Bayard Rock, a completely different kind of company. We do research and development primarily for anti-money laundering. This means lots of little experiments and the occasional medium-scale project which will later be integrated into our sister company’s larger infrastructure. In the vast majority of cases Atalasoft-style monolithic black-boxing wouldn’t be helpful at all. We only have one customer and we work with them to tailor our external APIs directly to their needs.

However, code reuse at a fine grained level is much more important at Bayard Rock than it was at Atalasoft. In this context what matters most is the construction of large libraries full of many small categorized functions which we can first use to put together experiments quickly (that is, through simple composition and without comprehensive unit tests) but later still feel confident about shipping in our product. We’re optimizing for experimentation and the rapid development of components that we trust enough to ship. It should come as no surprise that here typed functional programming wins hands down.

So, what is good API design? It depends, and that’s why it’s so hard.


19
Jul 12

Functional Programming is Dead, Long Live Expression-Oriented Programming

It’s funny how over time the meaning of a technical word will converge to something halfway between what the experts intended and some fuzzy notion consisting of the most easily graspable components of that idea. In this inevitable process an idea is stripped of all of its flavor and is reduced to a set of bullet points graspable in an hour long presentation. Over the last few years this has happened to functional programming, right along with its popularization.

From Wikipedia:

  • First-class and higher-order functions
  • Pure functions
  • Recursion

Now that almost every language has tacked-on “functional features”, the functional party is over. The term has become just as perverted as Object-Oriented is to its original idea. It seems as though these days all it takes is lambda expressions and a higher order functions library to claim your language supports functional programming. Most of these languages don’t even bother to include any kind of proper support for simple tail recursion, much less efficient co-recursion or function composition. Oh, and any kind of inclination toward even encouraging purity? You wish.

But this isn’t necessarily a bad thing. The term functional isn’t at all evocative of the actual properties that make functional languages so wonderful. The term we should have been using all along is Expression-Oriented Programming. It’s the composition of expressions, the building of programs by sticking together little modular pieces, that makes functional languages great and first class functions are just a small part of enabling that. Expression-Oriented Programming tends towards first classing everything.

However, even the term first class is too weak to pin down this concept. All first class means is “as good as most other things” and this can still imply a really awful lowest common denominator. Just take a look at Microsoft’s C#. Sure, functions are first class, but it’s still a pathetic attempt at emulating what is possible in functional programming languages because the rest of the language isn’t.

Let’s end with a simple example to drive home the point. In C#, because the switch statement doesn’t produce an expression, you can’t assign its result to a variable. You even get an error that tells you so.

However, F# does a much better job of supporting Expression-Oriented Programming as almost every language construct outside of type definitions is itself an expression.

Expression-Oriented programming a simple idea, just programming with little composable parts, but it leads to beautiful and expressive code. It is at the core of why programs in functional languages are small, simple and less error prone. We should give credit where the credit is due, not to just the functions who are but one small player in the Expression-Oriented story.


16
Jul 12

The Fresh Prince of Bell Labs

Inspired by:

In Petoskey, Michigan born and raised
Tinkering with gadgets I spent most of my days

Chillin out, maxin, relaxing with Boole,
And working on my master’s outside of the school

When a couple of Nazis who were up to no good,
Started making trouble in my neighborhood

I got one Alfred Nobel and Uncle Sam had confabs
said “Go out and help your country at Bell Labs”


21
Jun 12

Program Configuration Space Complexity

Taking the Coursera Probabilistic Graphical Models course has had me thinking a lot about complexity. Lately, program state complexity in the context of testing in particular.

How many discrete tests would it take to ensure every combination of inputs is correct? Let’s do some back of the napkin math and see what we come up with. Please excuse my amateurish Latex.

First, let’s start with a thought experiment. Imagine a program (in any language) in which all of the configuration options are expressed as enumerations. Now imagine that each configuration option has some impact on the code of every other.

Prod(Card(C_opts))

We’ll call this our worst case testing complexity as it completely ignores how everything is wired up. This effectively treats the entire program as a giant black box.

The terrifying thing about this to consider is that the addition of a simple Boolean configuration option can double your testing complexity. Now take a moment for that to sink in. Just adding a simple flag to a console application potentially doubles the test space.

This previous description only applies to a program where every configuration option is global. What about nice well separated classes?


If we consider each as its own little program, a non static class is only dependent on the cardinality of its internal state space times the cardinality of each of its methods input spaces.

For comparison, how might pure functional programs look?


Interesting that the testing complexity of a pure functional program is equivalent to that of object oriented programming done only with immutable classes. Each only has a single state so the whole product over options term drops away and we are left summing over all of the methods in our program. Now how many object oriented languages give nice semantics for immutability? That’s another question entirely.

Admittedly, is seems as though those very states when taken out could potentially turn into a roughly equal number of new classes thus giving no benefit in terms of test count. However, in my experience this is not the case. Will a given class state change the behavior of every single method? Also consider that if those states might change mid-computation things will become much more complex.

We seem to be stuck on the cardinality of our method or function inputs as our primary driver of testing complexity. Essentially this puts us back up to our original thought experiment, just in manageable chunks that are useful for pin-point debugging. We’ll never be able to test every state. The best you might do is using a random testing tool like QuickCheck.

As for languages which do not restrict inputs by type at all, they explode the testing space by forcing consideration of how every construct might interact with a given function. This could be seen as a new very large term in both the object-oriented and functional models above.

Finally, this makes me lament the lack of any kind of range value restriction in popular typed languages. Once we’ve done away with statefulness it’s the most glaringly obvious area where we can reduce complexity. The good news is we might start seeing some help from academia along these lines within the next few years.


11
Jun 12

NYC Progressive F# Tutorials 2012 in Retrospect

It was the best of times, it was the… Actually, it was all just fantastic.

On the beginner track Chris Marinos, Phil Trelford, and Tomas Petricek worked tirelessly in teaching F#. I was thoroughly impressed with the quality of their tutorials and would recommend seeking any of them out if you wish to become more proficient in F#. By the time we got to the contest at the end almost everyone was ready to compete with just a little help getting started.

On the advanced track attendees made type providers with Keith Battocchi, learned to use Adam Mlocek’s fantastic numerical computing library FCore (you can’t get any closer to Matlab syntax), and got some hands on time with the Microsoft Cloud Numerics team. Paulmichael Blasucci ran the contest on that side and noted that few had trouble and that the competition was just brutal.

On both sides the rooms were almost always packed and I didn’t hear a single complaint about the quality of the tutorials or speakers. Everyone was just thrilled to be there and stayed focused on learning the entire time. I’ve seen few conferences that kept the attendees so enthralled. Kudos to everyone involved for making such a great event happen.

Now, on to the links and materials (they’ll be updated as they come in).

Event Photos are up on Facebook!

Beginner Track:
- Chris Marinos’s F# Koans
- Phillip Trelford’s Writeup on Day 1 and Day 2

Advanced Track:
- Keith Battocchi’s Type Provider Tutorial Code and Slides/Docs
- Adam Mlocek and Tomas Petricek’s FCore Slides and Code
- Roope Astala’s Cloud Numerics Tutorial Slides and Tutorial Examples

Both Tracks:
I’m planning a Silverlight release of the contest code with the contestant sample AI in an upcoming blog post. For now, you can find the code in its github repo.

Social Media:
Chris Marinos’s Blog and Twitter
Phil Trelford’s Blog and Twitter
Tomas Petrice’s Blog and Twitter
Don Syme’s Blog and Twitter
The Cloud Numerics Blog
SkillsMatter on Twitter

Going Further:
F# Training in London and NY
FCore Library


15
Apr 12

What Microsoft MVP means to me

It wasn’t long after college that I found myself blogging about the technology I was using on a regular basis. I have pretty good writing skills and am damn good with the code so soon after I was easily breaking 10K hits per post. Having a platform to share my ideas and knowledge was exhilarating and fun, but really didn’t mean much career wise. I wasn’t particularly passionate about C# or the CLR and would have been just as happy blogging about Java.

But after about two years out of college everything changed. I went to a talk by Rich Hickey on Clojure. Rich walked us through a completely functional and massively parallel ant colony simulation which repeatedly blew my mind. Four hours after walking into that talk I came out a different person. I knew then that I had been going about this whole programming business wrong. I knew then that everything was much harder than necessary and I was wasting huge amounts of time grooming my object oriented garden.

Now, I worked for a .NET shop so Clojure at work was out of the question, but I seen posts around on a new language from Microsoft Research. It was also a functional language but had some different properties. Properties that could be used to get stronger guarantees on code correctness.

Over the next week I feverishly built my own ant colony simulation in F#. While I struggled with the new type system at first, I found the ML syntax to be a joy to read after the fact. The code was also remarkably robust, it was a simple matter to inject and swap between many different thread communication models. I soon became convinced that I had found something even better than Clojure. A passion grew inside me like I had never felt before. Other people had to know that there was a better way.

Soon I found myself giving talks at every Code Camp and user group meeting that would have me. Most others viewed my enthusiasm skeptically but I was pushed on by the open minded few who watched my presentations with enthralled attention. Of course, some meetings were better than others. After one particularly great night at a meeting in central Connecticut I had a line of about ten people who were just dieing to know more.

I also worked with Michael de la Maza and Talbot Crowell to start the first F# User Group. Getting speakers locally was a challenge so in most cases we resorted to having people speak over live meeting. I worked on this as much for myself as I did to help spread the word about F#. It was fantastic to hear from others using the language and to learn about things I have never even considered before. Even after moving on to NYC, I still reminisce about the early days and I’m still very proud of our loyal group members.

Now all of this, from learning F# to starting the group, had been only taken about a year. What followed was an even more overwhelming whirlwind of life changing events.

I’m not sure how it came about, but someone had noticed my passion. I was given a free trip to Microsoft PDC where I had dinner with the Microsoft language teams. Chris Smith carted me around and introduced me to everyone (he’s a very popular fellow). Conversation after conversation was loaded with interesting ideas and fresh perspectives. I had one of the best times of my life.

Then came the MVP, shortly followed by Professional F# 2.0, then the MVP Summit where I was able to spend a day with the F# team right in their offices! To spend so much time talking with the people I admired so much for building this wonderful language was a dream come true. And still, it continues with more conferences and events, meeting very smart people and hearing fantastic new ideas. The whirlwind hasn’t stopped, even three years later, and it’s been a fantastic ride.

I wouldn’t give it up for anything in the world.


10
Apr 12

F# Event Madness, Spring 2012 Edition

Upcoming Speaking Engagements:

Great Lakes Functional Programming Conference — May 5, Ann Arbor, MI

I’m very excited to be giving the Keynote at the first Great Lakes Functional Programming Conference, I’d suggest signing up but it’s already sold out!

Progressive F# Tutorials NYC - June 5/6, 2012 NYC

I’ve spent a ton of time over the last few months helping put together the first ever F# conference in the USA. Many of the most well known speakers from the F# community will be there giving hands on tutorials. There’s also both a beginner and advanced track so no matter your skill level you will certainly learn something.

CodeStock - June 15/16, 2012 Knoxville, TN

From what I hear, CodeStock is just about as much fun as you can legally have at a programmer conference. I’m proud to be giving one of four F# talks this year.

Recordings of Recent Events:

Why F#? With Richard Minerich and Phillip Trelford

Scott was kind enough to give us a shot at the “why should I use F# again?” question on his podcast. I hope Phil and I were able to convince at least a few folks to play with it a bit.

Barb: How I built a simple dynamic programming language in F#

I finally felt Barb was ready for show and tell at the NYC F# User Group. I would be grateful for any feedback you might have.


21
Jan 12

Musicians, Mechanics, and Mathematicians

Thank you all for your comments on my previous post, I appreciate the time you all took in sharing your perspectives very much.  Many of you have brought up great analogies to demonstrate how you feel and in reading these responses I realized I must not have been very clear.

There are some musical geniuses who have composed great works without having been taught even the basics of music theory. However, this doesn’t mean they’re not doing math. The human brain excels at building approximate mathematical models and a rare few minds are capable of building exceedingly complex ones. Still, formal knowledge of the patterns of music allow a musician to both play the same song in new and interesting ways and see the underlying connections between different pieces. As a composer it informs how rhythms and melody can be juxtaposed or fitted together to create a desired effect in a way much more profound than trial and error. It expands the musician’s mind and makes them better at what they do. 

Another great example is that of the wrench wielding mechanic. There are a great many mechanics who went to trade school and learned the high level basics of engines and a long list of procedures. They might not understand the details of combustion or material science but they can replace brake pads or swap in a new timing belt without too much difficulty. After many years of experience some may have even built a mental model so superb that they can take you for a spin around the block and tell you exactly what’s wrong.

And still, as the mechanic reaches for their bolts and wrench they might not think of the underlying mathematics of what they are about to perform. Yet, you can be sure the people who made those tools worried over it greatly. If they didn’t the wrench would not fit the head or worse, the bolt might shear under the stress applied. While they surely tested many bolts before shipping their product, they certainly didn’t before creating a formal model of how the tools were shaped or how they would perform. Even if the tools might happen to work without testing, they probably wouldn’t work very well and to sell tools made in this way would be grossly negligent.

Yet, I can’t be the only one who has suffered many near catastrophes at the hands of inept mechanics over the years. From the time a post-brake change air bubble in my brake line made my car roll out into traffic or the punctured gas tank that almost left me stranded at the side of the road. One might wonder if they even bothered testing their work.

Some might think programmers shouldn’t be beholden the same strictness as our creations aren’t usually capable of quite so much damage. Instead, the worst things most are liable to do are destroying important information, providing incorrect data to critical systems, leaking private information, sharing unsalted (or worse, unencrypted) passwords or causing people to become unable to access their bank or medical records. No big deal really, just shoveling data.

I’d love to see every programmer taking the time to learn deeply about the mathematical modeling of data and programs, but I know that’s not reasonable. However, it takes just a little bit of learning to leverage tools with very complex underlying mathematics made by others. You don’t need to be an expert in category theory to use Haskell any more than you need to be an expert in set theory to use SQL. F# and Scala are even more accessible as they have access to all of the libraries and patterns you would be familiar with as a programmer who works in .NET or Java.

So, I’m not asking that you go out and spend years studying your way to the equivalent of a PhD. Instead what I ask is that you just take a little time to understand what’s possible and then use tools made by people who do have that kind of deep understanding.

I know I wouldn’t want to drive a car with parts made by someone who didn’t use models, would you?

Huge thanks to @danfinch and @TheColonial for proof reading.