The Promise of F# Language Type Providers

In most software domains you can safely stick with one or two languages and, because the tools you are using are fairly easy to replicate, you’ll find almost anything you might need to finish your project. This isn’t true in data science and data engineering however. Whether it be some hyper-optimized data structure or a cutting edge machine learning technique often you only have a single language or platform choice.

Even worse, when you want to build a system that uses one or more platform specific components, things can become quite an engineering mess. No matter what you do you can’t avoid the high cost of serialization and marshaling. This makes some combinations of tools non-options for some problems. You often make trade-offs that you shouldn’t need to make, for example using a worse algorithm just because the better option hasn’t been written for your platform.

In .NET this is a particularly bad problem. There are quite a few dedicated people working on open source libraries, but they are tiny in number compared to the Matlab, Python, R or Java communities. Meanwhile, Microsoft research has several fantastic libraries with overly restrictive licenses that make them impossible to use commercially. These libraries drive away academic competition, but at the same time can’t be used outside of academia. It’s a horrible situation.

Thankfully, there is a silver lining in this dark cloud. With the release of F# 3.0 in VS 2012 we were given a new language feature called Type Providers. Type Providers are compiler plugins that generate types at compile time and can run arbitrary code to do it. Initially, these were designed for access databases and getting types from the schema for free, but when Howard Mansell released the R Language Type Provider everything changed. We now realized that we had a way to build slick typed APIs on top of almost any other language.

This means that now it doesn’t matter if someone has written the algorithm or data structure for our platform as long as there’s a Type Provider for a platform where it has been done. The tedious work of building lots of little wrapped sub-programs is completely gone. It shouldn’t even matter if the kind of calculation you’d like to do is fast on your native platform, as you can just transparently push it to another. Of course, we still must pay the price of marshaling, but we can do it in a controlled way by dealing in handles to the other platform’s variables.

The language Type Providers themselves are a bit immature for the moment but the idea is sound and the list is growing. There is now the beginnings of an IKVM Type Provider (for Java) and I’m working on a Matlab Type Provider. The Matlab Provider doesn’t yet have all of the functionality I am aiming for, but I’ve been working on it for several months and it’s quite usable now. All that’s left is for someone to start in on a Python type provider and we’ll practically have all of the data science bases covered.

It’s an exciting time to be an F#’er.

Enjoy this post? Continue the conversation with me on twitter.

Tags: , , , , , , ,

10 comments

  1. Nice article, looking forward to more on applications/limitations of type providers.

    Possible typo:

    “Of course, we still must pay the price of marshaling, but we can do it in a controlled way by dealing in handles the other platform’s variables.”

  2. Rachel Reese and I were chatting about Type Providers at CodeStock. I want one that takes advantage of conventions in my database. Every table in my schema has a “base class” if you will that contains an Active bit. Some tables also have an EffectiveDate and a TerminationDate to indicate that rows are in effect at the time of the query. I envision a type provider that knows how to use these bits of data to refine the generated queries. In your experience, how hard would it be for me to mod the SQL type provider? I’d like to be able to add computation expressions to the provider to make the rule set generic.

    • This sounds interesting, and totally possible, but modifying the SQL provider isn’t going to be easy. The one downside to Type Providers in general is that they’re not extensible. Also, the SQL providers are particularly complex because they implement enhanced computation expressions to facilitate the linq syntax. If you feel like poking around, the source is on the F# github repo: https://github.com/fsharp/fsharp/tree/master/src/fsharp/FSharp.Data.TypeProviders

      • Thanks, I’ll add that to the list of things I’d love to work on. The idea is simple but powerful: write an expression that handles active record strategy for your data source, add it to the standard provider and go. My clients use a multi-stage data retention policy that typically involves keeping some data in the warehouse for a while after it has expired. Being able to filter out these inactive records automatically with a rule would be quite helpful.

  3. Hi,

    Great to see the enthusiasm behind the article. I like your broad brush analysis of the horror we face too often.

    Thanks.

    I also liked your mention of Infer.NET. I was working on some closely related systems a couple of decades ago. Very powerful, especially when you look beyond probability distributions and look at more powerful representations of the computation! You’re right it has enormous potential in F#. For my money I can envisage something that’s “within” F# in a deeper way. Thanks again.

    Maybe another typo, was “Initially, these were designed for accessing databases” intended?

  4. Very cool indeed. I saw your talk at qcon and I can see how leveraging all these different platforms would make life better. Do you know of any good resources for learning how to create type providers?

  5. [...] Richard Minerich posted “The Promise of F# Language Type Providers“. [...]

Leave a comment