Tag: Entity Resolution
-
How Machine Learning Products are Different (Part 2, Entity Resolution Checklists)
Last time I talked a bit about the context of my experience with Machine Learning products and the high level issues we had getting customers to switch to what was clearly a better product. This time I get into some technical examples from these checklists and try to demonstrate the conflict. The first and most…
-
How Machine Learning Products are Different (Part 1, High Level)
First some context. My direct experience over the last decade+ has been mostly B2B selling enterprise grade software to large corporates and so my perspective is skewed this way. I have had many conversations with folks working in the small business and consumer spaces and know those can be very different worlds. When you sell…
-
Developing an Algorithm in F#: Fast Rotational Alignments with Gosper’s Hack
This post is for the 7th day of the 2014 F# Advent Calendar. It has been said that functional languages can’t be as fast as their imperative cousins because of all of the allocation and garbage collection, this is patently false (as far as F# is concerned at least) largely because the CLR has value types.…
-
Bad Data is the Real Problem
Big data is the buzzword de jour, and why not? Companies like Google with huge server farms are doing amazing things leveraging huge amounts of data and processing power. It’s all very sexy but these researchers get to pick and choose the data they work with. They can maximize their research gains by pushing the cutting edge…
-
Record Linkage Algorithms in F# – Jaro-Winkler Distance (Part 2)
Last time we dove into the Jaro distance algorithm and picked apart how each of its components are calculated. However, from a modern perspective Jaro alone is a rather weak method of string matching. It was Winkler’s extension that brought this algorithm into widespread modern use. Matthew Jaro’s insight when inventing the Jaro distance algorithm was that…