Benedict Evan’s article about Ways to Think About Machine Learning is so spot on that I need to quote:
What, then, are the washing machines of machine learning, for real companies? I think there are two sets of tools for thinking about this. The first is to think in terms of a procession of types of data and types of question:
- Machine learning may well deliver better results for questions you’re already asking about data you already have, simply as an analytic or optimization technique. For example, our portfolio company Instacart built a system to optimize the routing of its personal shoppers through grocery stores that delivered a 50% improvement (this was built by just three engineers, using Google’s open-source tools Keras and Tensorflow).
- Machine learning lets you ask new questions of the data you already have. For example, a lawyer doing discovery might search for ‘angry’ emails, or ‘anxious’ or anomalous threads or clusters of documents, as well as doing keyword searches,
- Third, machine learning opens up new data types to analysis – computers could not really read audio, images or video before and now, increasingly, that will be possible.
Five years ago, if you gave a computer a pile of photos, it couldn’t do much more than sort them by size. A ten year old could sort them into men and women, a fifteen year old into cool and uncool and an intern could say ‘this one’s really interesting’. Today, with ML, the computer will match the ten year old and perhaps the fifteen year old. It might never get to the intern. But what would you do if you had a million fifteen year olds to look at your data? What calls would you listen to, what images would you look at, and what file transfers or credit card payments would you inspect?
Indeed, I think one could propose a whole list of unhelpful ways of talking about current developments in machine learning. For example:
- Data is the new oil
- Google and China (or Facebook, or Amazon, or BAT) have all the data
- AI will take all the jobs
- And, of course, saying AI itself.
More useful things to talk about, perhaps, might be:
- Enabling technology layers
- Relational databases.
Google ‘has all the data’, or that IBM has an actual thing called ‘Watson’. Really, this is always the mistake in looking at automation: with each wave of automation, we imagine we’re creating something anthropomorphic or something with general intelligence.
By the 1990s, pretty much all enterprise software was a relational database – PeopleSoft and CRM and SuccessFactors and dozens more all ran on relational databases. No-one looked at SuccessFactors or Salesforce and said “that will never work because Oracle has all the database” – rather, this technology became an enabling layer that was part of everything.
So, this is a good grounding way to think about ML today – it’s a step change in what we can do with computers, and that will be part of many different products for many different companies. Eventually, pretty much everything will have ML somewhere inside and no-one will care.