A Q & A with Pedro Domingos: Author of ‘The Master Algorithm’ Posted Today
Pedro Domingos , University of Washington professor of computer science and engineering, is the author of “The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World.”
A popular science romp through one of today’s hottest scientific topics, the book is an essential primer on machine learning. It unveils the deep ideas behind the algorithms that increasingly pick our books, find our dates, filter email, manage investments and run our lives — and what informed consumers and citizens ought to know about them.
Domingos, who will speak at Seattle’s Town Hall at 7:30 p.m. on Sept. 22, answered a few questions about the book.
What is machine learning, and how might a person encounter it in a typical day?
PD: Machine learning is the automation of discovery — computers learning by themselves by generalizing from data instead of having to be programmed by us. It’s like the scientific method on steroids: formulate hypotheses, test them against the data, refine them — except computers can do it millions of times faster than humans.
Google uses machine learning to decide which Web pages to show you, Amazon and Netflix to recommend books and movies, Twitter and Facebook to select posts for your feed. Siri uses learning algorithms to understand what you say and predict what you want to do. Spam filters use it as well. Retailers use it to decide which goods to stock and how to lay out their stores. If you receive a credit card offer, chances are a learning algorithm picked you. At many companies, when you apply for a job, a learning algorithm screens your resume. Online dating sites use machine learning to match their users — there are children alive today who wouldn’t have been born if not for machine learning. In other words, machine learning is involved in pretty much everything we do these days.
Why is it important for someone who isn’t a computer scientist to understand principles of machine learning?
PD: Learning algorithms make a lot of decisions on your behalf every day. As we just saw, they can determine not just what goods you buy but also whether you’ll get a job or even who your lifetime companion will be. If these algorithms are a black box to you, you have no control over where they will take you. Think of a car as an analogy: only engineers and mechanics need to understand how the engine works, but you need to know how to drive it. In the future cars will drive themselves, but you’ll have to know how to drive learning algorithms — and right now you probably don’t even know where the steering wheel or the pedals are.
Your book talks about what different “tribes” in machine learning research might contribute to curing cancer, and what their approaches lack. Why focus on that question?
PD: Curing cancer is one of the most important problems in the world — perhaps the most important problem — and machine learning has a big part to play in solving it. What makes cancer hard is that it’s not one disease, but many. Every patient’s cancer is different, and it mutates as it grows, so there’s no one-size-fits-all solution. The cure for cancer is a learning program that predicts which drug to use for which cancer by looking at the tumor’s genome, the patient’s genome and medical history, etc. But none of the current approaches to machine learning is able to solve the problem all by itself, so it’s a great illustration of both what each approach brings to the table and what it’s missing.
What is the difference between the algorithms that Netflix and Amazon use to recommend products you might like? Why is it important for consumers to be aware of these differences?
PD: Like every company, Netflix and Amazon each use the algorithms that best serve their purposes. Neflix loses money on blockbusters, so its recommendation system directs you to obscure British TV shows from the 70s, which cost it virtually nothing. The whole machine learning smarts is in picking shows for you that you’ll actually like even though you’ve never heard of them. Amazon, on the other hand, has no particular interest in recommending rare products that only sell in small quantities. Selling larger quantities of fewer products actually simplifies its logistics. So its recommendation system is based more on just how popular each product is in connection with the products you’ve bought before. The problem for you if you don’t know any of this is that you wind up doing what the companies want you to do, instead of what you want to do.
If you know — even just roughly — how the learning algorithms work, you can make them work for you by deliberately teaching them, by choosing the companies whose machine learning agrees best with you and by demanding that the learning algorithms let you explicitly say things like “This is what I want, not that,” and “Here’s where you went wrong.”
How did Obama’s chief scientist — who was a machine learning expert — use four simple questions to help win the 2012 election?
PD: Rayid Ghani and his team of data scientists used machine learning to predict the answers to four questions for each individual swing voter, using all the data about them they could get their hands on. The questions were: How likely is he to support Obama? To show up at the polls? To respond to the campaign’s reminders to do so? And to change his mind about the election based on a conversation about a specific issue? Then, every night, they ran a program called “the Optimizer” to choose which voters to target the following day based on the results of the machine learning. In contrast, Mitt Romney’s campaign used standard polling and targeted broad demographic categories like “suburban middle-aged woman.” The result? Even though the race was close, Obama carried all the swing states but one and won the election.
How is a machine learning expert more like a farmer than a factory worker?
PD: Factory-made goods have to be assembled piece by piece, step by step, all the way from the raw materials. In contrast, crops grow on their own, with a bit of help from the farmer. Traditional computer programs are like factory-made goods; software engineers write them line by line, which is an incredibly time-consuming and error-prone process. In contrast, a machine learning expert grows programs from data in the same way that a farmer grows crops from nutrients. In our case, the seeds are learning algorithms; and big data means the soil is incredibly fertile.
What is the relationship between machine learning and artificial intelligence?
PD: The goal of artificial intelligence is to get computers to do things that in the past required human intelligence. One of those things — perhaps the hallmark of human intelligence — is the ability to learn from experience. So machine learning is a subfield of artificial intelligence, but these days it’s so successful that it’s outgrown its proud parent and has become a stand-alone field, often known by other names like data science and predictive analytics.
Lots of plot lines have been built around sentient computers that go awry or take over the world or do harm. Is this something to worry about, or are there other potential dangers?
PD: “The Terminator” scenario of an evil AI deciding to take over the world and exterminate humanity is not really something to take seriously. It’s based on confusing being intelligent with being human, when in fact the two are very different things. The robots in the movies are always humans in disguise, but real robots aren’t. Computers could be infinitely intelligent and not pose any danger to us, provided we set the goals and all they do is figure out how to achieve them — like curing cancer.
On the other hand, computers can easily make serious mistakes by not understanding what we asked them to do or by not knowing enough about the real world, like the proverbial sorcerer’s apprentice. The cure for that is to make them more intelligent. People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world.
What has machine learning enabled university scientists and researchers to do that wouldn’t have been possible before?
PD: Machine learning is revolutionizing science by making it possible to understand much more complex phenomena than before. With it, we can apply the scientific method to vast quantities of data that no unaided human could hope to come to grips with. Biologists use machine learning to build models of the cell based on data from DNA sequencers, gene expression microarrays, and so on. Astronomers use it to automatically create catalogs of stars and galaxies from sky surveys. Physicists use it to suss out the new particles from the masses of data generated by particle colliders. Neuroscientists use it to build detailed maps of the brain, literally neuron by neuron. Social scientists use it to understand how large social networks, with millions or billions of people, behave. It’s not an exaggeration to say that machine learning and big data have ushered in a new era in science.
What is the “Master Algorithm” and how far are we from finding it?
PD: The Master Algorithm is a single algorithm capable of discovering all knowledge — past, present and future — from data. The human brain is a kind of master algorithm. So is evolution. Each has given rise to a different machine learning school, as have a number of other ideas, like symbolism. Each school has its own master algorithm: for the connectionists it’s something called backpropagation, for the symbolists it’s inverse deduction, and so on. But, as we saw, what we really need is a single algorithm that combines the capabilities of all of them. When will we find it? It’s hard to predict, because scientific progress is not linear. It could happen tomorrow, or it could take many decades. One of my fondest hopes in writing the eponymous book is that it will inspire a bright kid somewhere to come up with the key idea that we’ve all been missing — and make the Master Algorithm a reality, with all the extraordinary benefits for humanity that will follow.