Episode 86, August 21, 2019
The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.
On today’s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik’s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world.
T.J. Hazen: Most of the questions are fact-based questions like, who did something, or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. How can we make this technology work for real problems that our enterprise customers are bringing in?
Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.
Host: The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.
On today’s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik’s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world. That and much more on this episode of the Microsoft Research Podcast.
Host: T.J. Hazen, welcome to the podcast!
T.J. Hazen: Thanks for having me.
Host: Researchers like to situate their research, and I like to situate my researchers so let’s get you situated. You are a Senior Principal Research Manager in the Engineering and Applied Research group at Microsoft Research in Montreal. Tell us what you do there. What are the big questions you’re asking, what are the big problems you’re trying to solve, what gets you up in the morning?
T.J. Hazen: Well, I’ve spent my whole career working in speech and language understanding, and I think the primary goal of everything I do is to try to be able to answer questions. So, people have questions and we’d like the computer to be able to provide answers. So that’s sort of the high-level goal, how do we go about answering questions? Now, answers can come from many places.
T.J. Hazen: A lot of the systems that you’re probably aware of like Siri for example, or Cortana or Bing or Google, any of them…
T.J. Hazen: …the answers typically come from structured places, databases that contain information, and for years these models have been built in a very domain-specific way. If you want to know the weather, somebody built a system to tell you about the weather.
T.J. Hazen: And somebody else might build a system to tell you about the age of your favorite celebrity and somebody else might have written a system to tell you about the sports scores, and each of them can be built to handle that very specific case. But that limits the range of questions you can ask because you have to curate all this data, you have to put it into structured form. And right now, what we’re worried about is, how can you answer questions more generally, about anything? And the internet is a wealth of information. The internet has got tons and tons of documents on every topic, you know, in addition to the obvious ones like Wikipedia. If you go into any enterprise domain, you’ve got manuals about how their operation works. You’ve got policy documents. You’ve got financial reports. And it’s not typical that all this information is going to be curated by somebody. It’s just sitting there in text. So how can we answer any question about anything that’s sitting in text? We don’t have a million or five million or ten million librarians doing this for us…
T.J. Hazen: …uhm, but the information is there, and we need a way to get at it.
Host: Is that what you are working on?
T.J. Hazen: Yes, that’s exactly what we’re working on. I think one of the difficulties with today’s systems is, they seem really smart…
T.J. Hazen: Sometimes. Sometimes they give you fantastically accurate answers. But then you can just ask a slightly different question and it can fall on its face.
T.J. Hazen: That’s the real gap between what the models currently do, which is, you know, really good pattern matching some of the time, versus something that can actually understand what your question is and know when the answer that it’s giving you is correct.
Host: Let’s talk a bit about your group, which, out of Montreal, is Engineering and Applied Research. And that’s an interesting umbrella at Microsoft Research. You’re technically doing fundamental research, but your focus is a little different from some of your pure research peers. How would you differentiate what you do from others in your field?
T.J. Hazen: Well, I think there’s two aspects to this. The first is that the lab up in Montreal was created as an offshoot of an acquisition. Microsoft bought Maluuba, which was a startup that was doing really incredible deep learning research, but at the same time they were a startup and they needed to make money. So, they also had this very talented engineering team in place to be able to take the research that they were doing in deep learning and apply it to problems where it could go into products for customers.
T.J. Hazen: When you think about that need that they had to actually build something, you could see why they had a strong engineering team.
T.J. Hazen: Now, when I joined, I wasn’t with them when they were a startup, I actually joined them from Azure where I was working with outside customers in the Azure Data Science Solution team, and I observed lots of problems that our customers have. And when I saw this new team that we had acquired and we had turned into a research lab in Montreal, I said I really want to be involved because they have exactly the type of technology that can solve customer problems and they have this engineering team in place that can actually deliver on turning from a concept into something real.
T.J. Hazen: So, I joined, and I had this agreement with my manager that we would focus on real problems. They were now part of the research environment at Microsoft, but I said that doesn’t restrict us on thinking about blue sky, far-afield research. We can go and talk to product teams and say what are the real problems that are hindering your products, you know, what are the difficulties you have in actually making something real? And we could focus our research to try to solve those difficult problems. And if we’re successful, then we have an immediate product that could be beneficial.
Host: Well in any case, you’re swimming someplace in a “we could do this immediately” but you have permission to take longer, or is there a mandate, as you live in this engineering and applied research group?
T.J. Hazen: I think there’s a mandate to solve hard problems. I think that’s the mandate of research. If it wasn’t a hard problem, then somebody…
Host: …would already have a product.
T.J. Hazen: …in the product team would already have a solution, right? So, we do want to tackle hard problems. But we also want to tackle real problems. That’s, at least, our focus of our team. And there’s plenty of people doing blue sky research and that’s an absolute need as well. You know, we can’t just be thinking one or two years ahead. Research should be also be thinking five, ten, fifteen years ahead.
Host: So, there’s a whole spectrum there.
T.J. Hazen: So, there’s a spectrum. But there is a real need, I think, to fill that gap between taking an idea that works well in a lab and turning it into something that works well in practice for a real problem. And that’s the key. And many of the problems that have been solved by Microsoft have not just been blue sky ideas, but they’ve come from this problem space where a real product says, ahh, we’re struggling with this. So, it could be anything. It can be, like, how does Bing efficiently rank documents over billions of documents? You don’t just solve that problem by thinking about it, you have to get dirty with the data, you have to understand what the real issues are. So, many of these research problems that we’re focusing on, and we’re focusing on, how do you answer questions out of documents when the questions could be arbitrary, and on any topic? And you’ve probably experienced this, if you are going into a search site for your company, that company typically doesn’t have the advantage of having a big Bing infrastructure behind it that’s collecting all this data and doing sophisticated machine learning. Sometimes it’s really hard to find an answer to your question. And, you know, the tricks that people use can be creative and inventive but oftentimes, trying to figure out what the right keywords are to get you to an answer is not the right thing.
Host: You work closely with engineers on the path from research to product. So how does your daily proximity to the people that reify your ideas as a researcher impact the way you view, and do, your work as a researcher?
T.J. Hazen: Well, I think when you’re working in this applied research and engineering space, as opposed to a pure research space, it really forces you to think about the practical implications of what you’re building. How easy is it going to be for somebody else to use this? Is it efficient? Is it going to run at scale? All of these problems are problems that engineers care a lot about. And sometimes researchers just say, let me solve the problem first and everything else is just engineering. If you say that to an engineer, they’ll be very frustrated because you don’t want to bring something to an engineer that works ten times slower than needs to be, uses ten times more memory. So, when you’re in close proximity to engineers, you’re thinking about these problems as you are developing your methods.
Host: Interesting, because those two things, I mean, you could come up with a great idea that would do it and you pay a performance penalty in spades, right?
T.J. Hazen: Yeah, yeah. So, sometimes it’s necessary. Sometimes you don’t know how to do it and you just say let me find a solution that works and then you spend ten years actually trying to figure out how to make it work in a real product.
T.J. Hazen: And I’d rather not spend that time. I’d rather think about, you know, how can I solve something and have it be effective as soon as possible?
Host: Let’s talk about human language technologies. They’ve been referred to by some of your colleagues as “the crown jewel of AI.” Speech and language comprehension is still a really hard problem. Give us a lay of the land, both in the field in general and at Microsoft Research specifically. What’s hope and what’s hype, and what are the common misconceptions that run alongside the remarkable strides you actually are making?
T.J. Hazen: I think that word we mentioned already: understand. That’s really the key of it. Or comprehend is another way to say it. What we’ve developed doesn’t really understand, at least when we’re talking about general purpose AI. So, the deep learning mechanisms that people are working on right now that can learn really sophisticated things from examples. They do an incredible job of learning specific tasks, but they really don’t understand what they’re learning.
T.J. Hazen: So, they can discover complex patterns that can associate things. So in the vision domain, you know, if you’re trying to identify objects, and then you go in and see what the deep learning algorithm has learned, it might have learned features that are like, uh, you know, if you’re trying to identify a dog, it learns features that would say, oh, this is part of a leg, or this is part of an ear, or this is part of the nose, or this is the tail. It doesn’t know what these things are, but it knows they all go together. And the combination of them will make a dog. And it doesn’t know what a dog is either. But the idea that you could just feed data in and you give it some labels, and it figures everything else out about how to associate that label with that, that’s really impressive learning, okay? But it’s not understanding. It’s just really sophisticated pattern-matching. And the same is true in language. We’ve gotten to the point where we can answer general-purpose questions and it can go and find the answer out of a piece of text, and it can do it really well in some cases, and like, some of the examples we’ll give it, we’ll give it “who” questions and it learns that “who” questions should contain proper names or names of organizations. And “when” questions should express concepts of time. It doesn’t know anything about what time is, but it’s figured out the patterns about, how can I relate a question like “when” to an answer that contains time expression? And that’s all done automatically. There’s no features that somebody sits down and says, oh, this is a month and a month means this, and this is a year, and a year means this. And a month is a part of a year. Expert AI systems of the past would do this. They would create ontologies and they would describe things about how things are related to each other and they would write rules. And within limited domains, they would work really, really well if you stayed within a nice, tightly constrained part of that domain. But as soon as you went out and asked something else, it would fall on its face. And so, we can’t really generalize that way efficiently. If we want computers to be able to learn arbitrarily, we can’t have a human behind the scene creating an ontology for everything. That’s the difference between understanding and crafting relationships and hierarchies versus learning from scratch. We’ve gotten to the point now where the algorithms can learn all these sophisticated things, but they really don’t understand the relationships the way that humans understand it.
Host: Go back to the, sort of, the lay of the land, and how I sharpened that by saying, what’s hope and what’s hype? Could you give us a “TBH” answer?
T.J. Hazen: Well, what’s hope is that we can actually find reasonable answers to an extremely wide range of questions. What’s hype is that the computer will actually understand, at some deep and meaningful level, what this answer actually means. I do think that we’re going to grow our understanding of algorithms and we’re going to figure out ways that we can build algorithms that could learn more about relationships and learn more about reasoning, learn more about common sense, but right now, they’re just not at that level of sophistication yet.
Host: All right. Well let’s do the podcast version of your NERD Lunch and Learn. Tell us what you are working on in machine reading comprehension, or MRC, and what contributions you are making to the field right now.
T.J. Hazen: You know, NERD is short for New England Research and Development Center…
Host: I did not!
T.J. Hazen: …which is where I physically work.
T.J. Hazen: Even though I work closely and am affiliated with the Montreal lab, I work out of the lab in Cambridge, Massachusetts, and NERD has a weekly Lunch and Learn where people present the work they’re doing, or the research that they’re working on, and at one of these Lunch and Learns, I gave this talk on machine reading comprehension. Machine reading comprehension, in its simplest version, is being able to take a question and then being able to find the answer anywhere in some collection of text. As we’ve already mentioned, it’s not really “comprehending” at this point, it’s more just very sophisticated pattern-matching. But it works really well in many circumstances. And even on tasks like the Stanford Question Answering Dataset, it’s a common competition that people have competed in, question answering, by computer, has achieved a human level of parity on that task.
T.J. Hazen: Okay. But that task itself is somewhat simple because most of the questions are fact-based questions like, who did something or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. Like, how can we make this technology work for real problems that our enterprise customers are bringing in? So, we have customers coming in saying, I want to be able to answer any question in our financial policies, or our auditing guidelines, or our operations manual. And people don’t ask “who” or “when” questions of their operations manual. They ask questions like, how do I do something? Or explain some process to me. And those answers are completely different. They tend to be longer and more complex and you don’t always, necessarily, find a short, simple answer that’s well situated in some context.
T.J. Hazen: So, our focus at MSR Montreal is to take this machine reading comprehension technology and apply it into these new areas where our customers are really expressing that there’s a need.
Host: Well, let’s go a little deeper, technically, on what it takes to enable or teach machines to answer questions, and this is key, with limited data. That’s part of your equation, right?
T.J. Hazen: Right, right. So, when we go to a new task, uh, so if a company comes to us and says, oh, here’s our operations manual, they often have this expectation, because we’ve achieved human parity on some dataset, that we can answer any question out of that manual. But when we test the general-purpose models that have been trained on these other tasks on these manuals, they don’t generally work well. And these models have been trained on hundreds of thousands, if not millions, of examples, depending on what datasets you’ve been using. And it’s not reasonable to ask a company to collect that level of data in order to be able to answer questions about their operations manual. But we need something. We need some examples of what are the types of questions, because we have to understand what types of questions they ask, we need to understand the vocabulary. We’ll try to learn what we can from the manual itself. But without some examples, we don’t really understand how to answer questions in these new domains. But what we discovered through some of the techniques that are available, transfer learning is what we refer to as sort of our model adaptation, how do you learn from data in some new domain and take an existing model and make it adapt to that domain? We call that transfer learning. We can actually use transfer learning to do really well in a new domain without requiring a ton of data. So, our goal is to have it be examples like hundreds of examples, not tens of thousands of examples.
Host: How’s that working now?
T.J. Hazen: It works surprisingly well. I’m always amazed at how well these machine learning algorithms work with all the techniques that are available now. These models are very complex. When we’re talking about our question answering model, it has hundreds of millions of parameters and what you’re talking about is trying to adjust a model that is hundreds of millions of parameters with only hundreds of examples and, through a variety of different techniques where we can avoid what we call overfitting, we can allow the generalizations that are learned from all this other data to stay in place while still adapting it so it does well in this specific domain. So, yeah, I think we’re doing quite well. We’re still exploring, you know, what are the limits?
T.J. Hazen: And we’re still trying to figure out how to make it work so that an outside company can easily create the dataset, put the dataset into a system, push a button. The engineering for that and the research for that is still ongoing, but I think we’re pretty close to being able to, you know, provide a solution for this type of problem.
Host: All right. Well I’m going to push in technically because to me, it seems like that would be super hard for a machine. We keep referring to these techniques… Do we have to sign an NDA, as listeners?
T.J. Hazen: No, no. I can explain stuff that’s out…
Host: Yeah, do!
T.J. Hazen: … in the public domain. So, there are two common underlying technical components that make this work. One is called word embeddings and the other is called attention. Word embeddings are a mechanism where it learns how to take words or phrases and express them in what we call vector space.
T.J. Hazen: So, it turns them into a collection of numbers. And it does this by figuring out what types of words are similar to each other based on the context that they appear in, and then placing them together in this vector space, so they’re nearby each other. So, we would learn, that let’s say, city names are all similar because they appear in similar contexts. And so, therefore, Boston and New York and Montreal, they should all be close together in this vector space.
T.J. Hazen: And blue and red and yellow should be close together. And then advances were made to figure this out in context. So that was the next step, because some words have multiple meanings.
T.J. Hazen: So, you know, if you have a word like apple, sometimes it refers to a fruit and it should be near orange and banana, but sometimes it refers to the company and it should be near Microsoft and Google. So, we’ve developed context dependent ones, so that says, based on the context, I’ll place this word into this vector space so it’s close to the types of things that it really represents in that context.
T.J. Hazen: That’s the first part. And you can learn these word embeddings from massive amounts of data. So, we start off with a model that’s learned on far more data than we actually have question and answer data for. The second part is called attention and that’s how you associate things together. And it’s the attention mechanisms that learn things like a word like “who” has to attend to words like person names or company names. And a word like “when” has to attend to…
T.J. Hazen: …time. And those associations are learned through this attention mechanism. And again, we can actually learn on a lot of associations between things just from looking at raw text without actually having it annotated.
T.J. Hazen: Once we’ve learned all that, we have a base, and that base tells us a lot about how language works. And then we just have to have it focus on the task, okay? So, depending on the task, we might have a small amount of data and we feed in examples in that small amount, but it takes advantage of all the stuff that it’s learned about language from all these, you know, rich data that’s out there on the web. And so that’s how it can learn these associations even if you don’t give it examples in your domain, but it’s learned a lot of these associations from all the raw data.
T.J. Hazen: And so, that’s the base, right? You’ve got this base of all this raw data and then you train a task-specific thing, like a question answering system, but even then, what we find is that, if we train a question answering system on basic facts, it doesn’t always work well when you go to operation manuals or other things. So, then we have to have it adapt.
T.J. Hazen: But, like I said, that base is very helpful because it’s already learned a lot of characteristics of language just by observing massive amounts of text.
Host: I’d like you to predict the future. No pressure. What’s on the horizon for machine reading comprehension research? What are the big challenges that lie ahead? I mean, we’ve sort of laid the land out on what we’re doing now. What next?
T.J. Hazen: Yeah. Well certainly, more complex questions. What we’ve been talking about so far is still fairly simple in the sense that you have a question, and we try to find passages of text that answer that question. But sometimes a question actually requires that you get multiple pieces of evidence from multiple places and you somehow synthesize them together. So, a simple example we call the multi-hop example. If I ask a question like, you know, where was Barack Obama’s wife born? I have to figure out first, who is Barack Obama’s wife? And then I have to figure out where she was born. And those pieces of information might be in two different places.
T.J. Hazen: So that’s what we call a multi-hop question. And then, sometimes, we have to do some operation on the data. So, you could say, you know like, what players, you know, from one Super Bowl team also played on another Super Bowl team? Well there, what you have to do is, you have to get the list of all the players from both teams and then you have to do an intersection between them to figure out which ones are the same on both. So that’s an operation on the data…
T.J. Hazen: …and you can imagine that there’s lots of questions like that where the information is there, but it’s not enough to just show the person where the information is. You also would like to go a step further and actually do the computation for that. That’s a step that we haven’t done, like, how do you actually go from mapping text to text, and saying these two things are associated, to mapping text to some sequence of operations that will actually give you an exact answer. And, you know, it can be quite difficult. I can give you a very simple example. Like, just answering a question, yes or no, out of text, is not a solved problem. Let’s say I have a question where someone says, I’m going to fly to London next week. Am I allowed to fly business class according to my policies from my company, right? We can have a system that would be really good at finding the section of the policy that says, you know, if you are a VP-level or higher and you are flying overseas, you can fly business class, otherwise, no. Okay? But, you know, if we actually want the system to answer yes or no, we have to actually figure out all the details, like okay, who’s asking the question? Are they a VP? Where are they located? Oh, they’re in New York. What does flying overseas mean??
Host: Right. They’re are layers.
T.J. Hazen: Right. So that type of comprehension, you know, we’re not quite there yet for all types of questions. Usually these things have to be crafted by hand for specific domains. So, all of these things about how can you answer complex questions, and even simple things like common sense, like, things that we all know… Um. And so, my manager, Andrew McNamara, he was supposed to be here with us, one of his favorite examples is this concept of coffee being black. But if you spill coffee on your shirt, do you have a black stain on your shirt? No, you’ve got a brown stain on your shirt. And that’s just common knowledge. That is, you know, a common-sense thing that computers may not understand.
Host: You’re working on research, and ultimately products or product features, that make people think they can talk to their machines and that their machines can understand and talk back to them. So, is there anything you find disturbing about this? Anything that keeps you up at night? And if so, how are you dealing with it?
T.J. Hazen: Well, I’m certainly not worried about the fact that people can ask questions of the computer and the computer can give them answers. What I’m trying to get at is something that’s helpful and can help you solve tasks. In terms of the work that we do, yeah, there are actually issues that concern me. So, one of the big ones is, even if a computer can say, oh, I found a good answer for you, here’s the answer, it doesn’t know anything about whether that answer is true. If you go and ask your computer, was the Holocaust real? and it finds an article on the web that says no, the Holocaust was a hoax, do I want my computer to show that answer? No, I don’t. But…
Host: Or the moon landing…!
T.J. Hazen: …if all you are doing is teaching the computer about word associations, it might think that’s a perfectly reasonable answer without actually knowing that this is a horrible answer to be showing. So yeah, the moon landing, vaccinations… The easy way that people can defame people on the internet, you know, even if you ask a question that might seem like a fact-based question, you can get vast differences of opinion on this and you can get extremely biased and untrue answers. And how does a computer actually understand that some of these things are not things that we should represent as truth, right? Especially if your goal is to find a truthful answer to a question.
Host: All right. So, then what do we do about that? And by we, I mean you!
T.J. Hazen: Well, I have been working on this problem a little bit with the Bing team. And one of the things that we discovered is that if you can determine that a question is phrased in a derogatory way, that usually means the search results that you’re going to get back are probably going to be phrased in a derogatory way. So, even if we don’t understand the answer, we can just be very careful about what types of questions we actually want to answer.
Host: Well, what does the world look like if you are wildly successful?
T.J. Hazen: I want the systems that we build to just make life easier for people. If you have an information task, the world is successful if you get that piece of information and you don’t have to work too hard to get it. We call it task completion. If you have to struggle to find an answer, then we’re not successful. But if you can ask a question, and we can get you the answer, and you go, yeah, that’s the answer, that’s success to me. And we’ll be wildly successful if the types of things where that happens become more and more complex. You know, where if someone can start asking questions where you are synthesizing data and computing answers from multiple pieces of information, for me, that’s the wildly successful part. And we’re not there yet with what we’re going to deliver into product, but it’s on the research horizon. It will be incremental. It’s not going to happen all at once. But I can see it coming, and hopefully by the time I retire, I can see significant progress in that direction.
Host: Off script a little… will I be talking to my computer, my phone, a HoloLens? Who am I asking? Where am I asking? What device? Is that so “out there” as well?
T.J. Hazen: Uh, yeah, I don’t know how to think about where devices are going. You know, when I was a kid, I watched the original Star Trek, you know, and everything on there, it seemed like a wildly futuristic thing, you know? And then fifteen, twenty years later, everybody’s got their own little “communicator.”
Host: Oh my gosh.
T.J. Hazen: And so, uh, you know, the fact that we’re now beyond where Star Trek predicted we would be, you know, that itself, is impressive to me. So, I don’t want to speculate where the devices are going. But I do think that this ability to answer questions, it’s going to get better and better. We’re going to be more interconnected. We’re going to have more access to data. The range of things that computers will be able to answer is going to continue to expand. And I’m not quite sure exactly what it looks like in the future, to be honest, but, you know, I know it’s going to get better and easier to get information. I’m a little less worried about, you know, what the form factor is going to be. I’m more worried about how I’m going to actually answer questions reliably.
Host: Well it’s story time. Tell us a little bit about yourself, your life, your path to MSR. How did you get interested in computer science research and how did you land where you are now working from Microsoft Research in New England for Montreal?
T.J. Hazen: Right. Well, I’ve never been one to long-term plan for things. I’ve always gone from what I find interesting to the next thing I find interesting. I never had a really serious, long-term goal. I didn’t wake up some morning when I was seven and say, oh, I want to be a Principal Research Manager at Microsoft in my future! I didn’t even know what Microsoft was when I was seven. I went to college and I just knew I wanted to study computers. I didn’t know really what that meant at the time, it just seemed really cool.
T.J. Hazen: I had an Apple II when I was a kid and I learned how to do some basic programming. And then I, you know, was going through my course work. I was, in my junior year, I was taking a course in audio signal processing and in the course of that class, we got into a discussion about speech recognition, which to me was, again, it was Star Trek. It was something I saw on TV. Of course, now it was Next Generation….!
T.J. Hazen: But you know, you watch the next generation of Star Trek and they’re talking to the computer and the computer is giving them answers and here somebody is telling me you know there’s this guy over in the lab for computer science, Victor Zue, and he’s building systems that recognize speech and give answers to questions! And to me, that was science-fiction. So, I went over and asked the guy, you know, I heard you’re building a system, and can I do my bachelor’s thesis on this? And he gave me a demo of the system – it was called Voyager – and he asked a question, I don’t remember the exact question, but it was probably something like, show me a map of Harvard Square. And the system starts chugging along and it’s showing results on the screen as it’s going. And it literally took about two minutes for it to process the whole thing. It was long enough that he actually explained to me how the entire system worked while it was processing. But then it came back, and it popped up a map of Harvard Square on the screen. And I was like, ohhh my gosh, this is so cool, I have to do this! So, I did my bachelor’s thesis with him and then I stayed on for graduate school. And by seven years later, we had a system that was running in real time. We had a publicly available system in 1997 that you could call up on a toll-free number and you could ask for weather reports and weather information for anywhere in the United States. And so, the idea that it went from something that was “Star Trek” to something that I could pick up my phone, call a number and, you know, show my parents, this is what I’m working on, it was astonishing how fast that developed! I stayed on in that field with that research group. I was at MIT for another fifteen years after I graduated. At some point, a lot of the things that we were doing, they moved from the research lab to actually being real.
T.J. Hazen: So, like twenty years after I went and asked to do my bachelor’s thesis, Siri comes out, okay? And so that was our goal. They were like, twenty years ago, we should be able to have a device where you can talk to it and it gives you answers and twenty years later there it was. So, that, for me, that was a queue that maybe it’s time to go where the action is, which was in companies that were building these things. Once you have a large company like Microsoft or Google throwing their resources behind these hard problems, then you can’t compete when you’re in academia for that space. You know, you have to move on to something harder and more far out. But I still really enjoyed it. So, I joined Microsoft to work on Cortana…
T.J. Hazen: …when we were building the first version of Cortana. And I spent a few years working on that. I’ve worked on some Bing products. I then spent some time in Azure trying to transfer these things so that companies that had the similar types of problems could solve their problems on Azure with our technology.
Host: And then we come full circle to…
T.J. Hazen: Then full circle, yeah. You know, once I realized that some of the stuff that customers were asking for wasn’t quite ready yet, I said, let me go back to research and see if I can improve that. It’s fantastic to see something through all the way to product, but once you’re successful and you have something in a product, it’s nice to then say, okay, what’s the next hard problem? And then start over and work on the next hard problem.
Host: Before we wrap up, tell us one interesting thing about yourself, maybe it’s a trait, a characteristic, a life event, a side quest, whatever… that people might not know, or be able to find on a basic web search, that’s influenced your career as a researcher?
T.J. Hazen: Okay. You know, when I was a kid, maybe about eleven years old, the Rubik’s Cube came out. And I got fascinated with it. And I wanted to learn how to solve it. And a kid down the street from my cousin had taught himself from a book how to solve it. And he taught me. His name was Jonathan Cheyer. And he was actually in the first national speed Rubik’s Cube solving competition. It was on this TV show, That’s Incredible. I don’t know if you remember that TV show.
Host: I do.
T.J. Hazen: It turned out what he did was, he had learned what is now known as the simple solution. And I learned it from him. And I didn’t realize it until many years later, but what I learned was an algorithm. I learned, you know, a sequence of steps to solve a problem. And once I got into computer science, I discovered all that problem-solving I was doing with the Rubik’s Cube and figuring out what are the steps to solve a problem, that’s essentially what things like machine learning are doing. What are the steps to figure out, what are the features of something, what are the steps I have to do to solve the problem? I didn’t realize that at the time, but the idea of being able to break down a hard problem like solving a Rubik’s Cube, and figuring out what are the stages to get you there, is interesting. Now, here’s the interesting fact. So, Jonathan Cheyer, his older brother is Adam Cheyer. Adam Cheyer is one of the co-founders of Siri.
Host: Oh my gosh. Are you kidding me?
T.J. Hazen: So, I met the kid when I was young, and we didn’t really stay in touch. I discovered, you know, many years later that Adam Cheyer was actually the older brother of this kid who taught me the Rubik’s Cube years and years earlier, and Jonathan ended up at Siri also. So, it’s an interesting coincidence that we ended up working in the same field after all those years from this Rubik’s Cube connection!
Host: You see, this is my favorite question now because I’m getting the broadest spectrum of little things that influenced and triggered something…!
Host: At the end of every podcast, I give my guests a chance for the proverbial last word. Here’s your chance to say anything you want to would-be researchers, both applied and other otherwise, who might be interested in working on machine reading comprehension for real-world applications.
T.J. Hazen: Well, I could say all the things that you would expect me to say, like you should learn about deep learning algorithms and you should possibly learn Python because that’s what everybody is using these days, but I think the single most important thing that I could tell anybody who wants to get into a field like this is that you need to explore it and you need to figure out how it works and do something in depth. Don’t just get some instruction set or some high-level overview on the internet, run it on your computer and then say, oh, I think I understand this. Like get into the nitty-gritty of it. Become an expert. And the other thing I could say is, of all the people I’ve met who are extremely successful, the thing that sets them apart isn’t so much, you know, what they learned, it’s the initiative that they took. So, if you see a problem, try to fix it. If you see a problem, try to find a solution for it. And I say this to people who work for me. If you really want to have an impact, don’t just do what I tell you to do, but explore, think outside the box. Try different things. OK? I’m not going to have the answer to everything, so therefore, if I don’t have the answer to everything, then if you’re only doing what I’m telling you to do, then we both, together, aren’t going to have the answer. But if you explore things on your own and take the initiative and try to figure out something, that’s the best way to really be successful.
Host: T.J. Hazen, thanks for coming in today, all the way from the east coast to talk to us. It’s been delightful.
T.J. Hazen: Thank you. It’s been a pleasure.
To learn more about Dr. T.J. Hazen and how researchers and engineers are teaching machines to answer complicated questions, visit Microsoft.com/research