Posted on Leave a comment

Microsoft and Nokia collaborate to accelerate digital transformation and Industry 4.0 for communications service providers and enterprises

Companies announce their first joint solutions combining Microsoft cloud, AI and machine learning expertise with Nokia’s leadership across mission-critical networking and communications

REDMOND, Wash., and ESPOO, Finland Nov. 5, 2019 Microsoft and Nokia today announced a strategic collaboration to accelerate transformation and innovation across industries with cloud, Artificial Intelligence (AI) and Internet of Things (IoT). By bringing together Microsoft cloud solutions and Nokia’s expertise in mission-critical networking, the companies are uniquely positioned to help enterprises and communications service providers (CSPs) transform their businesses. As Microsoft’s Azure, Azure IoT, Azure AI and Machine Learning solutions combine with Nokia’s LTE/5G-ready private wireless solutions, IP, SD-WAN, and IoT connectivity offerings, the companies will drive industrial digitalization and automation across enterprises, and enable CSPs to offer new services to enterprise customers.

BT is the first global communications service provider to offer its enterprise customers a managed service that integrates Microsoft Azure cloud and Nokia SD-WAN solutions. BT customers can access this through a customer automated delegated rights service, which enables BT to manage both the customer Azure vWAN and the unique Agile Connect SD-WAN, based on Nokia’s Nuage SD-WAN 2.0.

“Bringing together Microsoft’s expertise in intelligent cloud solutions and Nokia’s strength in building business and mission-critical networks will unlock new connectivity and automation scenarios,” said Jason Zander, executive vice president, Microsoft Azure. “We’re excited about the opportunities this will create for our joint customers across industries.”

“We are thrilled to unite Nokia’s mission-critical networks with Microsoft’s cloud solutions,” said Kathrin Buvac, President of Nokia Enterprise and Chief Strategy Officer. “Together, we will accelerate the digital transformation journey towards Industry 4.0, driving economic growth and productivity for both enterprises and service providers.”

The cloud and IoT have ushered in the fourth industrial revolution, or Industry 4.0, wherein enterprises are embracing data to automate and streamline processes across all aspects of their businesses. By joining forces, the two companies are bringing solutions to market that will simplify and accelerate this journey for enterprises, as well as enable CSPs to play a key role in helping their customers realize the potential of industrial digitalization and automation while also optimizing and better differentiating their own businesses.

Accelerating digital transformation for enterprises

Microsoft and Nokia are partnering to help accelerate digital transformation for enterprises by offering connectivity and Azure IoT solutions that unlock connected scenarios across multiple industries including digital factories, smart cities, warehouses, healthcare settings, and transportation hubs such as ports, airports and more.

The Nokia Digital Automation Cloud (Nokia DAC) 5G-ready industrial-grade private wireless broadband solution with on-premise Azure elements will enable a wide variety of secure industrial automation solutions that require more reliable connectivity, efficient coverage and better mobility than traditional Wi-Fi networks provide. For example, connected smart tools and machines on manufacturing floors that enable increased productivity, flexibility and safety for workers, or autonomous vehicles and robots in industrial environments that improve automation, efficiency and overall safety.

Enabling new enterprise services offered by service providers

Nokia’s Nuage SD-WAN 2.0 solution now enables service providers to offer integration with Microsoft Azure Virtual WAN for branch to cloud connectivity, with the companies planning to offer more options for branch internet connectivity in 2020. By automating branch and hybrid WAN connectivity, enterprises will have simplified, faster access to cloud applications such as Office 365, integrated security from branch-to-branch and branch-to-Azure and reduced risk of configuration errors causing security or connectivity issues.

Furthermore, the companies are integrating Nokia’s Worldwide IoT Network Grid (WING) with Azure IoT Central to make the onboarding, deployment, management and servicing of IoT solutions seamless. This integration provides CSPs with the opportunity to offer their enterprises a single platform including vertical solutions to enable secure connected IoT services, such as asset tracking and machine monitoring on a national or global scale. Enterprises will be able to use Azure IoT Central and partner solutions for faster and easier enablement and implementation of their IoT applications together with Nokia’s IoT connectivity solutions.

Driving digital transformation for CSPs

Microsoft and Nokia are collaborating to host Nokia’s Analytics, Virtualization and Automation (AVA) cognitive services solutions on Azure. These AI solutions will enable CSPs to move out of private data centers and into the Azure cloud to realize cost savings and transform operations for 5G. Predictive Video Analytics is an example of a joint solution that will ensure optimal video experiences for CSP subscribers, improving reliability by up to 60 percent.

About Microsoft

Microsoft (Nasdaq “MSFT” @microsoft) enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.

About Nokia

We create the technology to connect the world. We develop and deliver the industry’s only end-to-end portfolio of network equipment, software, services and licensing that is available globally. Our customers include communications service providers whose combined networks support 6.1 billion subscriptions, as well as enterprises in the private and public sector that use our network portfolio to increase productivity and enrich lives.

Through our research teams, including the world-renowned Nokia Bell Labs, we are leading the world to adopt end-to-end 5G networks that are faster, more secure and capable of revolutionizing lives, economies and societies. Nokia adheres to the highest ethical business standards as we create technology with social purpose, quality and integrity. www.nokia.com

For more information, press only:

Microsoft Media Relations, WE Communications for Microsoft, (425) 638-7777, rrt@we-worldwide.com

Nokia Communications, +358 10 448 4900, press.services@nokia.com

Posted on Leave a comment

Podcast: How machines are learning to ace the reading comprehension exam

Dr. TJ Hazen

Episode 86, August 21, 2019

The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.

On today’s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik’s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world.

Related:


Transcript

T.J. Hazen: Most of the questions are fact-based questions like, who did something, or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. How can we make this technology work for real problems that our enterprise customers are bringing in?

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.

On today’s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik’s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world. That and much more on this episode of the Microsoft Research Podcast.

(music plays)

Host: T.J. Hazen, welcome to the podcast!

T.J. Hazen: Thanks for having me.

Host: Researchers like to situate their research, and I like to situate my researchers so let’s get you situated. You are a Senior Principal Research Manager in the Engineering and Applied Research group at Microsoft Research in Montreal. Tell us what you do there. What are the big questions you’re asking, what are the big problems you’re trying to solve, what gets you up in the morning?

T.J. Hazen: Well, I’ve spent my whole career working in speech and language understanding, and I think the primary goal of everything I do is to try to be able to answer questions. So, people have questions and we’d like the computer to be able to provide answers. So that’s sort of the high-level goal, how do we go about answering questions? Now, answers can come from many places.

Host: Right.

T.J. Hazen: A lot of the systems that you’re probably aware of like Siri for example, or Cortana or Bing or Google, any of them…

Host: Right.

T.J. Hazen: …the answers typically come from structured places, databases that contain information, and for years these models have been built in a very domain-specific way. If you want to know the weather, somebody built a system to tell you about the weather.

Host: Right.

T.J. Hazen: And somebody else might build a system to tell you about the age of your favorite celebrity and somebody else might have written a system to tell you about the sports scores, and each of them can be built to handle that very specific case. But that limits the range of questions you can ask because you have to curate all this data, you have to put it into structured form. And right now, what we’re worried about is, how can you answer questions more generally, about anything? And the internet is a wealth of information. The internet has got tons and tons of documents on every topic, you know, in addition to the obvious ones like Wikipedia. If you go into any enterprise domain, you’ve got manuals about how their operation works. You’ve got policy documents. You’ve got financial reports. And it’s not typical that all this information is going to be curated by somebody. It’s just sitting there in text. So how can we answer any question about anything that’s sitting in text? We don’t have a million or five million or ten million librarians doing this for us…

Host: Right.

T.J. Hazen: …uhm, but the information is there, and we need a way to get at it.

Host: Is that what you are working on?

T.J. Hazen: Yes, that’s exactly what we’re working on. I think one of the difficulties with today’s systems is, they seem really smart…

Host: Right?

T.J. Hazen: Sometimes. Sometimes they give you fantastically accurate answers. But then you can just ask a slightly different question and it can fall on its face.

Host: Right.

T.J. Hazen: That’s the real gap between what the models currently do, which is, you know, really good pattern matching some of the time, versus something that can actually understand what your question is and know when the answer that it’s giving you is correct.

Host: Let’s talk a bit about your group, which, out of Montreal, is Engineering and Applied Research. And that’s an interesting umbrella at Microsoft Research. You’re technically doing fundamental research, but your focus is a little different from some of your pure research peers. How would you differentiate what you do from others in your field?

T.J. Hazen: Well, I think there’s two aspects to this. The first is that the lab up in Montreal was created as an offshoot of an acquisition. Microsoft bought Maluuba, which was a startup that was doing really incredible deep learning research, but at the same time they were a startup and they needed to make money. So, they also had this very talented engineering team in place to be able to take the research that they were doing in deep learning and apply it to problems where it could go into products for customers.

Host: Right.

T.J. Hazen: When you think about that need that they had to actually build something, you could see why they had a strong engineering team.

Host: Yeah.

T.J. Hazen: Now, when I joined, I wasn’t with them when they were a startup, I actually joined them from Azure where I was working with outside customers in the Azure Data Science Solution team, and I observed lots of problems that our customers have. And when I saw this new team that we had acquired and we had turned into a research lab in Montreal, I said I really want to be involved because they have exactly the type of technology that can solve customer problems and they have this engineering team in place that can actually deliver on turning from a concept into something real.

Host: Right.

T.J. Hazen: So, I joined, and I had this agreement with my manager that we would focus on real problems. They were now part of the research environment at Microsoft, but I said that doesn’t restrict us on thinking about blue sky, far-afield research. We can go and talk to product teams and say what are the real problems that are hindering your products, you know, what are the difficulties you have in actually making something real? And we could focus our research to try to solve those difficult problems. And if we’re successful, then we have an immediate product that could be beneficial.

Host: Well in any case, you’re swimming someplace in a “we could do this immediately” but you have permission to take longer, or is there a mandate, as you live in this engineering and applied research group?

T.J. Hazen: I think there’s a mandate to solve hard problems. I think that’s the mandate of research. If it wasn’t a hard problem, then somebody…

Host: …would already have a product.

T.J. Hazen: …in the product team would already have a solution, right? So, we do want to tackle hard problems. But we also want to tackle real problems. That’s, at least, our focus of our team. And there’s plenty of people doing blue sky research and that’s an absolute need as well. You know, we can’t just be thinking one or two years ahead. Research should be also be thinking five, ten, fifteen years ahead.

Host: So, there’s a whole spectrum there.

T.J. Hazen: So, there’s a spectrum. But there is a real need, I think, to fill that gap between taking an idea that works well in a lab and turning it into something that works well in practice for a real problem. And that’s the key. And many of the problems that have been solved by Microsoft have not just been blue sky ideas, but they’ve come from this problem space where a real product says, ahh, we’re struggling with this. So, it could be anything. It can be, like, how does Bing efficiently rank documents over billions of documents? You don’t just solve that problem by thinking about it, you have to get dirty with the data, you have to understand what the real issues are. So, many of these research problems that we’re focusing on, and we’re focusing on, how do you answer questions out of documents when the questions could be arbitrary, and on any topic? And you’ve probably experienced this, if you are going into a search site for your company, that company typically doesn’t have the advantage of having a big Bing infrastructure behind it that’s collecting all this data and doing sophisticated machine learning. Sometimes it’s really hard to find an answer to your question. And, you know, the tricks that people use can be creative and inventive but oftentimes, trying to figure out what the right keywords are to get you to an answer is not the right thing.

Host: You work closely with engineers on the path from research to product. So how does your daily proximity to the people that reify your ideas as a researcher impact the way you view, and do, your work as a researcher?

T.J. Hazen: Well, I think when you’re working in this applied research and engineering space, as opposed to a pure research space, it really forces you to think about the practical implications of what you’re building. How easy is it going to be for somebody else to use this? Is it efficient? Is it going to run at scale? All of these problems are problems that engineers care a lot about. And sometimes researchers just say, let me solve the problem first and everything else is just engineering. If you say that to an engineer, they’ll be very frustrated because you don’t want to bring something to an engineer that works ten times slower than needs to be, uses ten times more memory. So, when you’re in close proximity to engineers, you’re thinking about these problems as you are developing your methods.

Host: Interesting, because those two things, I mean, you could come up with a great idea that would do it and you pay a performance penalty in spades, right?

T.J. Hazen: Yeah, yeah. So, sometimes it’s necessary. Sometimes you don’t know how to do it and you just say let me find a solution that works and then you spend ten years actually trying to figure out how to make it work in a real product.

Host: Right.

T.J. Hazen: And I’d rather not spend that time. I’d rather think about, you know, how can I solve something and have it be effective as soon as possible?

(music plays)

Host: Let’s talk about human language technologies. They’ve been referred to by some of your colleagues as “the crown jewel of AI.” Speech and language comprehension is still a really hard problem. Give us a lay of the land, both in the field in general and at Microsoft Research specifically. What’s hope and what’s hype, and what are the common misconceptions that run alongside the remarkable strides you actually are making?

T.J. Hazen: I think that word we mentioned already: understand. That’s really the key of it. Or comprehend is another way to say it. What we’ve developed doesn’t really understand, at least when we’re talking about general purpose AI. So, the deep learning mechanisms that people are working on right now that can learn really sophisticated things from examples. They do an incredible job of learning specific tasks, but they really don’t understand what they’re learning.

Host: Right.

T.J. Hazen: So, they can discover complex patterns that can associate things. So in the vision domain, you know, if you’re trying to identify objects, and then you go in and see what the deep learning algorithm has learned, it might have learned features that are like, uh, you know, if you’re trying to identify a dog, it learns features that would say, oh, this is part of a leg, or this is part of an ear, or this is part of the nose, or this is the tail. It doesn’t know what these things are, but it knows they all go together. And the combination of them will make a dog. And it doesn’t know what a dog is either. But the idea that you could just feed data in and you give it some labels, and it figures everything else out about how to associate that label with that, that’s really impressive learning, okay? But it’s not understanding. It’s just really sophisticated pattern-matching. And the same is true in language. We’ve gotten to the point where we can answer general-purpose questions and it can go and find the answer out of a piece of text, and it can do it really well in some cases, and like, some of the examples we’ll give it, we’ll give it “who” questions and it learns that “who” questions should contain proper names or names of organizations. And “when” questions should express concepts of time. It doesn’t know anything about what time is, but it’s figured out the patterns about, how can I relate a question like “when” to an answer that contains time expression? And that’s all done automatically. There’s no features that somebody sits down and says, oh, this is a month and a month means this, and this is a year, and a year means this. And a month is a part of a year. Expert AI systems of the past would do this. They would create ontologies and they would describe things about how things are related to each other and they would write rules. And within limited domains, they would work really, really well if you stayed within a nice, tightly constrained part of that domain. But as soon as you went out and asked something else, it would fall on its face. And so, we can’t really generalize that way efficiently. If we want computers to be able to learn arbitrarily, we can’t have a human behind the scene creating an ontology for everything. That’s the difference between understanding and crafting relationships and hierarchies versus learning from scratch. We’ve gotten to the point now where the algorithms can learn all these sophisticated things, but they really don’t understand the relationships the way that humans understand it.

Host: Go back to the, sort of, the lay of the land, and how I sharpened that by saying, what’s hope and what’s hype? Could you give us a “TBH” answer?

T.J. Hazen: Well, what’s hope is that we can actually find reasonable answers to an extremely wide range of questions. What’s hype is that the computer will actually understand, at some deep and meaningful level, what this answer actually means. I do think that we’re going to grow our understanding of algorithms and we’re going to figure out ways that we can build algorithms that could learn more about relationships and learn more about reasoning, learn more about common sense, but right now, they’re just not at that level of sophistication yet.

Host: All right. Well let’s do the podcast version of your NERD Lunch and Learn. Tell us what you are working on in machine reading comprehension, or MRC, and what contributions you are making to the field right now.

T.J. Hazen: You know, NERD is short for New England Research and Development Center

Host: I did not!

T.J. Hazen: …which is where I physically work.

Host: Okay…

T.J. Hazen: Even though I work closely and am affiliated with the Montreal lab, I work out of the lab in Cambridge, Massachusetts, and NERD has a weekly Lunch and Learn where people present the work they’re doing, or the research that they’re working on, and at one of these Lunch and Learns, I gave this talk on machine reading comprehension. Machine reading comprehension, in its simplest version, is being able to take a question and then being able to find the answer anywhere in some collection of text. As we’ve already mentioned, it’s not really “comprehending” at this point, it’s more just very sophisticated pattern-matching. But it works really well in many circumstances. And even on tasks like the Stanford Question Answering Dataset, it’s a common competition that people have competed in, question answering, by computer, has achieved a human level of parity on that task.

Host: Mm-hmm.

T.J. Hazen: Okay. But that task itself is somewhat simple because most of the questions are fact-based questions like, who did something or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. Like, how can we make this technology work for real problems that our enterprise customers are bringing in? So, we have customers coming in saying, I want to be able to answer any question in our financial policies, or our auditing guidelines, or our operations manual. And people don’t ask “who” or “when” questions of their operations manual. They ask questions like, how do I do something? Or explain some process to me. And those answers are completely different. They tend to be longer and more complex and you don’t always, necessarily, find a short, simple answer that’s well situated in some context.

Host: Right.

T.J. Hazen: So, our focus at MSR Montreal is to take this machine reading comprehension technology and apply it into these new areas where our customers are really expressing that there’s a need.

Host: Well, let’s go a little deeper, technically, on what it takes to enable or teach machines to answer questions, and this is key, with limited data. That’s part of your equation, right?

T.J. Hazen: Right, right. So, when we go to a new task, uh, so if a company comes to us and says, oh, here’s our operations manual, they often have this expectation, because we’ve achieved human parity on some dataset, that we can answer any question out of that manual. But when we test the general-purpose models that have been trained on these other tasks on these manuals, they don’t generally work well. And these models have been trained on hundreds of thousands, if not millions, of examples, depending on what datasets you’ve been using. And it’s not reasonable to ask a company to collect that level of data in order to be able to answer questions about their operations manual. But we need something. We need some examples of what are the types of questions, because we have to understand what types of questions they ask, we need to understand the vocabulary. We’ll try to learn what we can from the manual itself. But without some examples, we don’t really understand how to answer questions in these new domains. But what we discovered through some of the techniques that are available, transfer learning is what we refer to as sort of our model adaptation, how do you learn from data in some new domain and take an existing model and make it adapt to that domain? We call that transfer learning. We can actually use transfer learning to do really well in a new domain without requiring a ton of data. So, our goal is to have it be examples like hundreds of examples, not tens of thousands of examples.

Host: How’s that working now?

T.J. Hazen: It works surprisingly well. I’m always amazed at how well these machine learning algorithms work with all the techniques that are available now. These models are very complex. When we’re talking about our question answering model, it has hundreds of millions of parameters and what you’re talking about is trying to adjust a model that is hundreds of millions of parameters with only hundreds of examples and, through a variety of different techniques where we can avoid what we call overfitting, we can allow the generalizations that are learned from all this other data to stay in place while still adapting it so it does well in this specific domain. So, yeah, I think we’re doing quite well. We’re still exploring, you know, what are the limits?

Host: Right.

T.J. Hazen: And we’re still trying to figure out how to make it work so that an outside company can easily create the dataset, put the dataset into a system, push a button. The engineering for that and the research for that is still ongoing, but I think we’re pretty close to being able to, you know, provide a solution for this type of problem.

Host: All right. Well I’m going to push in technically because to me, it seems like that would be super hard for a machine. We keep referring to these techniques… Do we have to sign an NDA, as listeners?

T.J. Hazen: No, no. I can explain stuff that’s out…

Host: Yeah, do!

T.J. Hazen: … in the public domain. So, there are two common underlying technical components that make this work. One is called word embeddings and the other is called attention. Word embeddings are a mechanism where it learns how to take words or phrases and express them in what we call vector space.

Host: Okay.

T.J. Hazen: So, it turns them into a collection of numbers. And it does this by figuring out what types of words are similar to each other based on the context that they appear in, and then placing them together in this vector space, so they’re nearby each other. So, we would learn, that let’s say, city names are all similar because they appear in similar contexts. And so, therefore, Boston and New York and Montreal, they should all be close together in this vector space.

Host: Right.

T.J. Hazen: And blue and red and yellow should be close together. And then advances were made to figure this out in context. So that was the next step, because some words have multiple meanings.

Host: Right.

T.J. Hazen: So, you know, if you have a word like apple, sometimes it refers to a fruit and it should be near orange and banana, but sometimes it refers to the company and it should be near Microsoft and Google. So, we’ve developed context dependent ones, so that says, based on the context, I’ll place this word into this vector space so it’s close to the types of things that it really represents in that context.

Host: Right.

T.J. Hazen: That’s the first part. And you can learn these word embeddings from massive amounts of data. So, we start off with a model that’s learned on far more data than we actually have question and answer data for. The second part is called attention and that’s how you associate things together. And it’s the attention mechanisms that learn things like a word like “who” has to attend to words like person names or company names. And a word like “when” has to attend to…

Host: Time.

T.J. Hazen: …time. And those associations are learned through this attention mechanism. And again, we can actually learn on a lot of associations between things just from looking at raw text without actually having it annotated.

Host: Mm-hmm.

T.J. Hazen: Once we’ve learned all that, we have a base, and that base tells us a lot about how language works. And then we just have to have it focus on the task, okay? So, depending on the task, we might have a small amount of data and we feed in examples in that small amount, but it takes advantage of all the stuff that it’s learned about language from all these, you know, rich data that’s out there on the web. And so that’s how it can learn these associations even if you don’t give it examples in your domain, but it’s learned a lot of these associations from all the raw data.

Host: Right.

T.J. Hazen: And so, that’s the base, right? You’ve got this base of all this raw data and then you train a task-specific thing, like a question answering system, but even then, what we find is that, if we train a question answering system on basic facts, it doesn’t always work well when you go to operation manuals or other things. So, then we have to have it adapt.

Host: Sure.

T.J. Hazen: But, like I said, that base is very helpful because it’s already learned a lot of characteristics of language just by observing massive amounts of text.

(music plays)

Host: I’d like you to predict the future. No pressure. What’s on the horizon for machine reading comprehension research? What are the big challenges that lie ahead? I mean, we’ve sort of laid the land out on what we’re doing now. What next?

T.J. Hazen: Yeah. Well certainly, more complex questions. What we’ve been talking about so far is still fairly simple in the sense that you have a question, and we try to find passages of text that answer that question. But sometimes a question actually requires that you get multiple pieces of evidence from multiple places and you somehow synthesize them together. So, a simple example we call the multi-hop example. If I ask a question like, you know, where was Barack Obama’s wife born? I have to figure out first, who is Barack Obama’s wife? And then I have to figure out where she was born. And those pieces of information might be in two different places.

Host: Right.

T.J. Hazen: So that’s what we call a multi-hop question. And then, sometimes, we have to do some operation on the data. So, you could say, you know like, what players, you know, from one Super Bowl team also played on another Super Bowl team? Well there, what you have to do is, you have to get the list of all the players from both teams and then you have to do an intersection between them to figure out which ones are the same on both. So that’s an operation on the data…

Host: Right.

T.J. Hazen: …and you can imagine that there’s lots of questions like that where the information is there, but it’s not enough to just show the person where the information is. You also would like to go a step further and actually do the computation for that. That’s a step that we haven’t done, like, how do you actually go from mapping text to text, and saying these two things are associated, to mapping text to some sequence of operations that will actually give you an exact answer. And, you know, it can be quite difficult. I can give you a very simple example. Like, just answering a question, yes or no, out of text, is not a solved problem. Let’s say I have a question where someone says, I’m going to fly to London next week. Am I allowed to fly business class according to my policies from my company, right? We can have a system that would be really good at finding the section of the policy that says, you know, if you are a VP-level or higher and you are flying overseas, you can fly business class, otherwise, no. Okay? But, you know, if we actually want the system to answer yes or no, we have to actually figure out all the details, like okay, who’s asking the question? Are they a VP? Where are they located? Oh, they’re in New York. What does flying overseas mean??

Host: Right. They’re are layers.

T.J. Hazen: Right. So that type of comprehension, you know, we’re not quite there yet for all types of questions. Usually these things have to be crafted by hand for specific domains. So, all of these things about how can you answer complex questions, and even simple things like common sense, like, things that we all know… Um. And so, my manager, Andrew McNamara, he was supposed to be here with us, one of his favorite examples is this concept of coffee being black. But if you spill coffee on your shirt, do you have a black stain on your shirt? No, you’ve got a brown stain on your shirt. And that’s just common knowledge. That is, you know, a common-sense thing that computers may not understand.

Host: You’re working on research, and ultimately products or product features, that make people think they can talk to their machines and that their machines can understand and talk back to them. So, is there anything you find disturbing about this? Anything that keeps you up at night? And if so, how are you dealing with it?

T.J. Hazen: Well, I’m certainly not worried about the fact that people can ask questions of the computer and the computer can give them answers. What I’m trying to get at is something that’s helpful and can help you solve tasks. In terms of the work that we do, yeah, there are actually issues that concern me. So, one of the big ones is, even if a computer can say, oh, I found a good answer for you, here’s the answer, it doesn’t know anything about whether that answer is true. If you go and ask your computer, was the Holocaust real? and it finds an article on the web that says no, the Holocaust was a hoax, do I want my computer to show that answer? No, I don’t. But…

Host: Or the moon landing…!

T.J. Hazen: …if all you are doing is teaching the computer about word associations, it might think that’s a perfectly reasonable answer without actually knowing that this is a horrible answer to be showing. So yeah, the moon landing, vaccinations… The easy way that people can defame people on the internet, you know, even if you ask a question that might seem like a fact-based question, you can get vast differences of opinion on this and you can get extremely biased and untrue answers. And how does a computer actually understand that some of these things are not things that we should represent as truth, right? Especially if your goal is to find a truthful answer to a question.

Host: All right. So, then what do we do about that? And by we, I mean you!

T.J. Hazen: Well, I have been working on this problem a little bit with the Bing team. And one of the things that we discovered is that if you can determine that a question is phrased in a derogatory way, that usually means the search results that you’re going to get back are probably going to be phrased in a derogatory way. So, even if we don’t understand the answer, we can just be very careful about what types of questions we actually want to answer.

Host: Well, what does the world look like if you are wildly successful?

T.J. Hazen: I want the systems that we build to just make life easier for people. If you have an information task, the world is successful if you get that piece of information and you don’t have to work too hard to get it. We call it task completion. If you have to struggle to find an answer, then we’re not successful. But if you can ask a question, and we can get you the answer, and you go, yeah, that’s the answer, that’s success to me. And we’ll be wildly successful if the types of things where that happens become more and more complex. You know, where if someone can start asking questions where you are synthesizing data and computing answers from multiple pieces of information, for me, that’s the wildly successful part. And we’re not there yet with what we’re going to deliver into product, but it’s on the research horizon. It will be incremental. It’s not going to happen all at once. But I can see it coming, and hopefully by the time I retire, I can see significant progress in that direction.

Host: Off script a little… will I be talking to my computer, my phone, a HoloLens? Who am I asking? Where am I asking? What device? Is that so “out there” as well?

T.J. Hazen: Uh, yeah, I don’t know how to think about where devices are going. You know, when I was a kid, I watched the original Star Trek, you know, and everything on there, it seemed like a wildly futuristic thing, you know? And then fifteen, twenty years later, everybody’s got their own little “communicator.”

Host: Oh my gosh.

T.J. Hazen: And so, uh, you know, the fact that we’re now beyond where Star Trek predicted we would be, you know, that itself, is impressive to me. So, I don’t want to speculate where the devices are going. But I do think that this ability to answer questions, it’s going to get better and better. We’re going to be more interconnected. We’re going to have more access to data. The range of things that computers will be able to answer is going to continue to expand. And I’m not quite sure exactly what it looks like in the future, to be honest, but, you know, I know it’s going to get better and easier to get information. I’m a little less worried about, you know, what the form factor is going to be. I’m more worried about how I’m going to actually answer questions reliably.

Host: Well it’s story time. Tell us a little bit about yourself, your life, your path to MSR. How did you get interested in computer science research and how did you land where you are now working from Microsoft Research in New England for Montreal?

T.J. Hazen: Right. Well, I’ve never been one to long-term plan for things. I’ve always gone from what I find interesting to the next thing I find interesting. I never had a really serious, long-term goal. I didn’t wake up some morning when I was seven and say, oh, I want to be a Principal Research Manager at Microsoft in my future! I didn’t even know what Microsoft was when I was seven. I went to college and I just knew I wanted to study computers. I didn’t know really what that meant at the time, it just seemed really cool.

Host: Yeah.

T.J. Hazen: I had an Apple II when I was a kid and I learned how to do some basic programming. And then I, you know, was going through my course work. I was, in my junior year, I was taking a course in audio signal processing and in the course of that class, we got into a discussion about speech recognition, which to me was, again, it was Star Trek. It was something I saw on TV. Of course, now it was Next Generation….!

Host: Right!

T.J. Hazen: But you know, you watch the next generation of Star Trek and they’re talking to the computer and the computer is giving them answers and here somebody is telling me you know there’s this guy over in the lab for computer science, Victor Zue, and he’s building systems that recognize speech and give answers to questions! And to me, that was science-fiction. So, I went over and asked the guy, you know, I heard you’re building a system, and can I do my bachelor’s thesis on this? And he gave me a demo of the system – it was called Voyager – and he asked a question, I don’t remember the exact question, but it was probably something like, show me a map of Harvard Square. And the system starts chugging along and it’s showing results on the screen as it’s going. And it literally took about two minutes for it to process the whole thing. It was long enough that he actually explained to me how the entire system worked while it was processing. But then it came back, and it popped up a map of Harvard Square on the screen. And I was like, ohhh my gosh, this is so cool, I have to do this! So, I did my bachelor’s thesis with him and then I stayed on for graduate school. And by seven years later, we had a system that was running in real time. We had a publicly available system in 1997 that you could call up on a toll-free number and you could ask for weather reports and weather information for anywhere in the United States. And so, the idea that it went from something that was “Star Trek” to something that I could pick up my phone, call a number and, you know, show my parents, this is what I’m working on, it was astonishing how fast that developed! I stayed on in that field with that research group. I was at MIT for another fifteen years after I graduated. At some point, a lot of the things that we were doing, they moved from the research lab to actually being real.

Host: Right.

T.J. Hazen: So, like twenty years after I went and asked to do my bachelor’s thesis, Siri comes out, okay? And so that was our goal. They were like, twenty years ago, we should be able to have a device where you can talk to it and it gives you answers and twenty years later there it was. So, that, for me, that was a queue that maybe it’s time to go where the action is, which was in companies that were building these things. Once you have a large company like Microsoft or Google throwing their resources behind these hard problems, then you can’t compete when you’re in academia for that space. You know, you have to move on to something harder and more far out. But I still really enjoyed it. So, I joined Microsoft to work on Cortana…

Host: Okay…

T.J. Hazen: …when we were building the first version of Cortana. And I spent a few years working on that. I’ve worked on some Bing products. I then spent some time in Azure trying to transfer these things so that companies that had the similar types of problems could solve their problems on Azure with our technology.

Host: And then we come full circle to…

T.J. Hazen: Then full circle, yeah. You know, once I realized that some of the stuff that customers were asking for wasn’t quite ready yet, I said, let me go back to research and see if I can improve that. It’s fantastic to see something through all the way to product, but once you’re successful and you have something in a product, it’s nice to then say, okay, what’s the next hard problem? And then start over and work on the next hard problem.

Host: Before we wrap up, tell us one interesting thing about yourself, maybe it’s a trait, a characteristic, a life event, a side quest, whatever… that people might not know, or be able to find on a basic web search, that’s influenced your career as a researcher?

T.J. Hazen: Okay. You know, when I was a kid, maybe about eleven years old, the Rubik’s Cube came out. And I got fascinated with it. And I wanted to learn how to solve it. And a kid down the street from my cousin had taught himself from a book how to solve it. And he taught me. His name was Jonathan Cheyer. And he was actually in the first national speed Rubik’s Cube solving competition. It was on this TV show, That’s Incredible. I don’t know if you remember that TV show.

Host: I do.

T.J. Hazen: It turned out what he did was, he had learned what is now known as the simple solution. And I learned it from him. And I didn’t realize it until many years later, but what I learned was an algorithm. I learned, you know, a sequence of steps to solve a problem. And once I got into computer science, I discovered all that problem-solving I was doing with the Rubik’s Cube and figuring out what are the steps to solve a problem, that’s essentially what things like machine learning are doing. What are the steps to figure out, what are the features of something, what are the steps I have to do to solve the problem? I didn’t realize that at the time, but the idea of being able to break down a hard problem like solving a Rubik’s Cube, and figuring out what are the stages to get you there, is interesting. Now, here’s the interesting fact. So, Jonathan Cheyer, his older brother is Adam Cheyer. Adam Cheyer is one of the co-founders of Siri.

Host: Oh my gosh. Are you kidding me?

T.J. Hazen: So, I met the kid when I was young, and we didn’t really stay in touch. I discovered, you know, many years later that Adam Cheyer was actually the older brother of this kid who taught me the Rubik’s Cube years and years earlier, and Jonathan ended up at Siri also. So, it’s an interesting coincidence that we ended up working in the same field after all those years from this Rubik’s Cube connection!

Host: You see, this is my favorite question now because I’m getting the broadest spectrum of little things that influenced and triggered something…!

Host: At the end of every podcast, I give my guests a chance for the proverbial last word. Here’s your chance to say anything you want to would-be researchers, both applied and other otherwise, who might be interested in working on machine reading comprehension for real-world applications.

T.J. Hazen: Well, I could say all the things that you would expect me to say, like you should learn about deep learning algorithms and you should possibly learn Python because that’s what everybody is using these days, but I think the single most important thing that I could tell anybody who wants to get into a field like this is that you need to explore it and you need to figure out how it works and do something in depth. Don’t just get some instruction set or some high-level overview on the internet, run it on your computer and then say, oh, I think I understand this. Like get into the nitty-gritty of it. Become an expert. And the other thing I could say is, of all the people I’ve met who are extremely successful, the thing that sets them apart isn’t so much, you know, what they learned, it’s the initiative that they took. So, if you see a problem, try to fix it. If you see a problem, try to find a solution for it. And I say this to people who work for me. If you really want to have an impact, don’t just do what I tell you to do, but explore, think outside the box. Try different things. OK? I’m not going to have the answer to everything, so therefore, if I don’t have the answer to everything, then if you’re only doing what I’m telling you to do, then we both, together, aren’t going to have the answer. But if you explore things on your own and take the initiative and try to figure out something, that’s the best way to really be successful.

Host: T.J. Hazen, thanks for coming in today, all the way from the east coast to talk to us. It’s been delightful.

T.J. Hazen: Thank you. It’s been a pleasure.

(music plays)

To learn more about Dr. T.J. Hazen and how researchers and engineers are teaching machines to answer complicated questions, visit Microsoft.com/research

Posted on Leave a comment

From predicting performance to preventing injuries: How machine learning is unlocking the secrets of human movement

Launched in 2006, P3 is the first facility to apply a more data-driven approach to understanding how elite competitors move. It uses advanced sports-science strategies to assess and train athletes in ways that will revolutionize pro sports – and, eventually, the bodies and abilities of weekend warriors, Elliott says.

“We are challenging them and measuring them. But we’re not interested in how high they jump or how fast they accelerate,” Elliott says. “We’re interested in the mechanics of how they jump, how they accelerate and decelerate. It’s helping us unlock the secrets of human movement.”

Working directly with players and their agents or families, P3 has evaluated members of the past six NBA draft classes, amassing a database of more than 600 current and former NBA athletes.

An athlete jumps in a gym, reaching for the ceiling with her right hand.
Volleyball player Cassandra Strickland leaps at P3.

Some of P3’s clients include NBA stars Luka Doncic and Zach LaVine plus athletes from the NFL, Major League Baseball, international soccer, track and field and more.

Many of those NBA clients, like Philadelphia 76ers guard Josh Richardson, return to P3 each summer for re-testing to pinpoint whether their movement patterns have gained asymmetries that could cause injury, or to reconfirm the health of physical systems they use to leap, land, stop and start, fueling their on-court edge.

“This is my fifth off-season now at P3,” Richardson says. “When I started with them during my NBA draft preparation, I immediately saw that their approach was different and that it could help me have the best chance to improve my athleticism. Every off-season I get to see exactly where I am physically compared to where I was before – and compared to other NBA players.

“They are able to help me identify where I might be at risk of injury and where I can improve physically. It’s important for me to know that the training I am doing is specific to my unique needs,” Richardson says.

To collect all that granular data, P3 outfitted its lab with a high-speed camera system manufactured by Simi Reality Motion Systems GmbH, a German company from the ZF Group and a Microsoft partner.

Simi offers markerless, motion-capture software that removes the need for athletes to wear tracking sensors while they play or train. Simi also works with seven Major League Baseball clubs, deploying high-speed camera systems to those stadiums to record every pitch during every game since the 2017 season.

Simi’s software digitizes the pitchers’ arm angles and related body movements, spanning 42 different joint centers across 24,000 pitches thrown per team per season. That produces hundreds of billions of data points that are uploaded and processed on Microsoft Azure, enabling teams to create in-depth biomechanical analyses for the players, says Pascal Russ, Simi’s CEO.

A laptop screen shows a player in mid-stride on a track and the movement data he is producing.
An athlete’s workout at P3 produces data on his body angles and movements.

“The first team that deploys this effectively on the field to pick lineups or to see which pitch angles worked well against which batters is going to see a huge separation between them and the other teams not using this,” Russ says.

“It’s freakishly accurate.”

While Russ foresees this technology eventually remaking baseball, such seismic shifts already are occurring in the NBA through P3’s player assessments, says Benedikt Jocham, Simi’s U.S. chief operations officer.

“We provide the software solution that can quantify the movement and analyze, for example, how much pressure and torque a person is putting on various body parts,” Jocham says. “P3 adds the magic sauce. They are wizards at figuring out what it all means and making sense out of it for athletes.”

After the cameras record a player’s movements in the P3 lab, those datasets are loaded into Azure where machine-learning algorithms reveal how that player’s physical systems are most related to other NBA players who were similarly assessed. The algorithm then assigns that player into one of several clusters or branches that predict how their basketball career may unfold, Elliott says.

One branch, for example, contains athletes who had a brief NBA experience and never became significant players. Another branch encompasses players who were impactful during their first three or four seasons then sustained serious injuries that depleted their skills. In still another branch, players share rare combinations of length, power and force that fed elite careers – and they remained healthy.

“The human eye is good at measuring size and maybe estimating weight, and very bad at comparing athletes’ physical systems and movement symmetries to one another,” Elliot says. “But we can measure those things in the lab and the machine tells us how young athletes are most alike.

“It’s a solid foothold into an area of sports science that has been out of sight until now,” he says.

An athlete looks at a big screen projection of his workout and the data is produced while a coach talks with him.
Cleveland Indians prospect Will Benson scans his workout data with P3 biomechanist Ben Johnson.

The data is also helping to shatter long-held theories that successful NBA players who, at first glance, lack the size, jumping ability or quickness of traditional stars are merely compensating by tapping unmeasurable intangibles such as “intuition” or “IQ” or “heart.”

“That’s how people once would have defined (2017-18 NBA most valuable player) James Harden, as somebody who just has this super-high basketball IQ,” Elliott says. “Maybe he does. But he also has a better stopping or braking system than anybody we’ve ever assessed in the NBA.

“That creates competitive advantages,” he adds. “There’s Newtonian physics behind these advantages.”

Case in point: Dallas Mavericks rookie Luka Doncic. In its pre-draft assessment of Doncic one year ago, P3 identified that same hidden performance metric – the elite ability to stop quickly. P3 knew, before his NBA Draft, that Doncic and Harden were in the same player branch. Doncic posted a stunning first pro season.

An athlete is shown moving sideways on an indoor track.
Aaron Gordon training on the P3 indoor track.

The insights also help athletes avoid injuries by adopting new training techniques to change unhealthy movement patterns revealed in the data, says Elliott, who previously served as the first director of sports science in MLB (for the Seattle Mariners) and as the first director of sports science in the NFL (for the New England Patriots).

Every NBA player or draft prospect assessed by P3 receives a report that highlights their injury risks and compares them to league peers based on performance.

“Athletes come to us because they trust us to take better care of their bodies than would happen anywhere else,” Elliott says. “Traditionally, and still today, when these bad things happen to players, everyone says, ‘Oh, that was a freak injury.’ I’m just telling you that the machine learning models predict a whole lot of these.

“I can’t imagine a world where out of nowhere you suffer, say, a right tibial stress fracture – not your left one, not your femur, it’s your tibia, out of nowhere,” he adds. “Without a doubt, these are not random events. Sports science just has not been very good about identifying them.”

Eventually, this same information may become available to amateur athletes and everyone else, Elliott says. The same technologies could predict, for example, that a weekend warrior has too much force going through the left leg while jumping or landing plus a tiny but unhealthy rotation of the left knee and femur, causing too much friction, and, eventually, an erosion of the left knee cartilage.

“What if you identified that when you were 30 or 20, instead of learning when you’re 50 that your cartilage is gone? That really is the future,” Elliott says.

“The power of machine learning and (Microsoft) Artificial Intelligence are going to help us unlock these secrets in ways that have never existed. We’re already doing it but it’s only in the early days of what I think is going to be a revolution in this space,” he says. “It’s coming. It’s definitely coming.

Top photo: Stanley Johnson, a forward with the NBA’s New Orleans Pelicans, moves laterally inside an exercise band at the P3 lab. (All photos courtesy of P3.)

Posted on Leave a comment

Driving lessons for autonomous vehicles

Paul Shieh, Founder and CEO of Linker Networks, says his company is now working with global auto manufacturers that are trying to create AI systems that can drive vehicles with flawless image recognition functionality. To attain that, the systems use machine learning to recognize millions of digital images of other objects, including other vehicles, roads, signs, pedestrians, and a myriad of other features and objects.

To do that, images of all these things must first be identified and labeled.

Shieh explains, “At present, many companies are finding it difficult to hire thousands of workers that want to manually do this image work. It is labor-intensive and time-consuming. Moreover, each worker must maintain unrelenting focus on the task, leaving open the possibility of natural human error. A single mistake is all it takes to affect a dataset’s quality and drag down the overall performance, and therefore the safety level, of a model.”

Manual tagging is labor-intensive and time-consuming. For example, labeling a single car takes a worker up to 30 seconds to complete.

As an example, Shieh says labeling a single car takes a worker up to 30 seconds to complete – placing the duration needed for a thousand workers to process larger quantities of images, say 100 million, at more than a year.

But imagine being able to label all that data in a single click. That is the promise of auto-labeling – Linker Networks’ latest AI venture.

Inventing the fast track

Using a pre-trained model to label digital images, the system recognizes objects using transfer learning technology – a method that lets machines apply existing knowledge to various similar scenarios. For example, systems trained to recognize cars can apply the same algorithm to recognize other vehicles, like buses or trucks.

“If you input an image with about a hundred cars in it and hit the auto-label button, most of them will be auto-labeled in just a few seconds with very high accuracy,” Shieh says. “That saves a lot of time and improves image recognition quality.”

Employees like Cindy Chao, who used to do manual labeling have been upskilled to do quality control of the auto labeling algorithms, also known as machine teaching.

Accuracy rates have also increased. At the same time, manual inspections and corrections are still carried out, to ensure close to 100 percent data accuracy.

The process allows millions of images to be labeled in less than a day, which is a 70 percent reduction in time compared to manual labeling. The company is also seeing cost savings of more than 60 percent.

Shieh shares, “Linker’s auto-labeling model uses Microsoft Azure Machine service to reduce costs, boost productivity and improve accuracy by enabling customers to handpick images to auto-label and store.”

Ultimately with AI, Linker Networks’ goal is for auto manufacturers to build smarter, safer vehicles.

Employees that used to do manual labelling have been upskilled to do quality control of the auto labelling algorithms, also known as machine teaching. The AI model seeks to gain knowledge from people rather than extracting knowledge from data alone. With people guiding the AI systems to learn the things that they already know, the job requires critical thinking and fewer repetitive and monotonous tasks.

“Linker’s data scientists are able to focus on developing the AI and let Azure take care of scaling their AI training jobs,” Shieh explained.

Other possibilities

Ultimately with AI, the company’s goal is for auto manufacturers to build smarter, safer vehicles. With auto labelling technology, Linker Networks envisions safe self-driving capability in the near future.

Besides autonomous driving, auto-labelling can be used in factories to detect product defects, identify theft at retail stores and profile vehicles to strengthen security. Shieh said, “the auto-labeling system allows us to take advantage of all the benefits of AI, empowering humans to do what they do best, while improving efficiency and safety.”

Posted on Leave a comment

Deep Learning Indaba 2018 conference strengthens African contributions to machine learning

deep leaning indaba, participants at conference

Images ©2018 Deep Learning Indaba.

At the 30th conference on Neural Information Processing in 2016, one of the world’s foremost gatherings on machine learning, there was not a single accepted paper from a researcher at an African institution. In fact, for the last decade, the entire African continent has been absent from the contemporary machine learning landscape. The following year, a group of researchers set out to change this, founding a world-class machine learning conference that would strengthen African machine learning – the Deep Learning Indaba.

The first Deep Learning Indaba took place at Wits University in South Africa. The indaba (a Zulu word for a gathering or meeting) was a runaway success, with almost 300 participants representing 22 African countries and 33 African research institutes. It was a week-long event of teaching, sharing and debate around the state of the art in machine learning and artificial intelligence that aimed to be a catalyst for strengthening machine learning in Africa.

indaba group picture

Attendees at Deep Learning Indaba 2017, held at Wits University, South Africa.

Now in its second year, Microsoft is proud to sponsor Deep Learning Indaba 2018, to be held September 9-14 at Stellenbosch University in South Africa.

The conference offers an exciting line-up of talks, hands-on workshops, poster sessions and networking/mentoring events. Once again it has attracted a star-studded guest speaker list – Google Brain lead and Tensorflow co-creator Jeff Dean; DeepMind lead Nando de Freitas; and AlphaGo lead, David Silver. Microsoft is flying in top researchers as well; Katja Hofmann will speak about reinforcement learning and Project Malmo (check out her recent podcast episode). Konstantina Palla will present on generative models and healthcare. And Timnit Gebru will talk about fairness and ethics in AI.

The missing continent

The motivation behind this conference really resonated with me. When I heard about it, I knew I wanted to contribute to the 2018 Indaba, and I was excited that Microsoft was already signed-up as a headline sponsor, and had our own Danielle Belgrave on the advisory board.

African Map - Indaba 2017 attendance

African countries represented at the 2017 Deep Learning Indaba.

Dr Tempest van Schaik, Software Engineer, AI & Data Science

Dr. Tempest van Schaik, Software Engineer, AI & Data Science

I graduated from University of the Witwatersrand (“Wits”) in Johannesburg, South Africa, with a degree in biomedical engineering, and a degree in electrical engineering, not unlike some of the conference organizers. In 2010, I came to the United Kingdom to pursue my PhD at Imperial College London and stayed on to work in the UK, joining Microsoft in 2017 as a software engineer in machine learning.

In my eight years working in the UK in the tech community, I have seldom come across African scientists, engineers and researchers sharing their work on the international stage. During my PhD studies, I was acutely aware of the Randlord monuments flanking my department’s building, despite the absence of any South Africans inside the department. At scientific conferences in Asia, Europe and the USA, I scanned the schedule for African institutions but seldom found them. Fellow Africans that I do find are usually working abroad. I have come to learn that Africa, a continent bigger than the USA, China, India, and Europe put together, has little visible global participation in science and technology. The reasons are numerous, with affordability being just one factor. I have felt the disappointment of trying to get a Tanzanian panelist to a tech conference in the USA. We realized that even if we could raise sufficient funds for his participation, the money would have achieved so much more in his home country that he couldn’t justify spending it on a conference.

Of all tech areas, perhaps it is artificial intelligence in particular that needs African participation. Countries such as China and the UK are gearing-up for the next industrial revolution, creating plans for re-retraining and increasing digital skills. Those who are left behind could face disruption due to AI and automation and might not be able to benefit from the fruits of AI. Another reason to increase African participation in AI is to reduce algorithmic bias that can arise when a narrow section of society develops technology.

A quote from the Indaba 2017 report perhaps says it best: “The solutions of contemporary AI and machine learning have been developed for the most part in the developed-world. As Africans, we continue to be receivers of the current advances in machine learning. To address the challenges facing our societies and countries, Africans must be owners, shapers and contributors of the advances in machine learning and artificial intelligence. “

Attendees at Deep Learning Indaba 2017

Attendees at Deep Learning Indaba 2017

Diversity

One of the goals of the conference is to increase diversity in the field. To quote the organizers, “It is critical for Africans, and women and black people in particular, to be appropriately represented in the advances that are to be made.” The make-up of the Indaba in its first two years is already impressive and leads by example to show how to organize a diverse and inclusive conference. From the Code of Conduct to the organizing committee, the advisory board, the speakers and attendees, you see a group of brilliant and diverse people in every sense.

Women in Machine Learning session

The 2018 Women in Machine Learning lineup.

The 2018 Women in Machine Learning lineup.

The Indaba’s quest for diversity aligns with another passion of mine, that of increasing women’s participation in STEM. Since my days of being the lonely woman in electrical engineering lectures, things have been improving. There seems to be more awareness today about attracting and retaining women in STEM, by improving workplace culture. However, there’s still a long way to go, and in the UK where I work, only 11% of the engineering workforce is female according to a 2017 survey. I have found great support and encouragement from women-in-tech communities and events such as PyLadies/RLadies London and AI Club For Gender Minorities, and saw the Indaba as an opportunity to pay it forward and link up with like-minded women globally. So, I’m very pleased to say that on the evening of September 10 at the Indaba, Microsoft is hosting a Women in Machine Learning event.

Indaba – a gathering.

Indaba – a gathering.

The aim of our evening is to encourage, support and unite women in machine learning. Our panelists each will describe her personal career journey and her experiences as a woman in machine learning. As there will be a high number of students in attendance, our panel also highlights diverse career paths, from academia to industrial research, to applied machine learning, to start-ups. Our panel consists of Sarah Brown (Brown University, USA), Konstantina Palla (Microsoft Research, UK), Muthoni Wanyoike (InstaDeep, Kenya), Kathleen Siminyu (Africa’s Talking, Kenya) and myself from Microsoft Commercial Software Engineering (UK). We look forward to seeing you there!

Posted on Leave a comment

The power of machine learning to change—and maybe even save—the world

In the last two decades, the impact of artificial intelligence (AI) has grown from a very small community of data scientists to something that is woven into many people’s daily lives. Machine learning, computer vision, and other AI disciplines—supported by the cloud—are helping people achieve more, from mundane tasks, like avoiding a traffic jam, to revolutionary breakthroughs, like curing cancer.

Over the past year, Microsoft has been on a journey to apply these transformative technologies to the world’s biggest environmental challenges. On July 12, 2017, Microsoft launched AI for Earth as a $2 million program in London, with a goal of providing AI and cloud tools to researchers working on the frontlines of environmental challenges in the areas of agriculture, water, biodiversity, and climate change.

Since that time, AI for Earth has grown into a $50 million over five-year program, with 112 grantees in 27 countries and seven featured projects. People are using machine learning and computer vision to learn more than previously possible about our planet and how it’s changing, and increasingly using these insights to chart a better future.

These are big goals, but we’re confident in our ability to get there because we know how advanced our tools like machine learning and computer vision already are. Consider machine learning. We have come a long way from the simple pattern-matching of ELIZA. Fifteen years ago, when I got my degree in artificial intelligence, problems like facial recognition, machine translation, and speech recognition were dreams of the field, and now they are solved problems. Among other things, machine learning can group similar items together, detect unusual occurrences, and construct mathematical models of historical data to make future predictions.

These techniques are incredibly helpful for sorting through large amounts of data. Today, we’re excited to share a new story about the power of this technology that also helps answer a basic question: what is the value of AI when we don’t have massive amounts of data already waiting to be processed? This is an issue for many individuals and organizations working in the field of biodiversity, especially when the species are very small, travel great distances, and are hidden from public view.

That’s precisely the challenge we set out to address recently at the most magical place in the world – Walt Disney World Resort. Purple martins are yearly visitors to Disney, nesting at the park before returning their journey to the Brazilian Amazon. Disney scientists have been working with the purple martin community and have provided homes for the families for the past 20 years, studying the conservation of the species with more than 170 nests each year. Despite their annual visits, there is still lots to be learned about nesting behavior of these birds, in part because they nest in enclosed structures known as gourds. Some of what is known is troubling – the species is in decline, with an estimated population drop of 40 percent since 1966.

How do you close this data gap quickly to better understand the species to protect their future? Enter AI. Tiny connected homes, including cameras and cloud-connected sensors were installed, and those combined with computer vision began to deliver data on behaviors that were infrequently observed, like hatching, the caring for and growth of purple martins. External factors, like temperature, humidity, and air pressure were also recorded. Disney and Microsoft hope to expand this work, and AI will help pull all this data together to deliver insights in hopes of inspiring the next generation of conservationists to protect the purple martins for the future.

While this is our newest story, this work is happening across the world. We’re proud to support AI-enabled solutions for biodiversity, including:

PAWS: Machine learning to predict poaching. Spearheaded by a team of researchers at USC, an AI for Earth partner, with additional work being done by a member of the team now at Carnegie Mellon University, an AI for Earth grantee, the Protection Assistant for Wildlife Security (PAWS) processes data about previous poaching activities in an area and creates optimized routes for rangers to patrol based on where poaching is most likely to occur. These routes are also randomized to keep poachers from learning and adapting to patrol patterns. Currently, the PAWS algorithm is being improved so that it can incorporate new information that rangers see while on patrol—such as human footprints—to alter the proposed patrol route in real-time.

Access to ranger patrol data is key. That’s why PAWS partnered with the Uganda Wildlife Authority at Queen Elizabeth National Park. They had collected 14 years of patrol data and more than 125,000 observations on animal sightings, snares, animal remains, and other signs of poaching. PAWS is now being used in several parks, and the system has led to more observations of poacher activities per kilometer than were possible without technology.

Wildbook: Machine learning and computer vision to identify species. One of our newest featured projects, Wild Me, is showing what is possible by pushing the limits of computer vision, with an AI tool that smartly identifies, captions, and moderates pictures. Researchers often have little meaningful data on species. But computer vision makes it possible to tap into an explosion of images, available for free or at a low cost from camera traps, drones, professional photographers, safari-goers, and citizen scientists. Wild Me is not only using computer vision to identify images of zebras, for example, but is also identifying the individual animals in photos—helping to address a fundamental problem in conservation. If we can identify individual animals, then this eliminates the need for physically tagging them, which can harm the animal.

This new data on animals then goes into Wildbook, the platform developed by Wild Me. Using machine learning, it’s possible to either match an animal within the database or determine that the individual is new. Once an animal is identified, it can be tracked in other photographs. Wildbook stores information about the animals, such as their location at a specific time, in a fully developed database. This combination of AI tools and human ingenuity makes it possible to connect information about sightings with additional relevant data, enabling new science, conservation, and education at unprecedented scales and resolution. With a much more detailed and useful picture of what is happening, researchers and other decision-makers are able to implement new, more effective conservation strategies.

We see incredible potential and tremendous progress in our grantees’ work and in the explosive pace at which new algorithms are being built, refined, and made publicly available. And these are just a few of the grantees, featured projects, and partners we’re working with in the area of biodiversity; there’s equally exciting work in water, agriculture, and climate change that we look forward to sharing in the near future on this blog. Check out the amazing organizations and individuals we’re supporting, apply for a grant to join us or our new partnership with National Geographic Society, or just follow our progress on Twitter by following @Microsoft_Green, or me at @jennifermarsman.

Tags: , , ,

Posted on Leave a comment

How Microsoft uses machine learning to fight social engineering attacks

Machine learning is a key driver in the constant evolution of security technologies at Microsoft. Machine learning allows Microsoft 365 to scale next-gen protection capabilities and enhance cloud-based, real-time blocking of new and unknown threats. Just in the last few months, machine learning has helped us to protect hundreds of thousands of customers against ransomware, banking Trojan, and coin miner malware outbreaks.

But how does machine learning stack up against social engineering attacks?

Social engineering gives cybercriminals a way to get into systems and slip through defenses. Security investments, including the integration of advanced threat protection services in Windows, Office 365, and Enterprise Mobility + Security into Microsoft 365, have significantly raised the cost of attacks. The hardening of Windows 10 and Windows 10 in S mode, the advancement of browser security in Microsoft Edge, and the integrated stack of endpoint protection platform (EPP) and endpoint detection and response (EDR) capabilities in Windows Defender Advanced Threat Protection (Windows Defender ATP) further raise the bar in security. Attackers intent on overcoming these defenses to compromise devices are increasingly reliant on social engineering, banking on the susceptibility of users to open the gate to their devices.

Modern social engineering attacks use non-portable executable (PE) files like malicious scripts and macro-laced documents, typically in combination with social engineering lures. Every month, Windows Defender AV detects non-PE threats on over 10 million machines. These threats may be delivered as email attachments, through drive-by web downloads, removable drives, browser exploits, etc. The most common non-PE threat file types are JavaScript and VBScript.

Figure 1. Ten most prevalent non-PE threat file types encountered by Windows Defender AV

Non-PE threats are typically used as intermediary downloaders designed to deliver more dangerous executable malware payloads. Due to their flexibility, non-PE files are also used in various stages of the attack chain, including lateral movement and establishing fileless persistence. Machine learning allows us to scale protection against these threats in real-time, often protecting the first victim (patient zero).

Catching social engineering campaigns big and small

In mid-May, a small-scale, targeted spam campaign started distributing spear phishing emails that spoofed a landscaping business in Calgary, Canada. The attack was observed targeting less than 100 machines, mostly located in Canada. The spear phishing emails asked target victims to review an attached PDF document.

When opened, the PDF document presents itself as a “secure document” that requires action – a very common social engineering technique used in enterprise phishing attacks. To view the supposed “secure document”, the target victim is instructed to click a link within the PDF, which opens a malicious website with a sign-in screen that asks for enterprise credentials.

Phished credentials can then be used for further attacks, including CEO fraud, additional spam campaigns, or remote access to the network for data theft or ransomware. Our machine learning blocked the PDF file as malware (Trojan:Script/Cloxer.A!cl) from the get-go, helping prevent the attack from succeeding. 

Figure 2. Phishing email campaign with PDF attachment

Beyond targeted credential phishing attacks, we commonly see large-scale malware campaigns that use emails with archive attachments containing malicious VBScript or JavaScript files. These emails typically masquerade as an outstanding invoice, package delivery, or parking ticket, and instruct targets of the attack to refer to the attachment for more details. If the target opens the archive and runs the script, the malware typically downloads and runs further threats like ransomware or coin miners.

Figure 3. Typical social engineering email campaign with an archive attachment containing a malicious script

Malware campaigns like these, whether limited and targeted or large-scale and random, occur frequently. Attackers go to great lengths to avoid detection by heavily obfuscating code and modifying their attack code for each spam wave. Traditional methods of manually writing signatures identifying patterns in malware cannot effectively stop these attacks. The power of machine learning is that it is scalable and can be powerful enough to detect noisy, massive campaigns, but also specific enough to detect targeted attacks with very few signals. This flexibility means that we can stop a wide range of modern attacks automatically at the onset.

Machine learning models zero in on non-executable file types

To fight social engineering attacks, we build and train specialized machine learning models that are designed for specific file types.

Building high-quality specialized models requires good features for describing each file. For each file type, the full contents of hundreds of thousands of files are analyzed using large-scale distributed computing. Using machine learning, the best features that describe the content of each file type are selected. These features are deployed to the Windows Defender AV client to assist in describing the content of each file to machine learning models.

In addition to these ML-learned features, the models leverage expert researcher-created features and other useful file metadata to describe content. Because these ML models are trained for specific file types, they can zone in on the metadata of these file types.

Figure 4. Specialized file type-specific client ML models are paired with heavier cloud ML models to classify and protect against malicious script files in real-time

When the Windows Defender AV client encounters an unknown file, lightweight local ML models search for suspicious characteristics in the file’s features. Metadata for suspicious files are sent to the cloud protection service, where an array of bigger ML classifiers evaluate the file in real-time.

In both the client and the cloud, specialized file-type ML classifiers add to generic ML models to create multiple layers of classifiers that detect a wide range of malicious behavior. In the backend, deep-learning neural network models identify malicious scripts based on their full file content and behavior during detonation in a controlled sandbox. If a file is determined malicious, it is not allowed to run, preventing infection at the onset.

File type-specific ML classifiers are part of metadata-based ML models in the Windows Defender AV cloud protection service, which can make a verdict on suspicious files within a fraction of a second.

Figure 5. Layered machine learning models in Windows Defender ATP

File type-specific ML classifiers are also leveraged by ensemble models that learn and combine results from the whole array of cloud classifiers. This produces a comprehensive cloud-based machine learning stack that can protect against script-based attacks, including zero-day malware and highly targeted attacks. For example, the targeted phishing attack in mid-May was caught by a specialized PDF client-side machine learning model, as well as several cloud-based machine learning models, protecting customers in real-time.

Microsoft 365 threat protection powered by artificial intelligence and data sharing

Social engineering attacks that use non-portable executable (PE) threats are pervasive in today’s threat landscape; the impact of combating these threats through machine learning is far-reaching.

Windows Defender AV combines local machine learning models, behavior-based detection algorithms, generics, and heuristics with a detonation system and powerful ML models in the cloud to provide real-time protection against polymorphic malware. Expert input from researchers, advanced technologies like Antimalware Scan Interface (AMSI), and rich intelligence from the Microsoft Intelligent Security Graph continue to enhance next-generation endpoint protection platform (EPP) capabilities in Windows Defender Advanced Threat Protection.

In addition to antivirus, components of Windows Defender ATP’s interconnected security technologies defend against the multiple elements of social engineering attacks. Windows Defender SmartScreen in Microsoft Edge (also now available as a Google Chrome extension) blocks access to malicious URLs, such as those found in social engineering emails and documents. Network protection blocks malicious network communications, including those made by malicious scripts to download payloads. Attack surface reduction rules in Windows Defender Exploit Guard block Office-, script-, and email-based threats used in social engineering attacks. On the other hand, Windows Defender Application Control can block the installation of untrusted applications, including malware payloads of intermediary downloaders. These security solutions protect Windows 10 and Windows 10 in S mode from social engineering attacks.

Further, Windows Defender ATP endpoint detection and response (EDR) uses the power of machine learning and AMSI to unearth script-based attacks that “live off the land”. Windows Defender ATP allows security operations teams to detect and mitigate breaches and cyberattacks using advanced analytics and a rich detection library. With the April 2018 Update, automated investigation and advance hunting capabilities further enhance Windows Defender ATP. Sign up for a free trial.

Machine learning also powers Office 365 Advanced Threat Protection to detect non-PE attachments in social engineering spam campaigns that distribute malware or steal user credentials. This enhances the Office 365 ATP comprehensive and multi-layered solution to protect mailboxes, files, online storage, and applications against threats.

These and other technologies power Microsoft 365 threat protection to defend the modern workplace. In Windows 10 April 2018 Update, we enhanced signal sharing across advanced threat protection services in Windows, Office 365, and Enterprise Mobility + Security through the Microsoft Intelligent Security Graph. This integration enables these technologies to automatically update protection and detection and orchestrate remediation across Microsoft 365.

Gregory Ellison and Geoff McDonald
Windows Defender Research


Talk to us

Questions, concerns, or insights on this story? Join discussions at the Microsoft community and Windows Defender Security Intelligence.

Follow us on Twitter @WDSecurity and Facebook Windows Defender Security Intelligence.