Posted on Leave a comment

A conversation with Kevin Scott: What’s next in AI

For example, I’ve been playing around with an experimental system I built for myself using GPT-3 designed to help me write a science fiction book, which is something that I’ve wanted to do since I was a teenager. I have notebooks full of synopses I’ve created for theoretical books, describing what the books are about and the universes where they take place. With this experimental tool, I have been able to get the logjam broken. When I wrote a book the old-fashioned way, if I got 2,000 words out of a day, I’d feel really good about myself. With this tool, I’ve had days where I can write 6,000 words in a day, which for me feels like a lot. It feels like a qualitatively more energizing process than what I was doing before.

This is the “copilot for everything” dream—that you would have a copilot that could sit alongside you as you’re doing any kind of cognitive work, helping you not just get more done, but also enhancing your creativity in new and exciting ways.

This increase in productivity is clearly a boost to your satisfaction. Why do these tools bring more joy to work?

All of us use tools to do our work. Some of us really enjoy acquiring the tools and mastering them and figuring out how to deploy them in a super effective way to do the thing that we’re trying to do. I think that is part of what’s going on here. In many cases, people now have new and interesting and fundamentally more effective tools than they’ve had before. We did a study that found using no-code or low-code tools led to more than an 80% positive impact on work satisfaction, overall workload and morale by users. Especially for tools that are in their relatively early stages, that’s just a huge benefit to see.

For some workers, it’s literally enhancing that core flow that you get into when you’re doing the work; it speeds you up. It’s like having a better set of running shoes to go run a race or marathon. This is exactly what we’re seeing with the experiences developers are having with Copilot; they are reporting that Copilot helps them stay in the flow and keeps their minds sharper during what used to be boring and repetitive tasks.  And when AI tools can help to eliminate drudgery from a job, something that is super repetitive or annoying or that was getting in their way of getting to the thing that they really enjoy, it unsurprisingly improves satisfaction.

Personally, these tools let me be in flow state longer than I was before. The enemy of creative flow is distraction and getting stuck. I get to a point where I don’t know quite how to solve the next thing, or the next thing is, like, “I’ve got to go look this thing up. I’ve got to context switch out of what I was doing to go solve the subproblem.” These tools increasingly solve the subproblem for me so that I stay in the flow.

In addition to GitHub Copilot and DALL∙E 2, AI is showing up in Microsoft products and services in other ways. How is next-generation AI improving current products such as Teams and Word?

An impressionist oil painting of a women on a video call.This is the big untold story of AI. To date, most of AI’s benefits are spread across 1,000 different things where you may not even fully appreciate how much of the product experience that you’re getting is coming from a machine learned system.

For example, we’re sitting here in this Teams call on video and, in the system, there are all these parameters that were learned by a machine learning algorithm. There are jitter buffers for the audio system to smooth out the communication. The blur behind you on your screen is a machine learning algorithm at work. There are more than a dozen machine learning systems that make this experience more delightful for the both of us. And that is certainly true across Microsoft.

We’ve gone from machine learning in a few places to literally 1,000 machine learning things spread across different products, everything from how your Outlook email client works, your predictive text in Word, your Bing search experience, to what your feed looks like in Xbox Cloud Gaming and LinkedIn. There’s AI all over the place making these products better.

One of the big things that has changed in the past two years is it used to be the case that you would have a model that was specialized to each one of these tasks that we have across all our products. Now you have a single model that gets used in lots of places because they’re broadly useful. Being able to invest in these models that become more powerful with scale—and then having all the things built on top of the model benefit simultaneously from improvements that you’re making—is tremendous.

Microsoft’s AI research and development continues through initiatives such as AI4Science and AI for Good. What excites you most about this area of AI?

An impressionist oil painting of group of scientists in a nuclear lab.The most challenging problems we face as a society right now are in the sciences. How do you cure these intractably complicated diseases? How do you prepare yourself for the next pandemic? How do you provide affordable, high-quality healthcare to an aging population? How do you help educate more kids at scale in the skills that they will need for the future? How do you develop technologies that will reverse some of the negative effects of carbon emissions into the atmosphere? We’re exploring how to take some of these exciting developments in AI to those problems.

The models in these basic science applications have the same scaling properties as large language models. You build a model, you get it into some self-supervised mode where it’s learning from a simulation or it’s learning from its own ability to observe a particular domain, and then the model that you get out of it lets you dramatically change the performance of an application—whether you’re doing a computational fluid dynamics simulation or you’re doing molecular dynamics for drug design.

There’s immense opportunity there. This means better medicines, it means maybe we can find the catalyst we don’t have yet to fix our carbon emission problem, it means across the board accelerating how scientists and other folks with big ideas can work to try to solve society’s biggest challenges.

How have breakthroughs in computing techniques and hardware contributed to the advances in AI?

The fundamental thing underlying almost all of the recent progress we’ve seen in AI is how critical the importance of scale has proven to be. It turns out that models trained on more data with more compute power just have a much richer and more generalized set of capabilities. If we want to keep driving this progress further—and to be clear, right now we don’t see any end to the benefits of increased scale—we need to optimize and scale up our compute power as much as we possibly can.

We announced our first Azure AI supercomputer two years ago, and at our Build developer conference this year I shared that we now have multiple supercomputing systems that we’re pretty sure are the largest and most powerful AI supercomputers in the world today. We and OpenAI use this infrastructure to train nearly all of our state-of-the-art large models, whether that’s our Turing, Z-code and Florence models at Microsoft or the GPT, DALL∙E and Codex models at OpenAI. And we just recently announced a collaboration with NVIDIA to build a supercomputer powered by Azure infrastructure combined with NVIDIA GPUs.

Supercomputer image generated by a producer using DALL∙E 2.

Some of this progress has just been via brute force compute scale with bigger and bigger clusters of GPUs. But maybe even a bigger breakthrough is the layer of software that optimizes how models and data are distributed across these giant systems, both to train the models and then to serve them to customers. If we’re going to put forth these large models as platforms that people can create with, they can’t only be accessible to the tiny number of tech companies in the world with enough resources to build giant supercomputers.

So, we’ve invested a ton in software like DeepSpeed to boost training efficiency, and the ONNX Runtime for inference. They optimize for cost and latency and generally help us make bigger AI models more accessible and valuable for people. I’m super proud of the teams we have working on these technologies because Microsoft is really leading the industry here, and we’re open sourcing all of it so others can keep improving.

These advances are all playing out amid an ongoing concern that AI is going to impact jobs. How do you think about the issue of AI and jobs?

We live in a time of extraordinary complexity and historic macroeconomic change, and as we look out 5, 10 years into the future, even to just achieve a net neutral balance for the whole world, we’re going to need new forms of productivity for all of us to be able to continue enjoying progress. We want to be building these AI tools as platforms that lots of people can use to build businesses and solve problems. We believe that these platforms democratize access to AI to far more people. With them, you’ll get a richer set of problems solved and you’ll have a more diverse group of people being able to participate in the creation of technology.

With the previous instantiation of AI, you needed a huge amount of expertise just to get started. Now you can call Azure Cognitive Services, you can call the Azure OpenAI Service and build complicated products on top of these things without necessarily having to be so expert at AI that you’ve got to be able to train your own large model from scratch.

Posted on Leave a comment

Online math tutoring service uses AI to help boost students’ skills and confidence

Like many students around the world, Eithne, 14, in Chorley, United Kingdom, was struggling to keep up in math at school after more than a year of COVID-19 related disruptions. In June 2021, her parents signed her up for a summer program offered by Eedi, an online math tutoring service.

“Just dealing with lockdown, she hadn’t had enough of a really good background,” said her mother, Arianna. “She missed most of the Year 7 Maths, then Year 8. So, we thought, ‘Let’s give it a go, let’s see where she needs a bit of help.’”

Newly enrolled students on Eedi are asked to take a dynamic quiz of 10 multiple choice diagnostic questions that the service uses to learn where students struggle most in math. This information allows the service to place students on a learning pathway to overcome those specific obstacles, or misconceptions.

“We ask them a question based roughly on their age group and then we say, ‘Well, what’s the next best question to ask them based on their previous answer?’” explained Iris Hulls, the head of operations at Eedi. “We learn as much about them as possible to predict either growth or comfort topics for them.”

The dynamic quiz is powered by AI developed by researchers at the Microsoft Research Lab in Cambridge, United Kingdom, who specialize in machine learning algorithms that help people make decisions.

The AI uses each answer to predict the probability the student will correctly answer each of thousands of other possible next questions and then weighs those probabilities to decide what question to ask next to pinpoint knowledge gaps.

The information gleaned from the quiz is akin to what a teacher might learn from a one-on-one conversation with a student, explained Cheng Zhang, a Microsoft principal researcher at the lab who led the development of the machine learning model that powers Eedi’s dynamic quiz.

“If the student doesn’t know 3 times 7, we may want to ask 1 plus 1,” Zhang said. “We want to adapt the quiz based on the previous answer.”

Once students’ misconceptions are identified, the Eedi platform slots students onto a learning pathway that helps them overcome their misconceptions and do better in math at school.

Eithne was slotted onto a pathway that included a review of topics covered in Year 8 and prepared her for success in Year 9, including geometry.

“It’s very good for finding your weaknesses and your strengths and being able to understand why you’re maybe not as good in this one area,” Eithne said. “You’re able to realize, ‘I’ve been doing this wrong for ages.’”

A girl sits at a desk with a laptop interacting with an online math quiz
Eithne, 14, in Chorley, United Kingdom, gained confidence in math through lessons on Eedi, an online tutoring service that uses AI developed by Microsoft. Photo by Jonathan Banks.

Good questions, good data

The success of Microsoft’s next-best-question model hinges on the data used to train it, noted Zhang. In Eedi’s case, these are thousands of vetted, high-quality diagnostic questions developed specifically to help teachers identify student misconceptions about math topics.

“Our technology is just an enhancer that makes this high-quality data give more insights,” Zhang said.

Diagnostic questions are well-thought-through multiple choice questions that have one correct answer and three wrong answers, with each wrong answer designed to reveal a specific misconception.

“Maths lends itself quite well to this kind of multiple-choice assessment because more often than not there’s a right answer and these wrong answers; it’s much less subjective than some of the humanities subjects,” said Craig Barton, an Eedi co-founder and the company’s director of education.

Barton latched on to the power of diagnostic questions when, as a math teacher, he attended a training course on formative assessments and learned that well-formulated wrong answers can provide insight to why a student is struggling.

“In the past, it was always kids got things right, which is fine, or they got things wrong and then I had to start doing detective work to figure out where they were going wrong,” he said. “That’s okay if you work one-to-one, but if you’ve got 30 kids in a class, that’s potentially quite time consuming.”

Good diagnostic questions, Barton said, must be clear and unambiguous, check for one thing, be answerable in 20 seconds, link each wrong answer to a misconception and ensure that a student is unable to answer it correctly while having a key misconception.

“This notion that the kids can’t get it right whilst having a key misconception is the hardest one to factor in, but it’s probably the most important,” he said.

For example, consider the question: “Which of the following is a multiple of 6? – A: 20, B: 62, C: 24, or D: 26.”

According to Barton, on the surface this is a decent question. That’s because students could think a “multiple” means the “6” is the first number (B) or last number (D), or the student could have difficulty with their multiplication tables and select A. The correct answer is C: 24.

“But the major flaw in this question is if you don’t know the difference between a factor and a multiple, you could get this question right, whereas experience will tell us that the biggest misconception students have with multiples is they mix them up with factors,” he said.

A better question to ask, then, is, “Which of these is a multiple of 15? – A: 1, B: 5, C: 60 or D: 55.” That’s because the possible answers include factors and multiples. The correct answer is C: 60. A student who confuses factors with multiples might instead pick A: 1 or B: 5, and a student who needs work on multiplication might pick D: 55.

“When you write these things, you’ve really got to think, ‘What are all the different ways kids can go wrong and how am I going to capture those in three wrong answers?’” Barton explained.

Screenshot of an online math quiz asking for the mean of five numbers with four choices for answers
In this diagnostic question, the correct answer is “B:4.” Students who answer “A:20” took the first step to find the mean, totaling the numbers. “C:3” represents confusion between the concepts of median and mean. “D:2” is a mix up of the concepts mode and mean.

Teacher tools to online tutor

After the workshop, Barton went home and wrote about 50 diagnostic questions and tested them out on students in his class. They worked.

Barton is also a math book author and podcaster with thousands of followers on social media. He used his influence to spread the word on diagnostic questions and collaborated with Eedi co-founder Simon Woodhead to build an online database with thousands of diagnostic questions for teachers to access for their lesson planning.

“Then I thought, ‘Wait a minute, we could do something a bit better than this,’” Barton said. “’Imagine if the kids could answer the questions online and we could capture that data and then, before you know it, we’ve got insights into specific areas where students struggle.’”

The website exploded in popularity and attracted investors as well as the attention of Hulls, who along with colleagues was exploring options to use data to scale and make the benefits of math tutoring accessible to more families. The team formed Eedi. An advisor introduced them to Zhang and her team’s research on the next-best-question algorithm, which aims to accelerate decision making by gathering and analyzing relevant personal information.

At the time, the Microsoft researchers were working on healthcare scenarios, using AI to help doctors more efficiently make decisions about what tests to order to diagnose patient ailments.

For example, if a patient walks into an emergency room with a hurt arm, the doctor will ask a series of questions leading up to an X-ray, such as “How did you hurt your arm?” and, “Can you move your fingers?” instead of, “Do you have a cold?” because the answer will reveal relevant information for this patient’s treatment. The next-best-question algorithm automates this information gathering process.

The advisor thought the model would work well with Eedi’s dataset of diagnostic questions, automating the collection of information a tutor could glean from a one-on-one conversation with a student.

“We were aware that we had collected a lot of data. We wanted to do smarter stuff with our data; we wanted to be able to predict what misconceptions students might have before they even answer questions,” said Woodhead, who is Eedi’s chief data scientist.

The Eedi team worked with the Microsoft researchers to train the model on their diagnostic questions to efficiently pinpoint where students need the most support in math.

The model works without collecting any personal identifying information from the students, Woodhead noted.

“It doesn’t need to know a name. It doesn’t need to know an email address. It’s looking at patterns,” he said.

From this information, the system can pinpoint the best lessons for students to take on Eedi. Without that guidance, students tend to rely on strategies they’re already using at school, which isn’t the right starting point for the majority of students who are looking for a private tutor, according to Hulls.

“It really helps direct the children and their families at home to know where to start,” she said.

Posted on Leave a comment

How AI makes developers’ lives easier, and helps everybody learn to develop software

Ever since Ada Lovelace, a polymath often considered the first computer programmer, proposed in 1843 using holes punched into cards to solve mathematical equations on a never-built mechanical computer, software developers have been translating their solutions to problems into step-by-step instructions that computers can understand.

That’s now changing, according to Kevin Scott, Microsoft’s chief technology officer.

Today, AI-powered software development tools are allowing people to build software solutions using the same language that they use when they talk to other people. These AI-powered tools translate natural language into the programming languages that computers understand.

“That allows you, as a developer, to have an intent to accomplish something in your head that you can express in natural language and this technology translates it into code that achieves the intent you have,” Scott said. “That’s a fundamentally different way of thinking about development than we’ve had since the beginning of software.”

This paradigm shift is driven by Codex, a machine learning model from AI research and development company OpenAI that can translate natural language commands into code in more than a dozen programming languages.

Codex descended from GPT-3, OpenAI’s natural language model that was trained on petabytes of language data from the internet. Codex was trained on this language data as well as code from GitHub software repositories and other public sources.

“It makes coding more productive in terms of removing not-so-fun work and also helping you remember things you might have forgotten and helping you with the approach to solve problems,” Peter Welinder, vice president of products and partnerships for OpenAI, said of Codex.

Example of Codex where the creator, working in the graphics rendering engine Babylon.js, entered the natural language command, “create a model of the solar system” into the text box and the AI-powered software translated the command into code for a solar system model
In this example, a creator working in the graphics rendering engine Babylon.js entered the natural language command, “create a model of the solar system” into the text box and the AI-powered software translated the command into code for a solar system model.

The increase in productivity that Codex brings to software development is a game changer, according to Scott. It allows developers to accomplish many tasks in two minutes that previously took two hours.

“And oftentimes, the things that the tools are doing is they are helping you to very quickly go through the least interesting parts of your job so that you can get to the most interesting parts of your job, which makes the qualitative experience of creating much more pleasant and stimulating and fun,” he said.

AI and code come together

Microsoft and OpenAI formed a partnership in 2019 to accelerate breakthroughs in AI – including jointly developing some of the world’s most powerful AI supercomputers – and deliver them to developers to build the next generation of AI applications through Azure OpenAI Service.

Microsoft subsidiary GitHub also worked with OpenAI to integrate Codex into GitHub Copilot, a downloadable extension for software development programs such as Visual Studio Code. The tool uses Codex to draw context from a developer’s existing code to suggest additional lines of code and functions. Developers can also describe what they want to accomplish in natural language, and Copilot will draw on its knowledge base and current context to surface an approach or solution.

GitHub Copilot, released in a technical preview in June 2021, today suggests about 35% of the code in popular languages like Java and Python generated by the tens of thousands of developers in the technical preview who regularly use GitHub Copilot. GitHub Copilot will move to general availability this summer, bringing this AI-assisted coding capability to millions of professional developers, Microsoft announced today at its Microsoft Build developer’s conference.

“A lot of software has common frameworks and pieces of scaffolding. Copilot does such an awesome job of doing all that for you so you can focus your energy and your creativity on the things that you’re trying to solve uniquely,” said Julia Liuson, president of the developer division at Microsoft, which includes GitHub.

Julia Liuson, the president of the developer division at Microsoft is shown speaking at a conference.
Julia Liuson, president of the developer division at Microsoft, which includes GitHub, expects that today’s tools will be the first wave of AI-assisted development. Photo courtesy of Microsoft.

As more developers experiment with Codex and GitHub Copilot, more clues to the potential of AI-assisted development are emerging, according to Welinder. For example, natural language documentation inside most software programs is sparse. Users of GitHub Copilot create this documentation by default as they use the tool.

“You get a bunch of comments in the code just from the nature of telling Copilot what to do,” he said. “You’re documenting the code as you go, which is mind-blowing.”

These comments, in turn, serve as a teaching tool for other developers, who often study other programs to learn how to solve specific problems in their own programs. The ability of Codex to translate from code to natural language is another way developers can learn as they program, which will lower the barrier of entry to coding, Welinder added.

From low code to no code

Meanwhile, AI-powered low code and no code tools, such as those available through Microsoft Power Platform, aim to enable billions of people to develop the software applications that they need to solve their unique problems, from an audiologist digitizing simple paper forms to transform hearing loss prevention in Australia to a tool that relieves the burden of manual data-entry work from employees of a family owned business and an enterprise grade solution that processes billions of dollars of COVID-19 loan forgiveness claims for small businesses.

Today, the hundreds of millions of people who are comfortable working with formulas in Microsoft Excel, a spreadsheet program, could easily bring these skills into Power Platform where they can build these types of software applications, according to Charles Lamanna, Microsoft corporate vice president of business applications and platform.

Charles Lamanna, Microsoft corporate vice president of business applications and platform is shown leaning against a wall.
Charles Lamanna, Microsoft corporate vice president of business applications and platform, believes AI-powered tools will enable billions of people to develop software. Photo by Dan DeLong for Microsoft.

“One of the big pushes we’ve been doing is to go to the next level, to go from hundreds of millions of people that can use these tools to billions of people that can use these tools,” he said. “And the only way we think we can actually do that is to go from low code to no code by using AI-powered development.”

To do this, Lamanna’s team first integrated GPT-3 with Microsoft Power Apps for a feature called Power App Ideas, which allows people to create applications using conversational language in Power Fx, an open-source programming language for low code development with its origins in Microsoft Excel. The next step, announced at Build, is a feature called Power Apps express design, which leverages AI models from Azure Cognitive Services to turn drawings, images, PDFs and Figma design files into software applications.

“We’ve made it so that we can do image recognition and map it to the constructs that exist within an application. We understand what’s a button, what’s a grouping, what’s a text box and generate an application automatically based on those drawings without you having to understand and wire up all these different components,” Lamanna said.

YouTube Video

A new AI-powered feature called Power Apps express design helps turn sketches and other images into the bones of an app, helping people with little or no coding experience develop software.

This transition from low code to no code on the back of AI follows a general trend of computing becoming more accessible over time, he added. Personal computers were rare 40 years ago, spreadsheets were uncommon 30 years ago, internet access was limited 20 years ago, for example. Until recently, video and photo editing were reserved for experts.

Software development should also become more accessible, Lamanna said.

“If we want everybody to be a developer, we can’t plan on teaching everyone how to write Python code or JavaScript. That’s not possible. But it is possible if we create the right experiences and get them in front of enough people who can click and drag and drop and use concepts that are familiar to create amazing solutions,” he said.

Developers for the software-powered future

GitHub Copilot as well as the low code and no code offerings available via the Power Platform are the first phase of AI-powered development, according to Liuson. She envisions AI-powered models and tools that will help developers of all ability levels clean data, check code for errors, debug programs and explain what blocks of code mean in natural language.

These features are part of a larger vision of AI-powered tools that could serve as assistants that help developers more quickly find solutions to their problems and help anyone who wants to build an application go from an idea in their head to a piece of software that works.

“As a developer, we all have days that we have pulled out our hair, saying, ‘Why is this thing not working?’ And we consult with a more senior developer who points us in the right direction,” Liuson said. “When Copilot can go, ‘Hey here are the four different things that are common with this pattern of problem,’ that will be huge.”

This new era of AI-assisted software development can lead to greater developer productivity, satisfaction and efficiency and make software development more natural and accessible to more people, according to Scott.

For example, a gamer could use natural language to program non-player characters in Minecraft to accomplish tasks such as build structures, freeing the gamer to attend to other, more pressing tasks. Graphic designers can use natural language to build 3D scenes in the graphics rendering engine Babylon.js. Teachers can use 3D creation and collaboration tools like FrameVR to speak into existence a metaverse world such as a moonscape with rovers and an American flag.

“You can describe to the AI system what you want to accomplish,” Scott said. “It can try to figure out what it is you meant and show you part of the solution and then you can refine what the model is showing you. It’s this iterative cycle that’s free flowing and natural.”

These tools, Scott added, will also swell the ranks of developers in a world that will be increasingly powered by software.

“Because the future is so dependent on software, we want a broad and inclusive set of people participating in its creation,” he said. “We want people from all sorts of backgrounds and points of view to be able to use the most powerful technology they can lay their hands on to solve the problems that they have, to help them build their businesses and create prosperity for their families and their communities.”

Related

Top photo: Kevin Scott, Microsoft chief technology officer, said AI-powered tools help developers get from thoughts in their heads to code. Photo courtesy of Microsoft.

John Roach writes about Microsoft research and innovation. Follow him on Twitter.

Posted on Leave a comment

From Mona Lisa searches to translation, AI research is improving our products

The evolution from research to product

It’s one thing for a Microsoft researcher to use all the available bells and whistles, plus Azure’s powerful computing infrastructure, to develop an AI-based machine translation model that can perform as well as a person on a narrow research benchmark with lots of data. It’s quite another to make that model work in a commercial product.

To tackle the human parity challenge, three research teams used deep neural networks and applied other cutting-edge training techniques that mimic the way people might approach a problem to provide more fluent and accurate translations. Those included translating sentences back and forth between English and Chinese and comparing results, as well as repeating the same translation over and over until its quality improves.

“In the beginning, we were not taking into account whether this technology was shippable as a product. We were just asking ourselves if we took everything in the kitchen sink and threw it at the problem, how good could it get?” Menezes said. “So we came up with this research system that was very big, very slow and very expensive just to push the limits of achieving human parity.”

“Since then, our goal has been to figure out how we can bring this level of quality — or as close to this level of quality as possible — into our production API,” Menezes said.

Someone using Microsoft Translator types in a sentence and expects a translation in milliseconds, Menezes said. So the team needed to figure out how to make its big, complicated research model much leaner and faster. But as they were working to shrink the research system algorithmically, they also had to broaden its reach exponentially — not just training it on news articles but on anything from handbooks and recipes to encyclopedia entries.

To accomplish this, the team employed a technique called knowledge distillation, which involves creating a lightweight “student” model that learns from translations generated by the “teacher” model with all the bells and whistles, rather than the massive amounts of raw parallel data that machine translation systems are generally trained on. The goal is to engineer the student model to be much faster and less complex than its teacher, while still retaining most of the quality.

In one example, the team found that the student model could use a simplified decoding algorithm to select the best translated word at each step, rather than the usual method of searching through a huge space of possible translations.

The researchers also developed a different approach to dual learning, which takes advantage of “round trip” translation checks. For example, if a person learning Japanese wants to check and see if a letter she wrote to an overseas friend is accurate, she might run the letter back through an English translator to see if it makes sense. Machine learning algorithms can also learn from this approach.

In the research model, the team used dual learning to improve the model’s output. In the production model, the team used dual learning to clean the data that the student learned from, essentially throwing out sentence pairs that represented inaccurate or confusing translations, Menezes said. That preserved a lot of the technique’s benefit without requiring as much computing.

With lots of trial and error and engineering, the team developed a recipe that allowed the machine translation student model — which is simple enough to operate in a cloud API — to deliver real-time results that are nearly as accurate as the more complex teacher, Menezes said.

Arul Menezes standing with arms folded in front of green foliage in the background
Arul Menezes, Microsoft distinguished engineer and founder of Microsoft Translator. Photo by Dan DeLong.

Improving search with multi-task learning

In the rapidly evolving AI landscape, where new language understanding models are constantly introduced and improved upon by others in the research community, Bing’s search experts are always on the hunt for new and promising techniques. Unlike the old days, in which people might type in a keyword and click through a list of links to get to the information they’re looking for, users today increasingly search by asking a question — “How much would the Mona Lisa cost?” or “Which spider bites are dangerous?” — and expect the answer to bubble up to the top.

“This is really about giving the customers the right information and saving them time,” said Rangan Majumder, partner group program manager of search and AI in Bing. “We are expected to do the work on their behalf by picking the most authoritative websites and extracting the parts of the website that actually shows the answer to their question.”

To do this, not only does an AI model have to pick the most trustworthy documents, but it also has to develop an understanding of the content within each document, which requires proficiency in any number of language understanding tasks.

Last June, Microsoft researchers were the first to develop a machine learning model that surpassed the estimate for human performance on the General Language Understanding Evaluation (GLUE) benchmark, which measures mastery of nine different language understanding tasks ranging from sentiment analysis to text similarity and question answering. Their Multi-Task Deep Neural Network (MT-DNN) solution employed both knowledge distillation and multi-task learning, which allows the same model to train on and learn from multiple tasks at once and to apply knowledge gained in one area to others.

Bing’s experts this fall incorporated core principles from that research into their own machine learning model, which they estimate has improved answers in up to 26 percent of all questions sent to Bing in English markets. It also improved caption generation — or the links and descriptions lower down on the page — in 20 percent of those queries. Multi-task deep learning led to some of the largest improvements in Bing question answering and captions, which have traditionally been done independently, by using a single model to perform both.

For instance, the new model can answer the question “How much does the Mona Lisa cost?” with a bolded numerical estimate: $830 million. In the answer below, it first has to know that the word cost is looking for a number, but it also has to understand the context within the answer to pick today’s estimate over the older value of $100 million in 1962. Through multi-task training, the Bing team built a single model that selects the best answer, whether it should trigger and which exact words to bold.

Screenshot of a Bing search results page showing an enhanced answer of how much the Mona Lisa costs, with a snippet from Wikipedia
This screenshot of Bing search results illustrates how natural language understanding research is improving the way Bing answers questions like “How much does the Mona Lisa cost?” A new AI model released this fall understands the language and context of the question well enough to distinguish between the two values in the answer — $100 million in 1962 and $830 million in 2018 — and highlight the more recent value in bold. Image by Microsoft.

Earlier this year, Bing engineers open sourced their code to pretrain large language representations on Azure.  Building on that same code, Bing engineers working on Project Turing developed their own neural language representation, a general language understanding model that is pretrained to understand key principles of language and is reusable for other downstream tasks. It masters these by learning how to fill in the blanks when words are removed from sentences, similar to the popular children’s game Mad Libs.

You take a Wikipedia document, remove a phrase and the model has to learn to predict what phrase should go in the gap only by the words around it,” Majumder said. “And by doing that it’s learning about syntax, semantics and sometimes even knowledge. This approach blows other things out of the water because when you fine tune it for a specific task, it’s already learned a lot of the basic nuances about language.”

To teach the pretrained model how to tackle question answering and caption generation, the Bing team applied the multi-task learning approach developed by Microsoft Research to fine tune the model on multiple tasks at once. When a model learns something useful from one task, it can apply those learnings to the other areas, said Jianfeng Gao, partner research manager in the Deep Learning Group at Microsoft Research.

For example, he said, when a person learns to ride a bike, she has to master balance, which is also a useful skill in skiing. Relying on those lessons from bicycling can make it easier and faster to learn how to ski, as compared with someone who hasn’t had that experience, he said.

“In some sense, we’re borrowing from the way human beings work. As you accumulate more and more experience in life, when you face a new task you can draw from all the information you’ve learned in other situations and apply them,” Gao said.

Like the Microsoft Translator team, the Bing team also used knowledge distillation to convert their large and complex model into a leaner model that is fast and cost-effective enough to work in a commercial product.

And now, that same AI model working in Microsoft Search in Bing is being used to improve question answering when people search for information within their own company. If an employee types a question like “Can I bring a dog to work”? into the company’s intranet, the new model can recognize that a dog is a pet and pull up the company’s pet policy for that employee — even if the word dog never appears in that text. And it can surface a direct answer to the question.

“Just like we can get answers for Bing searches from the public web, we can use that same model to understand a question you might have sitting at your desk at work and read through your enterprise documents and give you the answer,” Majumder said.

Top image: Microsoft investments in natural language understanding research are improving the way Bing answers search questions like “How much does the Mona Lisa cost?” Image by Musée du Louvre/Wikimedia Commons. 

Related:

Jennifer Langston writes about Microsoft research and innovation. Follow her on Twitter.

Posted on Leave a comment

AI boot camp aims to draw more teen girls into computer science

As an engineering student at the University of Pennsylvania in the 1990s, Didem Un Ates was one of only five women in a graduating class of 180. Today, she’s on a mission is to drastically change those numbers.

Un Ates is part of a Microsoft team that launched “Alice envisions the future,” a boot camp for girls focused on artificial intelligence. The first event in Athens – packed with keynote speeches, panel discussions and hands-on workshops – helped spark the passion for AI in 160 girls from 16 countries.

After witnessing the success of the inaugural event, the team took the show on the road, first to London last October, and then to New York in March.

Registration is now open for two more “Girls in AI” hackathons for girls 14 to 18, which are scheduled for next month:

Registration is free and attendees do not need a laptop or any experience with coding – just a curiosity about AI and a creative mind. The first 80 students to register will be accepted.

About 50 girls between attended the “Girls in AI” hackathon in New York, tackling subjects ranging from human-centered design and AI ethics to machine learning.

Un Ates said the transformation over the course of the weekend can be astounding. Girls who may come into the program shy, timid and hesitant of delving into advanced technology such as AI can leave the program with an entirely different mindset.

“They may have heard of AI, but they don’t exactly know what it means or what a hackathon means. But by the end of Sunday, there are all these super-excited, confident individuals who cannot stop talking about how they are going to  build a business out of their project,” said Un Ates, senior director of customer care intelligence for the Microsoft Business Applications Group, Cloud & AI.

YouTube Video

Winning teams from the hackathons are eligible to enter Microsoft’s AI for Good Idea Challenge, an international contest for developers, students and data scientists who use AI to tackle some of society’s greatest obstacles. The deadline for entries is June 26.

Un Ates says she is devoted to evangelizing STEM education – and artificial intelligence specifically – because of the dire underrepresentation of women in the field.

“Only 12% of artificial intelligence and machine learning experts are female,” Un Ates noted. “And we have the opportunity to change that.”

According to the U.S. National Center for Education Statistics, in 1985 women accounted for roughly 37% of all computer science undergraduate students. Today, that number is 12%. According to a recent WIRED & Element AI study, only 12% of machine learning researchers are women.

Un Ates said it’s important that women are well-represented in computer science both because of the perspective they bring to the field and because of the job opportunities the field can offer.

And that is exactly what Microsoft’s “Girls in AI” hackathons are designed to accomplish. According to the team’s event website, the curriculum gives teenage girls “the chance to utilize AI and machine learning techniques to tackle global challenges in a holistic manner.” The two-day event will give attendees an understanding of design thinking, strategy and business model development, ethics, social responsibility and pitching skills.

The “Alice Envisions the Future” hackathon program is just one of the ways Microsoft working to get more girls and young women involved in computer science. Microsoft also offers DigiGirlz Days, one-day events designed to provide girls with a better understanding of what a career in technology is like, and DigiGirlz High Tech Camp, a program developed 19 years ago to help dispel stereotypes in the high-tech industry.

Related:

Posted on Leave a comment

How AI is helping kids bridge language gaps

How did you learn to talk?

Probably something like this: Your infant brain, a hotbed of neurological activity, picked up on your parents’ speech tones and facial expressions. You started to mimic their sounds, interpret their emotions and identify relatives from strangers. And one day, about a year into life, you pointed and started saying a few meaningful words with slobbery glee.

But many children, particularly those diagnosed with autism spectrum disorder, acquire language in different ways. Worldwide, one in 160 children is diagnosed with ASD. In the United States, it is one in 59 children — and approximately 40 percent of this group is non-verbal.

YouTube Video

Learning from superheroes and puppies

Lois Jean Brady and Matthew Guggemos, co-founders of Bay Area-based iTherapy who are speech pathologists and autism specialists, are tackling the growing prevalence of autism-related speech challenges with InnerVoice, an artificial intelligence-powered app whose customizable avatars stimulate social cues. The app animates avatars of superheroes, puppies, stuffed animals and people to help young children who have difficulties with language and expression pair words with meanings and practice conversation.

iTherapy received a Microsoft AI for Accessibility grant in 2018. The program provides grants as well as technology and expertise to individuals, groups and companies passionate about creating tools that make the world more inclusive. iTherapy is using the grant to integrate the Azure AI platform to enhance its generated speech, image recognition and facial animation.

A young boy at the iTherapy clinic uses InnerVoice chat bot to describe his photo of a Teddy bear.A five-year-old student using Zyrobotics to learn to read at Ranch Santa Gertrudes Elementary. 

“I think for sure that the AI component was the missing link,” says Guggemos of the app. “How do you use words, and what do words mean? What does a symbol represent? How do you use AI to develop problems that require language to solve?”

How a hippo helps teach speech 

AI is also proving an exciting development in speech and language improvement for Zyrobotics, an Atlanta-based educational technology company that was the first beneficiary of the AI for Accessibility program in 2018. Zyrobotics is using Azure Machine Learning to help its ReadAble Storiez educational tool interpret when a student needs assistance.

YouTube Video

ReadAble Storiez uses an avatar of a hippo to help students with learning disabilities such as dyslexia and other challenges such as stuttering, pauses and heavy accents.

Ayanna Howard, the company’s founder and professor in robotics, was first motivated to create ReadAble Storiez when watching a teacher use Zyrobotics’ Counting Zoo app with a child. When the teacher turned to her and said, “Can you have this app do more than just read with him? I think it’s fantastic that it helps improve his math – could it also help him improve his reading?”

Howard also found teachers mentioning the challenges of dyslexia in the classroom. “I was like, ‘Oh, what happens if you have a reading disability?’ I then learned that signs of dyslexia in children aren’t picked up until much later, typically when schools start standardized testing. I realized we needed an intervention much earlier and that we could do that with Counting Zoo.”

Learning models that don’t take individualized challenges into account, or don’t address the speech patterns of kids, “tend to fail,” Howard says. ReadAble Storiez employs a custom speech model and a sophisticated “tutor” to convert speech to text and measure accuracy, fluency and the child’s reading improvement.

‘It blew my mind!’

Howard is pleased with the program’s early success. “While they were reading a book, kids were correcting themselves,” she says. “As a technologist, you say your stuff works, but I’m sitting there with the kids and I’m blown away, ‘It really does work!’ It’s thrilling to see that what works in the lab actually works in the real world, in the child’s environment. The [avatar] would provide feedback, and a child would be like, ‘I didn’t say a word right. Can I try again?’ It blew my mind. That was the affirmation. Our solution was on track and on target.”

Brady, who came up with the idea for InnerVoice after studying and writing a book on apps for people with autism, reflects on the impact it has made. She cites an example of working with a student who is non-verbal and used the app to communicate with an avatar of himself.

“He would take a picture of an apple, and an avatar would read it as ‘apple,’ and then he would write it down, ‘apple.’ Until then, I hadn’t even thought of that strategy.”

A mother uses InnerVoice to work on communication skills with her young daughter. A mother uses InnerVoice to work on communication skills with her young daughter.  

Brady and Guggemos imagine the benefits of AI-assisted communication beyond their target audience. They are working with people with dementia, head injuries and strokes. “Many communication apps just talk for you,” Brady adds. “Ours spans many aspects of communication for everybody — even English-language learners. Why wouldn’t I try that? It provides a model. There’s a coffee cup on the table, take a picture of it. How do you say that?”

Howard dreams of Zyrobotics helping to close the gap between mainstream learners and students with learning disabilities. To start, this fall Zyrobotics will introduce ReadAble Storiez to classrooms in the Los Nietos, California, school district, where learning disabilities track high. The company will also apply AI to its suite of STEM Storiez, a series of nine interactive and inclusive books that help children ages 3 to 7 engage with science, math, engineering and technology.

The AI for Accessibility program has been instrumental in getting Zyrobotics off the ground with ReadAble Storiez. “If we hadn’t gotten the grant, we’d be in phase zero,” Howard says. “We run on grants to ensure we provide access to learning technologies for all students. We need to be out there for kids that need us.”

The grant gave Brady and Guggemos the technology to take InnerVoice to the next level. “Our kids need this technology,” Brady says. “It’s not a luxury. We want to keep adding the best stuff. Microsoft really propelled us forward in that arena.”

Top image: A young boy at the iTherapy clinic uses InnerVoice chat bot to describe his photo of a Teddy bear. 

Related: