Machine Learning – Sick Gaming

Posted on September 14, 2022 by — Leave a comment

Microsoft shares what’s next in machine learning at NVIDIA GTC

Finding scalable solutions for today’s global challenges requires forward-thinking, transformative tools. As environmental, economic, and public health concerns mount, Microsoft Azure is addressing these challenges head on with high-performance computing (HPC), AI, and machine learning. The behind-the-scenes power for everything from MRI scans to energy management and financial services, these technologies are equipping customers and developers with innovative solutions that break through the boundaries of what’s possible in data and compute, paving the way for growth opportunities that span industries and applications around the world.

Microsoft Azure is committed to unlocking these new opportunities for our customers, providing the broadest range of NVIDIA GPUs at the edge, on-premises, in the cloud, and for hybrid environments.

At NVIDIA GTC we will demonstrate this commitment by showing how Azure’s advanced HPC capabilities, and AI/machine learning in the cloud are driving transformation and making an impact together with NVIDIA’s latest technology.

Microsoft Azure’s collaboration with NVIDIA was developed with our customers in mind and focused on opening new doors to innovation with graphics processing unit (GPU) acceleration in the cloud.

Learn more by registering today for NVIDIA GTC, a free, online event running September 19 to 22, 2022.

Get a chance to win an NVIDIA Jetson Nano or swag box

In both of our sessions you have a chance to win a SWAG box complete with a HPC t-shirt and mug or a Jetson Nano. Attend these sessions and don’t forget to look for the special link to enter!

Microsoft Sessions at NVIDIA GTC

The new SDK and CLI in Azure Machine Learning.
Bala Venkataraman, Principal Program Manager, Microsoft.

Video on demand

Azure Machine Learning is committed to simplifying the adoption of its platform for training and production. In 2022, we announced the general availability of Azure Machine Learning CLI v2 and the preview of Azure Machine Learning Python SDK v2. Both launches demonstrate our continued focus on making workflows easier and managing their entire lifecycle starting from training single jobs to pipelines and model deployments. In this session, learn about the key improvements in usability and productivity, and the new features that come with the command-line interpreter (CLI) and software development kit (SDK) v2 of Azure Machine Learning.

Operationalize large model training on Azure Machine Learning using multi-node NVIDIA A100 GPUs.
Sharmeelee Bijlani, Program Manager Azure Machine Learning, Microsoft; Razvan Tanase, Principal Engineering Manager Azure Machine Learning, Microsoft.

Wednesday, September 21, 10:00 to 10:50 AM PDT (1:00 to 1:50 PM EDT, 7:00 to 7:50 AM CEST)

In recent years, deep learning models have grown exponentially in size, demonstrating an acute need for customers to train and fine-tune them using large-scale data infrastructure, advanced GPUs, and an immense amount of memory. Fortunately, developers can now use simple training pipelines on Azure Machine Learning to train large models running on the latest multi-node NVIDIA GPUs. This session will describe these software innovations to customers through Azure Machine Learning (including a fully optimized PyTorch environment) that offers great performance and an easy-to-use interface for large-scale training. We’ll also highlight the power of Azure Machine Learning through experiments using 1,024 A100 Tensor Core GPUs to scale the training of a two-trillion parameter model with a streamlined user experience at 1,000 plus GPU scale.

Watch Party #1: Operationalize large-model training on Azure Machine Learning using multi-node NVIDIA A100 GPUs.
Mary Howell, NVIDIA.

Wednesday, Sep 21st, 3:00 – 3:30 PM PDT
In this GTC Watch Party, we will be replaying our Operationalize Large-Model Training on Azure Machine Learning using Multi-Node NVIDIA A100 GPUs session. Participants will be joined by experts from across Microsoft and NVIDIA who bring fresh insights and experiences to the table, taking the session to a whole new level of understanding. Interaction is core to our GTC Watch Parties, and we encourage you to join the discussion with any comments or questions.

Watch Party #2: Operationalize large-model training on Azure Machine Learning using multi-node NVIDIA A100 GPUs.
Gabrielle Davelaar, AI Technical Specialist, Microsoft; Maxim Salnikov, Senior Azure GTM Manager, Microsoft; Henk Boelman, Senior Cloud Advocate–AI and Machine Learning, Microsoft; Alexander Young, Technical Marketing Engineer, NVIDIA; Ulrich Knechtel, Microsoft Partner Manager (EMEA), NVIDIA.

Thursday, September 22, 2:00 to 3:30 PM CEST (5:00 to 6:30 AM PDT, 8:00 to 9:30 AM EDT)

In this GTC Watch Party, we will be replaying our Operationalize Large-Model Training on Azure Machine Learning using Multi-Node NVIDIA A100 GPUs session. Participants will be joined by experts from across Microsoft and NVIDIA who bring fresh insights and experiences to the table, taking the session to a whole new level of understanding. Interaction is core to our GTC Watch Parties, and we encourage you to join the discussion with any comments or questions.

Microsoft is helping customers across industries step up, transforming AI and machine learning at the Edge

Nuance’s Dragon Ambient eXperience helps doctors document care faster with AI on Azure

Nuance developed an AI-based clinical solution that automatically turns doctor-patient conversations into accurate medical notes. Built with Azure and PyTorch, this solution saves doctors transcription time, reducing administrative burdens and helping them conduct more focused, higher-quality interactions with their patients.

Energy utility Elva builds a highly secure DevOps platform with Azure infrastructure and network security services

Elva looked to build a secure, cloud-first DevOps platform that could meet Norway’s data residency and compliance requirements, delivering automated services that would help develop network grid technology. Using Azure DDoS Protection, Azure Web Application Firewall, and Azure Kubernetes Service, Elva realized its goal, enhancing its in-house development and data integration capabilities.

The Royal Bank of Canada creates personalized offers while protecting data privacy with Azure confidential computing

The Royal Bank of Canada (RBC) partnered with Microsoft to create a privacy-preserving multi-party data sharing platform built on Azure confidential computing. Called VCR, this solution enables RBC to personalize offerings and protect privacy at the same time, creating exceptional digital experiences that clients can trust.

Recapping 2022 moments with Azure and NVIDIA technologies

Azure NC A100 v4-series

At Microsoft, our NC series virtual machines allow customers access to almost limitless AI hardware infrastructure so they can be productive quickly. Last summer, we leveled up, announcing the general availability of Azure NC A100 v4 series virtual machines. Powered by NVIDIA A100 80GB PCle Tensor Core GPUs and 3rd Gen AMD EPYC™ processors, these virtual machines help our customers gain insights faster, innovate with speed, do more with less, and are the most performant and cost-competitive NC series offering for a diverse set of workloads.

DeepSpeed on Azure

Azure Machine Learning uses large fleets of the latest NVIDIA GPUs powered by NVIDIA Quantum InfiniBand interconnects to tackle large-scale AI training and tuning. Last July, we announced a breakthrough in our software stack, using DeepSpeed and 1,024 NVIDIA A100 GPUs to scale the training of a two trillion parameter model with a streamlined user experience at 1,000 plus GPU scale. We are bringing these software innovations to you through Azure Machine Learning (including a fully optimized PyTorch environment) that offers great performance and an easy-to-use interface for large-scale training.

NVads A10 v5 virtual machines

Traditionally, graphics-heavy visualization workloads that run in the cloud require virtual machines with full GPUs that are both costly and inflexible. To combat this, we introduced the first GPU-partitioned (GPU-P) virtual machine offering in the cloud, and just last July, we announced the general availability of NVads A10 v5 GPU accelerated virtual machines. Azure is the first public cloud to offer GPU partitioning on NVIDIA GPUs, and our new NVads A10 v5 virtual machines are designed to offer the right choice for any workload and provide optimum configurations for both single-user and multi-session environments. Dig into our latest virtual machine innovation.

NVIDIA Jetson AGX Orin-powered edge AI devices now available

Microsoft is pleased to announce that the NVIDIA Jetson AGX Orin SoM is now powering Azure Certified edge devices from industry-leading device builders including AAEON, Advantech, and AVerMedia, along with the NVIDIA Jetson AGX Orin developer kit.

Developers and solution builders can now leverage powerful NVIDIA Jetson AGX Orin devkits and production modules with Microsoft Azure to create, deploy, and operate powerful AI solutions at the edge, accelerating product development and deployment at scale. The NVIDIA Orin Nano modules have set a new baseline for entry-level edge AI and robotics, building on the momentum behind the Jetson Orin platform worldwide. Stay tuned for new Jetson Orin NX and Orin Nano partner products launching to meet customer needs in AI solution development.

NVIDIA DLI training powered by Azure

We’re proud to host NVIDIA deep learning institute (DLI) training at NVIDIA GTC again this year, with instructor-led workshops around accelerated computing, accelerated data science, and deep learning. Hosted on Microsoft Azure, these sessions enable and empower you to leverage NVIDIA GPUs on the Microsoft Azure platform to solve the world’s most interesting and relevant problems. Register for a DLI workshop today.

Join us at NVIDIA GTC

In collaboration with NVIDIA, Microsoft delivers purpose-built AI, machine learning, and HPC solutions in the cloud to meet even the most demanding real-world applications at scale. Join us at NVIDIA GTC September 19 to 22, to see how every enterprise can leverage the power of GPUs at the edge, on-premises, in the cloud, and for hybrid solutions.

Learn more

Posted on August 12, 2022 by — Leave a comment

Just say the magic word: Using language to program robots

LaTTe paper and video | Trajectory Transformer paper and video | Github code

Language is the most intuitive way for us to express how we feel and what we want. However, despite recent advancements in artificial intelligence, it is still very hard to control a robot using natural language instructions. Free-form commands such as “Robot, please go a little slower when you pass close to my TV” or “Stay far away from the swimming pool!” are hard to parse into actionable robot behaviors, and most human-robot interfaces today still rely on complex strategies such directly programming cost functions which define the desired behavior.

With our latest work, we attempt to change this reality through the introduction of “LaTTe: Language Trajectory Transformer”. LaTTe is a deep machine learning model that lets us send language commands to robots in an intuitive way with ease. When given an input sentence by the user, the model fuses it with camera images of objects that the robot observes in its surroundings, and outputs the desired robot behavior.

As an example, think of a user trying to control a robot barista that’s moving a wine bottle. Our method allows a non-technical user to control the robot’s behavior only using words, in a natural and simple interface. We will explain how we can achieve this in detail through this post.

Continue reading to learn more about this technology, or check out these additional resources:

We also invite the reader to watch the videos describing the papers:

Unlocking the potential of language for robotics

The field of robotics traditionally uses task-specific programming modules, which need to be re-designed by an expert even if there are minor changes in robot hardware, environment, or operational objectives. This inflexible approach is ripe for innovation with the latest advances in machine learning, which emphasizes reusable modules that generalize well over large domains.

Given the intuitive and effective nature of language for general communication, it would be simpler if one could just tell the robot how they want it to behave as opposed to having to reprogram the entire stack every time a change is needed. While large language models such as BERT, GPT-3 and Megatron-Turing have radically improved the quality of machine-generated text and our ability to solve to natural language processing tasks, and models like CLIP extend our reach capabilities towards multi-modal domains with vision and language, we still see few examples of language being applied in robotics.

The goal of our work is to leverage information contained in existing vision-language pre-trained models to fill the gap in existing tools for human-robot interaction. Even though natural language is the richest form of communication between humans, modeling human-robot interactions using language is challenging because we often require vast amounts of data to train models, or classically, force the user to operate within a rigid set of instructions. To tackle these challenges, our framework makes use of two key ideas: first, we employ large pre-trained language models to provide rich user intent representations, and second, we align geometrical trajectory data with natural language jointly with the use of a multi-modal attention mechanism.

We test our model on multiple robotic platforms, from manipulators to drones, and show that its functionality is agnostic of the robot form factor, dynamics, and motion controller. Our goal is to enable a factory worker to quickly reconfigure a robot arm trajectory further away from fragile objects; or allow a drone pilot to command the drone to slow down when close to buildings – all without requiring immense technical expertise.

Combining language and geometry into a single robotics model

Our overall goal is to provide a flexible interface for human-robot interaction within the context of trajectory reshaping that is agnostic to robotic platforms. We assume that the robot’s behavior is expressed through a 3D trajectory over time, and that the user provides a natural language command to reshape its behavior which relates to particular things in the scene, such as the objects in the robot workspace. Our trajectory generation system outputs a sequence of waypoints in XYZ and velocities, which are calculated fusing scene geometry, scene images, and the user’s language input. The diagram below shows an overview of the system:

LaTTe is composed of several building blocks, which can be categorized into the feature extractors, geometric encoder, and a final trajectory decoder. We use a pre-trained language model encoder, BERT, to produce semantic features from the user’s input. The use of a large language model creates more flexibility in the natural language input, allowing the use of synonyms and less training data, given that the encoder has already been trained with a massive text corpus. In addition, we use the pre-trained text encoder from the vision-language model CLIP to extract latent embeddings from both the user’s text and the pictures of each object in the scene. We then compute a similarity vector between the embeddings, and use this information to identify target objects the user is referring to through their language command.

As for the geometric information, we employ a Transformer encoder network to extract features related to the original robot’s trajectory as well as the 3D position of each one of the objects in the scene. In a practical scenario we can use off-the-shelf object detectors to obtain the position and pictures of each significant object.

Finally, all the geometrical, language and visual information is fused together into a Transformer decoder block. Similarly to what happens in a machine translation problem (for example, translating a sentence from English to German), the information from the transformer encoder network is used by the transformer decoder to generate one waypoint of the output trajectory at a time in a loop. The training process uses a range of procedurally generated synthetic data with multiple trajectory shapes and random object categories. We use multiple images for each object, which we obtain by web crawling through Bing Images.

What can we do with this model?

We conducted several experiments in simulated and real-life environments to test the effectiveness of LaTTe. We also tested different form factors (manipulators, drones, and a hexapod robot) in a multitude of scenarios to show the capability of LaTTe to adapt to various robot platforms.

Examples with manipulators:

Examples with aerial vehicles:

Examples with a hexapod robot:

Bringing robotics to a wider audience

We are excited to release these technologies with the aim of bringing robotics to the reach of a wider audience. Given the burgeoning applications of robots in several domains, it is imperative to design human-robot interfaces that are intuitive and easy to use. Our goal when designing such interfaces is to afford flexibility and precision of action, while ensuring that little to no technical training is required for novel users. Our Language Trajectory Transformer (LaTTe) framework takes a big step forward towards this direction.

This work is being undertaken by a multidisciplinary team at Microsoft Autonomous Systems Research together with the Munich Institute of Robotics and Machine Intelligence (MIRMI) at TU Munich. The researchers included in this project are: Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Sai Vemprala and Rogerio Bonatti.

Posted on April 18, 2022 by — Leave a comment

MLCube and Podman

MLCube is a new open source container based infrastructure specification introduced to enable reproducibility in Python based machine learning workflows. It can utilize tools such as Podman, Singularity and Docker. Execution on remote platforms is also supported. One of the chairs of the MLCommons Best Practices working group that is developing MLCube is Diane Feddema from Red Hat. This introductory article explains how to run the hello world MLCube example using Podman on Fedora Linux.

Yazan Monshed has written a very helpful introduction to Podman on Fedora which gives more details on some of the steps used here.

First install the necessary dependencies.

sudo dnf -y update
sudo dnf -y install podman git virtualenv \ policycoreutils-python-utils

Then, following the documentation, setup a virtual environment and get the example code. To ensure reproducibility, use a specific commit as the project is being actively improved.

virtualenv -p python3 ./env_mlcube source ./env_mlcube/bin/activate
git clone https://github.com/mlcommons/mlcube_examples.git cd ./mlcube_examples/hello_world
git checkout 5fe69bd
pip install mlcube mlcube-docker
mlcube describe

Now change the runner command from docker to podman by editing the file $HOME/mlcube.yaml so that the line

docker: docker

becomes

docker: podman

If you are on a computer with x86_64 architecture, you can get the container using

mlcube configure --mlcube=. --platform=docker

You will see a number of options

? Please select an image: ▸ registry.fedoraproject.org/mlcommons/hello_world:0.0.1 registry.access.redhat.com/mlcommons/hello_world:0.0.1 docker.io/mlcommons/hello_world:0.0.1 quay.io/mlcommons/hello_world:0.0.1

Choose docker.io/mlcommons/hello_world:0.0.1 to obtain the container.

If you are not on a computer with x86_64 architecture, you will need to build the container. Change the file $HOME/mlcube.yaml so that the line

build_strategy: pull

becomes

build_strategy: auto

and then build the container using

mlcube configure --mlcube=. --platform=docker

To run the tests, you may need to set SELinux permissions in the directories appropriately. You can check that SELinux is enabled by typing

sudo sestatus

which should give you output similar to

SELinux status: enabled
...

Josphat Mutai, Christopher Smart and Daniel Walsh explain that you need to be careful in setting appropriate SELinux policies for files used by containers. Here, you will allow the container to read and write to the workspace directory.

sudo semanage fcontext -a -t container_file_t "$PWD/workspace(/.*)?"
sudo restorecon -Rv $PWD/workspace

Now check the directory policy by checking that

ls -Z

gives output similar to

unconfined_u:object_r:user_home_t:s0 Dockerfile
unconfined_u:object_r:user_home_t:s0 README.md
unconfined_u:object_r:user_home_t:s0 mlcube.yaml
unconfined_u:object_r:user_home_t:s0 requirements.txt
unconfined_u:object_r:container_file_t:s0 workspace

Now run the example

mlcube run --mlcube=. --task=hello --platform=docker
mlcube run --mlcube=. --task=bye --platform=docker

Finally, check that the output

cat workspace/chats/chat_with_alice.txt

has text similar to

Hi, Alice! Nice to meet you.
Bye, Alice! It was great talking to you.

You can create your own MLCube as described here. Contributions to the MLCube examples repository are welcome. Udica is a new project that promises more fine grained SELinux policy controls for containers that are easy for system administrators to apply. Active development of these projects is ongoing. Testing and providing feedback on them would help make secure data management on systems with SELinux easier and more effective.

Posted on April 1, 2022 by — Leave a comment

New Jigsaw tool fixes bugs in machine-written software

A flowchart showing inputs pre-processed before being fed into large language models including GPT-3, Codex, and others. The post-process output is returned to the end-user for verification. If they find the output incorrect, it is edited by them, and the learning is fed back into the pre-process and post-process mechanisms to improve them further.

Large pre-trained language models such as GPT-3, Codex, and others can be tuned to generate code from natural language specifications of programmer intent. Such automated models have the potential to improve productivity for every programmer in the world. But since the models can struggle to understand program semantics, the quality of the resulting code can’t be guaranteed.

In our research paper, Jigsaw: Large Language Models meet Program Synthesis, which has been accepted at the International Conference on Software Engineering (ICSE 2022), we introduce a new tool that can improve the performance of these large language models. Jigsaw deploys post-processing techniques that understand the programs’ syntax and semantics and then leverages user feedback to improve future performance. Jigsaw is designed to synthesize code for Python Pandas API using multi-modal inputs.

Our experience suggests that as these large language models evolve for synthesizing code from intent, Jigsaw can play an important role in improving the accuracy of the systems.

The promise, and perils, of machine-written software

Large language models like OpenAI’s Codex are redefining the landscape of programming. A software developer, while solving a programming task, can provide a description in English for an intended code fragment and Codex can synthesize the intended code in languages like Python or JavaScript. However, the synthesized code might be incorrect and might even fail to compile or run. Codex users are responsible for vetting the code before using it. With Project Jigsaw, we aim to automate some of this vetting to boost the productivity of developers who are using large language models like Codex for code synthesis.

Suppose Codex provides a code fragment to a software developer. The developer might then undertake a basic vetting by checking whether the code compiles. If it doesn’t compile, then the developer might be able to use the error messages of the compiler to repair it. Once the code eventually does compile, a typical developer will test it on an input to check whether the code is producing the intended output or not. Again, the code might fail (raise an exception or produce incorrect output) and the developer would need to repair it further. We show that this process can be completely automated. Jigsaw takes as input an English description of the intended code, as well as an I/O example. In this way, it pairs an input with the associated output, and provides the quality assurance that the output Python code will compile and generate the intended output on the provided input.

In our ICSE 2022 paper, Jigsaw: Large Language Models meet Program Synthesis, we evaluate this approach on Python Pandas. Pandas is a widely used API in data science, with hundreds of functions for manipulating dataframes, or tables with rows and columns. Instead of asking a developer to memorize the usage of all these functions, an arguably better approach is to use Jigsaw. With Jigsaw, the user provides a description of the intended transformation in English, an input dataframe, and the corresponding output dataframe, and then lets Jigsaw synthesize the intended code. For example, suppose a developer wants to remove the prefix “Name: ” from the column “country” in the table below. Using Pandas, this can be solved performing the following operation:

df['c'] = df['c'].str.replace('Name: ', '')

Two tables with two columns labeled “country” and “val”, identifying three countries. In the first table, the rows are labeled: Name: India, Name: USA, and UK. In the second table, the word “Name” is removed, so the rows read: India, USA, UK. The “val” column remains the same in both the tables. — **Figure 1:** Input dataframe and output dataframe. Jigsaw removes the superfluous word “Name: ” from the column labelled “country”.

A developer who is new to Pandas will need to figure out the functions and their arguments to put together this code fragment or post the query and example to a forum like StackOverflow and wait for a good Samaritan to respond. In addition, they might have to tweak the response, at times considerably, based on the context. In contrast, it is much more convenient to provide the English query with an input-output table (or dataframe).

How Jigsaw works

Jigsaw takes the English query and pre-processes it with appropriate context to build an input that can be fed to a large language model. The model is treated as a black box and Jigsaw has been evaluated both with GPT-3 and Codex. The advantage of this design is that it enables plug-and-play with the latest and greatest available models. Once the model generates an output code, Jigsaw checks whether it satisfies the I/O example. If so, then Jigsaw is done! The model output is already correct. In our experiments, we found this happened about 30% of the time. If the code fails, then the repair process starts in a post-processing phase.

During post-processing, Jigsaw applies three kinds of transformations to repair the code. Each of these transformations is motivated by the failure modes that we have observed in GPT-3 and Codex. Surprisingly, both GPT-3 and Codex fail in similar ways and hence Jigsaw’s post-processing to address these failure modes is useful for both.

Variable transformations

We have observed that Codex can produce output that uses incorrect variable names. For example, most publicly available code uses names like df1, df2, etc. for dataframes. So, the Codex output also uses these names. Now, if the developer uses g1, g2, etc. as dataframe names, the Codex output is probably going to use df1, df2, etc. and fail. Other times Codex confuses variable names provided to it. For instance, it produces df2.merge(df1)instead of df1.merge(df2). To fix these kinds of errors, Jigsaw replaces names in Codex generated code with all possible names in the scope until it finds a program that satisfies the I/O example. We find this simple transformation to be quite useful in many cases.

Argument transformations

Sometimes Codex generated code calls the expected API functions but with some of the arguments incorrect. For example:

a.) Query – Drop all the rows that are duplicated in column ‘inputB’

dfout = dfin.drop_duplicates(subset=['inputB']) # Model
dfout = dfin.drop_duplicates(subset=['inputB'],keep=False) # Correct

b.) Query – Replace Canada with CAN in column country of df

df = df.replace({'Canada':'CAN'}) # Model
df = df.replace({'country':{'Canada':'CAN'}) # Correct

To fix such errors, Jigsaw systematically enumerates over all possible arguments, using the function and argument sequences generated by Codex as a starting point, until it finds a program that satisfies the I/O example.

AST-to-AST transformations

An AST (abstract-syntax-tree) is a representation of code in the form of a tree. Since models like Codex work at a syntactic level, they might produce code which is syntactically very close to the intended program, but some characters might be incorrect. For example:

a.) Query – Select rows of dfin where value in bar is 38 or >60

dfout = dfin[dfin['bar']38|dfin['bar']>60] # Model
dfout = dfin[(dfin['bar']38)|(dfin['bar']>60)] # Correct

Mistake – missing parentheses change precedence and cause exception

b.) Query – Count the number of duplicated rows in df

out = df.duplicated() # Model
out = df.duplicated().sum() # Correct

Mistake – missing required summation to get the count

To fix this failure mode, Jigsaw provides AST-to-AST transformations that are learned over time. The user would need to fix the code themselves — then the Jigsaw UI will capture the edit, generalize the edit to a more widely applicable transformation, and learn this transformation. With usage, the number of transformations increases, and Jigsaw becomes more and more effective.

Evaluation

We evaluated Codex and Jigsaw (with Codex) on various datasets and measured accuracy, which is the percentage of tasks in the dataset where the system produces the intended result. Codex gives an accuracy of about 30% out-of-the-box, which is what is expected from OpenAI’s paper as well. Jigsaw improves the accuracy to >60% and, through user feedback, the accuracy improves to >80%.

The road ahead

We have released the datasets that we used to evaluate Jigsaw in the public domain. Each dataset includes multiple tasks, where each task has an English query and an I/O example. Solving a task requires generating a Pandas code that maps the input dataframe provided to the corresponding output dataframe. We hope that this dataset will help evaluate and compare other systems. Although there are datasets where the tasks have only English queries or only I/O examples, the Jigsaw datasets are the first to contain both English queries and the associated I/O examples.

As these language models continue to evolve and become more powerful, we believe that Jigsaw will still be required for providing the guardrails and making these models viable in real-world scenarios. This is just addressing the tip of the iceberg for research problems in this area and many questions remain to be answered:

Can these language models be trained to learn semantics associated with code?
Can better preprocessing and postprocessing steps be integrated into Jigsaw? For example, we are looking at static analysis techniques to improve the post-processing.
Are I/O examples effective for other APIs apart from Python Pandas? How do we tackle scenarios where I/O examples are not available? How do we adapt Jigsaw for languages like JavaScript and general code in Python?
The developer overhead of providing an example over just providing a natural language query needs further evaluation and investigation.

These are some of the interesting directions we are pursuing. As we refine and improve Jigsaw, we believe it can play an important role in improving programmer productivity through automation. We continue to work on generalizing our experience with the Python Pandas API to work across other APIs and other languages.

Other contributors:

Naman Jain, Research fellow at Microsoft Research India Lab

Skanda Vaidyanath, Intern at Microsoft Research India Lab, currently pursuing master’s degree at Stanford

Posted on October 26, 2020 by — Leave a comment

Lobe app aims to make it easy for anyone to train machine learning models

Sean Cusack has been a backyard beekeeper for 10 years and a tinkerer for longer. That’s how he and an entomologist friend got talking about building an early warning system to alert hive owners to potentially catastrophic threats.

They envisioned installing a motion-sensor-activated camera at a beehive entrance and using machine learning to remotely identify when invaders like mites or wasps or potentially even the Asian giant hornet were getting in.

“A threat like that could kill your hive in a couple of hours, and it’d be game over,” Cusack said. “But had you known within 10 minutes of it happening and could get out there and get involved, you could potentially rescue whole colonies.”

It wasn’t until Cusack heard about Lobe, an app that aims to make machine learning easier for people to use and helps them train models without writing code, that he saw a manageable way to bring the project to reality.

“I’m pretty tech savvy, but when I’d tried to do some machine learning things in the past I found it to be pretty intimidating or overwhelming to put all the pieces of the puzzle together,” said Cusack, a Microsoft software engineer who normally works in enterprise web development. “Lobe immediately clicked for me.”

The free app, which Microsoft is making available today in public preview, helps people with no data science experience import images into Lobe and easily label them to create a machine learning dataset. Lobe automatically selects the right machine learning architecture and starts training without any setup or configuration. Users can evaluate the model’s strengths and weaknesses with real-time visual results, play with the model and offer feedback to boost performance.

Today, Lobe supports image classification but plans to expand to other model and data types in the future, Microsoft says.

Once training is done, the models can be easily exported to run on industry standard platforms and work in apps, websites or devices. That allows people to create end-to-end machine learning solutions at home or in the workplace, such as creating an alert when a resident raccoon gets their garbage or flagging when an employee in a dangerous situation isn’t wearing a helmet.

Toyon berries surrounded by a white box — To begin using Lobe, people import images of the things they want Lobe to recognize, like this Toyon berry shrub. The app automatically selects and begins training a machine learning model. Photo by Mike Matas, Microsoft.

Early customers include The Nature Conservancy, which is using the Lobe app as part of a larger project to map and protect Caribbean marine resources and pick out which vacation photos uploaded by tourists visiting those regions relate to whale and dolphin watching.

Other customers have used Lobe to build apps that can help identify harmful plants like poison oak on a hike, or that use a camera to send an alert when they accidentally leave the garage door open or when the street parking spot in front of their house opens up.

“Lobe is taking what is a sophisticated and complex piece of technology and making it actively fun,” said Bill Barnes, manager for Lobe, which Microsoft acquired and began incubating in 2018. “What we find is that it inspires people. It fills them with confidence that they can actually use machine learning. And when you have confidence you become more creative and start looking around and asking ‘What other stuff can I do with this?’”

Lobe, which is available for download on Windows or Mac computers, uses open-source machine learning architectures and transfer learning to train custom machine learning models on the user’s own machine. All the data is kept private, with no internet connection or logins required. Because training is automatic, people can start by simply importing images of the things they want Lobe to recognize.

In Cusack’s beehive project, which he proved out during the latest Microsoft Hackathon, he used a motion sensor camera that took pictures of honeybees as they flew into the hive, as well as invaders like wasps, earwigs and the giant Asian hornet. Because sightings of the hornet in the wild are still rare, Cusack printed out pictures, attached them to sticks and stuck them in the beehive to mimic an invasive threat.

Lobe used these images to create a machine learning model that can distinguish among the different insects and run on a small Raspberry Pi device at the entrance of the hive to alert owners to trouble.

Lobe fills a sweet spot for customers looking for a simple and quick way to get started with machine learning using their PCs or Macs without requiring any dependency on the cloud, Microsoft says. It complements Azure AI’s services for customers looking to leverage cloud computing capabilities.

“We really want to empower more people to leverage machine learning and try it for the first time,” said Jake Cohen, Lobe senior program manager. “We want them to be able to use it in ways that they either could not before or didn’t realize they could before.”

A screenshot of the Lobe app showing a grid of plant photos — Lobe simplifies the process of machine learning into three easy steps: collect and label images, train a model and understand its results, and play to improve it. Photos by Mike Matas, Microsoft.

The Nature Conservancy is using Lobe to support its Mapping Ocean Wealth project, which seeks to map how and where tourism, fishing and other activities are potentially affecting important ocean resources — with the goal of helping officials in five Caribbean nations make more informed conservation and economic decisions.

The nonprofit is using Lobe to flag vacation photos depicting whale or dolphin watching activities that visitors to those countries have uploaded to a popular travel website. The photos have been stripped of all personal information but retain geographic data, which can help give decision makers a rough idea of how popular those nature-based tourism activities are in different locations.

“There are a lot of good fishing maps, there are a lot of good shipping maps and maps that show where different habitats are. But it’s actually quite hard to capture spatial patterns of what tourists are doing and where and at what intensity,” said Kate Longley-Wood, ocean mapping coordinator for The Nature Conservancy. “So we’ve found that these crowdsourced datasets can be really helpful in filling those gaps.”

Before using Lobe, The Nature Conservancy had to contract with data science researchers and students to create a custom machine learning model that could identify tourists engaging with coral reefs. But Lobe has allowed the nonprofit to do that same work in house, using staff who have no programming or data science experience.

To train the model, Longley-Wood collected two sets of images and imported them into Lobe. The first were of “whale and dolphin watching” vacation photos of people who are clearly engaged in those activities. The second contain images that are “not whale or dolphin” — pictures of open water, other types of boats, people snorkeling.

One advantage of Lobe is that it’s very easy to see where the model is getting things wrong and quickly improve its accuracy, Longley-Wood said. If the model gets confused and incorrectly labels a picture of a person swimming next to a boat as a whale watching photo, you can correct it with the click of a button.

Another early customer, Chris Cachor, is a software engineer for Sincro, an Ansira company focused on automotive marketing. He helps local car dealerships get the best performance out of social media ads.

People are less likely to engage with ads featuring stock images of a car model for sale, as opposed to an authentic photo of the car as it appears on the lot, Cachor said. Yet scripts designed to flag generic car photos haven’t always been able to keep up with increasingly sophisticated computer-generated imagery, he said.

Cachor said he’d thought about using machine learning to automate that task, but the tools he had run across seemed too cumbersome and time consuming to learn. With Lobe, he was able to import and label examples of stock, computer-generated and authentic car images. Within minutes, he had his first version of a computer vision model to weed out photos that are less likely to perform well in ads.

“It was so cool to see results right away without it becoming a weekend-long academic project,” Cachor said. “It kind of took you from zero to 60 really quick.”

Top image: A backyard beekeeper used Lobe, a free app that helps people train custom machine learning models, to create a device that can distinguish between bees entering a hive and invader insects that threaten the colony. Video by Getty Images.

Related:

Jennifer Langston writes about Microsoft research and innovation. Follow her on Twitter.

Posted on October 13, 2020 by — Leave a comment

New nanodegree program offers chance to develop machine learning skills

Earlier this year, we empowered over 10,000 students from all over the world to learn the basics of machine learning over the course of four months. We are excited to announce the next stage of skilling with the availability of an advanced machine learning nanodegree on Udacity. Starting today, students can enroll for the Machine Learning Engineer for Microsoft Azure Nanodegree Program.

This new nanodegree program offers students the opportunity to develop deeper technical skills in machine learning (ML). Students will strengthen their skills by building and deploying sophisticated ML models using Azure Machine Learning. They will learn how to train ML models, manage ML pipelines, and tune hyperparameters to improve model performance. Once the model is ready, students will learn how to operationalize the model with the right MLOps practices, including automation, CI/CD, and monitoring.

Students will get hands-on exposure with built-in Azure labs that are designed to help students put theory into practice, all within Udacity’s classroom environment. To round it up, students will have the opportunity to show off their talents by completing a capstone project based on a real-life data science scenario. By the end of this program, students will also be well-prepared to earn the Azure Data Scientist Associate certification.

We also want to congratulate the top 300 students of the introductory ML course who are receiving a scholarship for the Nanodegree program. Here are five such scholars sharing their experiences from the introductory course:

“This is an opportunity to master ML in Azure, get coached by industry experts, and build a solid machine learning portfolio for career advancement. I believe that the scholarship opportunity will bring me a step closer to actualizing my dream,” Ijeoma Ndu said.

Like Ijeoma, many of these students are looking to this nanodegree program to either further their careers or make a career switch. Join our scholarship winners in taking the nanodegree program. Sign up today!

Explore Azure courses on Udacity

Posted on July 20, 2020 by — Leave a comment

Spam Classification with ML-Pack

Introduction

ML-Pack is a small footprint C++ machine learning library that can be easily integrated into other programs. It is an actively developed open source project and released under a BSD-3 license. Machine learning has gained popularity due to the large amount of electronic data that can be collected. Some other popular machine learning frameworks include TensorFlow, MxNet, PyTorch, Chainer and Paddle Paddle, however these are designed for more complex workflows than ML-Pack. On Fedora, ML-Pack is packaged by its lead developer Ryan Curtin. In addition to a command line interface, ML-Pack has bindings for Python and Julia. Here, we will focus on the command line interface since this may be useful for system administrators to integrate into their workflows.

Installation

You can install ML-Pack on the Fedora command line using

$ sudo dnf -y install mlpack mlpack-bin

You can also install the documentation, development headers and Python bindings by using …

$ sudo dnf -y install mlpack-doc \
mlpack-devel mlpack-python3

though they will not be used in this introduction.

Example

As an example, we will train a machine learning model to classify spam SMS messages. To keep this article brief, linux commands will not be fully explained, but you can find out more about them by using the man command, for example for the command first command used below, wget

$ man wget

will give you information that wget will download files from the web and options you can use for it.

Get a dataset

We will use an example spam dataset in Indonesian provided by Yudi Wibisono

 $ wget https://drive.google.com/file/d/1-stKadfTgJLtYsHWqXhGO3nTjKVFxm_Q/view

$ unzip dataset_sms_spam_bhs_indonesia_v1.zip

Pre-process dataset

We will try to classify a message as spam or ham by the number of occurrences of a word in a message. We first change the file line endings, remove line 243 which is missing a label and then remove the header from the dataset. Then, we split our data into two files, labels and messages. Since the labels are at the end of the message, the message is reversed and then the label removed and placed in one file. The message is then removed and placed in another file.

$ tr 'r' 'n' < dataset_sms_spam_v1.csv > dataset.txt
$ sed '243d' dataset.txt > dataset1.csv
$ sed '1d' dataset1.csv > dataset.csv
$ rev dataset.csv | cut -c1 | rev > labels.txt
$ rev dataset.csv | cut -c2- | rev > messages.txt
$ rm dataset.csv
$ rm dataset1.csv
$ rm dataset.txt

Machine learning works on numeric data, so we will use labels of 1 for ham and 0 for spam. The dataset contains three labels, 0, normal sms (ham), 1, fraud (spam), and 2 promotion (spam). We will label all spam as 1, so promotions and fraud will be labelled as 1.

$ tr '2' '1' < labels.txt > labels.csv
$ rm labels.txt

The next step is to convert all text in the messages to lower case and for simplicity remove punctuation and any symbols that are not spaces, line endings or in the range a-z (one would need expand this range of symbols for production use)

$ tr '[:upper:]' '[:lower:]' < \
messages.txt > messagesLower.txt
$ tr -Cd 'abcdefghijklmnopqrstuvwxyz n' < \ messagesLower.txt > messagesLetters.txt
$ rm messagesLower.txt

We now obtain a sorted list of unique words used (this step may take a few minutes, so use nice to give it a low priority while you continue with other tasks on your computer).

$ nice -20 xargs -n1 < messagesLetters.txt > temp.txt
$ sort temp.txt > temp2.txt
$ uniq temp2.txt > words.txt
$ rm temp.txt
$ rm temp2.txt

We then create a matrix, where for each message, the frequency of word occurrences is counted (more on this on Wikipedia, here and here). This requires a few lines of code, so the full script, which should be saved as ‘makematrix.sh’ is below

#!/bin/bash
declare -a words=()
declare -a letterstartind=()
declare -a letterstart=()
letter=" "
i=0
lettercount=0
while IFS= read -r line; do labels[$((i))]=$line let "i++"
done < labels.csv
i=0
while IFS= read -r line; do words[$((i))]=$line firstletter="$( echo $line | head -c 1 )" if [ "$firstletter" != "$letter" ] then letterstartind[$((lettercount))]=$((i)) letterstart[$((lettercount))]=$firstletter letter=$firstletter let "lettercount++" fi let "i++"
done < words.txt
letterstartind[$((lettercount))]=$((i))
echo "Created list of letters" touch wordfrequency.txt
rm wordfrequency.txt
touch wordfrequency.txt
messagecount=0
messagenum=0
messages="$( wc -l messages.txt )"
i=0
while IFS= read -r line; do let "messagenum++" declare -a wordcount=() declare -a wordarray=() read -r -a wordarray <<> wordfrequency.txt echo "Processed message ""$messagenum" let "i++"
done < messagesLetters.txt
# Create csv file
tr ' ' ',' data.csv

Since Bash is an interpreted language, this simple implementation can take upto 30 minutes to complete. If using the above Bash script on your primary workstation, run it as a task with low priority so that you can continue with other work while you wait:

$ nice -20 bash makematrix.sh

Once the script has finished running, split the data into testing (30%) and training (70%) sets:

$ mlpack_preprocess_split \ --input_file data.csv \ --input_labels_file labels.csv \ --training_file train.data.csv \ --training_labels_file train.labels.csv \ --test_file test.data.csv \ --test_labels_file test.labels.csv \ --test_ratio 0.3 \ --verbose

Train a model

Now train a Logistic regression model:

$ mlpack_logistic_regression \
--training_file train.data.csv \
--labels_file train.labels.csv --lambda 0.1 \
--output_model_file lr_model.bin

Test the model

Finally we test our model by producing predictions,

$ mlpack_logistic_regression \
--input_model_file lr_model.bin \ --test_file test.data.csv \
--output_file lr_predictions.csv

and comparing the predictions with the exact results,

$ export incorrect=$(diff -U 0 lr_predictions.csv \
test.labels.csv | grep '^@@' | wc -l)
$ export tests=$(wc -l < lr_predictions.csv)
$ echo "scale=2; 100 * ( 1 - $((incorrect)) \
/ $((tests)))" | bc

This gives approximately 90% validation rate, similar to that obtained here.

The dataset is composed of approximately 50% spam messages, so the validation rates are quite good without doing much parameter tuning. In typical cases, datasets are unbalanced with many more entries in some categories than in others. In these cases a good validation rate can be obtained by mispredicting the class with a few entries. Thus to better evaluate these models, one can compare the number of misclassifications of spam, and the number of misclassifications of ham. Of particular importance in applications is the number of false positive spam results as these are typically not transmitted. The script below produces a confusion matrix which gives a better indication of misclassification. Save it as ‘confusion.sh’

#!/bin/bash
declare -a labels
declare -a lr
i=0
while IFS= read -r line; do labels[i]=$line let "i++"
done < test.labels.csv
i=0
while IFS= read -r line; do lr[i]=$line let "i++"
done < lr_predictions.csv
TruePositiveLR=0
FalsePositiveLR=0
TrueZerpLR=0
FalseZeroLR=0
Positive=0
Zero=0
for i in "${!labels[@]}"; do if [ "${labels[$i]}" == "1" ] then let "Positive++" if [ "${lr[$i]}" == "1" ] then let "TruePositiveLR++" else let "FalseZeroLR++" fi fi if [ "${labels[$i]}" == "0" ] then let "Zero++" if [ "${lr[$i]}" == "0" ] then let "TrueZeroLR++" else let "FalsePositiveLR++" fi fi done
echo "Logistic Regression"
echo "Total spam" $Positive
echo "Total ham" $Zero
echo "Confusion matrix"
echo " Predicted class"
echo " Ham | Spam "
echo " ---------------"
echo " Actual| Ham | " $TrueZeroLR "|" $FalseZeroLR
echo " class | Spam | " $FalsePositiveLR " |" $TruePositiveLR
echo ""

then run the script

$ bash confusion.sh

You should get output similar to

Logistic Regression
Total spam 183
Total ham 159
Confusion matrix

		Predicted class
		Ham	Spam
Actual class	Ham	128	26
Actual class	Spam	31	157

which indicates a reasonable level of classification. Other methods you can try in ML-Pack for this problem include Naive Bayes, random forest, decision tree, AdaBoost and perceptron.

To improve the error rating, you can try other pre-processing methods on the initial data set. Neural networks can give upto 99.95% validation rates, see for example here, here and here. However, using these techniques with ML-Pack cannot be done on the command line interface at present and is best covered in another post.

For more on ML-Pack, please see the documentation.

Posted on April 16, 2020 by — Leave a comment

NBA announces new multiyear partnership with Microsoft to redefine and personalize the fan experience

Microsoft becomes an Official Technology Partner for the NBA; together the companies will create a direct-to-consumer platform that delivers new fan engagement experiences and enhanced streaming capabilities powered by Microsoft Azure and its AI capabilities.

NEW YORK — April 16, 2020 — The National Basketball Association (NBA) and Microsoft Corp. on Thursday announced a new multiyear alliance, which will transform the way fans experience the NBA. As part of this collaboration, Microsoft will become the Official Artificial Intelligence Partner and an Official Cloud and Laptop Partner for the NBA, Women’s National Basketball Association (WNBA), NBA G League, and USA Basketball beginning with the 2020-21 NBA season.

Microsoft and NBA Digital — co-managed by the NBA and Turner Sports — will create a new, innovative, direct-to-consumer platform on Microsoft Azure that will use machine learning and artificial intelligence to deliver next-generation, personalized game broadcasts and other content offerings as well as integrate the NBA’s various products and services from across its business. The platform will reimagine how fans engage with the NBA from their devices by customizing and localizing experiences for the NBA’s global fanbase, which includes the 1.8 billion social media followers across all league, team and player accounts.

Beyond delivering live and on-demand game broadcasts through Microsoft Azure, the NBA’s vast array of data sources and extensive historical video archive will be surfaced to fans through state-of-the-art machine learning, cognitive search and advanced data analytics solutions. This will create a more personalized fan experience that tailors the content to the preferences of the fan, rewards participation, and provides more insights and analysis than ever. Additionally, this platform will enable the NBA to uncover unique insights and add new dimensions to the game for fans, coaches and broadcasters. The companies will also explore additional ways technology can be used to enhance the NBA’s business and game operations.

As part of the partnership, Microsoft will become the entitlement partner of the NBA Draft Combine beginning next season and an associate partner of future marquee events, including NBA All-Star, MGM Resorts NBA Summer League and WNBA All-Star.

“We are thrilled to serve as the official AI partner of the NBA,” said Satya Nadella, CEO, Microsoft. “Together, we’ll bring fans closer to the game and players they love with new personalized experiences powered by Microsoft Azure.”

“This partnership with Microsoft will help us redefine the way our fans experience NBA basketball,” said Adam Silver, NBA commissioner. “Our goal, working with Microsoft, is to create customized content that allows fans — whether they are in an NBA arena or watching from anywhere around the world — to immerse themselves in all aspects of the game and engage directly with our teams and players.”

About the NBA

The NBA is a global sports and media business built around four professional sports leagues: the National Basketball Association, the Women’s National Basketball Association, the NBA G League and the NBA 2K League. The NBA has established a major international presence with games and programming in 215 countries and territories in 47 languages, and merchandise for sale in more than 100,000 stores in 100 countries on six continents. NBA rosters at the start of the 2019-20 season featured 108 international players from 38 countries and territories. NBA Digital’s assets include NBA TV, NBA.com, the NBA App and NBA League Pass. The NBA has created one of the largest social media communities in the world, with 1.8 billion likes and followers globally across all league, team, and player platforms. Through NBA Cares, the league addresses important social issues by working with internationally recognized youth-serving organizations that support education, youth and family development, and health-related causes.

About Microsoft

Microsoft (Nasdaq “MSFT” @microsoft) enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.

For more information, press only:

Microsoft Media Relations, WE Communications for Microsoft, (425) 638-7777, [email protected]

Chase Kressel, NBA, [email protected]

Note to editors: For more information, news and perspectives from Microsoft, please visit the Microsoft News Center at http://news.microsoft.com. Web links, telephone numbers and titles were correct at time of publication, but may have changed. For additional assistance, journalists and analysts may contact Microsoft’s Rapid Response Team or other appropriate contacts listed at https://news.microsoft.com/microsoft-public-relations-contacts.

Posted on November 5, 2019 by — Leave a comment

Microsoft and Nokia collaborate to accelerate digital transformation and Industry 4.0 for communications service providers and enterprises

Companies announce their first joint solutions combining Microsoft cloud, AI and machine learning expertise with Nokia’s leadership across mission-critical networking and communications

REDMOND, Wash., and ESPOO, Finland — Nov. 5, 2019 — Microsoft and Nokia today announced a strategic collaboration to accelerate transformation and innovation across industries with cloud, Artificial Intelligence (AI) and Internet of Things (IoT). By bringing together Microsoft cloud solutions and Nokia’s expertise in mission-critical networking, the companies are uniquely positioned to help enterprises and communications service providers (CSPs) transform their businesses. As Microsoft’s Azure, Azure IoT, Azure AI and Machine Learning solutions combine with Nokia’s LTE/5G-ready private wireless solutions, IP, SD-WAN, and IoT connectivity offerings, the companies will drive industrial digitalization and automation across enterprises, and enable CSPs to offer new services to enterprise customers.

BT is the first global communications service provider to offer its enterprise customers a managed service that integrates Microsoft Azure cloud and Nokia SD-WAN solutions. BT customers can access this through a customer automated delegated rights service, which enables BT to manage both the customer Azure vWAN and the unique Agile Connect SD-WAN, based on Nokia’s Nuage SD-WAN 2.0.

“Bringing together Microsoft’s expertise in intelligent cloud solutions and Nokia’s strength in building business and mission-critical networks will unlock new connectivity and automation scenarios,” said Jason Zander, executive vice president, Microsoft Azure. “We’re excited about the opportunities this will create for our joint customers across industries.”

“We are thrilled to unite Nokia’s mission-critical networks with Microsoft’s cloud solutions,” said Kathrin Buvac, President of Nokia Enterprise and Chief Strategy Officer. “Together, we will accelerate the digital transformation journey towards Industry 4.0, driving economic growth and productivity for both enterprises and service providers.”

The cloud and IoT have ushered in the fourth industrial revolution, or Industry 4.0, wherein enterprises are embracing data to automate and streamline processes across all aspects of their businesses. By joining forces, the two companies are bringing solutions to market that will simplify and accelerate this journey for enterprises, as well as enable CSPs to play a key role in helping their customers realize the potential of industrial digitalization and automation while also optimizing and better differentiating their own businesses.

Accelerating digital transformation for enterprises

Microsoft and Nokia are partnering to help accelerate digital transformation for enterprises by offering connectivity and Azure IoT solutions that unlock connected scenarios across multiple industries including digital factories, smart cities, warehouses, healthcare settings, and transportation hubs such as ports, airports and more.

The Nokia Digital Automation Cloud (Nokia DAC) 5G-ready industrial-grade private wireless broadband solution with on-premise Azure elements will enable a wide variety of secure industrial automation solutions that require more reliable connectivity, efficient coverage and better mobility than traditional Wi-Fi networks provide. For example, connected smart tools and machines on manufacturing floors that enable increased productivity, flexibility and safety for workers, or autonomous vehicles and robots in industrial environments that improve automation, efficiency and overall safety.

Enabling new enterprise services offered by service providers

Nokia’s Nuage SD-WAN 2.0 solution now enables service providers to offer integration with Microsoft Azure Virtual WAN for branch to cloud connectivity, with the companies planning to offer more options for branch internet connectivity in 2020. By automating branch and hybrid WAN connectivity, enterprises will have simplified, faster access to cloud applications such as Office 365, integrated security from branch-to-branch and branch-to-Azure and reduced risk of configuration errors causing security or connectivity issues.

Furthermore, the companies are integrating Nokia’s Worldwide IoT Network Grid (WING) with Azure IoT Central to make the onboarding, deployment, management and servicing of IoT solutions seamless. This integration provides CSPs with the opportunity to offer their enterprises a single platform including vertical solutions to enable secure connected IoT services, such as asset tracking and machine monitoring on a national or global scale. Enterprises will be able to use Azure IoT Central and partner solutions for faster and easier enablement and implementation of their IoT applications together with Nokia’s IoT connectivity solutions.

Driving digital transformation for CSPs

Microsoft and Nokia are collaborating to host Nokia’s Analytics, Virtualization and Automation (AVA) cognitive services solutions on Azure. These AI solutions will enable CSPs to move out of private data centers and into the Azure cloud to realize cost savings and transform operations for 5G. Predictive Video Analytics is an example of a joint solution that will ensure optimal video experiences for CSP subscribers, improving reliability by up to 60 percent.

About Microsoft

About Nokia

We create the technology to connect the world. We develop and deliver the industry’s only end-to-end portfolio of network equipment, software, services and licensing that is available globally. Our customers include communications service providers whose combined networks support 6.1 billion subscriptions, as well as enterprises in the private and public sector that use our network portfolio to increase productivity and enrich lives.

Through our research teams, including the world-renowned Nokia Bell Labs, we are leading the world to adopt end-to-end 5G networks that are faster, more secure and capable of revolutionizing lives, economies and societies. Nokia adheres to the highest ethical business standards as we create technology with social purpose, quality and integrity. www.nokia.com

For more information, press only:

Microsoft Media Relations, WE Communications for Microsoft, (425) 638-7777, [email protected]

Nokia Communications, +358 10 448 4900, [email protected]

Posted on August 22, 2019 by — Leave a comment

Podcast: How machines are learning to ace the reading comprehension exam

Episode 86, August 21, 2019

The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.

On today’s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik’s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world.

Transcript

T.J. Hazen: Most of the questions are fact-based questions like, who did something, or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. How can we make this technology work for real problems that our enterprise customers are bringing in?

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.

(music plays)

Host: T.J. Hazen, welcome to the podcast!

T.J. Hazen: Thanks for having me.

Host: Researchers like to situate their research, and I like to situate my researchers so let’s get you situated. You are a Senior Principal Research Manager in the Engineering and Applied Research group at Microsoft Research in Montreal. Tell us what you do there. What are the big questions you’re asking, what are the big problems you’re trying to solve, what gets you up in the morning?

T.J. Hazen: Well, I’ve spent my whole career working in speech and language understanding, and I think the primary goal of everything I do is to try to be able to answer questions. So, people have questions and we’d like the computer to be able to provide answers. So that’s sort of the high-level goal, how do we go about answering questions? Now, answers can come from many places.

Host: Right.

T.J. Hazen: A lot of the systems that you’re probably aware of like Siri for example, or Cortana or Bing or Google, any of them…

Host: Right.

T.J. Hazen: …the answers typically come from structured places, databases that contain information, and for years these models have been built in a very domain-specific way. If you want to know the weather, somebody built a system to tell you about the weather.

Host: Right.

T.J. Hazen: And somebody else might build a system to tell you about the age of your favorite celebrity and somebody else might have written a system to tell you about the sports scores, and each of them can be built to handle that very specific case. But that limits the range of questions you can ask because you have to curate all this data, you have to put it into structured form. And right now, what we’re worried about is, how can you answer questions more generally, about anything? And the internet is a wealth of information. The internet has got tons and tons of documents on every topic, you know, in addition to the obvious ones like Wikipedia. If you go into any enterprise domain, you’ve got manuals about how their operation works. You’ve got policy documents. You’ve got financial reports. And it’s not typical that all this information is going to be curated by somebody. It’s just sitting there in text. So how can we answer any question about anything that’s sitting in text? We don’t have a million or five million or ten million librarians doing this for us…

Host: Right.

T.J. Hazen: …uhm, but the information is there, and we need a way to get at it.

Host: Is that what you are working on?

T.J. Hazen: Yes, that’s exactly what we’re working on. I think one of the difficulties with today’s systems is, they seem really smart…

Host: Right?

T.J. Hazen: Sometimes. Sometimes they give you fantastically accurate answers. But then you can just ask a slightly different question and it can fall on its face.

Host: Right.

T.J. Hazen: That’s the real gap between what the models currently do, which is, you know, really good pattern matching some of the time, versus something that can actually understand what your question is and know when the answer that it’s giving you is correct.

Host: Let’s talk a bit about your group, which, out of Montreal, is Engineering and Applied Research. And that’s an interesting umbrella at Microsoft Research. You’re technically doing fundamental research, but your focus is a little different from some of your pure research peers. How would you differentiate what you do from others in your field?

T.J. Hazen: Well, I think there’s two aspects to this. The first is that the lab up in Montreal was created as an offshoot of an acquisition. Microsoft bought Maluuba, which was a startup that was doing really incredible deep learning research, but at the same time they were a startup and they needed to make money. So, they also had this very talented engineering team in place to be able to take the research that they were doing in deep learning and apply it to problems where it could go into products for customers.

Host: Right.

T.J. Hazen: When you think about that need that they had to actually build something, you could see why they had a strong engineering team.

Host: Yeah.

T.J. Hazen: Now, when I joined, I wasn’t with them when they were a startup, I actually joined them from Azure where I was working with outside customers in the Azure Data Science Solution team, and I observed lots of problems that our customers have. And when I saw this new team that we had acquired and we had turned into a research lab in Montreal, I said I really want to be involved because they have exactly the type of technology that can solve customer problems and they have this engineering team in place that can actually deliver on turning from a concept into something real.

Host: Right.

T.J. Hazen: So, I joined, and I had this agreement with my manager that we would focus on real problems. They were now part of the research environment at Microsoft, but I said that doesn’t restrict us on thinking about blue sky, far-afield research. We can go and talk to product teams and say what are the real problems that are hindering your products, you know, what are the difficulties you have in actually making something real? And we could focus our research to try to solve those difficult problems. And if we’re successful, then we have an immediate product that could be beneficial.

Host: Well in any case, you’re swimming someplace in a “we could do this immediately” but you have permission to take longer, or is there a mandate, as you live in this engineering and applied research group?

T.J. Hazen: I think there’s a mandate to solve hard problems. I think that’s the mandate of research. If it wasn’t a hard problem, then somebody…

Host: …would already have a product.

T.J. Hazen: …in the product team would already have a solution, right? So, we do want to tackle hard problems. But we also want to tackle real problems. That’s, at least, our focus of our team. And there’s plenty of people doing blue sky research and that’s an absolute need as well. You know, we can’t just be thinking one or two years ahead. Research should be also be thinking five, ten, fifteen years ahead.

Host: So, there’s a whole spectrum there.

T.J. Hazen: So, there’s a spectrum. But there is a real need, I think, to fill that gap between taking an idea that works well in a lab and turning it into something that works well in practice for a real problem. And that’s the key. And many of the problems that have been solved by Microsoft have not just been blue sky ideas, but they’ve come from this problem space where a real product says, ahh, we’re struggling with this. So, it could be anything. It can be, like, how does Bing efficiently rank documents over billions of documents? You don’t just solve that problem by thinking about it, you have to get dirty with the data, you have to understand what the real issues are. So, many of these research problems that we’re focusing on, and we’re focusing on, how do you answer questions out of documents when the questions could be arbitrary, and on any topic? And you’ve probably experienced this, if you are going into a search site for your company, that company typically doesn’t have the advantage of having a big Bing infrastructure behind it that’s collecting all this data and doing sophisticated machine learning. Sometimes it’s really hard to find an answer to your question. And, you know, the tricks that people use can be creative and inventive but oftentimes, trying to figure out what the right keywords are to get you to an answer is not the right thing.

Host: You work closely with engineers on the path from research to product. So how does your daily proximity to the people that reify your ideas as a researcher impact the way you view, and do, your work as a researcher?

T.J. Hazen: Well, I think when you’re working in this applied research and engineering space, as opposed to a pure research space, it really forces you to think about the practical implications of what you’re building. How easy is it going to be for somebody else to use this? Is it efficient? Is it going to run at scale? All of these problems are problems that engineers care a lot about. And sometimes researchers just say, let me solve the problem first and everything else is just engineering. If you say that to an engineer, they’ll be very frustrated because you don’t want to bring something to an engineer that works ten times slower than needs to be, uses ten times more memory. So, when you’re in close proximity to engineers, you’re thinking about these problems as you are developing your methods.

Host: Interesting, because those two things, I mean, you could come up with a great idea that would do it and you pay a performance penalty in spades, right?

T.J. Hazen: Yeah, yeah. So, sometimes it’s necessary. Sometimes you don’t know how to do it and you just say let me find a solution that works and then you spend ten years actually trying to figure out how to make it work in a real product.

Host: Right.

T.J. Hazen: And I’d rather not spend that time. I’d rather think about, you know, how can I solve something and have it be effective as soon as possible?

(music plays)

Host: Let’s talk about human language technologies. They’ve been referred to by some of your colleagues as “the crown jewel of AI.” Speech and language comprehension is still a really hard problem. Give us a lay of the land, both in the field in general and at Microsoft Research specifically. What’s hope and what’s hype, and what are the common misconceptions that run alongside the remarkable strides you actually are making?

T.J. Hazen: I think that word we mentioned already: understand. That’s really the key of it. Or comprehend is another way to say it. What we’ve developed doesn’t really understand, at least when we’re talking about general purpose AI. So, the deep learning mechanisms that people are working on right now that can learn really sophisticated things from examples. They do an incredible job of learning specific tasks, but they really don’t understand what they’re learning.

Host: Right.

T.J. Hazen: So, they can discover complex patterns that can associate things. So in the vision domain, you know, if you’re trying to identify objects, and then you go in and see what the deep learning algorithm has learned, it might have learned features that are like, uh, you know, if you’re trying to identify a dog, it learns features that would say, oh, this is part of a leg, or this is part of an ear, or this is part of the nose, or this is the tail. It doesn’t know what these things are, but it knows they all go together. And the combination of them will make a dog. And it doesn’t know what a dog is either. But the idea that you could just feed data in and you give it some labels, and it figures everything else out about how to associate that label with that, that’s really impressive learning, okay? But it’s not understanding. It’s just really sophisticated pattern-matching. And the same is true in language. We’ve gotten to the point where we can answer general-purpose questions and it can go and find the answer out of a piece of text, and it can do it really well in some cases, and like, some of the examples we’ll give it, we’ll give it “who” questions and it learns that “who” questions should contain proper names or names of organizations. And “when” questions should express concepts of time. It doesn’t know anything about what time is, but it’s figured out the patterns about, how can I relate a question like “when” to an answer that contains time expression? And that’s all done automatically. There’s no features that somebody sits down and says, oh, this is a month and a month means this, and this is a year, and a year means this. And a month is a part of a year. Expert AI systems of the past would do this. They would create ontologies and they would describe things about how things are related to each other and they would write rules. And within limited domains, they would work really, really well if you stayed within a nice, tightly constrained part of that domain. But as soon as you went out and asked something else, it would fall on its face. And so, we can’t really generalize that way efficiently. If we want computers to be able to learn arbitrarily, we can’t have a human behind the scene creating an ontology for everything. That’s the difference between understanding and crafting relationships and hierarchies versus learning from scratch. We’ve gotten to the point now where the algorithms can learn all these sophisticated things, but they really don’t understand the relationships the way that humans understand it.

Host: Go back to the, sort of, the lay of the land, and how I sharpened that by saying, what’s hope and what’s hype? Could you give us a “TBH” answer?

T.J. Hazen: Well, what’s hope is that we can actually find reasonable answers to an extremely wide range of questions. What’s hype is that the computer will actually understand, at some deep and meaningful level, what this answer actually means. I do think that we’re going to grow our understanding of algorithms and we’re going to figure out ways that we can build algorithms that could learn more about relationships and learn more about reasoning, learn more about common sense, but right now, they’re just not at that level of sophistication yet.

Host: All right. Well let’s do the podcast version of your NERD Lunch and Learn. Tell us what you are working on in machine reading comprehension, or MRC, and what contributions you are making to the field right now.

T.J. Hazen: You know, NERD is short for New England Research and Development Center…

Host: I did not!

T.J. Hazen: …which is where I physically work.

Host: Okay…

T.J. Hazen: Even though I work closely and am affiliated with the Montreal lab, I work out of the lab in Cambridge, Massachusetts, and NERD has a weekly Lunch and Learn where people present the work they’re doing, or the research that they’re working on, and at one of these Lunch and Learns, I gave this talk on machine reading comprehension. Machine reading comprehension, in its simplest version, is being able to take a question and then being able to find the answer anywhere in some collection of text. As we’ve already mentioned, it’s not really “comprehending” at this point, it’s more just very sophisticated pattern-matching. But it works really well in many circumstances. And even on tasks like the Stanford Question Answering Dataset, it’s a common competition that people have competed in, question answering, by computer, has achieved a human level of parity on that task.

Host: Mm-hmm.

T.J. Hazen: Okay. But that task itself is somewhat simple because most of the questions are fact-based questions like, who did something or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. Like, how can we make this technology work for real problems that our enterprise customers are bringing in? So, we have customers coming in saying, I want to be able to answer any question in our financial policies, or our auditing guidelines, or our operations manual. And people don’t ask “who” or “when” questions of their operations manual. They ask questions like, how do I do something? Or explain some process to me. And those answers are completely different. They tend to be longer and more complex and you don’t always, necessarily, find a short, simple answer that’s well situated in some context.

Host: Right.

T.J. Hazen: So, our focus at MSR Montreal is to take this machine reading comprehension technology and apply it into these new areas where our customers are really expressing that there’s a need.

Host: Well, let’s go a little deeper, technically, on what it takes to enable or teach machines to answer questions, and this is key, with limited data. That’s part of your equation, right?

T.J. Hazen: Right, right. So, when we go to a new task, uh, so if a company comes to us and says, oh, here’s our operations manual, they often have this expectation, because we’ve achieved human parity on some dataset, that we can answer any question out of that manual. But when we test the general-purpose models that have been trained on these other tasks on these manuals, they don’t generally work well. And these models have been trained on hundreds of thousands, if not millions, of examples, depending on what datasets you’ve been using. And it’s not reasonable to ask a company to collect that level of data in order to be able to answer questions about their operations manual. But we need something. We need some examples of what are the types of questions, because we have to understand what types of questions they ask, we need to understand the vocabulary. We’ll try to learn what we can from the manual itself. But without some examples, we don’t really understand how to answer questions in these new domains. But what we discovered through some of the techniques that are available, transfer learning is what we refer to as sort of our model adaptation, how do you learn from data in some new domain and take an existing model and make it adapt to that domain? We call that transfer learning. We can actually use transfer learning to do really well in a new domain without requiring a ton of data. So, our goal is to have it be examples like hundreds of examples, not tens of thousands of examples.

Host: How’s that working now?

T.J. Hazen: It works surprisingly well. I’m always amazed at how well these machine learning algorithms work with all the techniques that are available now. These models are very complex. When we’re talking about our question answering model, it has hundreds of millions of parameters and what you’re talking about is trying to adjust a model that is hundreds of millions of parameters with only hundreds of examples and, through a variety of different techniques where we can avoid what we call overfitting, we can allow the generalizations that are learned from all this other data to stay in place while still adapting it so it does well in this specific domain. So, yeah, I think we’re doing quite well. We’re still exploring, you know, what are the limits?

Host: Right.

T.J. Hazen: And we’re still trying to figure out how to make it work so that an outside company can easily create the dataset, put the dataset into a system, push a button. The engineering for that and the research for that is still ongoing, but I think we’re pretty close to being able to, you know, provide a solution for this type of problem.

Host: All right. Well I’m going to push in technically because to me, it seems like that would be super hard for a machine. We keep referring to these techniques… Do we have to sign an NDA, as listeners?

T.J. Hazen: No, no. I can explain stuff that’s out…

Host: Yeah, do!

T.J. Hazen: … in the public domain. So, there are two common underlying technical components that make this work. One is called word embeddings and the other is called attention. Word embeddings are a mechanism where it learns how to take words or phrases and express them in what we call vector space.

Host: Okay.

T.J. Hazen: So, it turns them into a collection of numbers. And it does this by figuring out what types of words are similar to each other based on the context that they appear in, and then placing them together in this vector space, so they’re nearby each other. So, we would learn, that let’s say, city names are all similar because they appear in similar contexts. And so, therefore, Boston and New York and Montreal, they should all be close together in this vector space.

Host: Right.

T.J. Hazen: And blue and red and yellow should be close together. And then advances were made to figure this out in context. So that was the next step, because some words have multiple meanings.

Host: Right.

T.J. Hazen: So, you know, if you have a word like apple, sometimes it refers to a fruit and it should be near orange and banana, but sometimes it refers to the company and it should be near Microsoft and Google. So, we’ve developed context dependent ones, so that says, based on the context, I’ll place this word into this vector space so it’s close to the types of things that it really represents in that context.

Host: Right.

T.J. Hazen: That’s the first part. And you can learn these word embeddings from massive amounts of data. So, we start off with a model that’s learned on far more data than we actually have question and answer data for. The second part is called attention and that’s how you associate things together. And it’s the attention mechanisms that learn things like a word like “who” has to attend to words like person names or company names. And a word like “when” has to attend to…

Host: Time.

T.J. Hazen: …time. And those associations are learned through this attention mechanism. And again, we can actually learn on a lot of associations between things just from looking at raw text without actually having it annotated.

Host: Mm-hmm.

T.J. Hazen: Once we’ve learned all that, we have a base, and that base tells us a lot about how language works. And then we just have to have it focus on the task, okay? So, depending on the task, we might have a small amount of data and we feed in examples in that small amount, but it takes advantage of all the stuff that it’s learned about language from all these, you know, rich data that’s out there on the web. And so that’s how it can learn these associations even if you don’t give it examples in your domain, but it’s learned a lot of these associations from all the raw data.

Host: Right.

T.J. Hazen: And so, that’s the base, right? You’ve got this base of all this raw data and then you train a task-specific thing, like a question answering system, but even then, what we find is that, if we train a question answering system on basic facts, it doesn’t always work well when you go to operation manuals or other things. So, then we have to have it adapt.

Host: Sure.

T.J. Hazen: But, like I said, that base is very helpful because it’s already learned a lot of characteristics of language just by observing massive amounts of text.

(music plays)

Host: I’d like you to predict the future. No pressure. What’s on the horizon for machine reading comprehension research? What are the big challenges that lie ahead? I mean, we’ve sort of laid the land out on what we’re doing now. What next?

T.J. Hazen: Yeah. Well certainly, more complex questions. What we’ve been talking about so far is still fairly simple in the sense that you have a question, and we try to find passages of text that answer that question. But sometimes a question actually requires that you get multiple pieces of evidence from multiple places and you somehow synthesize them together. So, a simple example we call the multi-hop example. If I ask a question like, you know, where was Barack Obama’s wife born? I have to figure out first, who is Barack Obama’s wife? And then I have to figure out where she was born. And those pieces of information might be in two different places.

Host: Right.

T.J. Hazen: So that’s what we call a multi-hop question. And then, sometimes, we have to do some operation on the data. So, you could say, you know like, what players, you know, from one Super Bowl team also played on another Super Bowl team? Well there, what you have to do is, you have to get the list of all the players from both teams and then you have to do an intersection between them to figure out which ones are the same on both. So that’s an operation on the data…

Host: Right.

T.J. Hazen: …and you can imagine that there’s lots of questions like that where the information is there, but it’s not enough to just show the person where the information is. You also would like to go a step further and actually do the computation for that. That’s a step that we haven’t done, like, how do you actually go from mapping text to text, and saying these two things are associated, to mapping text to some sequence of operations that will actually give you an exact answer. And, you know, it can be quite difficult. I can give you a very simple example. Like, just answering a question, yes or no, out of text, is not a solved problem. Let’s say I have a question where someone says, I’m going to fly to London next week. Am I allowed to fly business class according to my policies from my company, right? We can have a system that would be really good at finding the section of the policy that says, you know, if you are a VP-level or higher and you are flying overseas, you can fly business class, otherwise, no. Okay? But, you know, if we actually want the system to answer yes or no, we have to actually figure out all the details, like okay, who’s asking the question? Are they a VP? Where are they located? Oh, they’re in New York. What does flying overseas mean??

Host: Right. They’re are layers.

T.J. Hazen: Right. So that type of comprehension, you know, we’re not quite there yet for all types of questions. Usually these things have to be crafted by hand for specific domains. So, all of these things about how can you answer complex questions, and even simple things like common sense, like, things that we all know… Um. And so, my manager, Andrew McNamara, he was supposed to be here with us, one of his favorite examples is this concept of coffee being black. But if you spill coffee on your shirt, do you have a black stain on your shirt? No, you’ve got a brown stain on your shirt. And that’s just common knowledge. That is, you know, a common-sense thing that computers may not understand.

Host: You’re working on research, and ultimately products or product features, that make people think they can talk to their machines and that their machines can understand and talk back to them. So, is there anything you find disturbing about this? Anything that keeps you up at night? And if so, how are you dealing with it?

T.J. Hazen: Well, I’m certainly not worried about the fact that people can ask questions of the computer and the computer can give them answers. What I’m trying to get at is something that’s helpful and can help you solve tasks. In terms of the work that we do, yeah, there are actually issues that concern me. So, one of the big ones is, even if a computer can say, oh, I found a good answer for you, here’s the answer, it doesn’t know anything about whether that answer is true. If you go and ask your computer, was the Holocaust real? and it finds an article on the web that says no, the Holocaust was a hoax, do I want my computer to show that answer? No, I don’t. But…

Host: Or the moon landing…!

T.J. Hazen: …if all you are doing is teaching the computer about word associations, it might think that’s a perfectly reasonable answer without actually knowing that this is a horrible answer to be showing. So yeah, the moon landing, vaccinations… The easy way that people can defame people on the internet, you know, even if you ask a question that might seem like a fact-based question, you can get vast differences of opinion on this and you can get extremely biased and untrue answers. And how does a computer actually understand that some of these things are not things that we should represent as truth, right? Especially if your goal is to find a truthful answer to a question.

Host: All right. So, then what do we do about that? And by we, I mean you!

T.J. Hazen: Well, I have been working on this problem a little bit with the Bing team. And one of the things that we discovered is that if you can determine that a question is phrased in a derogatory way, that usually means the search results that you’re going to get back are probably going to be phrased in a derogatory way. So, even if we don’t understand the answer, we can just be very careful about what types of questions we actually want to answer.

Host: Well, what does the world look like if you are wildly successful?

T.J. Hazen: I want the systems that we build to just make life easier for people. If you have an information task, the world is successful if you get that piece of information and you don’t have to work too hard to get it. We call it task completion. If you have to struggle to find an answer, then we’re not successful. But if you can ask a question, and we can get you the answer, and you go, yeah, that’s the answer, that’s success to me. And we’ll be wildly successful if the types of things where that happens become more and more complex. You know, where if someone can start asking questions where you are synthesizing data and computing answers from multiple pieces of information, for me, that’s the wildly successful part. And we’re not there yet with what we’re going to deliver into product, but it’s on the research horizon. It will be incremental. It’s not going to happen all at once. But I can see it coming, and hopefully by the time I retire, I can see significant progress in that direction.

Host: Off script a little… will I be talking to my computer, my phone, a HoloLens? Who am I asking? Where am I asking? What device? Is that so “out there” as well?

T.J. Hazen: Uh, yeah, I don’t know how to think about where devices are going. You know, when I was a kid, I watched the original Star Trek, you know, and everything on there, it seemed like a wildly futuristic thing, you know? And then fifteen, twenty years later, everybody’s got their own little “communicator.”

Host: Oh my gosh.

T.J. Hazen: And so, uh, you know, the fact that we’re now beyond where Star Trek predicted we would be, you know, that itself, is impressive to me. So, I don’t want to speculate where the devices are going. But I do think that this ability to answer questions, it’s going to get better and better. We’re going to be more interconnected. We’re going to have more access to data. The range of things that computers will be able to answer is going to continue to expand. And I’m not quite sure exactly what it looks like in the future, to be honest, but, you know, I know it’s going to get better and easier to get information. I’m a little less worried about, you know, what the form factor is going to be. I’m more worried about how I’m going to actually answer questions reliably.

Host: Well it’s story time. Tell us a little bit about yourself, your life, your path to MSR. How did you get interested in computer science research and how did you land where you are now working from Microsoft Research in New England for Montreal?

T.J. Hazen: Right. Well, I’ve never been one to long-term plan for things. I’ve always gone from what I find interesting to the next thing I find interesting. I never had a really serious, long-term goal. I didn’t wake up some morning when I was seven and say, oh, I want to be a Principal Research Manager at Microsoft in my future! I didn’t even know what Microsoft was when I was seven. I went to college and I just knew I wanted to study computers. I didn’t know really what that meant at the time, it just seemed really cool.

Host: Yeah.

T.J. Hazen: I had an Apple II when I was a kid and I learned how to do some basic programming. And then I, you know, was going through my course work. I was, in my junior year, I was taking a course in audio signal processing and in the course of that class, we got into a discussion about speech recognition, which to me was, again, it was Star Trek. It was something I saw on TV. Of course, now it was Next Generation….!

Host: Right!

T.J. Hazen: But you know, you watch the next generation of Star Trek and they’re talking to the computer and the computer is giving them answers and here somebody is telling me you know there’s this guy over in the lab for computer science, Victor Zue, and he’s building systems that recognize speech and give answers to questions! And to me, that was science-fiction. So, I went over and asked the guy, you know, I heard you’re building a system, and can I do my bachelor’s thesis on this? And he gave me a demo of the system – it was called Voyager – and he asked a question, I don’t remember the exact question, but it was probably something like, show me a map of Harvard Square. And the system starts chugging along and it’s showing results on the screen as it’s going. And it literally took about two minutes for it to process the whole thing. It was long enough that he actually explained to me how the entire system worked while it was processing. But then it came back, and it popped up a map of Harvard Square on the screen. And I was like, ohhh my gosh, this is so cool, I have to do this! So, I did my bachelor’s thesis with him and then I stayed on for graduate school. And by seven years later, we had a system that was running in real time. We had a publicly available system in 1997 that you could call up on a toll-free number and you could ask for weather reports and weather information for anywhere in the United States. And so, the idea that it went from something that was “Star Trek” to something that I could pick up my phone, call a number and, you know, show my parents, this is what I’m working on, it was astonishing how fast that developed! I stayed on in that field with that research group. I was at MIT for another fifteen years after I graduated. At some point, a lot of the things that we were doing, they moved from the research lab to actually being real.

Host: Right.

T.J. Hazen: So, like twenty years after I went and asked to do my bachelor’s thesis, Siri comes out, okay? And so that was our goal. They were like, twenty years ago, we should be able to have a device where you can talk to it and it gives you answers and twenty years later there it was. So, that, for me, that was a queue that maybe it’s time to go where the action is, which was in companies that were building these things. Once you have a large company like Microsoft or Google throwing their resources behind these hard problems, then you can’t compete when you’re in academia for that space. You know, you have to move on to something harder and more far out. But I still really enjoyed it. So, I joined Microsoft to work on Cortana…

Host: Okay…

T.J. Hazen: …when we were building the first version of Cortana. And I spent a few years working on that. I’ve worked on some Bing products. I then spent some time in Azure trying to transfer these things so that companies that had the similar types of problems could solve their problems on Azure with our technology.

Host: And then we come full circle to…

T.J. Hazen: Then full circle, yeah. You know, once I realized that some of the stuff that customers were asking for wasn’t quite ready yet, I said, let me go back to research and see if I can improve that. It’s fantastic to see something through all the way to product, but once you’re successful and you have something in a product, it’s nice to then say, okay, what’s the next hard problem? And then start over and work on the next hard problem.

Host: Before we wrap up, tell us one interesting thing about yourself, maybe it’s a trait, a characteristic, a life event, a side quest, whatever… that people might not know, or be able to find on a basic web search, that’s influenced your career as a researcher?

T.J. Hazen: Okay. You know, when I was a kid, maybe about eleven years old, the Rubik’s Cube came out. And I got fascinated with it. And I wanted to learn how to solve it. And a kid down the street from my cousin had taught himself from a book how to solve it. And he taught me. His name was Jonathan Cheyer. And he was actually in the first national speed Rubik’s Cube solving competition. It was on this TV show, That’s Incredible. I don’t know if you remember that TV show.

Host: I do.

T.J. Hazen: It turned out what he did was, he had learned what is now known as the simple solution. And I learned it from him. And I didn’t realize it until many years later, but what I learned was an algorithm. I learned, you know, a sequence of steps to solve a problem. And once I got into computer science, I discovered all that problem-solving I was doing with the Rubik’s Cube and figuring out what are the steps to solve a problem, that’s essentially what things like machine learning are doing. What are the steps to figure out, what are the features of something, what are the steps I have to do to solve the problem? I didn’t realize that at the time, but the idea of being able to break down a hard problem like solving a Rubik’s Cube, and figuring out what are the stages to get you there, is interesting. Now, here’s the interesting fact. So, Jonathan Cheyer, his older brother is Adam Cheyer. Adam Cheyer is one of the co-founders of Siri.

Host: Oh my gosh. Are you kidding me?

T.J. Hazen: So, I met the kid when I was young, and we didn’t really stay in touch. I discovered, you know, many years later that Adam Cheyer was actually the older brother of this kid who taught me the Rubik’s Cube years and years earlier, and Jonathan ended up at Siri also. So, it’s an interesting coincidence that we ended up working in the same field after all those years from this Rubik’s Cube connection!

Host: You see, this is my favorite question now because I’m getting the broadest spectrum of little things that influenced and triggered something…!

Host: At the end of every podcast, I give my guests a chance for the proverbial last word. Here’s your chance to say anything you want to would-be researchers, both applied and other otherwise, who might be interested in working on machine reading comprehension for real-world applications.

T.J. Hazen: Well, I could say all the things that you would expect me to say, like you should learn about deep learning algorithms and you should possibly learn Python because that’s what everybody is using these days, but I think the single most important thing that I could tell anybody who wants to get into a field like this is that you need to explore it and you need to figure out how it works and do something in depth. Don’t just get some instruction set or some high-level overview on the internet, run it on your computer and then say, oh, I think I understand this. Like get into the nitty-gritty of it. Become an expert. And the other thing I could say is, of all the people I’ve met who are extremely successful, the thing that sets them apart isn’t so much, you know, what they learned, it’s the initiative that they took. So, if you see a problem, try to fix it. If you see a problem, try to find a solution for it. And I say this to people who work for me. If you really want to have an impact, don’t just do what I tell you to do, but explore, think outside the box. Try different things. OK? I’m not going to have the answer to everything, so therefore, if I don’t have the answer to everything, then if you’re only doing what I’m telling you to do, then we both, together, aren’t going to have the answer. But if you explore things on your own and take the initiative and try to figure out something, that’s the best way to really be successful.

Host: T.J. Hazen, thanks for coming in today, all the way from the east coast to talk to us. It’s been delightful.

T.J. Hazen: Thank you. It’s been a pleasure.

(music plays)

To learn more about Dr. T.J. Hazen and how researchers and engineers are teaching machines to answer complicated questions, visit Microsoft.com/research