Posted on Leave a comment

Microsoft Research 2019 reflection—a year of progress on technology’s toughest challenges

collage of images from 2019Research is about achieving long-term goals, often through incremental progress. As the year comes to an end, it’s a good time to step back and reflect on the work that researchers at Microsoft and their collaborators have done to advance the state of the art in computing, particularly by increasing the capabilities and reach of AI and delivering technology experiences that are more inclusive, secure, and accessible. This covers only a sliver of all the amazing work Microsoft Research has accomplished this year, and we encourage you to discover more of the hundreds of projects undertaken in 2019 by exploring our blog further.

Improving the reach and accessibility of AI and machine learning

Machine learning has made a tremendous impact on people’s everyday lives, especially in the latter half of this decade, while also raising significant policy and societal issues for research to address. This year, Microsoft researchers and their collaborators worked to improve the capabilities of machine learning systems and also explored new models that can take the discipline further. They used unique approaches that can make these systems more accessible and inclusive.

In deep learning, Jianfeng Gao’s team released MT-DNN, a model for learning universal language embeddings that combines the strengths of multi-task learning and the language model pre-training of BERT, helping systems quickly develop the semantic understanding necessary for natural language processing. And Xu Tan and his collaborators at Microsoft Research Asia developed MASS, a pre-training method that outperforms existing models at sequence-to-sequence language generation.

In the coming years, breakthroughs in machine learning will emerge by exploring new models beyond the current foundation of using Markov decision processes, particularly as reinforcement learning—a data-hungry approach generally suited to simulation scenarios—becomes more applicable to real-world scenarios. In this podcast, John Langford and Rafah Hosn discuss these new directions in reinforcement learning and their applications to everyday computing, including the real-world RL now deployed in Personalizer, an Azure Cognitive Service. Langford and Alekh Agarwal also hosted a webinar on the foundations of real-world reinforcement learning.

Many machine learning applications benefit from training with very large datasets; however, many potential uses simply do not have enough data for typical approaches to be effective. Enter machine teaching, where domain experts can build bespoke AI models with little data—and no machine learning expertise. In this podcast, Riham Mansour discusses (among other things) LUIS, one of the first Microsoft products to deploy machine teaching concepts in real-world scenarios.

group photo at NeurIPS conference

Researchers from Microsoft labs in Redmond, Montreal, New England, Cambridge (UK), India, and Asia came together for NeurIPS 2019. This year, over 300 Microsoft researchers attended the conference and participated in various events.

Another project aimed at further democratizing AI is the Decentralized & Collaborative AI on Blockchain framework with Justin Harris, which enables users to train and maintain models and datasets on the Ethereum network. At NeurIPS 2019, Debadeepta Dey and collaborators presented Project Petridish, an efficient forward neural architecture search algorithm that helps identify suitable neural architectures for a given machine learning task. And Adith Swaminathan and Emre Kiciman’s February blog post explores researchers’ work to improve causal inference modeling, which helps AI better understand “what if” scenarios in a wide variety of contexts.

Enabling responsible, inclusive, human-centered innovation

2019 kicked off with the inaugural ACM FAT* Conference in Atlanta, which focused on fairness, accountability, and transparency in socio-technical systems. Microsoft Research presented four papers at the conference. They covered gender bias in occupation classification, the role of data-driven decision making in reinforcing or amplifying injustices, strategic manipulation of algorithmic decision systems, and the fair allocation of items in scenarios without money, respectively. This work came from the FATE research group at Microsoft, which studies the complex social implications of AI, machine learning, data science, large-scale experimentation, and automation.

At May’s CHI Conference on Human Factors in Computing Systems, Saleema Amershi and her collaborators presented a set of guidelines for human-AI interaction design that brings together more than 20 years of research, recommendations, and best practices around effective interaction with AI-infused systems. Bringing this work together will help designers manage user expectations, moderate the level of autonomy, resolve ambiguity, and provide users with awareness of how systems learn from their behavior.

To ensure that machine learning systems effectively do the jobs we deploy them to do, we must develop a deeper understanding of how they succeed and fail. This paper, from researchers at Microsoft Ram Shankar Siva Kumar and Jeffrey Snover and their collaborators at Harvard, articulates the various ways machine learning systems can fail—either through intentional adversarial attacks or unintentional failures in which the output is formally correct, but unwanted.

Helping to train autonomous systems that can be trusted in real-world applications, the open-source simulator AirSim provides realistic and detailed testing environments. This year, it played host to the NeurIPS competition Game of Drones. In the drone race challenge, participants competed against a Microsoft Research opponent on the same track, working with a level of strategy and maneuvering not generally offered by contests of its kind. Microsoft researchers and collaborators who organized the competition plan to keep it open and add new racing environments. Visit the GitHub repository for more information.

In January, Jenn Wortman Vaughan and Hanna Wallach hosted a webinar on Fairness in Machine Learning, demonstrating how to make detecting and mitigating biases a first-order priority in the development and deployment of machine learning systems.

Creating human-computer interaction that works for all

This year, Microsoft researchers continued their work to make computing more natural, comfortable, and accessible for everyone. At the ACM CHI Conference on Human Factors in Computing Systems, researchers presented a number of papers and demos exploring how to support accessibility for users with cognitive or sensory disabilities. These include studies on whether web browsers’ “reading mode” is truly helpful for people with dyslexia and tools to help VR be more accessible for people with low vision (including tunnel vision, brightness sensitivity, and low visual acuity).

Also presented at CHI was Microsoft Soundscape, a project that uses 3D audio cues to enhance situational awareness and assist with navigation. (You can try the app yourself here.) In this op-ed in the Toronto Sun, Microsoft researcher Bill Buxton elaborates on the importance of work like this, noting that 1 billion people worldwide have some form of disability, making it imperative that we create technologies that support personal autonomy.

Speaking of sound, Nikunj Raghuvanshi’s podcast explores the physics of audio and discusses Project Triton, an acoustic system that models how sound waves behave so that the audio in 3D game environments can be as rich and immersive as the graphics. Project Triton is available for any game via the Unity and Unreal game engine plugins, as part of Project Acoustics, powered by Azure.

At the ACM Symposium on User Interface Software and Technology, Microsoft researchers presented a number of projects that make virtual environments more realistic, tactile, and navigable. Dreamwalker is a VR project that can augment a real-world walking experience with virtual reality—a virtual environment that detects the user’s surroundings in real time and generates a virtual world that accounts for their path and any obstacles—so that you can walk to work in Seattle, but through a virtual Manhattan. Mise-Unseen is a project that uses gaze detection to modify or replace elements of a virtual world while the user’s attention is directed elsewhere. And CapstanCrunch is a VR controller that leverages centuries-old technology, once used to control ropes on sailing ships, to provide effective and inexpensive haptic feedback.

Architectural designer Jenny Sabin installing ADA at the Microsoft campus.

Meanwhile in the physical world, Microsoft researchers partnered in May with students at the Brooklyn Public Library’s Fashion Academy to weave technology into their designs using Project Alava, which aims to develop microcontroller-based systems that are simple to build and code for people with a limited computer science background. At their end-of-program fashion show, students showed garments that incorporated LEDs, motion sensors, and motors. You can read about other areas where Microsoft researchers are working at the intersection of art and science here, including Ada, a first-of-its-kind architectural pavilion that incorporates AI, on display at Microsoft Research Redmond.

Breakthroughs in security, storage, systems, and applications

2019 saw continued progress in the development and adoption of homomorphic encryption, which enables computation on encrypted data, helping to preserve privacy. Microsoft SEAL has become one of the world’s most popular homomorphic encryption libraries, with broad adoption in both academia and industry. In February, Microsoft took the next step in democratizing homomorphic encryption by releasing SEAL for .NET. (The Microsoft SEAL library is open source and available on GitHub.)

In August 2019, Microsoft researchers joined their industry and academic peers for the Homomorphic Encryption Standards Meeting. The group will reconvene at Microsoft Research in Redmond for next year’s meeting this February. Take our webinar to learn more about homomorphic encryption, and listen to this October podcast with Craig Costello for an overview of the year’s developments in cryptography generally, including efforts to prepare for a post-quantum future.

In April, Project Everest took another step forward in its work to build a verified, secure HTTPS ecosystem with the release of EverCrypt, the first fully verified cryptographic provider to meet the security needs of the TLS protocol. Project Everest is a collaboration between Microsoft, Inria, and Carnegie Mellon University.

By 2023, it’s expected that more than 100 zettabytes of data will be stored in the cloud. To meet that need, Project Silica is developing the first-ever storage technology designed from the media up for use in the cloud. This year, the team collaborated with Warner Bros. on a proof of concept, storing the 1978 film Superman on a nearly indestructible piece of glass roughly the size of a drink coaster. This work is part of the Optics for the Cloud Research Alliance, which you can learn more about here or on the Microsoft Research Podcast. Meanwhile, researchers at Microsoft and the University of Washington achieved a “Hello, World!” moment in April for a different way to meet our growing storage needs: They demonstrated the first fully automated system to store and retrieve data in manufactured DNA. (For more on the intersection between computing and biology, listen to this podcast featuring Andrew Phillips, who leads the Biological Computation Group at Microsoft Research Cambridge.)

Cambridge researchers Andy Gordon and Simon Peyton Jones demonstrated the practical impact of fundamental research by exploring how ideas from programming language research could improve one of the world’s most common business applications: the spreadsheet. In this January blog post, they detail how their collaboration with the Microsoft Excel team led to product improvements such as cells that can contain first-class records linked to external data sources and formulas that can compute array values that “spill” into adjacent cells.

At the ACM International Conference on Web Search and Data Mining, Microsoft researchers presented new work in extreme classification, a research area that promises to dramatically improve the speed and quality of algorithms that can answer multiple-choice questions involving uncertainty, where there could be multiple correct answers. Among other things, this work can lead to more relevant recommendations and search results. In this blog post from February, Manik Varma of Microsoft Research India provides a deep dive into extreme classification.

Thanks to gains in computer vision, particularly object detection and classification, video analysis has become far more accurate; however, fast and affordable real-time video analysis is lagging. In December, Microsoft researchers Ganesh Ananthanarayanan and Yuanchao Shu hosted a webinar on Project Rocket, an extensible software stack that leverages the edge and cloud to meet the needs of video analytic applications.

In April, the Microsoft Research Podcast turned its attention to databases—particularly the need for imperative programming that allows for good software engineering practices like modularity, readability, and reusability. In this episode, Karthik Ramachandra discusses Froid, an extensible and language-agnostic framework for imperative functions in databases, which is available as “Scalar UDF Inlining” in Microsoft SQL Server 2019.

Open-source tools and data for the research community

Throughout the year, researchers from Microsoft made a number of projects open source for the benefit of the academic community, including the following:

    • SandDance, a data visualization tool included in Azure Data Studio, Visual Studio Code, and Power BI
    • TensorWatch, an AI debugging and visualization tool
    • PhoneticMatching, a component of Maluuba’s natural language understanding platform
    • SpaceFusion, a learning paradigm that brings together a palette of different deep learning models for conversational AI
    • Icecaps, a toolkit for conversation modeling
    • Icebreaker, a deep generative model that minimizes the amount and cost of data required to train a machine learning model

Building on last year’s announcement of Microsoft Research Open Data—an Azure-based repository for sharing datasets—the company developed a set of data use agreements, released them on GitHub, and adopted them for a number of public datasets. This work aims to make research data more readily available in the cloud and to encourage the reproducibility of research.

Supporting and honoring the research community

This year, Microsoft Research introduced the Ada Lovelace Fellowship to support diverse talent from underrepresented groups pursuing doctorates in computing-related fields. You can read about the fellows and their research here. Ten doctoral students were also awarded two-year fellowships as part of the PhD Fellowship program, supporting research in photonics, systems and networking, and AI. Additionally, Microsoft Research awarded Microsoft Research Faculty Fellowships to five early-career faculty members pursuing high-impact breakthrough research. You can read about their work here.

A number of researchers at Microsoft received awards and honors throughout 2019—check out the full list of recipients here.

Finally, we are saying goodbye to Harry Shum, who is leaving the company in February after 23 years, and hello to Microsoft CTO and EVP Kevin Scott, who has assumed Shum’s responsibilities as head of the Microsoft Artificial Intelligence and Research Group. Listen to Scott on the Microsoft Research Podcast here.

We hope you had a good year, and we look forward to a 2020 full of collaboration and exciting breakthroughs. Happy holidays.

To stay up to date on all things research at Microsoft, follow our blog and subscribe to our newsletter and the Microsoft Research Podcast. You can also follow us on Facebook, Twitter, YouTube, and Instagram.

Posted on Leave a comment

Game of Drones competition aims to advance autonomous systems

Image from Game of Drones simulation

Drone racing has transformed from a niche activity sparked by enthusiastic hobbyists to an internationally televised sport. In parallel, computer vision and machine learning are making rapid progress, along with advances in agile trajectory planning, control, and state estimation for quadcopters. These advances enable increased autonomy and reliability for drones. More recently, the unmanned aerial vehicle (UAV) research community has begun to tackle the drone-racing problem. This has given rise to competitions, with the goal of beating human performance in drone racing.

At the thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019), the AirSim research team is working together with Stanford University and University of Zurich to further democratize drone-racing research by hosting a simulation-based competition, Game of Drones. We are hosting the competition on Microsoft AirSim, our Unreal Engine-based simulator for multirotors. The competition focuses on trajectory planning and control, computer vision, and opponent drone avoidance. This is achieved via three tiers:

  • Tier 1 Planning only: The participant’s drone races tête-à-tête with a Microsoft Research opponent racer. The goal is to go through all gates in the minimum possible time, without hitting the opponent drone. Ground truth for gate poses, the opponent drone pose, and the participant drone are provided. These are accessible via our application-programming interfaces (APIs). The opponent racer follows a minimum jerk trajectory, which goes through randomized waypoints selected in each gate’s cross section.
  • Tier 2 Perception only: This is a time trial format where the participants are provided with noisy gate poses. There’s no opponent drone. The next gate will not always be in view, but the noisy pose returned by our API will steer the drone roughly in the right direction, after which vision-based control would be necessary.
  • Tier 3 – Perception and Planning: This combines Tier 1 and 2. Given the ground truth state estimate for participant drone and noisy estimate for gates, the goal is to race against the opponent racer without colliding with it.

The animation on the left below shows the ground truth gate poses (Tier 1), while the animation on the right shows the noisy gate poses (Tier 2 and Tier 3). In each animation, the drone is tracking a minimum jerk trajectory using one of our competition APIs.

Image shows the ground truth gate poses

The following animation shows a segment of one of our racing tracks with two drones racing against each other. Here “drone_2” (pink spline) is the opponent racer going through randomized waypoints in each gate cross section, while “drone_1” (yellow spline) is a representative competitor going through the gate centers.

This animation shows a segment of one of our racing tracks with two drones racing against each other

The competition is being run in two stages—an initial qualification round and a final round. A set of training binaries with configurable racetracks was made available to the participants initially, for prototyping and verification of algorithms on arbitrary racetracks. In the qualification stage (Oct 15th to Nov 21st), teams were asked to submit their entries for a subset or all of the three competition tiers.  117 teams registered for the competition worldwide, with 16 unique entries that have shown up on the qualification leaderboard.

We are now running the final round of the competition and the corresponding leaderboard is available here. All of the information for the competition is available at our GitHub repository, along with the training, qualification, and final race environments.

Engineering-wise, we introduced some new APIs in AirSim specifically for the competition, and we’re continually adding more features as we get feedback. We highlight the main components below:

In the long term, we intend to keep the competition open, and we will be adding more racing environments after NeurIPS 2019. While the first iteration brought an array of new features to AirSim, there are still many essential ingredients for trustable autonomy in real-world scenarios and effective simulation-to-reality transfer of learned policies. These include reliable state estimation; camera sensor models and motion blur; robustness to environmental conditions like weather, brightness, and diversity in texture and shape of the drone racing gates; and robustness against dynamics of the quadcopter. Over the next iterations, we aim to extend the competition to focus on these components of autonomy as well.

For more of the exciting work Microsoft is doing with AirSim, see our blog post on Ignite 2019.

Acknowledgements: This work would not have been possible without the substantial team effort behind the scenes by all members of the organizing team—Ratnesh Madaan, Nicholas Gyde, Keiko Nagami, Matthew Brown, Sai Vemprala, Tim Taubner, Eric Cristofalo, Paul Stubbs, Jim Piavis, Guada Casuso, Mac Schwager, Davide Scaramuzza, and Ashish Kapoor.

Posted on Leave a comment

Microsoft Research Open Data Project: Evolving our standards for data access and reproducible research

Datasets compilation for Open Data

Last summer we announced Microsoft Research Open Data—an Azure-based repository-as-a-service for sharing datasets—to encourage the reproducibility of research and make research data assets readily available in the cloud. Among other things, the project started a conversation between the community and Microsoft’s legal team about dataset licensing. Inspired by these conversations, our legal team developed a set of brand new data use agreements and released them for public comment on Github earlier this year.

Today we’re excited to announce that Microsoft Research Open Data will be adopting these data use agreements for several datasets that we offer.

Diving a bit deeper on the new data use agreements

The Open Use of Data Agreement (O-UDA) is intended for use by an individual or organization that is able to distribute data for unrestricted uses, and for which there is no privacy or confidentiality concern. It is not appropriate for datasets that include any data that might include materials subject to privacy laws (such as the GDPR or HIPAA) or other unlicensed third-party materials. The O-UDA meets the open definition: it does not impose any restriction with respect to the use or modification of data other than ensuring that attribution and limitation of liability information is passed downstream. In the research context, this implies that users of the data need to cite the corresponding publication with which the data is associated. This aids in findability and reusability of data, an important tenet in the FAIR guiding principles for scientific data management and stewardship.

We also recognize that in certain cases, datasets useful for AI and research analysis may not be able to be fully “open” under the O-UDA. For example, they may contain third-party copyrighted materials, such as text snippets or images, from publicly available sources. The law permits their use for research, so following the principle that research data should be “as open as possible, as closed as necessary,” we developed the Computational Use of Data Agreement (C-UDA) to make data available for research while respecting other interests. We will prefer the O-UDA where possible, but we see the C-UDA as a useful tool for ensuring that researchers continue to have access to important and relevant datasets.

Datasets that reflect the goals of our project

The following examples reference datasets that have adopted the Open Use of Data Agreement (O-UDA).

Location data for geo-privacy research

Microsoft researcher John Krumm and collaborators collected GPS data from 21 people who carried a GPS receiver in the Seattle area. Users who provided their data agreed to it being shared as long as certain geographic regions were deleted. This work covers key research on privacy preservation of GPS data as evidenced in the corresponding paper, “Exploring End User Preferences for Location Obfuscation, Location-Based Services, and the Value of Location,” which was accepted at the Twelfth ACM International Conference on Ubiquitous Computing (UbiComp 2010). The paper has been cited 147 times, including for research that builds upon this work to further the field of preservation of geo-privacy for location-based services providers.

Hand gestures data for computer vision

Another example dataset is that of labeled hand images and video clips collected by researchers Eyal Krupka, Kfir Karmon, and others. The research addresses an important computer vision and machine learning problem that deals with developing a hand-gesture-based interface language. The data was recorded using depth cameras and has labels that cover joints and fingertips. The two datasets included are FingersData, which contains 3,500 labeled depth frames of various hand poses, and GestureClips, which contains 140 gesture clips (100 of these contain labeled hand gestures and 40 contain non-gesture activity). The research associated with this dataset is available in the paper “Toward Realistic Hands Gesture Interface: Keeping it Simple for Developers and Machines,” which was published in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.

Question-Answer data for machine reading comprehension

Finally, the FigureQA dataset generated by researchers Samira Ebrahimi Kahou, Adam Atkinson, Adam Trischler, Yoshua Bengio and collaborators, introduces a visual reasoning task for research that is specific to graphical plots and figures. The dataset has 180,000 figures with 1.3 million question-answer pairs in the training set. More details about the dataset are available in the paper “FigureQA: An Annotated Figure Dataset for Visual Reasoning” and corresponding Microsoft Research Blog post. The dataset is pivotal to developing more powerful visual question answering and reasoning models, which potentially improve accuracy of AI systems that are involved in decision making based on charts and graphs.

The data agreements are a part of our larger goals

Microsoft Research Open Data project was conceived from the start to reflect Microsoft Research’s commitment to fostering open science and research and to achieve this without compromising the ethics of collecting and sharing data. Our goal is to make it easier for researchers to maintain provenance of data while having the ability to reference and build upon it.

The addition of the new data agreements to Microsoft Research Open Data’s feature set is an exciting step in furthering our mission.

Acknowledgements: This work would not have been possible without the substantial team effort by — Dave Green, Justin Colannino, Gretchen Deo, Sarah Kim, Emily McReynolds, Mario Madden, Emily Schlesinger, Elaine Peterson, Leila Stevenson, Dave Baskin, and Sergio Loscialo.

Posted on Leave a comment

Helping first responders achieve more with autonomous systems and AirSim

With inputs from: Elizabeth Bondi (Harvard University), Bob DeBortoli (Oregon State University), Balinder Malhi (Microsoft) and Jim Piavis (Microsoft)

Autonomous systems have the potential to improve safety for people in dangerous jobs, particularly first responders. However, deploying these systems is a difficult task that requires extensive research and testing.

In April, we explored complexities and challenges present in the development of autonomous systems and how technologies such as AirSim provide a pragmatic way to solve these tasks. Microsoft believes that the key to building robust and safe autonomous systems is providing a system with a wide range of training experiences to properly expose it to many scenarios before it can be deployed in the real world. This ensures training is done in a meaningful way—similar to how a student might be trained to tackle complex tasks through a curriculum curated by a teacher.

With autonomous systems, first responders gain sight into the unknown

One way Microsoft trains autonomous systems is through participating in unique research opportunities focused on solving real-world challenges, like aiding first responders in hazardous scenarios. Recently, our collaborators at Carnegie Mellon University and Oregon State University, collectively named Team Explorer, demonstrated technological breakthroughs in this area during their first-place win at the first round of the DARPA Subterranean (SubT) Challenge.

Snapshots from the AirSim simulation showing the effects of different conditions such as water vapor, dust and heavy smoke. Such variations in conditions can provide useful data when building robust autonomous systems.

Snapshots from the AirSim simulation showing the effects of different conditions such as water vapor, dust and heavy smoke. Such variations in conditions can provide useful data when building robust autonomous systems.

The DARPA SubT Challenge aspires to further the technologies that would augment difficult operations underground. Specifically, the challenge focuses on the methods to map, navigate, and search complex underground environments. These underground environments include human-made tunnel systems, urban underground, and natural cave networks. Imagine constrained environments that are several kilometers long and structured in unique ways with regular or irregular geological topologies and patterns. Weather or other hazardous conditions, due to poor ventilation or poisonous gasses, often make first responders’ work even more dangerous.

Team Explorer engaged in autonomous search and detection of several artifacts within a man-made system of tunnels. The end-to-end solution that the team created required many different complex components to work across the challenging circuit including mobility, mapping, navigation, and detection.

Microsoft’s Autonomous Systems team worked closely with Team Explorer to provide a high-definition simulation environment to help with the challenge. The team used AirSim to create an intricate maze of man-made tunnels in a virtual world that was representative of such real-world tunnels, both in complexity as well as size. The virtual world was a hybrid synthesis, where a team of artists used reference material from real-world mines to modularly generate a network of interconnected tunnels spanning two kilometers in length spread over a large area.

Additionally, the simulation included robotic vehicles—wheeled robots as well as unmanned aerial vehicles (UAVs)—and a suite of sensors that adorned the autonomous agents. AirSim provided a rich platform that Team Explorer could use to test their methods along with generate training experiences for creating various decision-making components for the autonomous agents.

At the center of the challenge was the ability for the robots to perceive the underground terrain and discover things (such as human survivors, backpacks, cellular phones, fire extinguishers, and power drills) while adjusting to different weather and lighting conditions. Multimodal perception is important in challenging environments, as well as AirSim’s ability to simulate a wide variety of sensors, and their fusion can provide a competitive edge. One of the most important sensors is a LIDAR, and in AirSim, the physical process of generating the point clouds are carefully reconstructed in software, so the sensor used on the robot in simulation uses the same configuration parameters (such as number-of-channels, range, points-per-second, rotations-per-second, horizontal/vertical FOVs, and more) as those found on the real vehicle.

It is challenging to train perception modules based on deep learning models to detect the target objects using LIDAR point clouds and RGB cameras. While curated datasets, such as ScanNet and MS COCO, exist for more canonical applications, none exist for underground exploration applications. Creating a real dataset for underground environments is expensive because a dedicated team is needed to first deploy the robot, gather the data, and then label the captured data. Microsoft’s ability to create near-realistic autonomy pipelines in AirSim means that we can rapidly generate labeled training data for a subterranean environment.

Detecting animal poaching through drone simulations

With autonomous systems, the issues with collection data are further exacerbated for applications that involve first line responders since the collection process is itself dangerous. Such challenges were present in our collaboration with Air Shepherd and USC to help counter wildlife poaching.

The central task in this collaboration was the development of UAVs equipped with thermal infrared cameras that can fly through national parks at night to search for poachers and animals. The project had several challenges, the largest of which was building such a system that requires data for both training as well as testing purposes. For example, labeling a real-world dataset, which was provided by Air Shepherd, took approximately 800 hours over the course of 6 months to complete. This produced 39,380 labeled frames and approximately 180,000 individual poacher and animal labels on those frames. This data was used to build a prototype detection system called SPOT but did not produce acceptable precision and recall values.

AirSim was then used to create a simulation, where virtual UAVs flew over virtual environments like those found in the Central African savanna at an altitude from 200 to 400 feet above ground level. The simulation took on the difficult task of detecting poachers and wildlife, both during the day and at night, and ultimately ended up increasing the precision in detection through imaging by 35.2%.

Driving innovation through simulation

Access to simulation environments means that we have a near-infinite data generation machine, where different simulation parameters can be chosen to generate experiences at will. This capability is foundational to test and debug autonomous systems that eventually would be provably robust and certified. We continue to investigate such fuzzing and falsification framework for various AI systems.

Holistic challenges such as the DARPA SubT Challenge, and partnerships with organizations like Air Shepherd allow researchers and developers to build complete solutions that cover a wide array of research topics. There are many research challenges at the intersection of robotics, simulations, and machine intelligence that we continue to invest in our journey to build toolchains, enabling researchers and developers to build safe and useful simulations and robots.

We invite readers to explore AirSim on our GitHub repository and invest in our journey to build toolchains in collaboration with the community. The AirSim network of man-made caves environment was co-created with Team Explorer for the DARPA SubT Challenge and is publicly available for researchers and developers.

Posted on Leave a comment

New Microsoft fellowship program empowers faculty research through Azure cloud computing

August 1, 2019 | By Jamie Harper, Vice-President, US Education

Microsoft is expanding its support for academic researchers through the new Microsoft Investigator Fellowship. This fellowship is designed to empower researchers of all disciplines who plan to make an impact with research and teaching using the Microsoft Azure cloud computing platform.

From predicting traffic jams to advancing the Internet of Things, Azure has continued to evolve with the times, and this fellowship aims to keep Azure at the forefront of new ideas in the cloud computing space. Similarly evolving, Microsoft fellowships have a long history of supporting researchers, seeking to promote diversity and promising academic research in the field of computing. This fellowship is an addition to this legacy that highlights the significance of Azure in education, both now and into the future.

Full-time faculty at degree-granting colleges or universities in the United States who hold PhDs are eligible to apply. This fellowship supports faculty who are currently conducting research, advising graduate students, teaching in a classroom, and plan to or currently use Microsoft Azure in research, teaching, or both.

Fellows will receive $100,000 annually for two years to support their research. Fellows will also be invited to attend multiple events during this time, where they will make connections with other faculty from leading universities and Microsoft. They will have the opportunity to participate in the greater academic community as well. Members of the cohort will also be offered various training and certification opportunities.

When reviewing the submissions, Microsoft will evaluate the proposed future research and teaching impact of Azure. This will include consideration of how the Microsoft Azure cloud computing platform will be leveraged in size, scope, or unique ways for research, teaching, or both.

Candidates should submit their proposals directly on the fellowship website by August 16, 2019. Recipients will be announced in September 2019.

We encourage you to submit your proposal! For more information on the Microsoft Investigator Fellowship, please check out the fellowship website.

Posted on Leave a comment

Podcast: The brave new world of cloud-scale systems and networking with Microsoft Research Asia’s Dr. Lidong Zhou

Dr. Lidong Zhou

Episode 82, June 26, 2019

If you’re like me, you’re no longer amazed by how all your technologies can work for you. Rather, you’ve begun to take for granted that they simply should work for you. Instantly. All together. All the time. The fact that you’re not amazed is a testimony to the work that people like Dr. Lidong Zhou, Assistant Managing Director of Microsoft Research Asia, do every day. He oversees some of the cutting-edge systems and networking research that goes on behind the scenes to make sure you’re not amazed when your technologies work together seamlessly but rather, can continue to take it for granted that they will!

Today, Dr. Zhou talks about systems and networking research in an era of unprecedented systems complexity and what happens when old assumptions don’t apply to new systems, explains how projects like CloudBrain are taking aim at real-time troubleshooting to address cloud-scale, network-related problems like “gray failure,” and tells us why he believes now is the most exciting time to be a systems and networking researcher.



Lidong Zhou: We have seen a lot of advances in, for example, machine learning and deep learning. So, one thing that we have been looking into is how we can leverage all those new technologies in machine learning and deep learning and apply it to deal with the complexity in systems.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: If you’re like me, you’re no longer amazed by how all your technologies can work for you. Rather, you’ve begun to take for granted that they simply should work for you. Instantly. All together. All the time. The fact that you’re not amazed is a testimony to the work that people like Dr. Lidong Zhou, Assistant Managing Director of Microsoft Research Asia, do every day. He oversees some of the cutting-edge systems and networking research that goes on behind the scenes to make sure you’re not amazed when your technologies work together seamlessly but rather, can continue to take it for granted that they will!

Today, Dr. Zhou talks about systems and networking research in an era of unprecedented systems complexity and what happens when old assumptions don’t apply to new systems, explains how projects like CloudBrain are taking aim at real-time troubleshooting to address cloud-scale, network-related problems like “gray failure,” and tells us why he believes now is the most exciting time to be a systems and networking researcher. That and much more on this episode of the Microsoft Research Podcast.

Host: Lidong Zhou, welcome to the podcast.

Lidong Zhou: Yes. It’s great to be here.

Host: As the Assistant Managing Director of MSR Asia, you are, among other things, responsible for overseeing research in systems and networking, and I know you’ve done a lot of research in systems and networking over the course of your career as well. So, in broad strokes, what do you do and why do you do it? What gets you up in the morning?

Lidong Zhou: Yeah, I think, you know, this is one of the most exciting times to do research in systems and networking. And we already have seen advances of, you know, systems and networking have been pushing the envelopes in many technologies. We’ve seen the internet, the web, web search, big data, and all the way to the artificial intelligence and cloud computing that, you know, everybody kind of relies on these days.

Host: Yeah.

Lidong Zhou: All those advances have created challenges of unprecedented complexity, scale and a lot of dynamism. So, my understanding, you know, of systems is always, you know, a system is about bringing order to chaos, right? The chaotic situation. So, we are actually in a very chaotic situation where things change so fast and there are a lot of, you know, new technologies coming. And so, when we talk about systems research, it’s really about transforming all those unorganized pieces into a unified whole, right? That’s why, you know, we’re very excited about all those challenges. And also, we realized over the years that it’s actually not just the typical systems expertise – when we talk about distributed systems, operating systems or networking – that’s actually not enough to address the challenges we’re facing. Like, you have to actually also master other fields like, you know, database systems and programming languages, compilers, hardware, and also in artificial intelligence and machine learning and deep learning. And what I do at Microsoft Research Asia, is to put together a team with a diverse set of expertise and inspire the team to take on those big challenges together by, you know, working together, and, you know, that’s a very exciting job to have.

Host: I love the “order out of chaos” representation… if you’ve ever been involved in software code writing, you write this here and someone else is writing that there, and it has to work together, and you’ve got ten other people writing… and we all just take for granted, on my end, it’s going to work. And if it doesn’t, I curse my computer!

Lidong Zhou: Yes, that’s our problem!

Host: Well, I had Hsiao-Wuen Hon on the podcast in November for the 20th anniversary of the lab there, and he talked about the mission to, in essence, both advance the theory and practice of computing, in general. Your own nearly twenty-year career has been about advancing the theory and practice of distributed systems, particularly. So, talk about some of the initiatives you’ve been part of and technical contributions you’ve made to distributed systems over the years. You’ve just come off the heels of talking about the complexities. Now, how have you seen it evolve over those years?

Lidong Zhou: You know, I think we are getting into the year of distributed systems. Being a distributed systems person, we always believe, you know, what we’re working on is the most important piece. You know, I think Microsoft Research is really a great place to connect theory and practice, because we are constantly exposed to very difficult technical challenges from the product teams. They’re tackling very difficult problems, and we also have the luxury of stepping back and thinking deeply about the problems we’re facing and thinking about what kinds of new theories we want to develop, what new methodologies we can develop to address those problems. I remember, you know, in early 2000, when Microsoft started doing web search, and we had a meeting with the dev manager, who was actually in charge of architecting the web search system. And so, we had a, you know, very interesting discussion. We talked about, you know, how we were doing research in distributed systems, how we had to deal with, you know, a lot of problems when services fail. So, we have to make sure that the whole service actually stays correct in the face of all kinds of problems that you can see in a distributed system. I remember at that time, we had Roy Levin, Leslie Lamport, you know, a lot of colleagues, and we talked about protocols. And, at the beginning, the dev manager basically said, oh yeah, I know, you know, it’s complicated to deal with all these failures, but it’s actually under control. And a couple months later, he came back and said, oh, you know, there’s so many corner cases. It’s just beyond our capability of reasoning about the correctness. And we need the protocols that we were talking about. But it’s also interesting that, you know, in developing those protocols, we tend to make some assumptions. Say, okay, you know, we can tolerate a certain number of failures. And one question that the general manager asked was, you know, what happens if we have more than that number of failures in the system, right? And from a practical point of view, you have to deal with those kinds of situations. In theory, when you work on theory, then, you know, you can say, okay, let’s make an assumption and let’s just work under that assumption. So, we see that there’s a difference between theory and practice. The nice thing about working at Microsoft Research is you can actually get exposed to those real problems and keep you honest about what assumptions are reasonable, what assumptions are not reasonable. And then you think about, you know, what is the best way of solving those problems in a more general sense rather than just solving a particular problem?

Host: Your work in networked computer systems is somewhat analogous to another passion of yours that I’m going to call “networked human systems.” In other words, your desire to build community among systems researchers. How are you going about that? I’m particularly interested in your Asia Pacific Systems workshop and the results you’ve seen come out of that.

Lidong Zhou: So, I moved to Microsoft Research Asia in late 2008, and, when I was in the United States, clearly there is a very strong systems community. And, over the years, we’ve also seen that community sort of expanding into Europe. So, the European systems community sort of started the systems workshop, and eventually it evolved into a conference called EuroSys, and very successfully. And you know we see a lot of people getting into systems and networking because of the community, because of the influence of those conferences. And the workshop has been very successful in gathering momentum in the region. And so, in 2010, I remember it was Chandu Thekkath and Rama Kotla who were my colleagues at Microsoft Research, and they basically had this idea that maybe we should start something also in the Asia Pacific region. At that time, I was already working in Beijing, and I thought, you know, this is also part of my obligation. So, in 2010, we started the first Asia Pacific systems workshop. And it was a humble beginning. We had probably about thirty submissions and accepted probably a dozen. It was a good workshop, but it was a very humble beginning, as I said. But what happened after that was really beyond our expectation. It’s like, you know, we just planted a seed, and the community sort of picked it up and grew with it. And, you know, it’s very satisfying to see that we’re actually going to have the tenth workshop in Hangzhou in August. If you look at the organizing committee, they are really you know all world-class researchers from all over the world. It’s not just from a particular region, but you know really, all the experts across the world contributed to the success of this workshop over the last, you know, almost ten years now. And the impact that this workshop has is actually pretty tremendous.

Host: What would you attribute it to?

Lidong Zhou: I think it’s really, first of all, this is the natural trend, right? You go from… the U.S. was leading in systems research and, and then expanded to Europe. And it’s just a natural trajectory to expand further to Asia Pacific given, you know, a lot of, you know, technological advances are happening in Asia. And the other, you know, reason is because the community really came together. There are a lot of top systems researchers that originally, just like me, came from the Asia Pacific region. So, we have a lot of incentives and commitment to give back.

Host: Right.

Lidong Zhou: And all those enthusiasms, passion, or the willingness to help young researchers in the region, I mean those actually contributed to the success of the workshop, in my view.

Host: Well, you were recently involved in hosting another interesting workshop, or conference: The Symposium on Operating Systems Principles, right?

Lidong Zhou: Right.

Host: SOSP?

Lidong Zhou: SOSP.

Host: And this was in Shanghai in 2017. It’s the premier conference for computer systems technology. And as I understand, it’s about as hard to win the bid for as the Olympics!

Lidong Zhou: Yes, almost.

Host: So why was it important to host this conference for you, and how do you think it will help broaden the reach of the systems community worldwide?

Lidong Zhou: So, SOSP is one of the most important systems conferences and traditionally, it has been held in the U.S. and later on, they started rotating into Europe. And it was really a very interesting journey that we went through, along with Professor Haibo Chen who is from Shanghai Jiao Tong University. We started pitching for having SOSP in the Asia Pacific region in 2011. That was like six years before we actually succeeded! We pitched three times. But overall, even for the first time, the community was very supportive in many ways, so that we’d be very careful to make sure that the first one is going to be a success. And in 2017, when Haibo and I opened the conference, I was actually very happy that I didn’t have to be there to make another pitch! I was essentially opening the conference. And it was very successful in the sense that we had a record number of attendees, over eight hundred people…

Host: Wow.

Lidong Zhou: …and we had almost the same number, if not a little bit more, from the U.S. and Europe. And we had, you know, many more people from the region, which was what we intended.

Host: Mm-hmm.

Lidong Zhou: And having the conference in the Asia Pacific is actually very significant to the region. We’re seeing more and more high-quality work and papers in those top conferences from the Asia Pacific region, you know, from Korea, India, China, and many other countries.

Host: Right.

Lidong Zhou: And I’d like to believe that what we have done sort of helped a little bit in those regards.

(music plays)

Host: Let’s talk about the broader topic of education for a minute. This is really, really important for the systems talent pipeline around the world. And perhaps the biggest challenge is expanding and improving university-level education for this talent pipeline. MSRA has been hosting a systems education workshop for the past three years. The fourth is coming up this summer, and none other than Turing Award winner John Hopcroft has praised it as “a step toward improving education and cultivating world-class talent.” And he also said a fifth of the world’s talent is in the Asia Pacific region, so we’d better get over there. Tell us about this ongoing workshop.

Lidong Zhou: Yeah, actually John really inspired us to get this started I think more than three years ago.

Host: Mm-hmm.

Lidong Zhou: And I think we’re seeing a need to improve, you know, systems education. But more importantly, I think, for MSR Asia, one of the things that we’re very proud of doing is connecting educators and researchers from all over the world, especially connecting people from, you know, the U.S. and Europe with those in the Asia Pacific region. And the other thing that we are also very proud of doing is cultivating the next generation of computer scientists. And certainly, as you said, you know, the most important thing is education. And during the process, what we found, is that there are a lot of professors who share the same passion. And we’re talking about, you know, a couple of professors, Lorenzo Alvisi from Cornell and Robbert van Renesse from Cornell and Geoff Voelker from UCSD… they actually came all the way from the U.S. just to be at the workshop, talking to all the systems professors from all over the country in China. And so, I attended those workshops myself. The first one was five days, and the next two were, like, three days. It’s a huge time commitment.

Host: Yeah.

Lidong Zhou: But you see all the passion from those professors. They’re really into improving teaching. They’re trying to figure out, you know, how to make students more engaged, how to get them excited about systems, even how to design experiments, all those aspects. And, you know, we’re really optimistic that with those passionate professors, we’re going to see a very strong new generation of systems researchers. And this is, you know, I think the kind of impact we really want to see from a perspective of, you know, Microsoft Research Asia. It’s not just about making the lab successful, but, if we can make an impact in the community in terms of talent, in terms of the quality of education, that’s much more satisfying.

Host: Before we get into specific work, I’d like you to talk about what you’d referred to as a fundamental shift in the way we need to design systems – and by we, I mean you – in the era of cloud computing and AI. You’ve suggested that things have changed enough that the older methodologies and principles aren’t valid anymore. So, unpack that for us. What’s changed and what needs to happen to build next-gen systems?

Lidong Zhou: Yeah, that’s a great question. I’ll continue with the story about building fault-tolerant systems. So, in the last thirty years, we have been working on systems reliability, and we have developed a lot of techniques, a lot of protocols, and we think it will solve all the problems. But if you look at how this thread of work started, it really started in the late seventies when we were looking at the reliability of airplanes, and so on. Of course, you know, there are assumptions we make about the kinds of failures in those kinds of systems. And we sort of generalize those protocols so that it can be applicable up until now. But if you look at the cloud, it’s much more complicated, in many dimensions. And the system also evolves very quickly. And a lot of assumptions we make actually start to break. And even though we have applied all these well-known techniques, that’s just not enough. So, that’s one aspect. The other aspect is, it used to be that, you know, the system we build, we can sort of understand how it works, right? And now, the complexity has already gone beyond our own understanding, you know. We can’t reason about how the system behaves. On the other hand, we have seen a lot of advances in, for example, machine learning and deep learning. So, one thing that we have been looking into is how we can leverage all those new technologies in machine learning and deep learning and apply it to deal with the complexity in systems. And that’s, you know, another very fascinating area that we’re looking into as well.

Host: Yeah. Well, let’s get specific now. Another super interesting area of research deals with exceptions and failures in the cloud-scale era and how you’re dealing with what you call “gray failure.” And you’ve also called it the gray swan (which I want you to explain) or the Achilles heel of cloud-scale systems. So how did you handle exceptions and failures in a somewhat less complex, pre-cloud era and what new methodologies are you trying to implement now?

Lidong Zhou: Right. So, as I mentioned, in the older days, we are targeting those systems with assumptions about failures, right? Like crash failures, you know, a component can fail… when it fails, it crashes. It stops working. And nowadays, we realize, you know, this kind of assumption no longer holds. So, this is why we define a new type of failures called gray failures. So, thinking about what kind of name to give to this very interesting new line of research that we’re starting so we called it gray swan. People already know about black swan or gray rhino. So first of all, because we’re talking about the cloud, we want something not as heavy as a rhino!

Host: Right.

Lidong Zhou: We want something that can fly. And the reason we call it gray is because, you know a systems component is no longer just black or white. It could be in a weird state where, from some of the observers it’s actually behaving correctly, but from the others, it’s actually not. And that turns out to be behind many of the issues that major problems that we’re seeing in the cloud. And it has sort of some components of black swan in the sense that some of the assumptions we’re making break. So that’s why everything we build on top of that assumption starts to break down. So, for example, I mentioned the assumption about failure, right? If you think that it either crashed or it’s correct, then it’s a very simple kind of world, right? But if it’s not the case, then all the protocols that will work under that assumption will cease to work. It also has this connection with gray rhino because gray rhino is this problem that everybody sort of sees coming, and it’s a very major problem, but people tend to ignore it for the wrong reason. And in our case, we know that, for the cloud, all those service disruptions happen all the time, and there are actually failures all over the place. It’s just very hard to figure out which ones are important. But we know something big is going to happen at some point, right? So, we try to use this notion of gray swan to describe this new line of thinking where, you know, we really think about failures that are not just crash failures or not even, you know, Byzantine failures where it’s essentially arbitrary failures. But there’s something in between that we should reason about, and then using those to reason about the correctness of the whole service.

Host: So, does the word catastrophic enter into this at all? Or is it…

Lidong Zhou: Yes! That could be catastrophic. Eventually.

Host: How does that kind of thinking playing into what you’re doing?

Lidong Zhou: If you look at the cloud system, it’s like in a rhino sort of charging towards you, and before it hits you, there are a lot of dusts, and you know noise and other things. But you just don’t know when and how something bad is going to happen, right? And it could be catastrophic. It happens actually a couple times already. And so, one of the things we try to do is to try to figure out when and how bad things could happen to prevent catastrophic failures…

Host: Right.

Lidong Zhou: …from all the dust and maybe, you know, other signals we have in the system. There are signals. It’s just we don’t know how to leverage them.

Host: Part of your approach to coping with gray failures is a line of research you call CloudBrain.

Lidong Zhou: Right.

Host: And it’s all about automatic troubleshooting for the cloud. It’s actually a huge issue because of the remarkable complexity of the systems. So, tell us how CloudBrain, and what you call DeepView, is actually helping operators – the people that have to deal with it on the ground – simplify how they write troubleshooting algorithms.

Lidong Zhou: Mm-hmm. So, I think CloudBrain is one of the efforts that we have to deal with gray failures. And remember, you know, we talked about the challenges that come from the complexity of the system or the scale of the system. It would really have, you know, a huge number of components interacting with each other. But on the other hand, we can really leverage the scale of the system to help us in terms of, you know, diagnosis and all, detecting problems, even figuring out where the problem is. And this is the premise of the CloudBrain project. So, it has actually three components, three ideas. The first one is really the notion of near, real-time monitoring. And so instead of trying to look at the logs after the fact and then analyze what happened, we try to have a pulse on what the system is doing, how it’s doing, and so on. So that’s the first component. And the second component is we really want to form a global view. So, it’s not just one observation we make about a system, but really observations for all over the systems combined, so we can actually understand how a system is behaving and which part is actually having a problem. And then, the third part is, once you have, you know, all these global observations that come in real time, then we can use statistical methods to really reason about, you know, what’s abnormal and so on. So, this is where we really leverage the scale, the huge amount of data…

Host: Right.

Lidong Zhou: …that used to be a challenge and now it becomes an opportunity for us to actually come up with new solutions to handle the complexity of the system.

Host: So how does that help an operator simplify writing an algorithm?

Lidong Zhou: Right, so now, the operator actually has all the data in near real time. And, you know, you can write this very simple algorithm that operates on the data sort of like a SQL query.

Host: Right.

Lidong Zhou: Right? And then it can emit signals and you know tell people that something’s wrong or something’s correct, or maybe we have to pay attention to part of the system that seems to have some problems.

Host: So where is this gray failure research, with all its pieces and parts, in the pipeline for production?

Lidong Zhou: Overall, we are not at the stage where we solve all the problems, but we have pieces of the technology we developed to solve some specific problems like DeepView and CloudBrain are, you know, the two projects that have already been incorporated in Azure to deal with network-related problems, for example.

Host: Mm-hmm.

Lidong Zhou: But, you know, we are far from solving the problem. It’s really sort of a research agenda that we set out probably for years to come. And one idea that we have been working on, which is actually very interesting, is that we really have to change how we view programs. In the past, for defensive programming, we have been trained to handle exceptions, and it turns out that handling exceptions in a large, complex system is not enough. So, one of the ideas that we’ve been thinking about is changing exception handling into exception or error reporting. So, you start to collect all those signals. We talked about, you know, the dust when the…

Host: Right.

Lidong Zhou: …rhino comes charging at you. So, you have to really collect those dusts towards one place so that you can actually reason about the behavior of the system. And that’s, you know, one of those major shifts…

Host: Yeah.

Lidong Zhou: …that, you know, we see coming even in how we develop systems.

Host: Right.

Lidong Zhou: Not just, you know, after the fact, we already have this beast and now we need to understand what’s going on.

Host: Right.

Lidong Zhou: So those methodologies, I think, is where we’re pushing. You know, it’s not just solving a specific problem. We have an incident; we try to solve this problem. Yeah, we can do that. But more importantly… this goes back to the theory meets practice…

Host: Right.

Lidong Zhou: …so, we need to come out of looking at the specific instances, but think about, you know, what methodologies we should adopt to change the status completely.

Host: So how do you implement, then, a brand-new thing? I mean, we talked about the beast that already exists, and is growing. What are you proposing with your research?

Lidong Zhou: Right, so, this is always a hard problem. We already have something running, and it has to keep running, and now it has a lot of problems we need to solve. So, one of the ways we deal with those challenges is trying to solve the current problems. You know, like CloudBrain and DeepView sort of try to fit into the current practice. But for some other projects, what we do is like, you know, what I talked about, changing from exception handling to error reporting – that actually is a system we build that we can transform automatically a piece of code that does error handling in the traditional way into a piece of code that actually does error reporting in the way that we desire.

Host: Right.

Lidong Zhou: And that helps because we don’t want everybody to rewrite the whole code base.

Host: No.

Lidong Zhou: It’s just not possible. So, we have to find ways to help developers to sort of do the transformation and also live with the current boundaries of the system. And we hopefully, gradually, we’ll move towards the right direction.

Host: Yeah, I think you see that in just about every place software exists is there’s a legacy system. You’ve got to retrofit some stuff that added complexity to it.

Lidong Zhou: That’s right.

Host: But you can’t just make everyone throw out what they’re already using. So, this is a big challenge. I’m glad you’re on the job.

(music plays)

Host: Well, we talked about what gets you up in the morning and all the work you’re doing to make sure that everything goes right… that is basically what you’re doing, is trying to make everything go right…

Lidong Zhou: Right.

Host: …but as we know – as you know more than I know – something always goes wrong!

Lidong Zhou: Right, unfortunately.

Host: The rhino… So, given what you see in your work every day, is there anything that keeps you up at night?

Lidong Zhou: Yes, I think we’re realizing that the kinds of distributed systems we’re designing, or building, are becoming more and more important. They’re becoming part of the sort of critical infrastructure of our society. And that puts a lot of burden on us to make sure that whatever we’re building can be mission critical.

Host: Right.

Lidong Zhou: And you know we have a lot of researchers working on formal methods, verification, just to make sure that the core of the system can be verifiable, will give some assurance that it’s actually working correctly. And, you know, we talked about applying machine learning and deep learning mechanisms, but it’s statistical. So sometimes – actually, naturally – there are cases where it breaks. So how we can safeguard this kind of system from what you call catastrophic issues, and this is also another thing that we have been putting a lot of thought into. And we’re not short of challenges, especially on making the cloud infrastructure really, you know, mission critical!

Host: Lidong, tell us your story. How did you end up at Microsoft Research, and how did you develop your path to the positions you hold right now?

Lidong Zhou: Yeah, looking back, I remember when I finished my PhD, I started job hunting and I got, you know, a couple of offers, and I talked to my advisor. Of course, that’s what you do when you’re a graduate student. And he basically gave me a very simple piece of advice. He basically said, well, just go where you can find the best colleagues, the colleagues with maybe, you know, Turing-Award caliber. So, I ended up going to Microsoft Research Lab where, at that time, we didn’t have a Turing Award winner, but within ten years, we had two! So that was how things started. Looking back, what’s really important is the quality of colleagues you have, especially in the early stages of my career. I learned how to do research in some sense. It’s not about getting papers published. It’s internal passion that drives research and I think the first phase of my career is more on personal development. I remember being pushed by my manager at the time, Roy Levin, to get out of my comfort zone. We started as a sort of technical contributor, but then, I was pushed to lead a project and there are always new challenges that you face. And you get a lot of support from your colleagues to get to the next stage, and that’s very satisfying. And then I went to MSR Asia, where I later became a manager of a research group, and I think that’s sort of the second phase of my career, where it’s not about my personal career development. It’s also about building a team and how you can contribute to other people’s success. And that turns out to be even more satisfying to see the impact you can have on other people’s careers and their success. And also, during that period of time, I also realized that it’s not just about your own team. You know, we can build the best systems research team in Asia Pacific, but it’s more satisfying if you can contribute to the community. And we talked about starting the workshop and getting the conference into Asia Pacific, and, you know, a lot of other things that we do to contribute to society, including, you know, the talent fostering and many other things. And those, in my mind, are becoming even more critical as we move on in our career.

Host: Yeah.

Lidong Zhou: So, I view this as sort of the three stages of my career. It started with personal development, learning, you know, what it means to love what you do and do what you love. And then you think about how you can contribute to other people’s success and increase your ability to influence others and impact others, and positively. And finally, in what you can contribute to the society, to the community. And I’ve been very fortunate to have been working with a lot of great, you know, leaders and colleagues, and I’ve learned a lot along the way. And I remember you know I worked with a lot of product teams as well. And they also offered a lot of career advice and support. So, this is just, you know, my story, I guess.

Host: You know, it sounds to me like almost a metaphor. You know, you start with yourself, you grow and mature outwards to others, and then the broader community impact that ultimately a mature person wants to see happen, right?

Lidong Zhou: I hope so!

Host: I get the sense that it is!

Lidong Zhou: It’s just about seeking the truth. It’s not about, you know, getting papers published. It’s not about, you know, chasing fame or, you know, all those things that we start to lose sight of, you know, what the true meaning of research is. It’s not about all these results that we try to get, but truly, it’s about finding the truth and enjoying the process along the way.

Host: At the end of each podcast, I ask my guests to give some parting advice to our listeners. What big, unsolved problems do you see on the horizon for researchers who may just be getting their feet wet with systems and networking research?

Lidong Zhou: Well, I think they are very fortunate to be a young researcher in systems and networking now. I remember I was talking to But[ler] Lampson when I started my career in 2003, and he said, you know, he was feeling lucky that he was doing all the work in the late seventies and early eighties because it was the right time to see a paradigm shift. And I think, now, we are at the point that we’re going to see another major paradigm shift, just like, you know, folks in Xerox PARC. What they did was, essentially, to define computing for the next thirty years. Even now, we’re sort of living in the world that they defined, looking at the PC, even with the phone. I mean, that’s just a different form factor, right? They sort of defined the mouse, the laser printer, all the things that we know about, and the user interface. And the reason that happened at that time was because the computing was becoming, you know, more powerful from supercomputers now to personal computing, because…

Host: Right.

Lidong Zhou: …you know, we can pack so much computation power into a small machine. And now, I think, you know, the computation power has reached another milestone where computing capability is going to be everywhere. And we’re going to have intelligence everywhere around us. The boundary between sort of the virtual world in computers and our physical world will disappear. And that will lead to really paradigm-shifting opportunities where we figure out, you know, what computing really means in the next, you know, ten years, twenty years. And this is what I would encourage everyone focus on rather than just incremental improvements to the protocols and so on. Because we are really seeing a lot of assumptions being invalidated. And we really have to look at the world in a very different view and from, you know, how we interact with sort of the computing capability and how we expose computing capability to do what we need to do. And it’s not just doing computing in front of a computer but, you know, doing everything with sort of the computing capability around us. And that’s just exciting to imagine. And I can’t even describe what the future will look like, but it’s up to our young researchers to really make it a reality.

Host: Lidong Zhou, it’s been an absolute pleasure. Thanks for joining us in the booth today.

Lidong Zhou: Thank you, Gretchen. Really a pleasure.

(music plays)

To learn more about Dr. Lidong Zhou and how researchers are working to bring order out of systems and networking chaos, visit

Posted on Leave a comment

Project Triton and the physics of sound with Microsoft Research’s Dr. Nikunj Raghuvanshi

Episode 68, March 20, 2019

If you’ve ever played video games, you know that for the most part, they look a lot better than they sound. That’s largely due to the fact that audible sound waves are much longer – and a lot more crafty – than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher in the Interactive Media Group at Microsoft Research, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.

Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes.


Final Transcript

Nikunj Raghuvanshi: In a game scene, you will have multiple rooms, you’ll have caves, you’ll have courtyards, you’ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it’s making its way around geometry. And the question becomes how do you compute even the direct sound? Even the initial sound’s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: If you’ve ever played video games, you know that for the most part, they look a lot better than they sound. That’s largely due to the fact that audible sound waves are much longer – and a lot more crafty – than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher in the Interactive Media Group at Microsoft Research, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.

Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes. That and much more on this episode of the Microsoft Research Podcast.

Host: Nikunj Raghuvanshi, welcome to the podcast.

Nikunj Raghuvanshi: I’m glad to be here!

Host: You are a senior researcher in MSR’s Interactive Media Group, and you situate your research at the intersection of computational acoustics and graphics. Specifically, you call it “fast computational physics for interactive audio/visual applications.”

Nikunj Raghuvanshi: Yep, that’s a mouthful, right?

Host: It is a mouthful. So, unpack that! How would you describe what you do and why you do it? What gets you up in the morning?

Nikunj Raghuvanshi: Yeah, so my passion is physics. I really like the mixture of computers and physics. So, the way I got into this was, many, many years ago, I picked up this book on C++ and it was describing graphics and stuff. And I didn’t understand half of it, and there was a color plate in there. It took me two days to realize that those are not photographs, they were generated by a machine, and I was like, somebody took a photo of a world that doesn’t exist. So, that is what excites me. I was like, this is amazing. This is as close to magic as you can get. And then the idea was I used to build these little simulations and I was like the exciting thing is you just code up these laws of physics into a machine and you see all this behavior emerge out of it. And you didn’t tell the world to do this or that. It’s just basic Newtonian physics. So, that is computational physics. And when you try to do this for games, the challenge is you have to be super-fast. You have 1/60th of a second to render the next frame to produce the next buffer of audio. Right? So, that’s the fast portion. How do you take all these laws and compute the results fast enough that it can happen at 1/60th of a second, repeatedly? So, that’s where the computer science enters the physics part of it. So, that’s the sort of mixture of things where I like to work in.

Host: You’ve said that light and sound, or video and audio, work together to make gaming, augmented reality, virtual reality, believable. Why are the visual components so much more advanced than the audio? Is it because the audio is the poor relation in this equation, or is it that much harder to do?

Nikunj Raghuvanshi: It is kind of both. Humans are visual dominant creatures, right? Because visuals are what is on our conscious mind and when you describe the world, our language is so visual, right? Even for sound, sometimes we use visual metaphors to describe things. So, that is part of it. And part of it is also that for sound, the physics is in many ways tougher because you have much longer wavelengths and you need to model wave diffraction, wave scattering and all these things to produce a believable simulation. And so, that is the physical aspect of it. And also, there’s a perceptual aspect. Our brain has evolved in a world where both audio/visual cues exist, and our brain is very clever. It goes for the physical aspects of both that give us separate information, unique information. So, visuals give you line-of-sight, high resolution, right? But audio is lower resolution directionally, but it goes around corners. It goes around rooms. That’s why if you put on your headphones and just listen to music at the loud volume, you are a danger to everybody on the street because you have no awareness.

Host: Right.

Nikunj Raghuvanshi: So, audio is the awareness part of it.

Host: That is fascinating because you’re right. What you can see is what is in front of you, but you could hear things that aren’t in front of you.

Nikunj Raghuvanshi: Yeah.

Host: You can’t see behind you, but you can hear behind you.

Nikunj Raghuvanshi: Absolutely, you can hear behind yourself and you can hear around stuff, around corners. You can hear stuff you don’t see, and that’s important for anticipating stuff.

Host: Right.

Nikunj Raghuvanshi: People coming towards you and things like that.

Host: So, there’s all kinds of people here that are working on 3D sound and head-related transfer functions and all that.

Nikunj Raghuvanshi: Yeah, Ivan’s group.

Host: Yeah! How is your work interacting with that?

Nikunj Raghuvanshi: So, that work is about, if I tell you the spatial sound field around your head, how does it translate into a personal experience in your two ears? So, the HRTF modeling is about that aspect. My work with John Snyder is about, how does the sound propagate in the world, right?

Host: Interesting.

Nikunj Raghuvanshi: So, if there is a sound down a hallway, what happens during the time it gets from there up to your head? That’s our work.

Host: I want you to give us a snapshot of the current state-of-the-art in computational acoustics and there’s apparently two main approaches in the field. What are they, and what’s the case for each and where do you land in this spectrum?

Nikunj Raghuvanshi: So, there’s a lot of work in room acoustics where people are thinking about, okay, what makes a concert hall sound great? Can you simulate a concert hall before you build it, so you know how it’s going to sound? And, based on the constraints on those areas, people have used a lot of ray tracing approaches which borrow on a lot of literature in graphics. And for graphics, ray tracing is the main technique, and it works really well, because the idea is you’re using a short wavelength approximation. So, light wavelengths are submicron and if they hit something, they get blocked. But the analogy I like to use is sound is very different, the wavelengths are much bigger. So, you can hold your thumb out in front of you and blot out the sun, but you are going to have a hard time blocking out the sound of thunder with a thumb held out in front of your ear because the waves will just wrap around. And, that’s what motivates our approach which is to actually go back to the physical laws and say, instead of doing the short wave length approximation for sound, we revisit and say, maybe sounds needs the more fundamental wave equation to be solved, to actually model these diffraction effects for us. The usual thinking is that, you know, in games, you are thinking about we want a certain set of perceptual cues. We want walls to occlude sound, we want a small room to reverberate less. We want a large hall to reverberate more. And the thought is, why are we solving this expensive partial differential equation again? Can’t we just find some shortcut to jump straight to the answer instead of going through this long-winded route of physics? And our answer has been that you really have to do all the hard work because there’s a ton of information that’s folded in and what seems easy to us as humans isn’t quite so easy for a computer and and there’s no neat trick to get you straight to the perceptual answer you care about.

(music plays)

Host: Much of the work in audio and acoustic research is focused on indoor sound where the sound source is within the line of sight and the audience and the listener can see what they were listening to…

Nikunj Raghuvanshi: Um-hum.

Host: …and you mentioned that the concert hall has a rich literature in this field. So, what’s the gap in the literature when we move from the concert hall to the computer, specifically in virtual environments?

Nikunj Raghuvanshi: Yeah, so games and virtual reality, the key demand they have is the scene is not one room, and with time it has become much more difficult. So, a concert hall is terrible if you can’t see the people who are playing the sound, right? So, it allows for a certain set of assumptions that work extremely nicely. The direct sound, which is the initial sound, which is perceptually very critical, just goes in a straight line from source to listener. You know the distance so you can just use a simple formula and you know exactly how loud the initial sound is at the person. But in a game scene, you will have multiple rooms, you’ll have caves, you’ll have courtyards, you’ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it’s making its way around geometry. And the question becomes, how do you compute even the direct sound? Even the initial sound’s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly. So, that’s the challenge.

Host: All right. So, let’s talk about how you’re addressing it. A recent paper that you’ve published made some waves, sound waves probably. No pun intended… It’s called Parametric Directional Coding for Pre-computed Sound Propagation. Another mouthful. But it’s a great paper and the technology is so cool. Talk about this… research this that you’re doing.

Nikunj Raghuvanshi: Yeah. So, our main idea is, actually, to look at the literature in lighting again and see the kind of path they’d followed to kind of deliver this computational challenge of how you do these extensive simulations and still hit that stringent CPU budget in real time. And one of the key ideas is you precompute. You cheat. You just look at the scene and just compute everything you need to compute beforehand, right? Instead of trying to do it on the fly during the game. So, it does introduce the limitation that the scene has to be static. But then you can do these very nice physical computations and you can ensure that the whole thing is robust, it is accurate, it doesn’t suffer from all the sort of corner cases that approximations tend to suffer from, and you have your result. You basically have a giant look-up table. If somebody tells you that the source is over there and the listener is over here, tell me what the loudness of the sound would be. We just say okay, we this a giant table, we’ll just go look it up for you. And that is the main way we bring the CPU usage into control. But it generates a knock-off challenge that now we have this huge table, there’s this huge amount of data that we’ve stored and it’s 6-dimensional. The source can move in 3-dimensions and the listener can move in 3-dimensions. So, we have the giant table which is terabytes or even more on data.

Host: Yeah.

Nikunj Raghuvanshi: And the game’s typical budget is like 100 megabytes. So, the key challenge we’re facing is, how do we fit everything in that? How do we take this data and extract out something salient that people listen to and use that? So, you start with full computation, you start as close to nature as possible and then we’re saying okay, now what would a person hear out of this? Right? Now, let’s do that activity of, instead of doing a shortcut, now let’s think about okay, a person hears the directional sound comes from. If there is a doorway, the sound should come from the doorway. So, we pick out these perceptual parameters that are salient for human perception and then we store those. That’s the crucial way you kind of bring down this enormous data set and do a sort of memory budget that’s feasible.

Host: So, that’s the paper.

Nikunj Raghuvanshi: Um-hum.

Host: And how has it played out in practice, or in project, as it were?

Nikunj Raghuvanshi: So, a little bit of history on this is, we had a paper SIGGRAPH 2010, me and John Snyder and some academic collaborators, and at that point, we were trying to think of just physical accuracy. So, we took the physical data and we were trying to stay as close to physical reality as possible and we were rendering that. And around 2012, we got to talking with Gears of War, the studio, and we were going through what the budgets will be, how things would be. And we were like we need… this needs to… this is gigabytes, it needs to go to megabytes…

Host: Really?

Nikunj Raghuvanshi: …very quickly. And that’s when we were like, okay, let’s simplify. Like, what’s the four like most basic things that you really want from an acoustic system? Like walls should occlude sound and thing like that. So, we kind of re-winded and came to it from this perceptual viewpoint that I was just describing. Let’s keep only what’s necessary. And that’s how we were able to ship this in 2016 in Gears of War 4 by just re-winding and doing this process.

Host: How is that playing in to, you know… Project Triton is the big project that we’re talking about. How would you describe what that’s about and where it’s going? Is it everything you’ve just described or is there… other aspects to it?

Nikunj Raghuvanshi: Yeah. Project Triton is this idea that you should precompute the wave physics, instead of starting with approximations. Approximate later. That’s one idea of Project Triton. And the second is, if you want to make it feasible for real games and real virtual reality and augmented reality, switch to perceptual parameters. Extract that out of this physical simulation and then you have something feasible. And the path we are on now, which brings me back to the recent paper you mentioned…

Host: Right.

Nikunj Raghuvanshi: …is, in Gears of War, we shipped some set of parameters. We were like, these make a big difference. But one thing we lacked was if the sound is, say, in a different room and you are separated by a doorway, you would hear the right loudness of the sound, but its direction would be wrong. Its direction would be straight through the wall, going from source to listener.

Host: Interesting.

Nikunj Raghuvanshi: And that’s an important spatial cue. It helps you orient yourself when sounds funnel through doorways.

Host: Right.

Nikunj Raghuvanshi: Right? And it’s a cue that sound designers really look for and try to hand-tune to get good ambiances going. So, in the recent 2018 paper, that’s what we fixed. We call this portaling. It’s a made-up word for this effect of sounds going around doorways, but that’s what we’re modeling now.

Host: Is this new stuff? I mean, people have tackled these problems for a long time.

Nikunj Raghuvanshi: Yeah.

Host: Are you people the first ones to come up with this, the portaling and…?

Nikunj Raghuvanshi: I mean, the basic ideas have been around. People know that, perceptually, this is important, and there are approaches to try to tackle this, but I’d say, because we’re using wave physics, this problem becomes much easier because you just have the waves diffract around the edge. With ray tracing you face the difficult problem that you have to trace out the rays “intelligently” somehow to hit an edge, which is like hitting a bullseye, right?

Host: Right.

Nikunj Raghuvanshi: So, the ray can wrap around the edge. So, it becomes really difficult. Most practical ray tracing systems don’t try to deal with this edge diffraction effect because of that. Although there are academic approaches to it, in practice it becomes difficult. But as I worked on this over the years, I’ve kind of realized, these are the real advantages of this. Disadvantages are pretty clear: it’s slow, right? So, you have to precompute. But we’re realizing, over time, that going to physics has these advantages.

Host: Well, but the precompute part is innovative in terms of a thought process on how you would accomplish the speed-up…

Nikunj Raghuvanshi: There have been papers on precomputed acoustics, academically before, but this realization that mixing precomputation and extracting these perceptual parameters? That is a recipe that makes a lot of practical sense. Because a third thing that I haven’t mentioned yet is going to the perceptual domain, now the sound designer can make sense of the numbers coming out of this whole system. Because it’s loudness. It’s reverberation time, how long the sound is reverberating. And these numbers that are super-intuitive to sound designers, they already deal with them. So, now what you are telling them is, hey, you used to start with a blank world, which had nothing, right? Like the world before the act of creation, there’s nothing. It’s just empty space and you are trying to make things reverberate this way or that, now you don’t need to do that. Now physics will execute first ,on the actual scene with the actual materials, and then you can say I don’t like what physics did here or there, let me tweak it, let me modify what the real result is and make it meet the artistic goals I have for my game.

(music plays)

Host: We’ve talked about indoor audio modeling, but let’s talk about the outdoors for now and the computational challenges to making natural outdoor sounds, sound convincing.

Nikunj Raghuvanshi: Yeah.

Host: How have people hacked it in the past and how does your work in ambient sound propagation move us forward here?

Nikunj Raghuvanshi: Yeah, we’ve hacked it in the past! Okay. This is something we realized on Gears of War because the parameters we use were borrowed, again, from the concert hall literature and, because they’re parameters informed by concert halls, things sound like halls and rooms. Back in the days of Doom, this tech would have been great because it was all indoors and rooms, but in Gears of War, we have these open spaces and it doesn’t sound quite right. Outdoors sounds like a huge hall and you know, how do we do wind ambiances and rain that’s outdoors? And so, we came up with a solution for them at that time which we called “outdoorness.” It’s, again, an invented word.

Host: Outdoorness.

Nikunj Raghuvanshi: Outdoorness.

Host: I’m going to use that. I like it.

Nikunj Raghuvanshi: Because the idea it’s trying to convey is, it’s not a binary indoor/outdoor. When you are crossing a doorway or a threshold, you expect a smooth transition. You expect that, I’m not hearing rain inside, I’m feeling nice and dry and comfortable and now I’m walking into the rain…

Host: Yeah.

Nikunj Raghuvanshi: …and you want the smooth transition on it. So, we built a sort of custom tech to do that outdoor transition. But it got us thinking about, what’s the right way to do this? How do you produce the right sort of spatial impression of, there’s rain outside, it’s coming through a doorway, the doorway is to my left, and as you walk, it spreads all around you. You are standing in the middle of rain now and it’s all around you. So, we wanted to create that experience. So, the ambient sound propagation work was an intern project and now we finished it up with our collaborators in Cornell. And that was about, how do you model extended sound sources? So, again, going back to concert halls, usually people have dealt with point-like sources which might have a directivity pattern. But rain is like a million little drops. If you try to model each and every drop, that’s not going to get you anywhere. So, that’s what the paper is about, how to treat it as one aggregate that somebody gave us? And we produce an aggregate sort of energy distribution of that thing along with this directional characteristics and just encode that.

Host: And just encode it.

Nikunj Raghuvanshi: And just encode it.

Host: How is it working?

Nikunj Raghuvanshi: It works nice. It sounds good. To my ears it sounds great.

Host: Well you know, and you’re the picky one, I would imagine.

Nikunj Raghuvanshi: Yeah. I’m the picky one and also when you are doing iterations for a paper, you also completely lose objectivity at some point. So, you’re always looking for others to get some feedback.

Host: Here, listen to this.

Nikunj Raghuvanshi: Well, reviewers give their feedback, so, yeah.

Host: Sure. Okay. Well, kind of riffing on that, there’s another project going on that I’d love for you to talk as much as you can about called Project Acoustics and kind of the future of where we’re going with this. Talk about that.

Nikunj Raghuvanshi: That’s really exciting. So, up to now, Project Triton was an internal tech which we managed to propagate from research into actual Microsoft product, internally.

Host: Um-hum.

Nikunj Raghuvanshi: Project Acoustics is being led by Noel Cross’s team in Azure Cognition. And what they’re doing is turning it into a product that’s externally usable. So, trying to democratize this technology so it can be used by any game audio team anywhere backed by Azure compute to do the precomputation.

Host: Which is key, the Azure compute.

Nikunj Raghuvanshi: Yeah, because you know, it took us a long time, with Gears of War to figure out, okay, where is all this precompute going to happen?

Host: Right.

Nikunj Raghuvanshi: We had to figure out the whole cluster story for themselves, how to get the machines, how to procure them, and there’s a big headache of arranging compute for yourself. And so that’s, logistically, a key problem that people face when they try to think of precomputed acoustics. The run-time side, Project Acoustics, we are going to have plug-ins for all the standard game audio engines and everything. So, that makes things simpler on that side. But a key blocker in my view was always this question of, where are you going to precompute? So, now the answer is simple. You get your Azure badge account and you just send your stuff up there and it just computes.

Host: Send it to the cloud and the cloud will rain it back down on you.

Nikunj Raghuvanshi: Yes. It will send down data.

Host: Who is your audience for Project Acoustics?

Nikunj Raghuvanshi: Project Acoustics, the audience is the whole game audio industry. And our real hope is that we’ll see some uptake on it when we announce it at GDC in March, and we want people to use it, as many teams, small, big, medium, everybody, to start using this because we feel there’s a positive feedback loop that can be set up where you have these new tools available, designers realize that they have these new tools available that have shipped in Triple A games, so they do work. And for them to give us feedback. If they use these tools, we hope that they can produce new audio experiences that are distinctly different so that then they can say to their tech director, or somebody, for the next game, we need more CPU budget. Because we’ve shown you value. So, a big exercise was how to fit this within current budgets so people can produce these examples of novel possible experiences so they can argue for more. So, to increase the budget for audio and kind of bring it on par with graphics over time as you alluded to earlier.

Host: You know, if we get nothing across in this podcast, it’s like, people, pay attention to good audio. Give it its props. Because it needs it. Let’s talk briefly about some of the other applications for computational acoustics. Where else might it be awesome to have a layer of realism with audio computing?

Nikunj Raghuvanshi: One of the applications that I find very exciting is for audio rendering for people who are blind. I had the opportunity to actually show the demo of our latest system to Daniel Kish, who, if you don’t know, he’s the human echo-locator. And he uses clicks from his mouth to actually locate geometry around him and he’s always oriented. He’s an amazing person. So that was a collaboration, actually, we had with a team in the Garage. They released a game called Ear Hockey and it was a nice collaboration, like there was a good exchange of ideas over there. That’s nice because I feel that’s a whole different application where it can have a potential social positive impact. The other one that’s very interesting to me is that we lived in 2-D desktop screens for a while and now computing is moving into the physical world. That’s the sort of exciting thing about mixed reality, is moving compute out into this world. And then the acoustics of the real world being folded into the sounds of virtual objects becomes extremely important. If something virtual is right behind the wall from you, you don’t want to listen to it with full loudness. That would completely break the realism of something being situated in the real world. So, from that viewpoint, good light transport and good sound propagation are both required things for the future compute platform in the physical world. So that’s a very exciting future direction to me.

(music plays)

Host: It’s about this time in the podcast I ask all my guests the infamous “what keeps you up at night?” question. And when you and I talked before, we went down kind of two tracks here, and I felt like we could do a whole podcast on it, but sadly we can’t… But let’s talk about what keeps you up at night. Ironically to tee it up here, it deals with both getting people to use your technology…

Nikunj Raghuvanshi: Um-hum.

Host: And keeping people from using your technology.

Nikunj Raghuvanshi: No! I wanted everybody to use the technology. But I’d say like five years ago, what used to keep me up at night is like, how are we going to ship this thing in Gears of War? Now what’s keeping me up at night is how do we make Project Acoustics succeed and how do we you know expand the adoption of it and, in a small way, try to improve, move the game audio industry forward a bit and help artists do the artistic expression they need to do in games? So, that’s what I’m thinking right now, how can we move things forward in that direction? I frankly look at video games as an art form. And I’ve gamed a lot in my time. To be honest, all of it wasn’t art, I was enjoying myself a lot and I wasted some time playing games. But we all have our ways to unwind and waste time. But good games can be amazing. They can be much better than a Hollywood movie in terms of what you leave them with. And I just want to contribute in my small way to that. Giving artists the tools to maybe make the next great story, you know.

Host: All right. So, let’s do talk a little bit, though, about this idea of you make a really good game…

Nikunj Raghuvanshi: Um-hum.

Host: Suddenly, you’ve got a lot of people spending a lot of time. I won’t say wasting. But we have to address the nature of gaming, and the fact that there are you know… you’re upstream of it. You are an artist, you are a technologist, you are a scientist…

Nikunj Raghuvanshi: Um-hum.

Host: And it’s like I just want to make this cool stuff.

Nikunj Raghuvanshi: Yeah.

Host: Downstream, it’s people want people to use it a lot. So, how do you think about that and the responsibilities of a researcher in this arena?

Nikunj Raghuvanshi: Yeah. You know, this reminds me of Kurt Vonnegut’s book, Cat’s Cradle? He kind of makes – what there’s scientist who makes Ice 9 and it freezes the whole planet or something. So, you see things about video games in the news and stuff. But I frankly feel that the kind of games I’ve participated in making, these games are very social experiences. People meet on the games a lot. Like Sea of Thieves is all about, you get a bunch of friends together, you’re sitting on the couch together, and you’re just going crazy like on these pirate ships and trying to just have fun. So, they are not the sort of games where a person is being separated from society by the act of gaming and just is immersed in the screen and is just not participating in the world. They are kind of the opposite. So, games have all these aspects. And so, I personally feel pretty good about the games I’ve contributed to. I can at least say that.

Host: So, I like to hear personal stories of the researchers that come on the podcast. So, tell us a little bit about yourself. When did you know you wanted to do science for a living and how did you go about making that happen?

Nikunj Raghuvanshi: Science for a living? I was the guy in 6th grade who’d get up and say I want to be a scientist. So, that was then, but what got me really hooked was graphics, initially. Like I told you, I found the book which had these color plates and I was like, wow, that’s awesome! So, I was at UNC Chapel Hill, graphics group, and I studied graphics for my graduate studies. And then, in my second or third year, my advisor, Ming Lin, she does a lot of research in physical simulations. How do we make water look nice in physical simulations? Lots of it is CGI. How do you model that? How do you model cloth? How do you model hair? So, there’s all this physics for that. And so, I took a course with her and I was like, you know what? I want to do audio because you get a different sense, right? It’s simulation, not for visuals, but you get to hear stuff. I’m like okay, this is cool. This is different. So, I did a project with her and I published a paper on sound synthesis. So, like how rigid bodies, like objects rolling and bouncing around and sliding make sound, just from physical equations. And I found a cool technique and I was like okay, let me do acoustics with this. It’s going to be fun. And I’m going to publish another paper in a year. And here I am, still trying to crack that problem of how to do acoustics in spaces!

Host: Yeah, but what a place to be. And speaking of that, you have a really interesting story about how you ended up at Microsoft Research and brought your entire PhD code base with you.

Nikunj Raghuvanshi: Yeah. It was an interesting time. So, when I was graduating, MSR was my number one choice because I was always thinking of this technology as, it would be great if games used this one day. This is the sort of thing that would have a good application in games. And then, around that time, I got hired to MSR and it was a multicore incubation back then, my group was looking at how do these multicore systems enable all sorts of cool new things? And one of the things my hiring manager was looking at was how can we do physically based sound synthesis and propagation. So, that’s what my PhD was, so they licensed the whole code base and I built on that.

Host: You don’t see that very often.

Nikunj Raghuvanshi: Yeah, it was nice.

Host: That’s awesome. Well, Nikunj, as we close, I always like to ask guests to give some words of wisdom or advice or encouragement, however it looks to you. What would you say to the next generation of researchers who might want to make sound sound better?

Nikunj Raghuvanshi: Yeah, it’s an exciting area. It’s super-exciting right now. Because even like just to start from more technical stuff, there are so many problems to solve with acoustic propagation. I’d say we’ve taken just the first step of feasibility, maybe a second one with Project Acoustics, but we’re right at the beginning of this. And we’re thinking there are so many missing things, like outdoors is one thing that we’ve kind of fixed up a bit, but we’re going towards what sorts of effects can you model in the future? Like directional sources is one we’re looking at, but there are so many problems. I kind of think of it as the 1980s of graphics when people first figured out that you can make this work. You can make light propagation work. What are the things that you need to do to make it ever closer to reality? And we’re still at it. So, I think we’re at that phase with acoustics. We’ve just figured out this is one way that you can actually ship in practical applications and we know there are deficiencies in its realism in many, many places. So, I think of it as a very rich area that students can jump in and start contributing.

Host: Nowhere to go but up.

Nikunj Raghuvanshi: Yes. Absolutely!

Host: Nikunj Raghuvanshi, thank you for coming in and talking us today.

Nikunj Raghuvanshi: Thanks for having me.

(music plays)

To learn more about Dr. Nikunj Raghuvanshi and the science of sound simulation, visit

Posted on Leave a comment

Email overload: Using machine learning to manage messages, commitments

As email continues to be not only an important means of communication but also an official record of information and a tool for managing tasks, schedules, and collaborations, making sense of everything moving in and out of our inboxes will only get more difficult. The good news is there’s a method to the madness of staying on top of your email, and Microsoft researchers are drawing on this behavior to create tools to support users. Two teams working in the space will be presenting papers at this year’s ACM International Conference on Web Search and Data Mining February 11–15 in Melbourne, Australia.

“Identifying the emails you need to pay attention to is a challenging task,” says Partner Researcher and Research Manager Ryen White of Microsoft Research, who manages a team of about a dozen scientists and engineers and typically receives 100 to 200 emails a day. “Right now, we end up doing a lot of that on our own.”

According to the McKinsey Global Institute, professionals spend 28 percent of their time on email, so thoughtful support tools have the potential to make a tangible difference.

“We’re trying to bring in machine learning to make sense of a huge amount of data to make you more productive and efficient in your work,” says Senior Researcher and Research Manager Ahmed Hassan Awadallah. “Efficiency could come from a better ability to handle email, getting back to people faster, not missing things you would have missed otherwise. If we’re able to save some of that time so you could use it for your actual work function, that would be great.”

Email deferral: Deciding now or later

Awadallah has been studying the relationship between individuals and their email for years, exploring how machine learning can better support users in their email responses and help make information in inboxes more accessible. During these studies, he and fellow researchers began noticing varying behavior among users. Some tackled email-related tasks immediately, while others returned to messages multiple times before acting. The observations led them to wonder: How do users manage their messages, and how can we help them make the process more efficient?

“There’s this term called ‘email overload,’ where you have a lot of information flowing into your inbox and you are struggling to keep up with all the incoming messages,” explains Awadallah, “and different people come up with different strategies to cope.”

In “Characterizing and Predicting Email Deferral Behavior,” Awadallah and his coauthors reveal the inner workings of one such common strategy: email deferral, which they define as seeing an email but waiting until a later time to address it.

The team’s goal was twofold: to gain a deep understanding of deferral behavior and to build a predictive model that could help users in their deferral decisions and follow-up responses. The team—a collaboration between Microsoft Research’s Awadallah, Susan Dumais, and Bahareh Sarrafzadeh, lead author on the paper and an intern at the time, and Christopher Lin, Chia-Jung Lee, and Milad Shokouhi of the Microsoft Search, Assistant and Intelligence group—dedicated a significant amount of resources to the former.

“AI and machine learning should be inspired by the behavior people are doing right now,” says Awadallah.

The probability of deferring an email based on the workload of the user as measured by the number of unhandled emails. The number of unhandled emails is one of many features Awadallah and his coauthors used in training their deferral prediction model.

The team interviewed 15 subjects and analyzed the email logs of 40,000 anonymous users, finding that people defer for several reasons: They need more time and resources to respond than they have in that moment, or they’re juggling more immediate tasks. They also factor in who the sender is and how many others have been copied. They found some of the more interesting reasons revolved around perception and boundaries, delaying or not to set expectations on how quickly they respond to messages.

The researchers used this information to create a dataset of features—such as the message length, the number of unanswered emails in an inbox, and whether a message was human- or machine-generated—to train a model to predict whether a message is deferred. The model has the potential to significantly improve the email experience, says Awadallah. For example, email clients could use such a model to remind users about emails they’ve deferred or even forgotten about, saving them the effort they would have spent searching for those emails and reducing the likelihood of missing important ones.

“If you have decided to leave an email for later, in many cases, you either just rely on memory or more primitive controls that your mail client provides like flagging your message or marking the message unread, and while these are useful strategies, we found that they do not provide enough support for users,” says Awadallah.

Commitment detection: A promise is a promise

Among the deluge of incoming emails are outgoing messages containing promises we make—promises to provide information, set up meetings, or follow up with coworkers—and losing track of them has ramifications.

“Meeting your commitments is incredibly important in collaborative settings and helps build your reputation and establish trust,” says Ryen White.

Current commitment detection tools, such as those available in Cortana, are pretty effective, but there’s room for further advancement. White, lead author Hosein Azarbonyad, who was interning with Microsoft at the time of the work, and coauthor Microsoft Research Principal Applied Scientist Robert Sim seek to tackle one particular obstacle in their paper “Domain Adaptation for Commitment Detection in Email”: bias in the datasets available to train commitment detection models.

Researcher access is generally limited to public corpora, which tend to be specific to the industry they’re from. In this case, the team used public datasets of email from the energy company Enron and an unspecified tech startup referred to as “Avocado.” They found a significant disparity between models trained and evaluated on the same collection of emails and models trained on one collection and applied to another; the latter model failed to perform as well.

“We want to learn transferable models,” explains White. “That’s the goal—to learn algorithms that can be applied to problems, scenarios, and corpora that are related but different to those used during training.”

To accomplish this, the group turned to transfer learning, which has been effective in other scenarios where datasets aren’t representative of the environments in which they’ll ultimately be deployed. In their paper, the researchers train their models to remove bias by identifying and devaluing certain information using three approaches: feature-level adaptation, sample-level adaptation, and an adversarial deep learning approach that uses an autoencoder.

Emails contain a variety and number of words and phrases, some more likely to be related to a commitment—“I will,” “I shall,” “let you know”—than others. In the Enron corpus, domain-specific words like “Enron,” “gas,” and “energy” may be overweighted in any model trained from it. Feature-level adaptation attempts to replace or transform these domain-specific terms, or features, with similar domain-specific features in the target domain, explains Sim. For instance, “Enron” might be replaced with “Avocado,” and “energy forecast” might be replaced with a relevant tech industry term. The sample level, meanwhile, aims to elevate emails in the training dataset that resemble emails in the target domain, downgrading those that aren’t very similar. So if an Enron email is “Avocado-like,” the researchers will give it more weight while training.

General schema of the proposed neural autoencoder model used for commitment detection.

The most novel—and successful—of the three techniques is the adversarial deep learning approach, which in addition to training the model to recognize commitments also trains the model to perform poorly at distinguishing between the emails it’s being trained on and the emails it will evaluate; this is the adversarial aspect. Essentially, the network receives negative feedback when it indicates an email source, training it to be bad at recognizing which domain a particular email comes from. This has the effect of minimizing or removing domain-specific features from the model.

“There’s something counterintuitive to trying to train the network to be really bad at a classification problem, but it’s actually the nudge that helps steer the network to do the right thing for our main classification task, which is, is this a commitment or not,” says Sim.

Empowering users to do more

The two papers are aligned with the greater Microsoft goal of empowering individuals to do more, tapping into an ability to be more productive in a space full of opportunity for increased efficiency.

Reflecting on his own email usage, which finds him interacting with his email frequently throughout the day, White questions the cost-benefit of some of the behavior.

“If you think about it rationally, it’s like, ‘Wow, this is a thing that occupies a lot of our time and attention. Do we really get the return on that investment?’” he says.

He and other Microsoft researchers are confident they can help users feel better about the answer with the continued exploration of the tools needed to support them.

Posted on Leave a comment

Podcast: Putting the ‘human’ in human computer interaction with Haiyan Zhang

haiyan zhang standing in front of a wall

Haiyan Zhang, Innovation Director

Episode 62, February 6, 2019

Haiyan Zhang is a designer, technologist and maker of things (really cool technical things) who currently holds the unusual title of Innovation Director at the Microsoft Research lab in Cambridge, England. There, she applies her unusual skillset to a wide range of unusual solutions to real-life problems, many of which draw on novel applications of gaming technology in serious areas like healthcare.

On today’s podcast, Haiyan talks about her unique “brain hack” approach to the human-centered design process, and discusses a wide range of projects, from the connected play experience of Zanzibar, to Fizzyo, which turns laborious breathing exercises for children with cystic fibrosis into a video game, to Project Emma, an application of haptic vibration technology that, somewhat curiously, offsets the effects of tremors caused by Parkinson’s disease.


Episode Transcript

Haiyan Zhang: We started out going very broad, and looking at lots of different solutions out there, not necessarily just for tremor, but across the spectrum to address different symptoms of Parkinson’s disease. And this is actually really part of this whole design thinking methodology which is to look at analogous experiences. So, taking your core problem and then looking at adjacent spaces where there might be solutions in a completely different area that can inform upon the challenge that you are tackling.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Haiyan Zhang is a designer, technologist and maker of things (really cool technical things) who currently holds the unusual title of Innovation Director at the Microsoft Research lab in Cambridge, England. There, she applies her unusual skillset to a wide range of unusual solutions to real-life problems, many of which draw on novel applications of gaming technology in serious areas like healthcare.

On today’s podcast, Haiyan talks about her unique “brain hack” approach to the human-centered design process, and discusses a wide range of projects, from the connected play experience of Zanzibar, to Fizzyo, which turns laborious breathing exercises for children with cystic fibrosis into a video game, to Project Emma, an application of haptic vibration technology that, somewhat curiously, offsets the effects of tremors caused by Parkinson’s disease. That and much more on this episode of the Microsoft Research Podcast.

(music plays)

Host: Haiyan Zhang, welcome to the podcast.

Haiyan Zhang: Hi, thanks Gretchen. Great to be here.

Host: You are the Innovation Director at MSR Cambridge in England, which is a super interesting title. What is an Innovation Director? What does an Innovation Director do? What gets an Innovation Director up in the morning?

Haiyan Zhang: I guess it is quite an unusual title. It’s a kind of a bespoke role, I would say, because of my quite unusual background, I guess. Part of what I do is look at how technology can be applied in real use cases in the world to create business impact, within Microsoft and outside of Microsoft, and to make those connections between our deeply technical research with applied product groups across the company.

Host: So, is this a job that existed at MSR in Cambridge or did you arrive with this unique set of talents and skills and background and ability, and bring the job with you?

Haiyan Zhang: I would say it’s something I brought with me and it’s evolving over time. (laughs)

Host: Well, unpack that a little bit. How has it evolved since you began? When did you begin?

Haiyan Zhang: So, I actually joined Microsoft about five and a half years ago and I actually initially joined as part of the Xbox organization, running an innovation team in Xbox in London and looking at new play experiences for kids, for teens, that were completely outside of the box. And then from that, I transitioned into Microsoft Research. And part of my team also continued on that research in terms of creating completely new technology experiences around entertainment. And more recently, I’m working across the lab with various projects to see how we can connect our sort of fundamental computer science work better with products across Microsoft in terms of Azure cloud infrastructure, in terms of Xbox and gaming, in terms of Office and productivity.

Host: You’ve been in high-tech for nearly twenty years and you’ve worked in engineering and user experience and research… R&D, hardware, service design, etc., and even out in the “blue-sky envisioning space.” So, that brings a lot to the party in the form of one person. (laughter) Quite frankly, I’m impressed. How has your experience in each, or all, of these areas informed how you approached the research you do today?

Haiyan Zhang: Well thanks, Gretchen. I’m really… I’m quite honored to be on the podcast actually because I’m so impressed with all the researchers that you’ve been interviewing across all the MSR labs. So, I would say that, in the research work that I do, I bring a very human-centered lens to looking at technology. So I undertake a full, human-centered design process starting from talking to people, getting empathy with people, trying to extract insight from what people really need and then going deeply into the technical research to develop prototypes, technology ideas to support those needs, and then deploying those prototypes in the field to understand how that can be improved and how we can evolve our technology thinking.

Host: Let’s talk about design thinking, then, for a minute. I don’t know if you’d call it discrete from computational thinking or any other kind of thinking, but it seems to be a buzz phrase right now. So, as a self-described designer, technologist and maker of things, how would you define design thinking?

Haiyan Zhang: So, I would say that design thinking is not separate from computational thinking, it’s a layer above. It’s just an approach to problem-solving, and it’s basically a tool kit that allows you to utilize different methods to really gain an understanding of people’s needs, to gain an understanding of insight into how people’s lives can be improved through technology, and then tools around prototyping and evaluating those prototypes. So, I would say that it is not, in itself, a scientific method, but it can be used to improve and augment your existing practice.

Host: Let’s get specific now and talk about some of those projects that you’ve been working on, starting with Project Zanzibar. What was the inspiration behind this project? How did you bring it to life and how does it embody your idea of connected play experiences that you’ve talked about?

Haiyan Zhang: I think there is a rich history in computer science of tangible user interfaces. You know, some of the early work at Xerox Park even or at the MIT Media Lab around how we can create these seamless interactions between people, between their physical environment and between a digital universe. And I think the approach we had to Zanzibar was that the most fruitful area for exploration in tangible user interfaces would be to enable kids to play and learn though physicality. Through interacting with physical objects that were augmented with virtual information, because we’re really trying to tap into this idea of multi-modal learning and learning through play. So, just coming from this initial approach, we dive very deeply into how would we invent a completely new technology platform to enable people to very seamlessly manipulate objects in a natural way using gestures, and then bring about some new digital experiences layered on top of that, that were games or education scenarios and then sort of bringing those together in terms of really fundamental technology invention, but also applications that could demonstrate what that technology could do.

Host: Right. Well, and it’s too bad that this is an audio-only experience here on the podcast because there’s a really cool overview of this project on the Microsoft Research website and it’s a very visual, artifact-based approach to playing with computers.

Haiyan Zhang: Yeah, yeah. And I encourage everyone to visit the project page and take a look at some of the videos and our prototypes that we have published.

Host: Right. So, what was the thinking behind tying in the artifact and the digital?

Haiyan Zhang: You know, there’s this rich history of research with physical objects and we’ve proven out that physical/digital interaction is a great way forward in terms of novel interactions between people and computing. But the pragmatics of these systems have not been ideal. You know, if you have to be sat at your desk and there has to be an overhead camera, usually a lot of research projects involve this or there’s occlusion in terms of where your hand can be and where the physical objects can be because the cameras won’t be able to track it. So, what we set out to do was think about well, how would you design a technology platform that overcomes a lot of these barriers to these platforms so that we can then be freed up to think about those scenarios, but we can also empower other researchers who are doing research in this space to think about those scenarios. So, our research group, we had to this idea of leveraging NFC, but leveraging it in terms of an NFC antenna array so that we could track objects in a 2-D space. And then the additional novelty was also layering that with a capacitive multi-touch layer so that we could track both the objects in terms of the physical IDs of the objects on top of this surface area. The capacitor’s multi-touch would enhance that tracking that the NFC provided, but also, we could track hand gestures, both in terms of multi-touch gestures on top of the surface and also some hover gestures just above the surface as well.

(music plays)

Host: Let’s talk a bit about another really cool project that you’re working on. I know Cambridge, your lab, is deeply, and maybe even uniquely, invested in researching and developing technologies to improve healthcare, and you have a couple projects going on in this area. One of them, Project Fizzyo. I’ll spell it. It’s F (as in Frank)-i-z-z-y-o. Tell us about this project. How did it come about? What’s the technology behind it and how does it work?

Haiyan Zhang: So, Fizzyo really started as a collaboration with the BBC and we were inspired by one family. The mom, Vicky, she has four kids and two of her boys have cystic fibrosis, they have a genetic condition where their internal organs are constantly secreting mucous. And so, every day, twice a day, the boys have to do this laborious breathing exercise to expel the mucous from their lungs, and it involves breathing into these plastic apparatus. And they basically apply pressure to your breath so that when you breathe, it creates and oscillating effect in your lungs and escalates the mucous and then it culminates in you coughing and trying to cough out the mucous from your lungs. They’re usually plastic devices, where as you blow, the air kind of enters a chamber and there might be some sort of mechanism that oscillates the air like a ball-bearing that bounces up and down and so they are very low-fi, so there’s no digital aspect to these devices. And you can imagine, these kids, they are having to do these exercises from a very early age, from as early as they can remember, twice a day for 30 minutes, for an hour at a time. It’s really intensive and it can be, you know, if not painful, at least really uncomfortable to do. And I actually tried to do this once and I felt really light-headed. I actually couldn’t do one session of it. And also, the kids, they want to be outside playing with their friends. You know, they don’t want to be stuck indoors doing this all the time. And there is no thread from doing the exercise and feeling an improvement because the activity is about maintenance, so you are trying to maintain your health because if you don’t clear the mucous from your lungs, infection can set in and that means going to the hospital, that means getting antibiotics. And so, it’s a very challenging thing for Vicky, their mom, to be jostling them, be harassing them to do this all the time. And she said that her role has really changed with her kids and that she’s no longer a mom, she’s sort of nagging them all the time. And so, we visited with the family to really understand their plight. And she asked, you know, can we create a piece of technology that can help us in getting the kids to do this kind of physio, the treatment is a type of physio. And so, we actually came up with this idea together where she said, you know, the boys really love to play video games so, what if we could create a way for the boys to be playing a video game as they are undertaking this exercise. So, we started this process of prototyping and developing a digital attachment, a sensor, that attaches to all these various different physio devices. And as the patient is expelling, is breathing out, the sensor actually senses the breath and transmits that digital signal to a tablet and we can translate that signal into controls for a video game. And we’re also able to upload that to the cloud, to do further analysis on that medical treatment.

Host: Wow. How is it working?

Haiyan Zhang: We started this project about two and a half years ago. It’s been a long process, but a really fruitful and rewarding one. So, we started out with just some early prototypes, just using off-the-shelf electronics to get the breath sensor working just right. We added a single button, because we realized if you were just using the breath to play video games, it’s actually really challenging. And then, within the team, our industrial designer, Greg Saul, designed the physical attachment. We developed our own sensor board and we had it manufactured along with the product design. And we partnered with University College London, their physiotherapy department, and the Great Ormond Street Hospital in London where they’ve deployed over a hundred of these units with kids across the country to do a long-term trial. So actually, when we first met with the University College London physiotherapy department, I mean, this is a department that they’ve spent their entire careers working with kids in this domain. And they had never had any contact with the computer science department. This was not a digital research area. When they first met us, and they saw, on the computer screen, someone breathing out, and a graph showing that breath, the peak of that breath, one of the heads of the department that we were working with, she started to cry because she said that in her entire career, she had never seen physio data visualized in this way. It was just incredible for her.

Host: Wow.

Haiyan Zhang: And so, we decided to partner, and they’ve been amazing because, through this journey, they’ve gone to meet people in the computer science department, they initiated masters’ degrees incorporating data science and digital understanding. They just hired their first data scientist in order to leverage the platform that we’ve built to do further analysis to improve the health of these kids. And they said that even though this kind of exercise has been around for decades, no one has actually done a definitive, long-term study to track the efficacy of this kind of exercise to health, to outcomes. You know, because I think past studies have really relied on keeping paper diaries, answering questionnaires, but no one has done that digital study, which is what the power of Internet of Things can really bring you, which is tracking in the background in a very precise way.

Host: Talk about the role of machine learning. How do any of the new methodologies in computer science like machine learning methods and techniques play into this?

Haiyan Zhang: You know, what’s really interesting with machine learning is the availability of data. And, you know, we understand that what has driven this AI revolution is now the availability of large data sets to actually be able to develop new ML algorithms and models. And in many cases, especially in healthcare, there is the lack of data. So, I think throughout different areas of computer science research, there’s a real need to kind of connect the dots and actually develop IoT solutions that can start at the beginning and capture the data, because it’s only through cleverly capturing valid data, that we can then do the machine learning in the back end once we’ve done the data collection. And so, I think the Fizzyo project is a really good proof point of that in that we started out with IoT in order to gather the information that track the health exercises. And we just sort of deployed in the UK, so as we’re collected this data, we’re now able to look at that and start to do some predictions around long-term health. So, you know, some of the questions that physiotherapy researchers are trying to answer, if kids are very adherent to this kind of exercise, if they are doing what they are being told, they are doing this this twice a day for the duration that they are supposed to be doing it, does that mean, in six months’ time or a year’s time, their number of days in hospital is going to be reduced? Does it actually impact how much time they are spending being ill? If we see a trailing-off of this exercise, does that mean that we’ll see an increase in infection rates? So, with the data that we’re collecting, we’re now working with a different part of Microsoft, they’re called the Microsoft Commercial Software Engineering team, who are actively delving into projects around AI for good and they are going to be working with UCL to do some of this clustering and developing models around health prediction. So, clustering the patients into different cohorts to understand if there is prediction factors around how they are doing the exercises and how much time they are going to be spending in hospital in the years to come.

Host: Well, it almost would be hard for me to get more excited about something than what you just described in Project Fizzyo, but there is another project to talk about which is Project Emma. This is so cool it’s even been featured on a documentary series there in the UK called The Big Life Fix. And it didn’t just start with a specific idea, but with a specific person. Tell us the story of Emma.

Haiyan Zhang: Yes! So, again, Project Emma started with a single person, with Emma Lawton, who, when she was 28 years old, she was diagnosed with early onset Parkinson’s disease. And, it had been five years since her diagnosis and some of her symptoms had progressed quite quickly and one of them was an active tremor. So, her tremor would get worse as she started to write or draw. And this really affected how she went about her day-to-day work because she was a creative director, a graphic designer and day-to-day she would be in client meetings, talking with people and trying to sketch out what they meant in terms of the ideas that they had. And she would not be able to do that. And when I first met with her, she would sit with a colleague and her colleague would actually draw on her behalf. So, she really was looking for some kind of technology intervention to help her. And, we started out going very broad, and looking at lots of different solutions out there, not necessarily just for tremor, but across the spectrum to address different symptoms of Parkinson’s disease. And this is actually really part of this whole design thinking methodology which is to look at analogous experiences. So, taking your core problem and then looking at adjacent spaces where there might be solutions in a completely different area that can inform upon the challenge that you are tackling. So, we looked at lots of different solutions for other kinds of symptoms and of course, there was a lot of desk research. It was reading research papers that had been published over the decades that looked at tremors specifically. So, I think the two aspects that really influenced our thinking, one was around going to visit with a local charity called Parkinson’s UK and we were asking them to show us their catalogue of widgets and devices that they sold to Parkinson’s patients that helped them in their every day. And on the table, there was a digital metronome. So, you know, when you’re playing the piano you see musicians, they have this ticking metronome. And I asked, you know, so why is there a metronome on the table? And the lady said, well, for some Parkinson’s patients, they have a symptom called freezing of gait and this is where when you are walking along, your legs suddenly freeze, and you lose control of your legs. And so, sometimes people find that if they take out this metronome and they turn it on and it makes this rhythmic ticking sound, it somehow distracts their brain into being able to walk again, which is really kind of odd. There’s been a little bit of literature around this. In the literature it’s called queuing, it’s a queuing effect, but it doesn’t apply to tremor. But, for me, it sort of signaled an interesting brain hack, and signaled kind of underlying what might be going on in your brain when you have Parkinson’s disease. At the same time, there had been a number of papers around using vibration on the muscles to try to ameliorate tremor, to try to address it, to various effect. And not specifically looking at Parkinson’s but looking at other kinds of tremor diseases like central tremor, dystonia. And so, we developed a hypothesis and in order to test out the hypothesis, we developed a prototype which was a wearable device for the wrist that had a number of vibrating motors on it. So, it would apply vibration to the wrist in a rhythmic fashion in order to somehow circumvent the mechanism that was causing the tremor. And of course, we had a number of other hypotheses, too. This was not the only hypothesis. We had other devices that worked in a completely different way that was more about mechanically stopping the tremor, mechanically countering the tremor. And this device actually worked really well. So, we were surprised, but very, very happy, and so this is the direction that we took in order to further develop this product.

Host: Right. So, drilling in, I do want to mention that there is a video on this, on the website as well. It’s a video that made me cry. I think it made you cry, and it made Emma cry. We’re all just puddles of tears, because it’s so fantastic. And so, this kind of circles back to research writ large, and experimenting with ideas that may not necessarily be super, what we would call high-tech, maybe they are kind of low-fi, you know, a vibration tool that can keep you from shaking. So, how did it play out? How did you prototype this? Give us a little overview of your process.

Haiyan Zhang: For us, it was a very simple prototyping exercise. We took some off-the-shelf coin cell motors and developed, basically, a haptic type bracelet that we then had an app that you could program the haptics on the bracelet. And that’s what we sort of experimented with. So, just research from the haptics area of computer science research which is really about a mechanism for sort of using in VR or sensing something about the digital world, now applied to this medical domain.

(music plays)

Host: You have a diverse slate of projects going on at any given time and your teams are really diverse. So, I want you to talk, specifically, about the composition of skills and expertise that are required to bring some of these really fascinating research projects to life, and ultimately to market. Who is on your team and what do they bring to the party?

Haiyan Zhang: Well, I think there’s just something really unique about Microsoft Research and Microsoft Research Cambridge, in particular, we have such a broad portfolio of projects, but also expertise in the different computer science fields, that we can sort of pull together these multidisciplinary teams to go after a single topic. So, within our lab we have social scientists doing user research, gaining real insight into how people behave, how people think about various technologies. We have designers that are exploring user interfaces, exploring products to bring these ideas to life. We have, you know, computer vision specialists. We have machine learning specialists. We have natural language processing people, systems researchers, and securities researchers and, obviously, healthcare researchers. So, it’s that broad outlook that I think can really push forward in terms of technology innovation and really emphasizing the applications for people, for improving society as a whole.

Host: I ask all my guests some form of the question is there anything that keeps you up at night. And I know that many people, mainly parents, are worried that their kids are too engaged with screens or not spending enough time in real life and so on. What would you say to them, and is there anything that keeps you up at night about sort of the broader swath of what you are working on?

Haiyan Zhang: You know, on the topic of screen time, obviously it’s something that we really wrestled with Zanzibar research specifically which is thinking about how you could interact with physical objects instead of a digital screen, and also bringing that kind of bigger interaction surface between family and between friends so they could interact together. You know, at the same time, I would say that culture is constantly changing and how we live our lives is constantly changing. We’ve only seen the internet be really embedded in our lives in the last, I’d say, twenty years, fifteen years, twenty years. When I think we were younger, we had television and there were no computers and so, I say culture is constantly evolving. How we’re growing, how we’re living is constantly evolving. It’s important for parents to evaluate this changing landscape of technology and to figure out what is the best thing to do with their kids. And maybe you don’t have to rely on how you grew up, but to kind of evaluate that our kids are getting the right kind of social interaction, getting the right amount of parental support and quality time with their family. I think that’s what is important, but to accept that how we’re growing is changing.

Host: What about the idea of the internet of things and privacy when we’re talking about toys and kids?

Haiyan Zhang: Mmm, yeah, it is something we really have to watch out for, and um you know, we’ve seen some bad examples of the toy industry jumping ahead too far and enabling toys to be connected 24/7 and conversing with kids and what does that really mean? I’ve seen some really great research out of the MIT Media Lab where there was a researcher really looking at how kids are conversing with AI, with different AI agents and their mental model of these AI agents. So, I think that’s a really great piece of research to look at, but also maybe to expand upon. As a research community, if we’re thinking about kids, to understand that how kids are interacting with AI is going to be more commonplace, and rather than trying to avoid it, to really tackle it head-on and see how we can improve the principles around designing AI, how we can inform companies in the market out there of what is the ethical approach to doing this so that kids really understand what AI is as they are growing up with it.

Host: We’re coming up on an event at Microsoft Research called I Chose STEM and it’s all about encouraging women to… well, choose STEM! As an area of study or a career.

Haiyan Zhang: Yeah.

Host: So, tell us the story of how you chose it? What got you interested in a career in high-tech in general, and maybe even high-tech research specifically? Who were your influences?

Haiyan Zhang: I have a I guess slightly unique background in that I was born in China and at the time it was very kind of Communist education that I had when I was growing up. And my family moved to Australia when I was 8 years old. And I was always very technical and very nerdy. But I never thought about technology as a career. I actually wanted to study law when I was in high school. And computing was just something where I was sort of, you know, it was kind of fun, but I never thought about it as a career. And I’d say in the last sort of year of high school, I decided to switch and do computer science and I realized that I was actually really good at computer science. I guess what led me to choose STEM is just the – I think the fun and creativity you can have with programming. You know, I would always come up with my own little creative exercises to write on the computer. It wasn’t the rote exercises, it was the ability to kind of be creative with this technical tool that really got me excited. I think at the same time, I love this huge effort within our industry to really focus on getting more women, more girls into technology, into STEM education, and we really want to increase representation, increase sort of equal representation. At the same time, I think I found it, at times, to be, you know, challenging to be the only woman in the room. You know, when I was in computer science, sometimes I’d be, you know, one of three women in the lecture theater or something. I think we need to adopt this kind of pioneer mindset so that we can go into these new areas, go into a room where you’re the only person, where you’re unique in that room and you have something to contribute and don’t be afraid to speak up. I think that’s a really important mindset and skill for anybody to have.

Host: No interview would be complete if I didn’t ask my guest to predict the future. No pressure, Haiyan. Seriously though, you are living on the cutting edge of technology research which is what this podcast is all about. And so what advice or encouragement – you’ve just kind of given some – would you give to any of our listeners across the board who might be interested or inspired by what you are doing? Who is a good fit for the research you do?

Haiyan Zhang: My advice would be, especially in the research domain, to develop that deep research expertise, but to keep a holistic outlook. I think the research landscape is changing in that we are going to be working in more multidisciplinary teams, working across departments. You know, sometimes it’s the healthcare department, the physiotherapy department, with the computer science department. It’s through the connection of these disparate fields that I think we’re going to see dramatic impact from technology. And I think for researchers to have that holistic outlook, to visit other departments, to understand what are the challenges beyond their own group, I think is really, really important. And develop collaboration skills and techniques.

Host: Haiyan Zhang, it’s been a delight. Thanks for joining us today.

Haiyan Zhang: Thanks so much, Gretchen. It’s been a real pleasure, thank you.

Posted on Leave a comment

Podcast with Dr. Rico Malvar, manager of Microsoft Research’s NExT Enable group

Rico Malvar, Chief Scientist and Distinguished Engineer

Episode 61, January 30, 2019

From his deep technical roots as a principal researcher and founder of the Communications, Collaboration and Signal Processing group at MSR, through his tenure as Managing Director of the lab in Redmond, to his current role as Distinguished Engineer, Chief Scientist for Microsoft Research and manager of the MSR NExT Enable group, Dr. Rico Malvar has seen – and pretty well done – it all.

Today, Dr. Malvar recalls his early years at a fledgling Microsoft Research, talks about the exciting work he oversees now, explains why designing with the user is as important as designing for the user, and tells us how a challenge from an ex-football player with ALS led to a prize winning hackathon project and produced the core technology that allows you to type on a keyboard without your hands and drive a wheelchair with your eyes.


Episode Transcript

Rico Malvar: At some point, the leader of the team, Alex Kipman, came to us and says, oh, we want to do a new controller. What if you just spoke to the machine, made gestures and we could recognize everything? You say, that sounds like sci-fi. And then we said, no, wait a second, but to detect gestures, we need specialized computer vision. We’ve been doing computer vision for 15 years. To identify your voice, we need speech recognition. We’ve also been doing speech recognition for 15 years. Oh, but now there maybe be other sounds and multiple people… oh, but just a little over 10 years ago, we started these microphone arrays. They are acoustic antennas. And I said, wait a second, we actually have all the core elements, we could actually do this thing!

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: From his deep technical roots as a principal researcher and founder of the Communications, Collaboration and Signal Processing group at MSR, through his tenure as Managing Director of the lab in Redmond, to his current role as Distinguished Engineer, Chief Scientist for Microsoft Research and manager of the MSR NExT Enable group, Dr. Rico Malvar has seen – and pretty well done – it all.

Today, Dr. Malvar recalls his early years at a fledgling Microsoft Research, talks about the exciting work he oversees now, explains why designing with the user is as important as designing for the user, and tells us how a challenge from an ex-football player with ALS led to a prize winning hackathon project and produced the core technology that allows you to type on a keyboard without your hands and drive a wheelchair with your eyes. That and much more on this episode of the Microsoft Research Podcast.

Host: Rico Malvar, welcome to the podcast.

Rico Malvar: It’s a pleasure to be with you, Gretchen.

Host: You’re a Distinguished Engineer and Chief Scientist at Microsoft Research. How would you define your current role? What gets you up in the morning?

Rico Malvar: Ha ha! Uh, yeah, by chief scientist, it means I tell everybody what to do, very simple. (laughing) Yeah… Not really, but Chief Scientist is basically a way for me to have my fingers and eyes, in particular, on everything going on at Microsoft Research. So, I have an opportunity to interact with, essentially, all the labs, many of the groups, and find opportunities to do collaborative projects. And that is really super-exciting. And it’s really hard to be on top of what everybody is doing. It’s quite the opposite of telling people what to do, it’s like trying follow-up what they are doing.

Host: It’s um – on some level herding cats?

Rico Malvar: It’s not even herding. It’s where are they??

Host: You got to find the cats.

Rico Malvar: Find the cats, yeah.

Host: Well, talk a little bit about your role as Distinguished Engineer. What does that entail, what does that mean?

Rico Malvar: That’s basically… there’s a whole set of us. We have Distinguished Engineers and Technical Fellows which are at the top of our technical ladder. And the idea is a little bit recognition of some of the contributions we’ve done in the technical area, but it’s mostly our responsibility to go after big technical problems and don’t think just about the group you’re in, but think about the company, what the company needs, what the technology in that particular area should be evolving. My area, in particular, on the technical side, is signal processing, data compression, media compression. And these days, with audio and video entering the internet, that matters a lot. But also a few other areas, but that’s the idea. The idea is that what are the big problems in technology, how can we drive new things, how can we watch out for new things coming up at the company level?

Host: You know, those two things that you mentioned, drive things and anticipate things, are two kind of different gears and two different, I won’t say skillsets, but maybe it’s having your brain in two places.

Rico Malvar: You are right. It’s not completely different skillsets but driving and following are both important and one helps the other. And it’s very important for us to do both.

Host: Let’s go back to your roots a little bit. When you started here at Microsoft Research, you were a principle researcher and the founder and manager of what was called the Communications, Collaboration and Signal Processing group at MSR. So, tell us a little bit about the work you used to do and give us a short “where are they now?” snapshot of that group.

Rico Malvar: Yeah, that name is funny. That name was a bad example when you get too democratic about choosing names, and then we got everybody in the team to give ideas and then it got all complicated and we end up with a little bit of everything and came up with a boring name instead of a cool one. But it was a very descriptive name which was good. It was just called Signal Processing when we started, and then it evolved to Communication, Collaboration and Signal Processing because of the new things we were doing. For example, we had a big project on the collaboration area which is the prototype of a system which later evolved to become the RoundTable product. And that’s just not signal processing, it’s collaboration. Well, we have put collaboration. But people use it to communicate so it’s also communication, saying okay, put it all in the name. So, it’s just like that. And on your question of where people are, a cool thing is that we had a combination of expertise in the team to be able to do things like RoundTable. So, we had computer vision experts, we had distributed systems experts, we had streaming media experts and we had audio experts, on the last one for example, in audio. Then later, we actually evolved a new group doing specifically audio signal processing which is now led by Ivan Tashev who was a member of my team and now has his own team. He already participated in your podcast, so it’s nice to see the interesting challenges in those areas continue. And we keep evolving, as you know. The groups are always changing, modifying, renewing.

Host: In fact, that leads into my next question. Microsoft Research, as an entity, has evolved quite a bit since it was formed in 1991. And you were Managing Director in the mid-2000’s from like 2007 to 2010?

Rico Malvar: ‘10. Of the lab here in Redmond, yeah.

Host: Yeah. So, tell us a little bit about the history of the organization in the time you’ve been here.

Rico Malvar: Yeah. It’s great. One thing I really like about Microsoft Research is first, is that it started early with the top leaders in the company always believing in the concept. So, Bill Gates started Microsoft Research, driven by Nathan Myhrvold who was the CTO at the time, and it was a no-brainer for them to start Microsoft Research. They found Rick Rashid, who was our first leader of MSR. And I had the pleasure of reporting to Rick for many years. And the vision he put in, it is still to this day, is let’s really push the limits of technology. We don’t start by thinking how this is going to help Microsoft, we start by thinking how we push the technology, how it helps people. Later, we will figure out how it’s going to help Microsoft. And to this date, that’s how we operate. With the difference being, maybe, is that in the old days, the lab was more of a classical research lab. Almost everything was pivoted on research projects.

Host: Sure.

Rico Malvar: Which is great, and many, many of them generated good technology or even new products to the company. I was just talking about RoundTable as one example, and we have several. Of course, the vast majority fail because research is a business of failure and we all know that! We submit ten papers for publication, two or three get accepted. That is totally fine, and we keep playing the game. And we do the papers as a validation and also as a way to interact with the community. And both are extremely of value to us so we can have a better understanding we are pushing the state-of-the-art. And today, the new Microsoft Research puts even a little more emphasis on the impact side. We still want to push the state-of-the-art, we still do innovative things, but we want to spend a little more effort on making those things real.

Host: Yeah.

Rico Malvar: On helping the company. And even the company, itself, evolved to a point where that has even a higher value from Satya, our CEO, down. It is the mission of the company to empower people to do more. But empowering is not just developing the technology, it’s packaging it, shipping it in the right way, making products that actually leverage that. So, I would say the new MSR gets even more into, okay, what it takes to make this real.

Host: Well, let’s talk a little bit about Microsoft Research NExT. Give our listeners what I would call your elevator pitch of Microsoft NExT. What does it stand for, how does it fit in the portfolio of Microsoft Research? I kind of liken it to pick-up basketball, only with scientists and more money, but you do it more justice than I do!

Rico Malvar: That’s funny. Yeah, NExT is actually a great idea. As I said, we’re always evolving. And then, when Peter Lee came in, and also Harry Shum is our new leader, they thought hard about diversifying the approaches in which we do research. So, we still have the Microsoft Research labs, the part that is a bit more traditional in the sense that the research is mostly pivoted by areas. We have a graphics team, natural language processing group, human computer interaction, systems, and so forth. Many, many of them. When you go to NExT, the idea is different. One way to achieve potentially even more impact is pivot some of those activities, not by area, but by project, by impact goal. Oh, because of this technology and that technology, maybe we have an opportunity to do X, where X is this new project. Oh, but we’re going to have the first technology is computer vision, the other one is hardware architecture. Oops, we’re going to have to need people in all those areas together in a project team and then Peter Lee has been driving that, always trying to find disruptive, high impact things so that we can take new challenges. And lots of things are coming up from this new model which we call NExT, which is New Experiences in Technology.

Host: I actually didn’t know that, what the acronym stood for. I just thought it was, what’s NExT, right?

Rico Malvar: Of course, that is a cool acronym. Peter did a much better job than we did on the CCSB thing.

Host: I love it.

(music plays)

Host: Well, let’s talk about Enable, the group. There’s a fascinating story of how this all got started and it involves a former football player and what’s now called the yearly hackathon. Tell us the story.

Rico Malvar: That is exactly right. It all started when that famous football player, ex-football player, Steve Gleason, still a good partner of ours, is still a consultant to my team… Steve is a totally impressive person. He got diagnosed with ALS, and ALS is a very difficult disease because you basically lose mobility. And at some point in life, your organs may lose their ability to function, so, most people actually don’t survive ALS. But with some mitigations you can prolong, a little bit, and technology can help. Steve, actually, we quote him saying, “Until there is a cure for ALS, technology is the cure.” This is very inspiring. And he created a foundation, Team Gleason, that really does a wonderful job of securing resources and distributing resources to people with ALS. They really, really make a difference in the community. And he came to us almost five years ago, and we were toying with the idea of creating this hackathon, which is a company-wide effort to create hack-projects. And then in one of those, which actually the first time we did, which is in 2014, Steve told us, “You know what guys, I want to be able to do more. In particular, I want to be able to argue with my wife and play with my son. So, I need to communicate, and I need to move. My eyes still work, this eye tracking thing might be the way to go. Do you want to do something with that?” The hackathon team really got inspired by the challenge and within a very short period of time, they created an eye tracking system where you look at the computer and then there’s a keyboard and you can look at the keys and type at the keys by looking. And there is a play button so you can compose sentences and then speak out with your eyes.

Host: That’s amazing.

Rico Malvar: And they also created an interface where they put buttons, similar to a joy stick, on the screen. You look at those, and the wheelchair moves in the direction of where you are selecting. They did a nice overlay between the buttons and the video, so it’s almost like they put the computer, mount it on the wheelchair, you look through the computer, the camera shows what’s in front of you, and then the wheelchair goes. With lots of safety things like a stop button. And it was very successful, that project. In fact, it won the first prize.

Host: The hackathon prize?

Rico Malvar: On the hackathon prize. And then, a little bit later, Peter and I were thinking about where to go on new projects. And then Peter really suggested, Rico, what about that hackathon thing? That seems to be quite impactful, so maybe we want to develop that technology further. What do you think? I said, well if I had a team… (laughs) we could do that…

Host: (sings) If I only had a team…

Rico Malvar: (sings) If I only had had a team… And then Peter said, ehh, how many people you need? I don’t know, six, seven to start. I said, okay, let’s go do it. It was as easy as that.

Host: Well, let’s talk a little bit more about the hackathon. Like you said, it’s about in its fifth year. And, as I understand it, it’s kind of a ground-up approach. Satya replaced the annual “executive-inspirational-talk-top-down” kind of summer event with, hey, let’s get the whole company involved in invention. I would imagine it’s had a huge impact on the company at large. But how would you describe the role of the hackathon for people in Microsoft Research now? It seems like a lot of really interesting things have come out of that summer event.

Rico Malvar: You know, for us, it was a clear thing, because Microsoft Research was always bottom-up. I mean, we literally don’t tell researchers what to do. People, researchers, engineers, designers, managers, they all have great ideas, right? And they come up with those great ideas. When they click enough, they start developing something and we look from the top and say, that sounds good, keep going, right? So, we try to foster the most promising ones. But the idea of bottom-up was already there.

Host: Yeah.

Rico Malvar: When we look at the hackathon, we say, hey, thanks to Satya and the new leadership of Microsoft, the company’s embracing this concept of moving bottom-up. There’s The Garage. The Garage has been involved with many of those hackathons. Garage has been a driver and supporter of the hackathon. So, to us, it was like, hey, great, that’s how we work! And now we’re going to do more collaboration with the rest of the company.

Host: You have a fantastic and diverse group of researchers working with you, many of whom have been on the podcast already and been delightful. Who and what does it take to tackle big issues, huge ideas like hands-free keyboards and eye tracking and 3-D sound?

Rico Malvar: Right. One important concept, and it’s particularly important for Enable, is that we really need to pay attention to the user. Terms such as “user-centric” – yeah, they sound like cliché – but especially in accessibility, this is super important. For example, in our Enable team, the area working with eye tracking, our main intended user were people with ALS since the motivation from Steve Gleason. And then, in our team, Ann Paradiso, who is our user experience manager, she created what we call the PALS program. PALS means Person with ALS. And we actually brought people with ALS in their wheelchairs and everything to our lab and discussed ideas with them. So, they were not just testers, they were brainstorming with us on the design and technologies…

Host: Collaborators.

Rico Malvar: Collaborators. They loved doing it. They really felt, wow, I’m in this condition but I can contribute to something meaningful and we will make it better for the next generation…

Host: Sure.

Rico Malvar: …of people with this. So, this concept of strong user understanding through user design and user research, particularly on accessibility, makes a big difference.

Host: Mmm hmm. Talk a little bit about the technical side of things. What kinds of technical lines of inquiry are you really focusing on right now? I think our listeners are really curious about what they’re studying and how that might translate over here if they wanted to…

Rico Malvar: That’s a great question. Many of the advancements today are associated with artificial intelligence, AI, because of all the applications of AI, including in our projects. AI is typically a bunch of algorithms and data manipulation in finding patterns in data and so forth. But AI, itself, doesn’t talk to the user. You still need the last mile of the interfaces, the new interface. Is the AI going to appear to the user as a voice? Or as something on the screen? How is the user going to interact with the AI? So, we need new interfaces. And then, with the evolution of technology, we can develop novel interfaces. Eye tracking being an example. If I tell you that you’re going to control your computer with your eyes, you’re going to say, what? What does that mean? If I tell you, you’re going to control the computer with your voice, you say, oh yeah, I’ve been doing that for a while. With the eye tracking for a person with a disability, they immediately get it and say, a-ha! I know what it means, and I want to use that. For everybody, suppose, for example, that you are having your lunch break and you want to browse the news on the internet, get up to date on a topic of interest. But you’re eating a sandwich. Your hands are busy, your mouth is busy, but your eyes are free. You could actually flip around pages, do a lot of things, just with your eyes and you don’t need to worry about cleaning your hands and touching the computer because you don’t need to touch the computer. And you can think, in the future, where you may not even need your eyes. I may read your thoughts directly. And, at some point, it’s just a matter of time. It’s not that far away. We are going to read your thoughts directly.

Host: That’s both exciting and scary. Ummmm…

Rico Malvar: Yes.

Host: What does it take to say, all right, we’re going to make a machine be able to look at your eyes and tell you back what you are doing?

Rico Malvar: Yeah, you see, it’s a specialized version of computer vision. It’s basically cameras that look at your eyes. In fact, the sensor works by first illuminating your eyes with bright IR lights, infrared, so it doesn’t bother you because you can’t see. But now you have this bright image that the camera is looking at, IR can see, and then models in a little bit of AI and a little bit of just graphics and computer vision and signal modeling, that then make an estimate of the position of your eyes and associate that with elements on the screen. So, it’s almost as if you have a cursor on the screen.

Host: Okay.

Rico Malvar: That is controlled with your eyes, very similar to a mouse, with the difference that the eye control works better if we don’t display the cursor. With the mouse, you actually should display the cursor…

Host: Ooohhh, interesting….

Rico Malvar: …with eye control, the cursor works better if it is invisible. But you see the idea there is that you do need specialists, you need folks who understand that. And sometimes you do a combination of some of that understanding being in the group, so we need to be the top leaders in that technology, or we partner with partners that have a piece of the technology. For example, for the eye tracking, we put much more emphasis on designing the proper user interfaces and user experiences, because there are companies that do a good job introducing eye tracking devices. So, we leverage the eye tracking devices that these companies produce.

Host: And behind that, you are building on machine learning technologies, on computer vision technologies and… um… so…

Rico Malvar: Correct. For example, a typical one is that the keyboard driven by your eyes. You still want to have a predictive keyboard.

Host: Sure.

Rico Malvar: So, as you are typing the letters, it guesses. But how you interface on the guess, it’s very interesting, because when you are typically using a keyboard, your eye is looking at the letters, your fingers are typing on the keys. When you’re doing an eye control keyboard, your eye has to do everything. So, how you design the interface should be different.

Host: Yeah.

Rico Malvar: And we’ve learned and designed good ways to make that different.

Host: If I’m looking at the screen and I’m moving my eyes, how does it know when I’m done, you know, like that’s the letter I want? Do I just laser beam the…??

Rico Malvar: You said you would be asking deep technical questions and you are. That one, we use the concept that we call “dwelling.” As you look around the keyboard, remember that I told you we don’t display the cursor?

Host: Right.

Rico Malvar: So, but as you – the position where you look in your eyes, the focus of your eye, is in a particular letter, we highlight that letter. It can be a different color, it can be a lighter shade of grey…

Host: Gotcha.

Rico Malvar: So, as you move around, you see the letters moving around. If you want to type a particular letter, once you get to that letter, you stop moving for a little bit, let’s say half a second. That’s a dwell. You dwell on that letter a little bit and we measure the dwell. And there’s a little bit of AI to learn what is the proper dwell time based on the user.

(music plays)

Host: One thing I’m fascinated by, not just here, but in scientific ventures everywhere, is the research “success story.” The one that chronicles the path of a blue-sky research thing to instantiation in a product. And, I know, over and over, researchers have told me, research is generally a slow business, so it’s not like, oh, the overnight success story, but there’s a lot of hard-won success stories or stories that sort of blossomed over multiple years of serendipitous discovery. Do you have any stories that you could share about things that you’ve seen that started out like a hair-brained idea and now millions of people are using?

Rico Malvar: You know, there’s so many examples. I particularly like the story of Kinect, which was actually not a product developed by Microsoft Research, but in close collaboration with Microsoft Research. It was the Kinect team, at the time, in Windows. Because at some point, the leader of the team, Alex Kipman, came to us and says, oh, we want to do a new controller. What if you just spoke to the machine, made gestures and we could recognize everything? You say, that sounds like sci-fi. So, naahhh, that doesn’t work. But then Alex was very insistent. And then we said, no, wait a second, but to detect gestures, we need specialized computer vision. We’ve been doing computer vision for 15 years. To identify your voice, we need speech recognition. We’ve also been doing speech recognition for 15 years. Oh, but now there maybe be other sounds and there are maybe multiple people… oh, but just a little over 10 years ago, we started these microphone arrays. They are acoustic antennas. They can tune to the sound of whoever is speaking all of that.

Host: Directional.

Rico Malvar: The directional sound input. And I said, wait a second, we actually have all the core elements, we could actually do this thing. So, after the third or fourth meeting, I said, okay Alex, I think we can do that. And he said, great, you have two years to do it. What??? Yeah, because we need to ship at this particular date. And it all worked. I doubt there’s some other institution or company that could have produced that because we’ve been doing what was, apparently, “blue-sky” for many years, but then we created all those technologies and when then need arose, I say, a-ha, we can put them altogether.

Host: Where is Kinect today?

Rico Malvar: Kinect used to be a peripheral device for Xbox. We changed it into an IoT device. So, there’s a new Kinect kit, connects to Azure so people can do Kinect-like things, not just for games but for everything. And all the technology that supports that is now in Azure.

Host: So, Rico, you have a reputation for being an optimist. You’ve actually said as much yourself.

Rico Malvar: (laughs) Yes, I am!

Host: Plus, you work with teams on projects that are actually making the lives of people with disabilities, and others, profoundly better. But I know some of the projects that you worked on fall somewhere in the bounds of medical interventions.

Rico Malvar: Mmm-hmm.

Host: So, is there anything about what you do that keeps you up at night, anything we should be concerned about?

Rico Malvar: Yeah, you know, when you are helping a person with disability, sometimes what you are doing can be seen as, is that a treatment, is that a medical device? In most cases, they are not. But the answer to those questions can be complicated and there can be regulations. And of course, Microsoft is a super-responsible company, and if anything is regulated, of course, we are going to pay attention to the regulations. But some of those are complex. So, doing it right by the regulations can take significant amount of work. So, we have to do this extra work. So, my team has to spend time, sometimes in collaboration with our legal team, to make sure we do the right things. And I hope also that we will help evolve those regulations, potentially by working with the regulatory bodies, educating them on the evolution of the technology. Because in all areas, not just this area, but almost all areas of technology, regulations tend to be behind. It’s hard to move, and understandably so. So, the fact that we have to spend significant effort dealing with that does keep me up at night a little bit. But we do our best.

Host: You know, there’s a bit of a Wild West mentality where you have to, like you say, educate. And so, in a sense what I hear you saying is that, as you take responsibility for what you are doing, you are helping to shape and inform the way the culture onboards these things.

Rico Malvar: Exactly right, yes. Exactly right.

Host: So, how would you sort of frame that for people out there? How do we, you, help move the culture into a space that more understands what’s going on and can onboard it with responsibility themselves?

Rico Malvar: That is a great question. And you see for example, in areas such as AI, artificial intelligence, people are naturally afraid of how far can AI go? What are the kinds of things it could do?

Host: Yeah.

Rico Malvar: Can we regulate so that there will be some control in how it’s developed? And Microsoft has taken the stance that we have to be very serious about AI. We have to be ethical, we have to preserve privacy and all of those things. So, instead of waiting for regulation and regulatory aspects to develop, let’s help them. So, we were founders of – not just me, but the company and especially the Microsoft Research AI team – founders of the Partnership for AI, in partnership with other companies to actually say no, let’s be proactive about that.

(music plays)

Host: Tell us a bit about Rico Malvar. Let’s go further back than your time here at MSR and tell us how you got interested in technology, technology research. How did you end up here at Microsoft Research?

Rico Malvar: Okay, on the first question, how I got interested in technology? It took me a long time. I think I was 8 years old when my dad gave me an electronics kit and I start playing with that thing and I said, a-ha! That’s what I want to do when I grow up. So, then I went through high school taking courses in electronics and then I went to college to become an electrical engineer and I loved the academic environment, I loved doing research. So, I knew I wanted to do grad school. I got lucky enough to be accepted at MIT and when I arrived there, I was like, boy, this place is tough! And it was tough! But then when I finished and I went back to my home country, I created the signal processing group at the school there, which was… I was lucky to get fair amounts of funding, so we did lots of cool things. And then, one day, some colleagues in a company here in the US called me back in Brazil and they say, hey, our director of research decided to do something else. Do you want to apply for the position? And then I told my wife, hey, there’s a job opening in the US, what about that? I said, well go talk to them. And I came, talked to them. They make me an offer. And then it took us about a whole month discussing, are we going to move our whole family to another country? Hey, we lived there before, it’s not so bad, because I studied here. And maybe it’s going to be good for the kids. Let’s go. If something doesn’t work, we move back. I say, okay. So, and… here we are. But that was not Microsoft. That was for another company at the time, a company called PictureTel which was actually the leading company in professional video conferencing systems.

Host: Oh, okay.

Rico Malvar: So, we were pushing the state-of-the-art on how do you compress video and audio and these other things? And I was working happily there for about four years and then one day I see Microsoft and I say, wow, Microsoft Research is growing fast. Then one afternoon, I said, ah, okay, I think about it and I send an email to the CTO of Microsoft saying, you guys are great, you are developing all these groups. You don’t have yet a group on signal processing. And signal processing is important because one day we’re going to be watching video on your computers via the internet and all of that, so you should be investing more on that. And I see you already have Windows Media Player. Anyways, if you want to do research in signal processing, here’s my CV. I could build and lead a group for you doing that. And then I tell my wife and she goes, you did what?? You sent an email to the CTO of Microsoft??

Host: Who was it at the time?

Rico Malvar: It was Nathan Myhrvold.

Host: Nathan.

Rico Malvar: And she said, nah. I say, what do I have to lose? The worst case, they don’t respond, and life is good. I have a good job here. It’s all good. And that was on a Sunday afternoon. Monday morning, I get an email from Microsoft. Hey, my name is Suzanne. I work on recruiting. I’m coordinating your interview trip. I said, alright! And then I show the email to my wife and she was like, what? It worked? Whoa! And then it actually was a great time. The environment here, from day one, since the interviews, the openness of everybody, of management, the possibilities and the desire of Microsoft to, yeah, let’s explore this area, this area. One big word here is diversity. Diversity of people, diversity of areas. It is so broad. And that’s super exciting. So, I was almost saying, whatever offer they make me, I’ll take it! Fortunately, they made a reasonable one, so it wasn’t too hard to make that decision.

Host: Well, two things I take away from what you’ve just told me. You keep using the word lucky and I think that has less to do with it than you are making it out to be. Um, because there’s a lot of really smart people here that say, I was so lucky that they offered me this. It’s like, no, they’re lucky to have you, actually. But also, the idea that if you don’t ask, you are never going to know whether you could have or not. I think that’s a wonderful story of boldness and saying why not?

Rico Malvar: Yeah. And in fact, boldness is very characteristic of Microsoft Research. We’re not afraid. We have an idea, we just go and execute. And we’re fortunate, and I’m not going to say lucky, I’m going to say fortunate, that we’re in a company that sees that and gives us the resources to do so.

Host: Rico, I like to ask all my guests, as we come to the end of our conversation, to offer some parting thoughts to our listeners. I think what you just said is a fantastic parting thought. But maybe there’s more. So, what advice or wisdom would you pass on to what we might call the next generation of technical researchers? What’s important for them to know? What qualities should they be cultivating in their lives and work in order to be successful in this arena?

Rico Malvar: I would go back on boldness and diversity. Boldness, you’ve already highlighted Gretchen, that, you know, if you have an idea but it’s not just too rough an idea, you know a thing or two why that actually could work, go after it! Give it a try. Especially if you are young. Don’t worry if you fail many things. I failed many things in my life. But what matters is not the failures. You learn from the failures and you do it again. And the other one is diversity. Always think diversity in all the dimensions. All kind of people, everywhere in the world. It doesn’t matter gender, race, ethnicity, upbringing, rich, poor, whatever they come from, everybody can have cool ideas. The person whom you least expect to invent something might be the one inventing. So, listen to everybody because that diversity is great. And remember, the diversity of users. Don’t assume that all users are the same. Go learn what users really think. If you are not sure if Idea A or Idea B is the better, go talk to them. Try them out, test, get their opinion, test things with them. So, push diversity on both sides, diversity on the creation and diversity on who is going to use your technology. And don’t assume you know. In fact, Satya has been pushing the whole company towards that. Put us in a growth mindset which basically means keep learning, right? Because then if you do that, that diversity will expand and then we’ll be able to do more.

Host: Rico Malvar, I’m so glad that I finally got you on the podcast. It’s been delightful. Thanks for joining us today.

Rico Malvar: It has been a pleasure. Thanks for inviting me.

(music plays)

To learn more about Dr. Rico Malvar and how research for people with disabilities is enabling people of all abilities, visit