post

Helping first responders achieve more with autonomous systems and AirSim

With inputs from: Elizabeth Bondi (Harvard University), Bob DeBortoli (Oregon State University), Balinder Malhi (Microsoft) and Jim Piavis (Microsoft)

Autonomous systems have the potential to improve safety for people in dangerous jobs, particularly first responders. However, deploying these systems is a difficult task that requires extensive research and testing.

In April, we explored complexities and challenges present in the development of autonomous systems and how technologies such as AirSim provide a pragmatic way to solve these tasks. Microsoft believes that the key to building robust and safe autonomous systems is providing a system with a wide range of training experiences to properly expose it to many scenarios before it can be deployed in the real world. This ensures training is done in a meaningful way—similar to how a student might be trained to tackle complex tasks through a curriculum curated by a teacher.

With autonomous systems, first responders gain sight into the unknown

One way Microsoft trains autonomous systems is through participating in unique research opportunities focused on solving real-world challenges, like aiding first responders in hazardous scenarios. Recently, our collaborators at Carnegie Mellon University and Oregon State University, collectively named Team Explorer, demonstrated technological breakthroughs in this area during their first-place win at the first round of the DARPA Subterranean (SubT) Challenge.

Snapshots from the AirSim simulation showing the effects of different conditions such as water vapor, dust and heavy smoke. Such variations in conditions can provide useful data when building robust autonomous systems.

Snapshots from the AirSim simulation showing the effects of different conditions such as water vapor, dust and heavy smoke. Such variations in conditions can provide useful data when building robust autonomous systems.

The DARPA SubT Challenge aspires to further the technologies that would augment difficult operations underground. Specifically, the challenge focuses on the methods to map, navigate, and search complex underground environments. These underground environments include human-made tunnel systems, urban underground, and natural cave networks. Imagine constrained environments that are several kilometers long and structured in unique ways with regular or irregular geological topologies and patterns. Weather or other hazardous conditions, due to poor ventilation or poisonous gasses, often make first responders’ work even more dangerous.

Team Explorer engaged in autonomous search and detection of several artifacts within a man-made system of tunnels. The end-to-end solution that the team created required many different complex components to work across the challenging circuit including mobility, mapping, navigation, and detection.

Microsoft’s Autonomous Systems team worked closely with Team Explorer to provide a high-definition simulation environment to help with the challenge. The team used AirSim to create an intricate maze of man-made tunnels in a virtual world that was representative of such real-world tunnels, both in complexity as well as size. The virtual world was a hybrid synthesis, where a team of artists used reference material from real-world mines to modularly generate a network of interconnected tunnels spanning two kilometers in length spread over a large area.

Additionally, the simulation included robotic vehicles—wheeled robots as well as unmanned aerial vehicles (UAVs)—and a suite of sensors that adorned the autonomous agents. AirSim provided a rich platform that Team Explorer could use to test their methods along with generate training experiences for creating various decision-making components for the autonomous agents.

At the center of the challenge was the ability for the robots to perceive the underground terrain and discover things (such as human survivors, backpacks, cellular phones, fire extinguishers, and power drills) while adjusting to different weather and lighting conditions. Multimodal perception is important in challenging environments, as well as AirSim’s ability to simulate a wide variety of sensors, and their fusion can provide a competitive edge. One of the most important sensors is a LIDAR, and in AirSim, the physical process of generating the point clouds are carefully reconstructed in software, so the sensor used on the robot in simulation uses the same configuration parameters (such as number-of-channels, range, points-per-second, rotations-per-second, horizontal/vertical FOVs, and more) as those found on the real vehicle.

It is challenging to train perception modules based on deep learning models to detect the target objects using LIDAR point clouds and RGB cameras. While curated datasets, such as ScanNet and MS COCO, exist for more canonical applications, none exist for underground exploration applications. Creating a real dataset for underground environments is expensive because a dedicated team is needed to first deploy the robot, gather the data, and then label the captured data. Microsoft’s ability to create near-realistic autonomy pipelines in AirSim means that we can rapidly generate labeled training data for a subterranean environment.

Detecting animal poaching through drone simulations

With autonomous systems, the issues with collection data are further exacerbated for applications that involve first line responders since the collection process is itself dangerous. Such challenges were present in our collaboration with Air Shepherd and USC to help counter wildlife poaching.

The central task in this collaboration was the development of UAVs equipped with thermal infrared cameras that can fly through national parks at night to search for poachers and animals. The project had several challenges, the largest of which was building such a system that requires data for both training as well as testing purposes. For example, labeling a real-world dataset, which was provided by Air Shepherd, took approximately 800 hours over the course of 6 months to complete. This produced 39,380 labeled frames and approximately 180,000 individual poacher and animal labels on those frames. This data was used to build a prototype detection system called SPOT but did not produce acceptable precision and recall values.

AirSim was then used to create a simulation, where virtual UAVs flew over virtual environments like those found in the Central African savanna at an altitude from 200 to 400 feet above ground level. The simulation took on the difficult task of detecting poachers and wildlife, both during the day and at night, and ultimately ended up increasing the precision in detection through imaging by 35.2%.

Driving innovation through simulation

Access to simulation environments means that we have a near-infinite data generation machine, where different simulation parameters can be chosen to generate experiences at will. This capability is foundational to test and debug autonomous systems that eventually would be provably robust and certified. We continue to investigate such fuzzing and falsification framework for various AI systems.

Holistic challenges such as the DARPA SubT Challenge, and partnerships with organizations like Air Shepherd allow researchers and developers to build complete solutions that cover a wide array of research topics. There are many research challenges at the intersection of robotics, simulations, and machine intelligence that we continue to invest in our journey to build toolchains, enabling researchers and developers to build safe and useful simulations and robots.

We invite readers to explore AirSim on our GitHub repository and invest in our journey to build toolchains in collaboration with the community. The AirSim network of man-made caves environment was co-created with Team Explorer for the DARPA SubT Challenge and is publicly available for researchers and developers.

post

New Microsoft fellowship program empowers faculty research through Azure cloud computing

August 1, 2019 | By Jamie Harper, Vice-President, US Education

Microsoft is expanding its support for academic researchers through the new Microsoft Investigator Fellowship. This fellowship is designed to empower researchers of all disciplines who plan to make an impact with research and teaching using the Microsoft Azure cloud computing platform.

From predicting traffic jams to advancing the Internet of Things, Azure has continued to evolve with the times, and this fellowship aims to keep Azure at the forefront of new ideas in the cloud computing space. Similarly evolving, Microsoft fellowships have a long history of supporting researchers, seeking to promote diversity and promising academic research in the field of computing. This fellowship is an addition to this legacy that highlights the significance of Azure in education, both now and into the future.

Full-time faculty at degree-granting colleges or universities in the United States who hold PhDs are eligible to apply. This fellowship supports faculty who are currently conducting research, advising graduate students, teaching in a classroom, and plan to or currently use Microsoft Azure in research, teaching, or both.

Fellows will receive $100,000 annually for two years to support their research. Fellows will also be invited to attend multiple events during this time, where they will make connections with other faculty from leading universities and Microsoft. They will have the opportunity to participate in the greater academic community as well. Members of the cohort will also be offered various training and certification opportunities.

When reviewing the submissions, Microsoft will evaluate the proposed future research and teaching impact of Azure. This will include consideration of how the Microsoft Azure cloud computing platform will be leveraged in size, scope, or unique ways for research, teaching, or both.

Candidates should submit their proposals directly on the fellowship website by August 16, 2019. Recipients will be announced in September 2019.

We encourage you to submit your proposal! For more information on the Microsoft Investigator Fellowship, please check out the fellowship website.

post

Podcast: The brave new world of cloud-scale systems and networking with Microsoft Research Asia’s Dr. Lidong Zhou

Dr. Lidong Zhou

Episode 82, June 26, 2019

If you’re like me, you’re no longer amazed by how all your technologies can work for you. Rather, you’ve begun to take for granted that they simply should work for you. Instantly. All together. All the time. The fact that you’re not amazed is a testimony to the work that people like Dr. Lidong Zhou, Assistant Managing Director of Microsoft Research Asia, do every day. He oversees some of the cutting-edge systems and networking research that goes on behind the scenes to make sure you’re not amazed when your technologies work together seamlessly but rather, can continue to take it for granted that they will!

Today, Dr. Zhou talks about systems and networking research in an era of unprecedented systems complexity and what happens when old assumptions don’t apply to new systems, explains how projects like CloudBrain are taking aim at real-time troubleshooting to address cloud-scale, network-related problems like “gray failure,” and tells us why he believes now is the most exciting time to be a systems and networking researcher.

Related:


Transcript

Lidong Zhou: We have seen a lot of advances in, for example, machine learning and deep learning. So, one thing that we have been looking into is how we can leverage all those new technologies in machine learning and deep learning and apply it to deal with the complexity in systems.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: If you’re like me, you’re no longer amazed by how all your technologies can work for you. Rather, you’ve begun to take for granted that they simply should work for you. Instantly. All together. All the time. The fact that you’re not amazed is a testimony to the work that people like Dr. Lidong Zhou, Assistant Managing Director of Microsoft Research Asia, do every day. He oversees some of the cutting-edge systems and networking research that goes on behind the scenes to make sure you’re not amazed when your technologies work together seamlessly but rather, can continue to take it for granted that they will!

Today, Dr. Zhou talks about systems and networking research in an era of unprecedented systems complexity and what happens when old assumptions don’t apply to new systems, explains how projects like CloudBrain are taking aim at real-time troubleshooting to address cloud-scale, network-related problems like “gray failure,” and tells us why he believes now is the most exciting time to be a systems and networking researcher. That and much more on this episode of the Microsoft Research Podcast.

Host: Lidong Zhou, welcome to the podcast.

Lidong Zhou: Yes. It’s great to be here.

Host: As the Assistant Managing Director of MSR Asia, you are, among other things, responsible for overseeing research in systems and networking, and I know you’ve done a lot of research in systems and networking over the course of your career as well. So, in broad strokes, what do you do and why do you do it? What gets you up in the morning?

Lidong Zhou: Yeah, I think, you know, this is one of the most exciting times to do research in systems and networking. And we already have seen advances of, you know, systems and networking have been pushing the envelopes in many technologies. We’ve seen the internet, the web, web search, big data, and all the way to the artificial intelligence and cloud computing that, you know, everybody kind of relies on these days.

Host: Yeah.

Lidong Zhou: All those advances have created challenges of unprecedented complexity, scale and a lot of dynamism. So, my understanding, you know, of systems is always, you know, a system is about bringing order to chaos, right? The chaotic situation. So, we are actually in a very chaotic situation where things change so fast and there are a lot of, you know, new technologies coming. And so, when we talk about systems research, it’s really about transforming all those unorganized pieces into a unified whole, right? That’s why, you know, we’re very excited about all those challenges. And also, we realized over the years that it’s actually not just the typical systems expertise – when we talk about distributed systems, operating systems or networking – that’s actually not enough to address the challenges we’re facing. Like, you have to actually also master other fields like, you know, database systems and programming languages, compilers, hardware, and also in artificial intelligence and machine learning and deep learning. And what I do at Microsoft Research Asia, is to put together a team with a diverse set of expertise and inspire the team to take on those big challenges together by, you know, working together, and, you know, that’s a very exciting job to have.

Host: I love the “order out of chaos” representation… if you’ve ever been involved in software code writing, you write this here and someone else is writing that there, and it has to work together, and you’ve got ten other people writing… and we all just take for granted, on my end, it’s going to work. And if it doesn’t, I curse my computer!

Lidong Zhou: Yes, that’s our problem!

Host: Well, I had Hsiao-Wuen Hon on the podcast in November for the 20th anniversary of the lab there, and he talked about the mission to, in essence, both advance the theory and practice of computing, in general. Your own nearly twenty-year career has been about advancing the theory and practice of distributed systems, particularly. So, talk about some of the initiatives you’ve been part of and technical contributions you’ve made to distributed systems over the years. You’ve just come off the heels of talking about the complexities. Now, how have you seen it evolve over those years?

Lidong Zhou: You know, I think we are getting into the year of distributed systems. Being a distributed systems person, we always believe, you know, what we’re working on is the most important piece. You know, I think Microsoft Research is really a great place to connect theory and practice, because we are constantly exposed to very difficult technical challenges from the product teams. They’re tackling very difficult problems, and we also have the luxury of stepping back and thinking deeply about the problems we’re facing and thinking about what kinds of new theories we want to develop, what new methodologies we can develop to address those problems. I remember, you know, in early 2000, when Microsoft started doing web search, and we had a meeting with the dev manager, who was actually in charge of architecting the web search system. And so, we had a, you know, very interesting discussion. We talked about, you know, how we were doing research in distributed systems, how we had to deal with, you know, a lot of problems when services fail. So, we have to make sure that the whole service actually stays correct in the face of all kinds of problems that you can see in a distributed system. I remember at that time, we had Roy Levin, Leslie Lamport, you know, a lot of colleagues, and we talked about protocols. And, at the beginning, the dev manager basically said, oh yeah, I know, you know, it’s complicated to deal with all these failures, but it’s actually under control. And a couple months later, he came back and said, oh, you know, there’s so many corner cases. It’s just beyond our capability of reasoning about the correctness. And we need the protocols that we were talking about. But it’s also interesting that, you know, in developing those protocols, we tend to make some assumptions. Say, okay, you know, we can tolerate a certain number of failures. And one question that the general manager asked was, you know, what happens if we have more than that number of failures in the system, right? And from a practical point of view, you have to deal with those kinds of situations. In theory, when you work on theory, then, you know, you can say, okay, let’s make an assumption and let’s just work under that assumption. So, we see that there’s a difference between theory and practice. The nice thing about working at Microsoft Research is you can actually get exposed to those real problems and keep you honest about what assumptions are reasonable, what assumptions are not reasonable. And then you think about, you know, what is the best way of solving those problems in a more general sense rather than just solving a particular problem?

Host: Your work in networked computer systems is somewhat analogous to another passion of yours that I’m going to call “networked human systems.” In other words, your desire to build community among systems researchers. How are you going about that? I’m particularly interested in your Asia Pacific Systems workshop and the results you’ve seen come out of that.

Lidong Zhou: So, I moved to Microsoft Research Asia in late 2008, and, when I was in the United States, clearly there is a very strong systems community. And, over the years, we’ve also seen that community sort of expanding into Europe. So, the European systems community sort of started the systems workshop, and eventually it evolved into a conference called EuroSys, and very successfully. And you know we see a lot of people getting into systems and networking because of the community, because of the influence of those conferences. And the workshop has been very successful in gathering momentum in the region. And so, in 2010, I remember it was Chandu Thekkath and Rama Kotla who were my colleagues at Microsoft Research, and they basically had this idea that maybe we should start something also in the Asia Pacific region. At that time, I was already working in Beijing, and I thought, you know, this is also part of my obligation. So, in 2010, we started the first Asia Pacific systems workshop. And it was a humble beginning. We had probably about thirty submissions and accepted probably a dozen. It was a good workshop, but it was a very humble beginning, as I said. But what happened after that was really beyond our expectation. It’s like, you know, we just planted a seed, and the community sort of picked it up and grew with it. And, you know, it’s very satisfying to see that we’re actually going to have the tenth workshop in Hangzhou in August. If you look at the organizing committee, they are really you know all world-class researchers from all over the world. It’s not just from a particular region, but you know really, all the experts across the world contributed to the success of this workshop over the last, you know, almost ten years now. And the impact that this workshop has is actually pretty tremendous.

Host: What would you attribute it to?

Lidong Zhou: I think it’s really, first of all, this is the natural trend, right? You go from… the U.S. was leading in systems research and, and then expanded to Europe. And it’s just a natural trajectory to expand further to Asia Pacific given, you know, a lot of, you know, technological advances are happening in Asia. And the other, you know, reason is because the community really came together. There are a lot of top systems researchers that originally, just like me, came from the Asia Pacific region. So, we have a lot of incentives and commitment to give back.

Host: Right.

Lidong Zhou: And all those enthusiasms, passion, or the willingness to help young researchers in the region, I mean those actually contributed to the success of the workshop, in my view.

Host: Well, you were recently involved in hosting another interesting workshop, or conference: The Symposium on Operating Systems Principles, right?

Lidong Zhou: Right.

Host: SOSP?

Lidong Zhou: SOSP.

Host: And this was in Shanghai in 2017. It’s the premier conference for computer systems technology. And as I understand, it’s about as hard to win the bid for as the Olympics!

Lidong Zhou: Yes, almost.

Host: So why was it important to host this conference for you, and how do you think it will help broaden the reach of the systems community worldwide?

Lidong Zhou: So, SOSP is one of the most important systems conferences and traditionally, it has been held in the U.S. and later on, they started rotating into Europe. And it was really a very interesting journey that we went through, along with Professor Haibo Chen who is from Shanghai Jiao Tong University. We started pitching for having SOSP in the Asia Pacific region in 2011. That was like six years before we actually succeeded! We pitched three times. But overall, even for the first time, the community was very supportive in many ways, so that we’d be very careful to make sure that the first one is going to be a success. And in 2017, when Haibo and I opened the conference, I was actually very happy that I didn’t have to be there to make another pitch! I was essentially opening the conference. And it was very successful in the sense that we had a record number of attendees, over eight hundred people…

Host: Wow.

Lidong Zhou: …and we had almost the same number, if not a little bit more, from the U.S. and Europe. And we had, you know, many more people from the region, which was what we intended.

Host: Mm-hmm.

Lidong Zhou: And having the conference in the Asia Pacific is actually very significant to the region. We’re seeing more and more high-quality work and papers in those top conferences from the Asia Pacific region, you know, from Korea, India, China, and many other countries.

Host: Right.

Lidong Zhou: And I’d like to believe that what we have done sort of helped a little bit in those regards.

(music plays)

Host: Let’s talk about the broader topic of education for a minute. This is really, really important for the systems talent pipeline around the world. And perhaps the biggest challenge is expanding and improving university-level education for this talent pipeline. MSRA has been hosting a systems education workshop for the past three years. The fourth is coming up this summer, and none other than Turing Award winner John Hopcroft has praised it as “a step toward improving education and cultivating world-class talent.” And he also said a fifth of the world’s talent is in the Asia Pacific region, so we’d better get over there. Tell us about this ongoing workshop.

Lidong Zhou: Yeah, actually John really inspired us to get this started I think more than three years ago.

Host: Mm-hmm.

Lidong Zhou: And I think we’re seeing a need to improve, you know, systems education. But more importantly, I think, for MSR Asia, one of the things that we’re very proud of doing is connecting educators and researchers from all over the world, especially connecting people from, you know, the U.S. and Europe with those in the Asia Pacific region. And the other thing that we are also very proud of doing is cultivating the next generation of computer scientists. And certainly, as you said, you know, the most important thing is education. And during the process, what we found, is that there are a lot of professors who share the same passion. And we’re talking about, you know, a couple of professors, Lorenzo Alvisi from Cornell and Robbert van Renesse from Cornell and Geoff Voelker from UCSD… they actually came all the way from the U.S. just to be at the workshop, talking to all the systems professors from all over the country in China. And so, I attended those workshops myself. The first one was five days, and the next two were, like, three days. It’s a huge time commitment.

Host: Yeah.

Lidong Zhou: But you see all the passion from those professors. They’re really into improving teaching. They’re trying to figure out, you know, how to make students more engaged, how to get them excited about systems, even how to design experiments, all those aspects. And, you know, we’re really optimistic that with those passionate professors, we’re going to see a very strong new generation of systems researchers. And this is, you know, I think the kind of impact we really want to see from a perspective of, you know, Microsoft Research Asia. It’s not just about making the lab successful, but, if we can make an impact in the community in terms of talent, in terms of the quality of education, that’s much more satisfying.

Host: Before we get into specific work, I’d like you to talk about what you’d referred to as a fundamental shift in the way we need to design systems – and by we, I mean you – in the era of cloud computing and AI. You’ve suggested that things have changed enough that the older methodologies and principles aren’t valid anymore. So, unpack that for us. What’s changed and what needs to happen to build next-gen systems?

Lidong Zhou: Yeah, that’s a great question. I’ll continue with the story about building fault-tolerant systems. So, in the last thirty years, we have been working on systems reliability, and we have developed a lot of techniques, a lot of protocols, and we think it will solve all the problems. But if you look at how this thread of work started, it really started in the late seventies when we were looking at the reliability of airplanes, and so on. Of course, you know, there are assumptions we make about the kinds of failures in those kinds of systems. And we sort of generalize those protocols so that it can be applicable up until now. But if you look at the cloud, it’s much more complicated, in many dimensions. And the system also evolves very quickly. And a lot of assumptions we make actually start to break. And even though we have applied all these well-known techniques, that’s just not enough. So, that’s one aspect. The other aspect is, it used to be that, you know, the system we build, we can sort of understand how it works, right? And now, the complexity has already gone beyond our own understanding, you know. We can’t reason about how the system behaves. On the other hand, we have seen a lot of advances in, for example, machine learning and deep learning. So, one thing that we have been looking into is how we can leverage all those new technologies in machine learning and deep learning and apply it to deal with the complexity in systems. And that’s, you know, another very fascinating area that we’re looking into as well.

Host: Yeah. Well, let’s get specific now. Another super interesting area of research deals with exceptions and failures in the cloud-scale era and how you’re dealing with what you call “gray failure.” And you’ve also called it the gray swan (which I want you to explain) or the Achilles heel of cloud-scale systems. So how did you handle exceptions and failures in a somewhat less complex, pre-cloud era and what new methodologies are you trying to implement now?

Lidong Zhou: Right. So, as I mentioned, in the older days, we are targeting those systems with assumptions about failures, right? Like crash failures, you know, a component can fail… when it fails, it crashes. It stops working. And nowadays, we realize, you know, this kind of assumption no longer holds. So, this is why we define a new type of failures called gray failures. So, thinking about what kind of name to give to this very interesting new line of research that we’re starting so we called it gray swan. People already know about black swan or gray rhino. So first of all, because we’re talking about the cloud, we want something not as heavy as a rhino!

Host: Right.

Lidong Zhou: We want something that can fly. And the reason we call it gray is because, you know a systems component is no longer just black or white. It could be in a weird state where, from some of the observers it’s actually behaving correctly, but from the others, it’s actually not. And that turns out to be behind many of the issues that major problems that we’re seeing in the cloud. And it has sort of some components of black swan in the sense that some of the assumptions we’re making break. So that’s why everything we build on top of that assumption starts to break down. So, for example, I mentioned the assumption about failure, right? If you think that it either crashed or it’s correct, then it’s a very simple kind of world, right? But if it’s not the case, then all the protocols that will work under that assumption will cease to work. It also has this connection with gray rhino because gray rhino is this problem that everybody sort of sees coming, and it’s a very major problem, but people tend to ignore it for the wrong reason. And in our case, we know that, for the cloud, all those service disruptions happen all the time, and there are actually failures all over the place. It’s just very hard to figure out which ones are important. But we know something big is going to happen at some point, right? So, we try to use this notion of gray swan to describe this new line of thinking where, you know, we really think about failures that are not just crash failures or not even, you know, Byzantine failures where it’s essentially arbitrary failures. But there’s something in between that we should reason about, and then using those to reason about the correctness of the whole service.

Host: So, does the word catastrophic enter into this at all? Or is it…

Lidong Zhou: Yes! That could be catastrophic. Eventually.

Host: How does that kind of thinking playing into what you’re doing?

Lidong Zhou: If you look at the cloud system, it’s like in a rhino sort of charging towards you, and before it hits you, there are a lot of dusts, and you know noise and other things. But you just don’t know when and how something bad is going to happen, right? And it could be catastrophic. It happens actually a couple times already. And so, one of the things we try to do is to try to figure out when and how bad things could happen to prevent catastrophic failures…

Host: Right.

Lidong Zhou: …from all the dust and maybe, you know, other signals we have in the system. There are signals. It’s just we don’t know how to leverage them.

Host: Part of your approach to coping with gray failures is a line of research you call CloudBrain.

Lidong Zhou: Right.

Host: And it’s all about automatic troubleshooting for the cloud. It’s actually a huge issue because of the remarkable complexity of the systems. So, tell us how CloudBrain, and what you call DeepView, is actually helping operators – the people that have to deal with it on the ground – simplify how they write troubleshooting algorithms.

Lidong Zhou: Mm-hmm. So, I think CloudBrain is one of the efforts that we have to deal with gray failures. And remember, you know, we talked about the challenges that come from the complexity of the system or the scale of the system. It would really have, you know, a huge number of components interacting with each other. But on the other hand, we can really leverage the scale of the system to help us in terms of, you know, diagnosis and all, detecting problems, even figuring out where the problem is. And this is the premise of the CloudBrain project. So, it has actually three components, three ideas. The first one is really the notion of near, real-time monitoring. And so instead of trying to look at the logs after the fact and then analyze what happened, we try to have a pulse on what the system is doing, how it’s doing, and so on. So that’s the first component. And the second component is we really want to form a global view. So, it’s not just one observation we make about a system, but really observations for all over the systems combined, so we can actually understand how a system is behaving and which part is actually having a problem. And then, the third part is, once you have, you know, all these global observations that come in real time, then we can use statistical methods to really reason about, you know, what’s abnormal and so on. So, this is where we really leverage the scale, the huge amount of data…

Host: Right.

Lidong Zhou: …that used to be a challenge and now it becomes an opportunity for us to actually come up with new solutions to handle the complexity of the system.

Host: So how does that help an operator simplify writing an algorithm?

Lidong Zhou: Right, so now, the operator actually has all the data in near real time. And, you know, you can write this very simple algorithm that operates on the data sort of like a SQL query.

Host: Right.

Lidong Zhou: Right? And then it can emit signals and you know tell people that something’s wrong or something’s correct, or maybe we have to pay attention to part of the system that seems to have some problems.

Host: So where is this gray failure research, with all its pieces and parts, in the pipeline for production?

Lidong Zhou: Overall, we are not at the stage where we solve all the problems, but we have pieces of the technology we developed to solve some specific problems like DeepView and CloudBrain are, you know, the two projects that have already been incorporated in Azure to deal with network-related problems, for example.

Host: Mm-hmm.

Lidong Zhou: But, you know, we are far from solving the problem. It’s really sort of a research agenda that we set out probably for years to come. And one idea that we have been working on, which is actually very interesting, is that we really have to change how we view programs. In the past, for defensive programming, we have been trained to handle exceptions, and it turns out that handling exceptions in a large, complex system is not enough. So, one of the ideas that we’ve been thinking about is changing exception handling into exception or error reporting. So, you start to collect all those signals. We talked about, you know, the dust when the…

Host: Right.

Lidong Zhou: …rhino comes charging at you. So, you have to really collect those dusts towards one place so that you can actually reason about the behavior of the system. And that’s, you know, one of those major shifts…

Host: Yeah.

Lidong Zhou: …that, you know, we see coming even in how we develop systems.

Host: Right.

Lidong Zhou: Not just, you know, after the fact, we already have this beast and now we need to understand what’s going on.

Host: Right.

Lidong Zhou: So those methodologies, I think, is where we’re pushing. You know, it’s not just solving a specific problem. We have an incident; we try to solve this problem. Yeah, we can do that. But more importantly… this goes back to the theory meets practice…

Host: Right.

Lidong Zhou: …so, we need to come out of looking at the specific instances, but think about, you know, what methodologies we should adopt to change the status completely.

Host: So how do you implement, then, a brand-new thing? I mean, we talked about the beast that already exists, and is growing. What are you proposing with your research?

Lidong Zhou: Right, so, this is always a hard problem. We already have something running, and it has to keep running, and now it has a lot of problems we need to solve. So, one of the ways we deal with those challenges is trying to solve the current problems. You know, like CloudBrain and DeepView sort of try to fit into the current practice. But for some other projects, what we do is like, you know, what I talked about, changing from exception handling to error reporting – that actually is a system we build that we can transform automatically a piece of code that does error handling in the traditional way into a piece of code that actually does error reporting in the way that we desire.

Host: Right.

Lidong Zhou: And that helps because we don’t want everybody to rewrite the whole code base.

Host: No.

Lidong Zhou: It’s just not possible. So, we have to find ways to help developers to sort of do the transformation and also live with the current boundaries of the system. And we hopefully, gradually, we’ll move towards the right direction.

Host: Yeah, I think you see that in just about every place software exists is there’s a legacy system. You’ve got to retrofit some stuff that added complexity to it.

Lidong Zhou: That’s right.

Host: But you can’t just make everyone throw out what they’re already using. So, this is a big challenge. I’m glad you’re on the job.

(music plays)

Host: Well, we talked about what gets you up in the morning and all the work you’re doing to make sure that everything goes right… that is basically what you’re doing, is trying to make everything go right…

Lidong Zhou: Right.

Host: …but as we know – as you know more than I know – something always goes wrong!

Lidong Zhou: Right, unfortunately.

Host: The rhino… So, given what you see in your work every day, is there anything that keeps you up at night?

Lidong Zhou: Yes, I think we’re realizing that the kinds of distributed systems we’re designing, or building, are becoming more and more important. They’re becoming part of the sort of critical infrastructure of our society. And that puts a lot of burden on us to make sure that whatever we’re building can be mission critical.

Host: Right.

Lidong Zhou: And you know we have a lot of researchers working on formal methods, verification, just to make sure that the core of the system can be verifiable, will give some assurance that it’s actually working correctly. And, you know, we talked about applying machine learning and deep learning mechanisms, but it’s statistical. So sometimes – actually, naturally – there are cases where it breaks. So how we can safeguard this kind of system from what you call catastrophic issues, and this is also another thing that we have been putting a lot of thought into. And we’re not short of challenges, especially on making the cloud infrastructure really, you know, mission critical!

Host: Lidong, tell us your story. How did you end up at Microsoft Research, and how did you develop your path to the positions you hold right now?

Lidong Zhou: Yeah, looking back, I remember when I finished my PhD, I started job hunting and I got, you know, a couple of offers, and I talked to my advisor. Of course, that’s what you do when you’re a graduate student. And he basically gave me a very simple piece of advice. He basically said, well, just go where you can find the best colleagues, the colleagues with maybe, you know, Turing-Award caliber. So, I ended up going to Microsoft Research Lab where, at that time, we didn’t have a Turing Award winner, but within ten years, we had two! So that was how things started. Looking back, what’s really important is the quality of colleagues you have, especially in the early stages of my career. I learned how to do research in some sense. It’s not about getting papers published. It’s internal passion that drives research and I think the first phase of my career is more on personal development. I remember being pushed by my manager at the time, Roy Levin, to get out of my comfort zone. We started as a sort of technical contributor, but then, I was pushed to lead a project and there are always new challenges that you face. And you get a lot of support from your colleagues to get to the next stage, and that’s very satisfying. And then I went to MSR Asia, where I later became a manager of a research group, and I think that’s sort of the second phase of my career, where it’s not about my personal career development. It’s also about building a team and how you can contribute to other people’s success. And that turns out to be even more satisfying to see the impact you can have on other people’s careers and their success. And also, during that period of time, I also realized that it’s not just about your own team. You know, we can build the best systems research team in Asia Pacific, but it’s more satisfying if you can contribute to the community. And we talked about starting the workshop and getting the conference into Asia Pacific, and, you know, a lot of other things that we do to contribute to society, including, you know, the talent fostering and many other things. And those, in my mind, are becoming even more critical as we move on in our career.

Host: Yeah.

Lidong Zhou: So, I view this as sort of the three stages of my career. It started with personal development, learning, you know, what it means to love what you do and do what you love. And then you think about how you can contribute to other people’s success and increase your ability to influence others and impact others, and positively. And finally, in what you can contribute to the society, to the community. And I’ve been very fortunate to have been working with a lot of great, you know, leaders and colleagues, and I’ve learned a lot along the way. And I remember you know I worked with a lot of product teams as well. And they also offered a lot of career advice and support. So, this is just, you know, my story, I guess.

Host: You know, it sounds to me like almost a metaphor. You know, you start with yourself, you grow and mature outwards to others, and then the broader community impact that ultimately a mature person wants to see happen, right?

Lidong Zhou: I hope so!

Host: I get the sense that it is!

Lidong Zhou: It’s just about seeking the truth. It’s not about, you know, getting papers published. It’s not about, you know, chasing fame or, you know, all those things that we start to lose sight of, you know, what the true meaning of research is. It’s not about all these results that we try to get, but truly, it’s about finding the truth and enjoying the process along the way.

Host: At the end of each podcast, I ask my guests to give some parting advice to our listeners. What big, unsolved problems do you see on the horizon for researchers who may just be getting their feet wet with systems and networking research?

Lidong Zhou: Well, I think they are very fortunate to be a young researcher in systems and networking now. I remember I was talking to But[ler] Lampson when I started my career in 2003, and he said, you know, he was feeling lucky that he was doing all the work in the late seventies and early eighties because it was the right time to see a paradigm shift. And I think, now, we are at the point that we’re going to see another major paradigm shift, just like, you know, folks in Xerox PARC. What they did was, essentially, to define computing for the next thirty years. Even now, we’re sort of living in the world that they defined, looking at the PC, even with the phone. I mean, that’s just a different form factor, right? They sort of defined the mouse, the laser printer, all the things that we know about, and the user interface. And the reason that happened at that time was because the computing was becoming, you know, more powerful from supercomputers now to personal computing, because…

Host: Right.

Lidong Zhou: …you know, we can pack so much computation power into a small machine. And now, I think, you know, the computation power has reached another milestone where computing capability is going to be everywhere. And we’re going to have intelligence everywhere around us. The boundary between sort of the virtual world in computers and our physical world will disappear. And that will lead to really paradigm-shifting opportunities where we figure out, you know, what computing really means in the next, you know, ten years, twenty years. And this is what I would encourage everyone focus on rather than just incremental improvements to the protocols and so on. Because we are really seeing a lot of assumptions being invalidated. And we really have to look at the world in a very different view and from, you know, how we interact with sort of the computing capability and how we expose computing capability to do what we need to do. And it’s not just doing computing in front of a computer but, you know, doing everything with sort of the computing capability around us. And that’s just exciting to imagine. And I can’t even describe what the future will look like, but it’s up to our young researchers to really make it a reality.

Host: Lidong Zhou, it’s been an absolute pleasure. Thanks for joining us in the booth today.

Lidong Zhou: Thank you, Gretchen. Really a pleasure.

(music plays)

To learn more about Dr. Lidong Zhou and how researchers are working to bring order out of systems and networking chaos, visit Microsoft.com/research

post

Project Triton and the physics of sound with Microsoft Research’s Dr. Nikunj Raghuvanshi

Episode 68, March 20, 2019

If you’ve ever played video games, you know that for the most part, they look a lot better than they sound. That’s largely due to the fact that audible sound waves are much longer – and a lot more crafty – than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher in the Interactive Media Group at Microsoft Research, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.

Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes.

Related:


Final Transcript

Nikunj Raghuvanshi: In a game scene, you will have multiple rooms, you’ll have caves, you’ll have courtyards, you’ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it’s making its way around geometry. And the question becomes how do you compute even the direct sound? Even the initial sound’s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: If you’ve ever played video games, you know that for the most part, they look a lot better than they sound. That’s largely due to the fact that audible sound waves are much longer – and a lot more crafty – than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher in the Interactive Media Group at Microsoft Research, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.

Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes. That and much more on this episode of the Microsoft Research Podcast.

Host: Nikunj Raghuvanshi, welcome to the podcast.

Nikunj Raghuvanshi: I’m glad to be here!

Host: You are a senior researcher in MSR’s Interactive Media Group, and you situate your research at the intersection of computational acoustics and graphics. Specifically, you call it “fast computational physics for interactive audio/visual applications.”

Nikunj Raghuvanshi: Yep, that’s a mouthful, right?

Host: It is a mouthful. So, unpack that! How would you describe what you do and why you do it? What gets you up in the morning?

Nikunj Raghuvanshi: Yeah, so my passion is physics. I really like the mixture of computers and physics. So, the way I got into this was, many, many years ago, I picked up this book on C++ and it was describing graphics and stuff. And I didn’t understand half of it, and there was a color plate in there. It took me two days to realize that those are not photographs, they were generated by a machine, and I was like, somebody took a photo of a world that doesn’t exist. So, that is what excites me. I was like, this is amazing. This is as close to magic as you can get. And then the idea was I used to build these little simulations and I was like the exciting thing is you just code up these laws of physics into a machine and you see all this behavior emerge out of it. And you didn’t tell the world to do this or that. It’s just basic Newtonian physics. So, that is computational physics. And when you try to do this for games, the challenge is you have to be super-fast. You have 1/60th of a second to render the next frame to produce the next buffer of audio. Right? So, that’s the fast portion. How do you take all these laws and compute the results fast enough that it can happen at 1/60th of a second, repeatedly? So, that’s where the computer science enters the physics part of it. So, that’s the sort of mixture of things where I like to work in.

Host: You’ve said that light and sound, or video and audio, work together to make gaming, augmented reality, virtual reality, believable. Why are the visual components so much more advanced than the audio? Is it because the audio is the poor relation in this equation, or is it that much harder to do?

Nikunj Raghuvanshi: It is kind of both. Humans are visual dominant creatures, right? Because visuals are what is on our conscious mind and when you describe the world, our language is so visual, right? Even for sound, sometimes we use visual metaphors to describe things. So, that is part of it. And part of it is also that for sound, the physics is in many ways tougher because you have much longer wavelengths and you need to model wave diffraction, wave scattering and all these things to produce a believable simulation. And so, that is the physical aspect of it. And also, there’s a perceptual aspect. Our brain has evolved in a world where both audio/visual cues exist, and our brain is very clever. It goes for the physical aspects of both that give us separate information, unique information. So, visuals give you line-of-sight, high resolution, right? But audio is lower resolution directionally, but it goes around corners. It goes around rooms. That’s why if you put on your headphones and just listen to music at the loud volume, you are a danger to everybody on the street because you have no awareness.

Host: Right.

Nikunj Raghuvanshi: So, audio is the awareness part of it.

Host: That is fascinating because you’re right. What you can see is what is in front of you, but you could hear things that aren’t in front of you.

Nikunj Raghuvanshi: Yeah.

Host: You can’t see behind you, but you can hear behind you.

Nikunj Raghuvanshi: Absolutely, you can hear behind yourself and you can hear around stuff, around corners. You can hear stuff you don’t see, and that’s important for anticipating stuff.

Host: Right.

Nikunj Raghuvanshi: People coming towards you and things like that.

Host: So, there’s all kinds of people here that are working on 3D sound and head-related transfer functions and all that.

Nikunj Raghuvanshi: Yeah, Ivan’s group.

Host: Yeah! How is your work interacting with that?

Nikunj Raghuvanshi: So, that work is about, if I tell you the spatial sound field around your head, how does it translate into a personal experience in your two ears? So, the HRTF modeling is about that aspect. My work with John Snyder is about, how does the sound propagate in the world, right?

Host: Interesting.

Nikunj Raghuvanshi: So, if there is a sound down a hallway, what happens during the time it gets from there up to your head? That’s our work.

Host: I want you to give us a snapshot of the current state-of-the-art in computational acoustics and there’s apparently two main approaches in the field. What are they, and what’s the case for each and where do you land in this spectrum?

Nikunj Raghuvanshi: So, there’s a lot of work in room acoustics where people are thinking about, okay, what makes a concert hall sound great? Can you simulate a concert hall before you build it, so you know how it’s going to sound? And, based on the constraints on those areas, people have used a lot of ray tracing approaches which borrow on a lot of literature in graphics. And for graphics, ray tracing is the main technique, and it works really well, because the idea is you’re using a short wavelength approximation. So, light wavelengths are submicron and if they hit something, they get blocked. But the analogy I like to use is sound is very different, the wavelengths are much bigger. So, you can hold your thumb out in front of you and blot out the sun, but you are going to have a hard time blocking out the sound of thunder with a thumb held out in front of your ear because the waves will just wrap around. And, that’s what motivates our approach which is to actually go back to the physical laws and say, instead of doing the short wave length approximation for sound, we revisit and say, maybe sounds needs the more fundamental wave equation to be solved, to actually model these diffraction effects for us. The usual thinking is that, you know, in games, you are thinking about we want a certain set of perceptual cues. We want walls to occlude sound, we want a small room to reverberate less. We want a large hall to reverberate more. And the thought is, why are we solving this expensive partial differential equation again? Can’t we just find some shortcut to jump straight to the answer instead of going through this long-winded route of physics? And our answer has been that you really have to do all the hard work because there’s a ton of information that’s folded in and what seems easy to us as humans isn’t quite so easy for a computer and and there’s no neat trick to get you straight to the perceptual answer you care about.

(music plays)

Host: Much of the work in audio and acoustic research is focused on indoor sound where the sound source is within the line of sight and the audience and the listener can see what they were listening to…

Nikunj Raghuvanshi: Um-hum.

Host: …and you mentioned that the concert hall has a rich literature in this field. So, what’s the gap in the literature when we move from the concert hall to the computer, specifically in virtual environments?

Nikunj Raghuvanshi: Yeah, so games and virtual reality, the key demand they have is the scene is not one room, and with time it has become much more difficult. So, a concert hall is terrible if you can’t see the people who are playing the sound, right? So, it allows for a certain set of assumptions that work extremely nicely. The direct sound, which is the initial sound, which is perceptually very critical, just goes in a straight line from source to listener. You know the distance so you can just use a simple formula and you know exactly how loud the initial sound is at the person. But in a game scene, you will have multiple rooms, you’ll have caves, you’ll have courtyards, you’ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it’s making its way around geometry. And the question becomes, how do you compute even the direct sound? Even the initial sound’s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly. So, that’s the challenge.

Host: All right. So, let’s talk about how you’re addressing it. A recent paper that you’ve published made some waves, sound waves probably. No pun intended… It’s called Parametric Directional Coding for Pre-computed Sound Propagation. Another mouthful. But it’s a great paper and the technology is so cool. Talk about this… research this that you’re doing.

Nikunj Raghuvanshi: Yeah. So, our main idea is, actually, to look at the literature in lighting again and see the kind of path they’d followed to kind of deliver this computational challenge of how you do these extensive simulations and still hit that stringent CPU budget in real time. And one of the key ideas is you precompute. You cheat. You just look at the scene and just compute everything you need to compute beforehand, right? Instead of trying to do it on the fly during the game. So, it does introduce the limitation that the scene has to be static. But then you can do these very nice physical computations and you can ensure that the whole thing is robust, it is accurate, it doesn’t suffer from all the sort of corner cases that approximations tend to suffer from, and you have your result. You basically have a giant look-up table. If somebody tells you that the source is over there and the listener is over here, tell me what the loudness of the sound would be. We just say okay, we this a giant table, we’ll just go look it up for you. And that is the main way we bring the CPU usage into control. But it generates a knock-off challenge that now we have this huge table, there’s this huge amount of data that we’ve stored and it’s 6-dimensional. The source can move in 3-dimensions and the listener can move in 3-dimensions. So, we have the giant table which is terabytes or even more on data.

Host: Yeah.

Nikunj Raghuvanshi: And the game’s typical budget is like 100 megabytes. So, the key challenge we’re facing is, how do we fit everything in that? How do we take this data and extract out something salient that people listen to and use that? So, you start with full computation, you start as close to nature as possible and then we’re saying okay, now what would a person hear out of this? Right? Now, let’s do that activity of, instead of doing a shortcut, now let’s think about okay, a person hears the directional sound comes from. If there is a doorway, the sound should come from the doorway. So, we pick out these perceptual parameters that are salient for human perception and then we store those. That’s the crucial way you kind of bring down this enormous data set and do a sort of memory budget that’s feasible.

Host: So, that’s the paper.

Nikunj Raghuvanshi: Um-hum.

Host: And how has it played out in practice, or in project, as it were?

Nikunj Raghuvanshi: So, a little bit of history on this is, we had a paper SIGGRAPH 2010, me and John Snyder and some academic collaborators, and at that point, we were trying to think of just physical accuracy. So, we took the physical data and we were trying to stay as close to physical reality as possible and we were rendering that. And around 2012, we got to talking with Gears of War, the studio, and we were going through what the budgets will be, how things would be. And we were like we need… this needs to… this is gigabytes, it needs to go to megabytes…

Host: Really?

Nikunj Raghuvanshi: …very quickly. And that’s when we were like, okay, let’s simplify. Like, what’s the four like most basic things that you really want from an acoustic system? Like walls should occlude sound and thing like that. So, we kind of re-winded and came to it from this perceptual viewpoint that I was just describing. Let’s keep only what’s necessary. And that’s how we were able to ship this in 2016 in Gears of War 4 by just re-winding and doing this process.

Host: How is that playing in to, you know… Project Triton is the big project that we’re talking about. How would you describe what that’s about and where it’s going? Is it everything you’ve just described or is there… other aspects to it?

Nikunj Raghuvanshi: Yeah. Project Triton is this idea that you should precompute the wave physics, instead of starting with approximations. Approximate later. That’s one idea of Project Triton. And the second is, if you want to make it feasible for real games and real virtual reality and augmented reality, switch to perceptual parameters. Extract that out of this physical simulation and then you have something feasible. And the path we are on now, which brings me back to the recent paper you mentioned…

Host: Right.

Nikunj Raghuvanshi: …is, in Gears of War, we shipped some set of parameters. We were like, these make a big difference. But one thing we lacked was if the sound is, say, in a different room and you are separated by a doorway, you would hear the right loudness of the sound, but its direction would be wrong. Its direction would be straight through the wall, going from source to listener.

Host: Interesting.

Nikunj Raghuvanshi: And that’s an important spatial cue. It helps you orient yourself when sounds funnel through doorways.

Host: Right.

Nikunj Raghuvanshi: Right? And it’s a cue that sound designers really look for and try to hand-tune to get good ambiances going. So, in the recent 2018 paper, that’s what we fixed. We call this portaling. It’s a made-up word for this effect of sounds going around doorways, but that’s what we’re modeling now.

Host: Is this new stuff? I mean, people have tackled these problems for a long time.

Nikunj Raghuvanshi: Yeah.

Host: Are you people the first ones to come up with this, the portaling and…?

Nikunj Raghuvanshi: I mean, the basic ideas have been around. People know that, perceptually, this is important, and there are approaches to try to tackle this, but I’d say, because we’re using wave physics, this problem becomes much easier because you just have the waves diffract around the edge. With ray tracing you face the difficult problem that you have to trace out the rays “intelligently” somehow to hit an edge, which is like hitting a bullseye, right?

Host: Right.

Nikunj Raghuvanshi: So, the ray can wrap around the edge. So, it becomes really difficult. Most practical ray tracing systems don’t try to deal with this edge diffraction effect because of that. Although there are academic approaches to it, in practice it becomes difficult. But as I worked on this over the years, I’ve kind of realized, these are the real advantages of this. Disadvantages are pretty clear: it’s slow, right? So, you have to precompute. But we’re realizing, over time, that going to physics has these advantages.

Host: Well, but the precompute part is innovative in terms of a thought process on how you would accomplish the speed-up…

Nikunj Raghuvanshi: There have been papers on precomputed acoustics, academically before, but this realization that mixing precomputation and extracting these perceptual parameters? That is a recipe that makes a lot of practical sense. Because a third thing that I haven’t mentioned yet is going to the perceptual domain, now the sound designer can make sense of the numbers coming out of this whole system. Because it’s loudness. It’s reverberation time, how long the sound is reverberating. And these numbers that are super-intuitive to sound designers, they already deal with them. So, now what you are telling them is, hey, you used to start with a blank world, which had nothing, right? Like the world before the act of creation, there’s nothing. It’s just empty space and you are trying to make things reverberate this way or that, now you don’t need to do that. Now physics will execute first ,on the actual scene with the actual materials, and then you can say I don’t like what physics did here or there, let me tweak it, let me modify what the real result is and make it meet the artistic goals I have for my game.

(music plays)

Host: We’ve talked about indoor audio modeling, but let’s talk about the outdoors for now and the computational challenges to making natural outdoor sounds, sound convincing.

Nikunj Raghuvanshi: Yeah.

Host: How have people hacked it in the past and how does your work in ambient sound propagation move us forward here?

Nikunj Raghuvanshi: Yeah, we’ve hacked it in the past! Okay. This is something we realized on Gears of War because the parameters we use were borrowed, again, from the concert hall literature and, because they’re parameters informed by concert halls, things sound like halls and rooms. Back in the days of Doom, this tech would have been great because it was all indoors and rooms, but in Gears of War, we have these open spaces and it doesn’t sound quite right. Outdoors sounds like a huge hall and you know, how do we do wind ambiances and rain that’s outdoors? And so, we came up with a solution for them at that time which we called “outdoorness.” It’s, again, an invented word.

Host: Outdoorness.

Nikunj Raghuvanshi: Outdoorness.

Host: I’m going to use that. I like it.

Nikunj Raghuvanshi: Because the idea it’s trying to convey is, it’s not a binary indoor/outdoor. When you are crossing a doorway or a threshold, you expect a smooth transition. You expect that, I’m not hearing rain inside, I’m feeling nice and dry and comfortable and now I’m walking into the rain…

Host: Yeah.

Nikunj Raghuvanshi: …and you want the smooth transition on it. So, we built a sort of custom tech to do that outdoor transition. But it got us thinking about, what’s the right way to do this? How do you produce the right sort of spatial impression of, there’s rain outside, it’s coming through a doorway, the doorway is to my left, and as you walk, it spreads all around you. You are standing in the middle of rain now and it’s all around you. So, we wanted to create that experience. So, the ambient sound propagation work was an intern project and now we finished it up with our collaborators in Cornell. And that was about, how do you model extended sound sources? So, again, going back to concert halls, usually people have dealt with point-like sources which might have a directivity pattern. But rain is like a million little drops. If you try to model each and every drop, that’s not going to get you anywhere. So, that’s what the paper is about, how to treat it as one aggregate that somebody gave us? And we produce an aggregate sort of energy distribution of that thing along with this directional characteristics and just encode that.

Host: And just encode it.

Nikunj Raghuvanshi: And just encode it.

Host: How is it working?

Nikunj Raghuvanshi: It works nice. It sounds good. To my ears it sounds great.

Host: Well you know, and you’re the picky one, I would imagine.

Nikunj Raghuvanshi: Yeah. I’m the picky one and also when you are doing iterations for a paper, you also completely lose objectivity at some point. So, you’re always looking for others to get some feedback.

Host: Here, listen to this.

Nikunj Raghuvanshi: Well, reviewers give their feedback, so, yeah.

Host: Sure. Okay. Well, kind of riffing on that, there’s another project going on that I’d love for you to talk as much as you can about called Project Acoustics and kind of the future of where we’re going with this. Talk about that.

Nikunj Raghuvanshi: That’s really exciting. So, up to now, Project Triton was an internal tech which we managed to propagate from research into actual Microsoft product, internally.

Host: Um-hum.

Nikunj Raghuvanshi: Project Acoustics is being led by Noel Cross’s team in Azure Cognition. And what they’re doing is turning it into a product that’s externally usable. So, trying to democratize this technology so it can be used by any game audio team anywhere backed by Azure compute to do the precomputation.

Host: Which is key, the Azure compute.

Nikunj Raghuvanshi: Yeah, because you know, it took us a long time, with Gears of War to figure out, okay, where is all this precompute going to happen?

Host: Right.

Nikunj Raghuvanshi: We had to figure out the whole cluster story for themselves, how to get the machines, how to procure them, and there’s a big headache of arranging compute for yourself. And so that’s, logistically, a key problem that people face when they try to think of precomputed acoustics. The run-time side, Project Acoustics, we are going to have plug-ins for all the standard game audio engines and everything. So, that makes things simpler on that side. But a key blocker in my view was always this question of, where are you going to precompute? So, now the answer is simple. You get your Azure badge account and you just send your stuff up there and it just computes.

Host: Send it to the cloud and the cloud will rain it back down on you.

Nikunj Raghuvanshi: Yes. It will send down data.

Host: Who is your audience for Project Acoustics?

Nikunj Raghuvanshi: Project Acoustics, the audience is the whole game audio industry. And our real hope is that we’ll see some uptake on it when we announce it at GDC in March, and we want people to use it, as many teams, small, big, medium, everybody, to start using this because we feel there’s a positive feedback loop that can be set up where you have these new tools available, designers realize that they have these new tools available that have shipped in Triple A games, so they do work. And for them to give us feedback. If they use these tools, we hope that they can produce new audio experiences that are distinctly different so that then they can say to their tech director, or somebody, for the next game, we need more CPU budget. Because we’ve shown you value. So, a big exercise was how to fit this within current budgets so people can produce these examples of novel possible experiences so they can argue for more. So, to increase the budget for audio and kind of bring it on par with graphics over time as you alluded to earlier.

Host: You know, if we get nothing across in this podcast, it’s like, people, pay attention to good audio. Give it its props. Because it needs it. Let’s talk briefly about some of the other applications for computational acoustics. Where else might it be awesome to have a layer of realism with audio computing?

Nikunj Raghuvanshi: One of the applications that I find very exciting is for audio rendering for people who are blind. I had the opportunity to actually show the demo of our latest system to Daniel Kish, who, if you don’t know, he’s the human echo-locator. And he uses clicks from his mouth to actually locate geometry around him and he’s always oriented. He’s an amazing person. So that was a collaboration, actually, we had with a team in the Garage. They released a game called Ear Hockey and it was a nice collaboration, like there was a good exchange of ideas over there. That’s nice because I feel that’s a whole different application where it can have a potential social positive impact. The other one that’s very interesting to me is that we lived in 2-D desktop screens for a while and now computing is moving into the physical world. That’s the sort of exciting thing about mixed reality, is moving compute out into this world. And then the acoustics of the real world being folded into the sounds of virtual objects becomes extremely important. If something virtual is right behind the wall from you, you don’t want to listen to it with full loudness. That would completely break the realism of something being situated in the real world. So, from that viewpoint, good light transport and good sound propagation are both required things for the future compute platform in the physical world. So that’s a very exciting future direction to me.

(music plays)

Host: It’s about this time in the podcast I ask all my guests the infamous “what keeps you up at night?” question. And when you and I talked before, we went down kind of two tracks here, and I felt like we could do a whole podcast on it, but sadly we can’t… But let’s talk about what keeps you up at night. Ironically to tee it up here, it deals with both getting people to use your technology…

Nikunj Raghuvanshi: Um-hum.

Host: And keeping people from using your technology.

Nikunj Raghuvanshi: No! I wanted everybody to use the technology. But I’d say like five years ago, what used to keep me up at night is like, how are we going to ship this thing in Gears of War? Now what’s keeping me up at night is how do we make Project Acoustics succeed and how do we you know expand the adoption of it and, in a small way, try to improve, move the game audio industry forward a bit and help artists do the artistic expression they need to do in games? So, that’s what I’m thinking right now, how can we move things forward in that direction? I frankly look at video games as an art form. And I’ve gamed a lot in my time. To be honest, all of it wasn’t art, I was enjoying myself a lot and I wasted some time playing games. But we all have our ways to unwind and waste time. But good games can be amazing. They can be much better than a Hollywood movie in terms of what you leave them with. And I just want to contribute in my small way to that. Giving artists the tools to maybe make the next great story, you know.

Host: All right. So, let’s do talk a little bit, though, about this idea of you make a really good game…

Nikunj Raghuvanshi: Um-hum.

Host: Suddenly, you’ve got a lot of people spending a lot of time. I won’t say wasting. But we have to address the nature of gaming, and the fact that there are you know… you’re upstream of it. You are an artist, you are a technologist, you are a scientist…

Nikunj Raghuvanshi: Um-hum.

Host: And it’s like I just want to make this cool stuff.

Nikunj Raghuvanshi: Yeah.

Host: Downstream, it’s people want people to use it a lot. So, how do you think about that and the responsibilities of a researcher in this arena?

Nikunj Raghuvanshi: Yeah. You know, this reminds me of Kurt Vonnegut’s book, Cat’s Cradle? He kind of makes – what there’s scientist who makes Ice 9 and it freezes the whole planet or something. So, you see things about video games in the news and stuff. But I frankly feel that the kind of games I’ve participated in making, these games are very social experiences. People meet on the games a lot. Like Sea of Thieves is all about, you get a bunch of friends together, you’re sitting on the couch together, and you’re just going crazy like on these pirate ships and trying to just have fun. So, they are not the sort of games where a person is being separated from society by the act of gaming and just is immersed in the screen and is just not participating in the world. They are kind of the opposite. So, games have all these aspects. And so, I personally feel pretty good about the games I’ve contributed to. I can at least say that.

Host: So, I like to hear personal stories of the researchers that come on the podcast. So, tell us a little bit about yourself. When did you know you wanted to do science for a living and how did you go about making that happen?

Nikunj Raghuvanshi: Science for a living? I was the guy in 6th grade who’d get up and say I want to be a scientist. So, that was then, but what got me really hooked was graphics, initially. Like I told you, I found the book which had these color plates and I was like, wow, that’s awesome! So, I was at UNC Chapel Hill, graphics group, and I studied graphics for my graduate studies. And then, in my second or third year, my advisor, Ming Lin, she does a lot of research in physical simulations. How do we make water look nice in physical simulations? Lots of it is CGI. How do you model that? How do you model cloth? How do you model hair? So, there’s all this physics for that. And so, I took a course with her and I was like, you know what? I want to do audio because you get a different sense, right? It’s simulation, not for visuals, but you get to hear stuff. I’m like okay, this is cool. This is different. So, I did a project with her and I published a paper on sound synthesis. So, like how rigid bodies, like objects rolling and bouncing around and sliding make sound, just from physical equations. And I found a cool technique and I was like okay, let me do acoustics with this. It’s going to be fun. And I’m going to publish another paper in a year. And here I am, still trying to crack that problem of how to do acoustics in spaces!

Host: Yeah, but what a place to be. And speaking of that, you have a really interesting story about how you ended up at Microsoft Research and brought your entire PhD code base with you.

Nikunj Raghuvanshi: Yeah. It was an interesting time. So, when I was graduating, MSR was my number one choice because I was always thinking of this technology as, it would be great if games used this one day. This is the sort of thing that would have a good application in games. And then, around that time, I got hired to MSR and it was a multicore incubation back then, my group was looking at how do these multicore systems enable all sorts of cool new things? And one of the things my hiring manager was looking at was how can we do physically based sound synthesis and propagation. So, that’s what my PhD was, so they licensed the whole code base and I built on that.

Host: You don’t see that very often.

Nikunj Raghuvanshi: Yeah, it was nice.

Host: That’s awesome. Well, Nikunj, as we close, I always like to ask guests to give some words of wisdom or advice or encouragement, however it looks to you. What would you say to the next generation of researchers who might want to make sound sound better?

Nikunj Raghuvanshi: Yeah, it’s an exciting area. It’s super-exciting right now. Because even like just to start from more technical stuff, there are so many problems to solve with acoustic propagation. I’d say we’ve taken just the first step of feasibility, maybe a second one with Project Acoustics, but we’re right at the beginning of this. And we’re thinking there are so many missing things, like outdoors is one thing that we’ve kind of fixed up a bit, but we’re going towards what sorts of effects can you model in the future? Like directional sources is one we’re looking at, but there are so many problems. I kind of think of it as the 1980s of graphics when people first figured out that you can make this work. You can make light propagation work. What are the things that you need to do to make it ever closer to reality? And we’re still at it. So, I think we’re at that phase with acoustics. We’ve just figured out this is one way that you can actually ship in practical applications and we know there are deficiencies in its realism in many, many places. So, I think of it as a very rich area that students can jump in and start contributing.

Host: Nowhere to go but up.

Nikunj Raghuvanshi: Yes. Absolutely!

Host: Nikunj Raghuvanshi, thank you for coming in and talking us today.

Nikunj Raghuvanshi: Thanks for having me.

(music plays)

To learn more about Dr. Nikunj Raghuvanshi and the science of sound simulation, visit Microsoft.com/research

post

Email overload: Using machine learning to manage messages, commitments

As email continues to be not only an important means of communication but also an official record of information and a tool for managing tasks, schedules, and collaborations, making sense of everything moving in and out of our inboxes will only get more difficult. The good news is there’s a method to the madness of staying on top of your email, and Microsoft researchers are drawing on this behavior to create tools to support users. Two teams working in the space will be presenting papers at this year’s ACM International Conference on Web Search and Data Mining February 11–15 in Melbourne, Australia.

“Identifying the emails you need to pay attention to is a challenging task,” says Partner Researcher and Research Manager Ryen White of Microsoft Research, who manages a team of about a dozen scientists and engineers and typically receives 100 to 200 emails a day. “Right now, we end up doing a lot of that on our own.”

According to the McKinsey Global Institute, professionals spend 28 percent of their time on email, so thoughtful support tools have the potential to make a tangible difference.

“We’re trying to bring in machine learning to make sense of a huge amount of data to make you more productive and efficient in your work,” says Senior Researcher and Research Manager Ahmed Hassan Awadallah. “Efficiency could come from a better ability to handle email, getting back to people faster, not missing things you would have missed otherwise. If we’re able to save some of that time so you could use it for your actual work function, that would be great.”

Email deferral: Deciding now or later

Awadallah has been studying the relationship between individuals and their email for years, exploring how machine learning can better support users in their email responses and help make information in inboxes more accessible. During these studies, he and fellow researchers began noticing varying behavior among users. Some tackled email-related tasks immediately, while others returned to messages multiple times before acting. The observations led them to wonder: How do users manage their messages, and how can we help them make the process more efficient?

“There’s this term called ‘email overload,’ where you have a lot of information flowing into your inbox and you are struggling to keep up with all the incoming messages,” explains Awadallah, “and different people come up with different strategies to cope.”

In “Characterizing and Predicting Email Deferral Behavior,” Awadallah and his coauthors reveal the inner workings of one such common strategy: email deferral, which they define as seeing an email but waiting until a later time to address it.

The team’s goal was twofold: to gain a deep understanding of deferral behavior and to build a predictive model that could help users in their deferral decisions and follow-up responses. The team—a collaboration between Microsoft Research’s Awadallah, Susan Dumais, and Bahareh Sarrafzadeh, lead author on the paper and an intern at the time, and Christopher Lin, Chia-Jung Lee, and Milad Shokouhi of the Microsoft Search, Assistant and Intelligence group—dedicated a significant amount of resources to the former.

“AI and machine learning should be inspired by the behavior people are doing right now,” says Awadallah.

The probability of deferring an email based on the workload of the user as measured by the number of unhandled emails. The number of unhandled emails is one of many features Awadallah and his coauthors used in training their deferral prediction model.

The team interviewed 15 subjects and analyzed the email logs of 40,000 anonymous users, finding that people defer for several reasons: They need more time and resources to respond than they have in that moment, or they’re juggling more immediate tasks. They also factor in who the sender is and how many others have been copied. They found some of the more interesting reasons revolved around perception and boundaries, delaying or not to set expectations on how quickly they respond to messages.

The researchers used this information to create a dataset of features—such as the message length, the number of unanswered emails in an inbox, and whether a message was human- or machine-generated—to train a model to predict whether a message is deferred. The model has the potential to significantly improve the email experience, says Awadallah. For example, email clients could use such a model to remind users about emails they’ve deferred or even forgotten about, saving them the effort they would have spent searching for those emails and reducing the likelihood of missing important ones.

“If you have decided to leave an email for later, in many cases, you either just rely on memory or more primitive controls that your mail client provides like flagging your message or marking the message unread, and while these are useful strategies, we found that they do not provide enough support for users,” says Awadallah.

Commitment detection: A promise is a promise

Among the deluge of incoming emails are outgoing messages containing promises we make—promises to provide information, set up meetings, or follow up with coworkers—and losing track of them has ramifications.

“Meeting your commitments is incredibly important in collaborative settings and helps build your reputation and establish trust,” says Ryen White.

Current commitment detection tools, such as those available in Cortana, are pretty effective, but there’s room for further advancement. White, lead author Hosein Azarbonyad, who was interning with Microsoft at the time of the work, and coauthor Microsoft Research Principal Applied Scientist Robert Sim seek to tackle one particular obstacle in their paper “Domain Adaptation for Commitment Detection in Email”: bias in the datasets available to train commitment detection models.

Researcher access is generally limited to public corpora, which tend to be specific to the industry they’re from. In this case, the team used public datasets of email from the energy company Enron and an unspecified tech startup referred to as “Avocado.” They found a significant disparity between models trained and evaluated on the same collection of emails and models trained on one collection and applied to another; the latter model failed to perform as well.

“We want to learn transferable models,” explains White. “That’s the goal—to learn algorithms that can be applied to problems, scenarios, and corpora that are related but different to those used during training.”

To accomplish this, the group turned to transfer learning, which has been effective in other scenarios where datasets aren’t representative of the environments in which they’ll ultimately be deployed. In their paper, the researchers train their models to remove bias by identifying and devaluing certain information using three approaches: feature-level adaptation, sample-level adaptation, and an adversarial deep learning approach that uses an autoencoder.

Emails contain a variety and number of words and phrases, some more likely to be related to a commitment—“I will,” “I shall,” “let you know”—than others. In the Enron corpus, domain-specific words like “Enron,” “gas,” and “energy” may be overweighted in any model trained from it. Feature-level adaptation attempts to replace or transform these domain-specific terms, or features, with similar domain-specific features in the target domain, explains Sim. For instance, “Enron” might be replaced with “Avocado,” and “energy forecast” might be replaced with a relevant tech industry term. The sample level, meanwhile, aims to elevate emails in the training dataset that resemble emails in the target domain, downgrading those that aren’t very similar. So if an Enron email is “Avocado-like,” the researchers will give it more weight while training.

General schema of the proposed neural autoencoder model used for commitment detection.

The most novel—and successful—of the three techniques is the adversarial deep learning approach, which in addition to training the model to recognize commitments also trains the model to perform poorly at distinguishing between the emails it’s being trained on and the emails it will evaluate; this is the adversarial aspect. Essentially, the network receives negative feedback when it indicates an email source, training it to be bad at recognizing which domain a particular email comes from. This has the effect of minimizing or removing domain-specific features from the model.

“There’s something counterintuitive to trying to train the network to be really bad at a classification problem, but it’s actually the nudge that helps steer the network to do the right thing for our main classification task, which is, is this a commitment or not,” says Sim.

Empowering users to do more

The two papers are aligned with the greater Microsoft goal of empowering individuals to do more, tapping into an ability to be more productive in a space full of opportunity for increased efficiency.

Reflecting on his own email usage, which finds him interacting with his email frequently throughout the day, White questions the cost-benefit of some of the behavior.

“If you think about it rationally, it’s like, ‘Wow, this is a thing that occupies a lot of our time and attention. Do we really get the return on that investment?’” he says.

He and other Microsoft researchers are confident they can help users feel better about the answer with the continued exploration of the tools needed to support them.

post

Podcast: Putting the ‘human’ in human computer interaction with Haiyan Zhang

haiyan zhang standing in front of a wall

Haiyan Zhang, Innovation Director

Episode 62, February 6, 2019

Haiyan Zhang is a designer, technologist and maker of things (really cool technical things) who currently holds the unusual title of Innovation Director at the Microsoft Research lab in Cambridge, England. There, she applies her unusual skillset to a wide range of unusual solutions to real-life problems, many of which draw on novel applications of gaming technology in serious areas like healthcare.

On today’s podcast, Haiyan talks about her unique “brain hack” approach to the human-centered design process, and discusses a wide range of projects, from the connected play experience of Zanzibar, to Fizzyo, which turns laborious breathing exercises for children with cystic fibrosis into a video game, to Project Emma, an application of haptic vibration technology that, somewhat curiously, offsets the effects of tremors caused by Parkinson’s disease.

Related:


Episode Transcript

Haiyan Zhang: We started out going very broad, and looking at lots of different solutions out there, not necessarily just for tremor, but across the spectrum to address different symptoms of Parkinson’s disease. And this is actually really part of this whole design thinking methodology which is to look at analogous experiences. So, taking your core problem and then looking at adjacent spaces where there might be solutions in a completely different area that can inform upon the challenge that you are tackling.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Haiyan Zhang is a designer, technologist and maker of things (really cool technical things) who currently holds the unusual title of Innovation Director at the Microsoft Research lab in Cambridge, England. There, she applies her unusual skillset to a wide range of unusual solutions to real-life problems, many of which draw on novel applications of gaming technology in serious areas like healthcare.

On today’s podcast, Haiyan talks about her unique “brain hack” approach to the human-centered design process, and discusses a wide range of projects, from the connected play experience of Zanzibar, to Fizzyo, which turns laborious breathing exercises for children with cystic fibrosis into a video game, to Project Emma, an application of haptic vibration technology that, somewhat curiously, offsets the effects of tremors caused by Parkinson’s disease. That and much more on this episode of the Microsoft Research Podcast.

(music plays)

Host: Haiyan Zhang, welcome to the podcast.

Haiyan Zhang: Hi, thanks Gretchen. Great to be here.

Host: You are the Innovation Director at MSR Cambridge in England, which is a super interesting title. What is an Innovation Director? What does an Innovation Director do? What gets an Innovation Director up in the morning?

Haiyan Zhang: I guess it is quite an unusual title. It’s a kind of a bespoke role, I would say, because of my quite unusual background, I guess. Part of what I do is look at how technology can be applied in real use cases in the world to create business impact, within Microsoft and outside of Microsoft, and to make those connections between our deeply technical research with applied product groups across the company.

Host: So, is this a job that existed at MSR in Cambridge or did you arrive with this unique set of talents and skills and background and ability, and bring the job with you?

Haiyan Zhang: I would say it’s something I brought with me and it’s evolving over time. (laughs)

Host: Well, unpack that a little bit. How has it evolved since you began? When did you begin?

Haiyan Zhang: So, I actually joined Microsoft about five and a half years ago and I actually initially joined as part of the Xbox organization, running an innovation team in Xbox in London and looking at new play experiences for kids, for teens, that were completely outside of the box. And then from that, I transitioned into Microsoft Research. And part of my team also continued on that research in terms of creating completely new technology experiences around entertainment. And more recently, I’m working across the lab with various projects to see how we can connect our sort of fundamental computer science work better with products across Microsoft in terms of Azure cloud infrastructure, in terms of Xbox and gaming, in terms of Office and productivity.

Host: You’ve been in high-tech for nearly twenty years and you’ve worked in engineering and user experience and research… R&D, hardware, service design, etc., and even out in the “blue-sky envisioning space.” So, that brings a lot to the party in the form of one person. (laughter) Quite frankly, I’m impressed. How has your experience in each, or all, of these areas informed how you approached the research you do today?

Haiyan Zhang: Well thanks, Gretchen. I’m really… I’m quite honored to be on the podcast actually because I’m so impressed with all the researchers that you’ve been interviewing across all the MSR labs. So, I would say that, in the research work that I do, I bring a very human-centered lens to looking at technology. So I undertake a full, human-centered design process starting from talking to people, getting empathy with people, trying to extract insight from what people really need and then going deeply into the technical research to develop prototypes, technology ideas to support those needs, and then deploying those prototypes in the field to understand how that can be improved and how we can evolve our technology thinking.

Host: Let’s talk about design thinking, then, for a minute. I don’t know if you’d call it discrete from computational thinking or any other kind of thinking, but it seems to be a buzz phrase right now. So, as a self-described designer, technologist and maker of things, how would you define design thinking?

Haiyan Zhang: So, I would say that design thinking is not separate from computational thinking, it’s a layer above. It’s just an approach to problem-solving, and it’s basically a tool kit that allows you to utilize different methods to really gain an understanding of people’s needs, to gain an understanding of insight into how people’s lives can be improved through technology, and then tools around prototyping and evaluating those prototypes. So, I would say that it is not, in itself, a scientific method, but it can be used to improve and augment your existing practice.

Host: Let’s get specific now and talk about some of those projects that you’ve been working on, starting with Project Zanzibar. What was the inspiration behind this project? How did you bring it to life and how does it embody your idea of connected play experiences that you’ve talked about?

Haiyan Zhang: I think there is a rich history in computer science of tangible user interfaces. You know, some of the early work at Xerox Park even or at the MIT Media Lab around how we can create these seamless interactions between people, between their physical environment and between a digital universe. And I think the approach we had to Zanzibar was that the most fruitful area for exploration in tangible user interfaces would be to enable kids to play and learn though physicality. Through interacting with physical objects that were augmented with virtual information, because we’re really trying to tap into this idea of multi-modal learning and learning through play. So, just coming from this initial approach, we dive very deeply into how would we invent a completely new technology platform to enable people to very seamlessly manipulate objects in a natural way using gestures, and then bring about some new digital experiences layered on top of that, that were games or education scenarios and then sort of bringing those together in terms of really fundamental technology invention, but also applications that could demonstrate what that technology could do.

Host: Right. Well, and it’s too bad that this is an audio-only experience here on the podcast because there’s a really cool overview of this project on the Microsoft Research website and it’s a very visual, artifact-based approach to playing with computers.

Haiyan Zhang: Yeah, yeah. And I encourage everyone to visit the project page and take a look at some of the videos and our prototypes that we have published.

Host: Right. So, what was the thinking behind tying in the artifact and the digital?

Haiyan Zhang: You know, there’s this rich history of research with physical objects and we’ve proven out that physical/digital interaction is a great way forward in terms of novel interactions between people and computing. But the pragmatics of these systems have not been ideal. You know, if you have to be sat at your desk and there has to be an overhead camera, usually a lot of research projects involve this or there’s occlusion in terms of where your hand can be and where the physical objects can be because the cameras won’t be able to track it. So, what we set out to do was think about well, how would you design a technology platform that overcomes a lot of these barriers to these platforms so that we can then be freed up to think about those scenarios, but we can also empower other researchers who are doing research in this space to think about those scenarios. So, our research group, we had to this idea of leveraging NFC, but leveraging it in terms of an NFC antenna array so that we could track objects in a 2-D space. And then the additional novelty was also layering that with a capacitive multi-touch layer so that we could track both the objects in terms of the physical IDs of the objects on top of this surface area. The capacitor’s multi-touch would enhance that tracking that the NFC provided, but also, we could track hand gestures, both in terms of multi-touch gestures on top of the surface and also some hover gestures just above the surface as well.

(music plays)

Host: Let’s talk a bit about another really cool project that you’re working on. I know Cambridge, your lab, is deeply, and maybe even uniquely, invested in researching and developing technologies to improve healthcare, and you have a couple projects going on in this area. One of them, Project Fizzyo. I’ll spell it. It’s F (as in Frank)-i-z-z-y-o. Tell us about this project. How did it come about? What’s the technology behind it and how does it work?

Haiyan Zhang: So, Fizzyo really started as a collaboration with the BBC and we were inspired by one family. The mom, Vicky, she has four kids and two of her boys have cystic fibrosis, they have a genetic condition where their internal organs are constantly secreting mucous. And so, every day, twice a day, the boys have to do this laborious breathing exercise to expel the mucous from their lungs, and it involves breathing into these plastic apparatus. And they basically apply pressure to your breath so that when you breathe, it creates and oscillating effect in your lungs and escalates the mucous and then it culminates in you coughing and trying to cough out the mucous from your lungs. They’re usually plastic devices, where as you blow, the air kind of enters a chamber and there might be some sort of mechanism that oscillates the air like a ball-bearing that bounces up and down and so they are very low-fi, so there’s no digital aspect to these devices. And you can imagine, these kids, they are having to do these exercises from a very early age, from as early as they can remember, twice a day for 30 minutes, for an hour at a time. It’s really intensive and it can be, you know, if not painful, at least really uncomfortable to do. And I actually tried to do this once and I felt really light-headed. I actually couldn’t do one session of it. And also, the kids, they want to be outside playing with their friends. You know, they don’t want to be stuck indoors doing this all the time. And there is no thread from doing the exercise and feeling an improvement because the activity is about maintenance, so you are trying to maintain your health because if you don’t clear the mucous from your lungs, infection can set in and that means going to the hospital, that means getting antibiotics. And so, it’s a very challenging thing for Vicky, their mom, to be jostling them, be harassing them to do this all the time. And she said that her role has really changed with her kids and that she’s no longer a mom, she’s sort of nagging them all the time. And so, we visited with the family to really understand their plight. And she asked, you know, can we create a piece of technology that can help us in getting the kids to do this kind of physio, the treatment is a type of physio. And so, we actually came up with this idea together where she said, you know, the boys really love to play video games so, what if we could create a way for the boys to be playing a video game as they are undertaking this exercise. So, we started this process of prototyping and developing a digital attachment, a sensor, that attaches to all these various different physio devices. And as the patient is expelling, is breathing out, the sensor actually senses the breath and transmits that digital signal to a tablet and we can translate that signal into controls for a video game. And we’re also able to upload that to the cloud, to do further analysis on that medical treatment.

Host: Wow. How is it working?

Haiyan Zhang: We started this project about two and a half years ago. It’s been a long process, but a really fruitful and rewarding one. So, we started out with just some early prototypes, just using off-the-shelf electronics to get the breath sensor working just right. We added a single button, because we realized if you were just using the breath to play video games, it’s actually really challenging. And then, within the team, our industrial designer, Greg Saul, designed the physical attachment. We developed our own sensor board and we had it manufactured along with the product design. And we partnered with University College London, their physiotherapy department, and the Great Ormond Street Hospital in London where they’ve deployed over a hundred of these units with kids across the country to do a long-term trial. So actually, when we first met with the University College London physiotherapy department, I mean, this is a department that they’ve spent their entire careers working with kids in this domain. And they had never had any contact with the computer science department. This was not a digital research area. When they first met us, and they saw, on the computer screen, someone breathing out, and a graph showing that breath, the peak of that breath, one of the heads of the department that we were working with, she started to cry because she said that in her entire career, she had never seen physio data visualized in this way. It was just incredible for her.

Host: Wow.

Haiyan Zhang: And so, we decided to partner, and they’ve been amazing because, through this journey, they’ve gone to meet people in the computer science department, they initiated masters’ degrees incorporating data science and digital understanding. They just hired their first data scientist in order to leverage the platform that we’ve built to do further analysis to improve the health of these kids. And they said that even though this kind of exercise has been around for decades, no one has actually done a definitive, long-term study to track the efficacy of this kind of exercise to health, to outcomes. You know, because I think past studies have really relied on keeping paper diaries, answering questionnaires, but no one has done that digital study, which is what the power of Internet of Things can really bring you, which is tracking in the background in a very precise way.

Host: Talk about the role of machine learning. How do any of the new methodologies in computer science like machine learning methods and techniques play into this?

Haiyan Zhang: You know, what’s really interesting with machine learning is the availability of data. And, you know, we understand that what has driven this AI revolution is now the availability of large data sets to actually be able to develop new ML algorithms and models. And in many cases, especially in healthcare, there is the lack of data. So, I think throughout different areas of computer science research, there’s a real need to kind of connect the dots and actually develop IoT solutions that can start at the beginning and capture the data, because it’s only through cleverly capturing valid data, that we can then do the machine learning in the back end once we’ve done the data collection. And so, I think the Fizzyo project is a really good proof point of that in that we started out with IoT in order to gather the information that track the health exercises. And we just sort of deployed in the UK, so as we’re collected this data, we’re now able to look at that and start to do some predictions around long-term health. So, you know, some of the questions that physiotherapy researchers are trying to answer, if kids are very adherent to this kind of exercise, if they are doing what they are being told, they are doing this this twice a day for the duration that they are supposed to be doing it, does that mean, in six months’ time or a year’s time, their number of days in hospital is going to be reduced? Does it actually impact how much time they are spending being ill? If we see a trailing-off of this exercise, does that mean that we’ll see an increase in infection rates? So, with the data that we’re collecting, we’re now working with a different part of Microsoft, they’re called the Microsoft Commercial Software Engineering team, who are actively delving into projects around AI for good and they are going to be working with UCL to do some of this clustering and developing models around health prediction. So, clustering the patients into different cohorts to understand if there is prediction factors around how they are doing the exercises and how much time they are going to be spending in hospital in the years to come.

Host: Well, it almost would be hard for me to get more excited about something than what you just described in Project Fizzyo, but there is another project to talk about which is Project Emma. This is so cool it’s even been featured on a documentary series there in the UK called The Big Life Fix. And it didn’t just start with a specific idea, but with a specific person. Tell us the story of Emma.

Haiyan Zhang: Yes! So, again, Project Emma started with a single person, with Emma Lawton, who, when she was 28 years old, she was diagnosed with early onset Parkinson’s disease. And, it had been five years since her diagnosis and some of her symptoms had progressed quite quickly and one of them was an active tremor. So, her tremor would get worse as she started to write or draw. And this really affected how she went about her day-to-day work because she was a creative director, a graphic designer and day-to-day she would be in client meetings, talking with people and trying to sketch out what they meant in terms of the ideas that they had. And she would not be able to do that. And when I first met with her, she would sit with a colleague and her colleague would actually draw on her behalf. So, she really was looking for some kind of technology intervention to help her. And, we started out going very broad, and looking at lots of different solutions out there, not necessarily just for tremor, but across the spectrum to address different symptoms of Parkinson’s disease. And this is actually really part of this whole design thinking methodology which is to look at analogous experiences. So, taking your core problem and then looking at adjacent spaces where there might be solutions in a completely different area that can inform upon the challenge that you are tackling. So, we looked at lots of different solutions for other kinds of symptoms and of course, there was a lot of desk research. It was reading research papers that had been published over the decades that looked at tremors specifically. So, I think the two aspects that really influenced our thinking, one was around going to visit with a local charity called Parkinson’s UK and we were asking them to show us their catalogue of widgets and devices that they sold to Parkinson’s patients that helped them in their every day. And on the table, there was a digital metronome. So, you know, when you’re playing the piano you see musicians, they have this ticking metronome. And I asked, you know, so why is there a metronome on the table? And the lady said, well, for some Parkinson’s patients, they have a symptom called freezing of gait and this is where when you are walking along, your legs suddenly freeze, and you lose control of your legs. And so, sometimes people find that if they take out this metronome and they turn it on and it makes this rhythmic ticking sound, it somehow distracts their brain into being able to walk again, which is really kind of odd. There’s been a little bit of literature around this. In the literature it’s called queuing, it’s a queuing effect, but it doesn’t apply to tremor. But, for me, it sort of signaled an interesting brain hack, and signaled kind of underlying what might be going on in your brain when you have Parkinson’s disease. At the same time, there had been a number of papers around using vibration on the muscles to try to ameliorate tremor, to try to address it, to various effect. And not specifically looking at Parkinson’s but looking at other kinds of tremor diseases like central tremor, dystonia. And so, we developed a hypothesis and in order to test out the hypothesis, we developed a prototype which was a wearable device for the wrist that had a number of vibrating motors on it. So, it would apply vibration to the wrist in a rhythmic fashion in order to somehow circumvent the mechanism that was causing the tremor. And of course, we had a number of other hypotheses, too. This was not the only hypothesis. We had other devices that worked in a completely different way that was more about mechanically stopping the tremor, mechanically countering the tremor. And this device actually worked really well. So, we were surprised, but very, very happy, and so this is the direction that we took in order to further develop this product.

Host: Right. So, drilling in, I do want to mention that there is a video on this, on the website as well. It’s a video that made me cry. I think it made you cry, and it made Emma cry. We’re all just puddles of tears, because it’s so fantastic. And so, this kind of circles back to research writ large, and experimenting with ideas that may not necessarily be super, what we would call high-tech, maybe they are kind of low-fi, you know, a vibration tool that can keep you from shaking. So, how did it play out? How did you prototype this? Give us a little overview of your process.

Haiyan Zhang: For us, it was a very simple prototyping exercise. We took some off-the-shelf coin cell motors and developed, basically, a haptic type bracelet that we then had an app that you could program the haptics on the bracelet. And that’s what we sort of experimented with. So, just research from the haptics area of computer science research which is really about a mechanism for sort of using in VR or sensing something about the digital world, now applied to this medical domain.

(music plays)

Host: You have a diverse slate of projects going on at any given time and your teams are really diverse. So, I want you to talk, specifically, about the composition of skills and expertise that are required to bring some of these really fascinating research projects to life, and ultimately to market. Who is on your team and what do they bring to the party?

Haiyan Zhang: Well, I think there’s just something really unique about Microsoft Research and Microsoft Research Cambridge, in particular, we have such a broad portfolio of projects, but also expertise in the different computer science fields, that we can sort of pull together these multidisciplinary teams to go after a single topic. So, within our lab we have social scientists doing user research, gaining real insight into how people behave, how people think about various technologies. We have designers that are exploring user interfaces, exploring products to bring these ideas to life. We have, you know, computer vision specialists. We have machine learning specialists. We have natural language processing people, systems researchers, and securities researchers and, obviously, healthcare researchers. So, it’s that broad outlook that I think can really push forward in terms of technology innovation and really emphasizing the applications for people, for improving society as a whole.

Host: I ask all my guests some form of the question is there anything that keeps you up at night. And I know that many people, mainly parents, are worried that their kids are too engaged with screens or not spending enough time in real life and so on. What would you say to them, and is there anything that keeps you up at night about sort of the broader swath of what you are working on?

Haiyan Zhang: You know, on the topic of screen time, obviously it’s something that we really wrestled with Zanzibar research specifically which is thinking about how you could interact with physical objects instead of a digital screen, and also bringing that kind of bigger interaction surface between family and between friends so they could interact together. You know, at the same time, I would say that culture is constantly changing and how we live our lives is constantly changing. We’ve only seen the internet be really embedded in our lives in the last, I’d say, twenty years, fifteen years, twenty years. When I think we were younger, we had television and there were no computers and so, I say culture is constantly evolving. How we’re growing, how we’re living is constantly evolving. It’s important for parents to evaluate this changing landscape of technology and to figure out what is the best thing to do with their kids. And maybe you don’t have to rely on how you grew up, but to kind of evaluate that our kids are getting the right kind of social interaction, getting the right amount of parental support and quality time with their family. I think that’s what is important, but to accept that how we’re growing is changing.

Host: What about the idea of the internet of things and privacy when we’re talking about toys and kids?

Haiyan Zhang: Mmm, yeah, it is something we really have to watch out for, and um you know, we’ve seen some bad examples of the toy industry jumping ahead too far and enabling toys to be connected 24/7 and conversing with kids and what does that really mean? I’ve seen some really great research out of the MIT Media Lab where there was a researcher really looking at how kids are conversing with AI, with different AI agents and their mental model of these AI agents. So, I think that’s a really great piece of research to look at, but also maybe to expand upon. As a research community, if we’re thinking about kids, to understand that how kids are interacting with AI is going to be more commonplace, and rather than trying to avoid it, to really tackle it head-on and see how we can improve the principles around designing AI, how we can inform companies in the market out there of what is the ethical approach to doing this so that kids really understand what AI is as they are growing up with it.

Host: We’re coming up on an event at Microsoft Research called I Chose STEM and it’s all about encouraging women to… well, choose STEM! As an area of study or a career.

Haiyan Zhang: Yeah.

Host: So, tell us the story of how you chose it? What got you interested in a career in high-tech in general, and maybe even high-tech research specifically? Who were your influences?

Haiyan Zhang: I have a I guess slightly unique background in that I was born in China and at the time it was very kind of Communist education that I had when I was growing up. And my family moved to Australia when I was 8 years old. And I was always very technical and very nerdy. But I never thought about technology as a career. I actually wanted to study law when I was in high school. And computing was just something where I was sort of, you know, it was kind of fun, but I never thought about it as a career. And I’d say in the last sort of year of high school, I decided to switch and do computer science and I realized that I was actually really good at computer science. I guess what led me to choose STEM is just the – I think the fun and creativity you can have with programming. You know, I would always come up with my own little creative exercises to write on the computer. It wasn’t the rote exercises, it was the ability to kind of be creative with this technical tool that really got me excited. I think at the same time, I love this huge effort within our industry to really focus on getting more women, more girls into technology, into STEM education, and we really want to increase representation, increase sort of equal representation. At the same time, I think I found it, at times, to be, you know, challenging to be the only woman in the room. You know, when I was in computer science, sometimes I’d be, you know, one of three women in the lecture theater or something. I think we need to adopt this kind of pioneer mindset so that we can go into these new areas, go into a room where you’re the only person, where you’re unique in that room and you have something to contribute and don’t be afraid to speak up. I think that’s a really important mindset and skill for anybody to have.

Host: No interview would be complete if I didn’t ask my guest to predict the future. No pressure, Haiyan. Seriously though, you are living on the cutting edge of technology research which is what this podcast is all about. And so what advice or encouragement – you’ve just kind of given some – would you give to any of our listeners across the board who might be interested or inspired by what you are doing? Who is a good fit for the research you do?

Haiyan Zhang: My advice would be, especially in the research domain, to develop that deep research expertise, but to keep a holistic outlook. I think the research landscape is changing in that we are going to be working in more multidisciplinary teams, working across departments. You know, sometimes it’s the healthcare department, the physiotherapy department, with the computer science department. It’s through the connection of these disparate fields that I think we’re going to see dramatic impact from technology. And I think for researchers to have that holistic outlook, to visit other departments, to understand what are the challenges beyond their own group, I think is really, really important. And develop collaboration skills and techniques.

Host: Haiyan Zhang, it’s been a delight. Thanks for joining us today.

Haiyan Zhang: Thanks so much, Gretchen. It’s been a real pleasure, thank you.

post

Podcast with Dr. Rico Malvar, manager of Microsoft Research’s NExT Enable group

Rico Malvar, Chief Scientist and Distinguished Engineer

Episode 61, January 30, 2019

From his deep technical roots as a principal researcher and founder of the Communications, Collaboration and Signal Processing group at MSR, through his tenure as Managing Director of the lab in Redmond, to his current role as Distinguished Engineer, Chief Scientist for Microsoft Research and manager of the MSR NExT Enable group, Dr. Rico Malvar has seen – and pretty well done – it all.

Today, Dr. Malvar recalls his early years at a fledgling Microsoft Research, talks about the exciting work he oversees now, explains why designing with the user is as important as designing for the user, and tells us how a challenge from an ex-football player with ALS led to a prize winning hackathon project and produced the core technology that allows you to type on a keyboard without your hands and drive a wheelchair with your eyes.

Related:


Episode Transcript

Rico Malvar: At some point, the leader of the team, Alex Kipman, came to us and says, oh, we want to do a new controller. What if you just spoke to the machine, made gestures and we could recognize everything? You say, that sounds like sci-fi. And then we said, no, wait a second, but to detect gestures, we need specialized computer vision. We’ve been doing computer vision for 15 years. To identify your voice, we need speech recognition. We’ve also been doing speech recognition for 15 years. Oh, but now there maybe be other sounds and multiple people… oh, but just a little over 10 years ago, we started these microphone arrays. They are acoustic antennas. And I said, wait a second, we actually have all the core elements, we could actually do this thing!

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: From his deep technical roots as a principal researcher and founder of the Communications, Collaboration and Signal Processing group at MSR, through his tenure as Managing Director of the lab in Redmond, to his current role as Distinguished Engineer, Chief Scientist for Microsoft Research and manager of the MSR NExT Enable group, Dr. Rico Malvar has seen – and pretty well done – it all.

Today, Dr. Malvar recalls his early years at a fledgling Microsoft Research, talks about the exciting work he oversees now, explains why designing with the user is as important as designing for the user, and tells us how a challenge from an ex-football player with ALS led to a prize winning hackathon project and produced the core technology that allows you to type on a keyboard without your hands and drive a wheelchair with your eyes. That and much more on this episode of the Microsoft Research Podcast.

Host: Rico Malvar, welcome to the podcast.

Rico Malvar: It’s a pleasure to be with you, Gretchen.

Host: You’re a Distinguished Engineer and Chief Scientist at Microsoft Research. How would you define your current role? What gets you up in the morning?

Rico Malvar: Ha ha! Uh, yeah, by chief scientist, it means I tell everybody what to do, very simple. (laughing) Yeah… Not really, but Chief Scientist is basically a way for me to have my fingers and eyes, in particular, on everything going on at Microsoft Research. So, I have an opportunity to interact with, essentially, all the labs, many of the groups, and find opportunities to do collaborative projects. And that is really super-exciting. And it’s really hard to be on top of what everybody is doing. It’s quite the opposite of telling people what to do, it’s like trying follow-up what they are doing.

Host: It’s um – on some level herding cats?

Rico Malvar: It’s not even herding. It’s where are they??

Host: You got to find the cats.

Rico Malvar: Find the cats, yeah.

Host: Well, talk a little bit about your role as Distinguished Engineer. What does that entail, what does that mean?

Rico Malvar: That’s basically… there’s a whole set of us. We have Distinguished Engineers and Technical Fellows which are at the top of our technical ladder. And the idea is a little bit recognition of some of the contributions we’ve done in the technical area, but it’s mostly our responsibility to go after big technical problems and don’t think just about the group you’re in, but think about the company, what the company needs, what the technology in that particular area should be evolving. My area, in particular, on the technical side, is signal processing, data compression, media compression. And these days, with audio and video entering the internet, that matters a lot. But also a few other areas, but that’s the idea. The idea is that what are the big problems in technology, how can we drive new things, how can we watch out for new things coming up at the company level?

Host: You know, those two things that you mentioned, drive things and anticipate things, are two kind of different gears and two different, I won’t say skillsets, but maybe it’s having your brain in two places.

Rico Malvar: You are right. It’s not completely different skillsets but driving and following are both important and one helps the other. And it’s very important for us to do both.

Host: Let’s go back to your roots a little bit. When you started here at Microsoft Research, you were a principle researcher and the founder and manager of what was called the Communications, Collaboration and Signal Processing group at MSR. So, tell us a little bit about the work you used to do and give us a short “where are they now?” snapshot of that group.

Rico Malvar: Yeah, that name is funny. That name was a bad example when you get too democratic about choosing names, and then we got everybody in the team to give ideas and then it got all complicated and we end up with a little bit of everything and came up with a boring name instead of a cool one. But it was a very descriptive name which was good. It was just called Signal Processing when we started, and then it evolved to Communication, Collaboration and Signal Processing because of the new things we were doing. For example, we had a big project on the collaboration area which is the prototype of a system which later evolved to become the RoundTable product. And that’s just not signal processing, it’s collaboration. Well, we have put collaboration. But people use it to communicate so it’s also communication, saying okay, put it all in the name. So, it’s just like that. And on your question of where people are, a cool thing is that we had a combination of expertise in the team to be able to do things like RoundTable. So, we had computer vision experts, we had distributed systems experts, we had streaming media experts and we had audio experts, on the last one for example, in audio. Then later, we actually evolved a new group doing specifically audio signal processing which is now led by Ivan Tashev who was a member of my team and now has his own team. He already participated in your podcast, so it’s nice to see the interesting challenges in those areas continue. And we keep evolving, as you know. The groups are always changing, modifying, renewing.

Host: In fact, that leads into my next question. Microsoft Research, as an entity, has evolved quite a bit since it was formed in 1991. And you were Managing Director in the mid-2000’s from like 2007 to 2010?

Rico Malvar: ‘10. Of the lab here in Redmond, yeah.

Host: Yeah. So, tell us a little bit about the history of the organization in the time you’ve been here.

Rico Malvar: Yeah. It’s great. One thing I really like about Microsoft Research is first, is that it started early with the top leaders in the company always believing in the concept. So, Bill Gates started Microsoft Research, driven by Nathan Myhrvold who was the CTO at the time, and it was a no-brainer for them to start Microsoft Research. They found Rick Rashid, who was our first leader of MSR. And I had the pleasure of reporting to Rick for many years. And the vision he put in, it is still to this day, is let’s really push the limits of technology. We don’t start by thinking how this is going to help Microsoft, we start by thinking how we push the technology, how it helps people. Later, we will figure out how it’s going to help Microsoft. And to this date, that’s how we operate. With the difference being, maybe, is that in the old days, the lab was more of a classical research lab. Almost everything was pivoted on research projects.

Host: Sure.

Rico Malvar: Which is great, and many, many of them generated good technology or even new products to the company. I was just talking about RoundTable as one example, and we have several. Of course, the vast majority fail because research is a business of failure and we all know that! We submit ten papers for publication, two or three get accepted. That is totally fine, and we keep playing the game. And we do the papers as a validation and also as a way to interact with the community. And both are extremely of value to us so we can have a better understanding we are pushing the state-of-the-art. And today, the new Microsoft Research puts even a little more emphasis on the impact side. We still want to push the state-of-the-art, we still do innovative things, but we want to spend a little more effort on making those things real.

Host: Yeah.

Rico Malvar: On helping the company. And even the company, itself, evolved to a point where that has even a higher value from Satya, our CEO, down. It is the mission of the company to empower people to do more. But empowering is not just developing the technology, it’s packaging it, shipping it in the right way, making products that actually leverage that. So, I would say the new MSR gets even more into, okay, what it takes to make this real.

Host: Well, let’s talk a little bit about Microsoft Research NExT. Give our listeners what I would call your elevator pitch of Microsoft NExT. What does it stand for, how does it fit in the portfolio of Microsoft Research? I kind of liken it to pick-up basketball, only with scientists and more money, but you do it more justice than I do!

Rico Malvar: That’s funny. Yeah, NExT is actually a great idea. As I said, we’re always evolving. And then, when Peter Lee came in, and also Harry Shum is our new leader, they thought hard about diversifying the approaches in which we do research. So, we still have the Microsoft Research labs, the part that is a bit more traditional in the sense that the research is mostly pivoted by areas. We have a graphics team, natural language processing group, human computer interaction, systems, and so forth. Many, many of them. When you go to NExT, the idea is different. One way to achieve potentially even more impact is pivot some of those activities, not by area, but by project, by impact goal. Oh, because of this technology and that technology, maybe we have an opportunity to do X, where X is this new project. Oh, but we’re going to have the first technology is computer vision, the other one is hardware architecture. Oops, we’re going to have to need people in all those areas together in a project team and then Peter Lee has been driving that, always trying to find disruptive, high impact things so that we can take new challenges. And lots of things are coming up from this new model which we call NExT, which is New Experiences in Technology.

Host: I actually didn’t know that, what the acronym stood for. I just thought it was, what’s NExT, right?

Rico Malvar: Of course, that is a cool acronym. Peter did a much better job than we did on the CCSB thing.

Host: I love it.

(music plays)

Host: Well, let’s talk about Enable, the group. There’s a fascinating story of how this all got started and it involves a former football player and what’s now called the yearly hackathon. Tell us the story.

Rico Malvar: That is exactly right. It all started when that famous football player, ex-football player, Steve Gleason, still a good partner of ours, is still a consultant to my team… Steve is a totally impressive person. He got diagnosed with ALS, and ALS is a very difficult disease because you basically lose mobility. And at some point in life, your organs may lose their ability to function, so, most people actually don’t survive ALS. But with some mitigations you can prolong, a little bit, and technology can help. Steve, actually, we quote him saying, “Until there is a cure for ALS, technology is the cure.” This is very inspiring. And he created a foundation, Team Gleason, that really does a wonderful job of securing resources and distributing resources to people with ALS. They really, really make a difference in the community. And he came to us almost five years ago, and we were toying with the idea of creating this hackathon, which is a company-wide effort to create hack-projects. And then in one of those, which actually the first time we did, which is in 2014, Steve told us, “You know what guys, I want to be able to do more. In particular, I want to be able to argue with my wife and play with my son. So, I need to communicate, and I need to move. My eyes still work, this eye tracking thing might be the way to go. Do you want to do something with that?” The hackathon team really got inspired by the challenge and within a very short period of time, they created an eye tracking system where you look at the computer and then there’s a keyboard and you can look at the keys and type at the keys by looking. And there is a play button so you can compose sentences and then speak out with your eyes.

Host: That’s amazing.

Rico Malvar: And they also created an interface where they put buttons, similar to a joy stick, on the screen. You look at those, and the wheelchair moves in the direction of where you are selecting. They did a nice overlay between the buttons and the video, so it’s almost like they put the computer, mount it on the wheelchair, you look through the computer, the camera shows what’s in front of you, and then the wheelchair goes. With lots of safety things like a stop button. And it was very successful, that project. In fact, it won the first prize.

Host: The hackathon prize?

Rico Malvar: On the hackathon prize. And then, a little bit later, Peter and I were thinking about where to go on new projects. And then Peter really suggested, Rico, what about that hackathon thing? That seems to be quite impactful, so maybe we want to develop that technology further. What do you think? I said, well if I had a team… (laughs) we could do that…

Host: (sings) If I only had a team…

Rico Malvar: (sings) If I only had had a team… And then Peter said, ehh, how many people you need? I don’t know, six, seven to start. I said, okay, let’s go do it. It was as easy as that.

Host: Well, let’s talk a little bit more about the hackathon. Like you said, it’s about in its fifth year. And, as I understand it, it’s kind of a ground-up approach. Satya replaced the annual “executive-inspirational-talk-top-down” kind of summer event with, hey, let’s get the whole company involved in invention. I would imagine it’s had a huge impact on the company at large. But how would you describe the role of the hackathon for people in Microsoft Research now? It seems like a lot of really interesting things have come out of that summer event.

Rico Malvar: You know, for us, it was a clear thing, because Microsoft Research was always bottom-up. I mean, we literally don’t tell researchers what to do. People, researchers, engineers, designers, managers, they all have great ideas, right? And they come up with those great ideas. When they click enough, they start developing something and we look from the top and say, that sounds good, keep going, right? So, we try to foster the most promising ones. But the idea of bottom-up was already there.

Host: Yeah.

Rico Malvar: When we look at the hackathon, we say, hey, thanks to Satya and the new leadership of Microsoft, the company’s embracing this concept of moving bottom-up. There’s The Garage. The Garage has been involved with many of those hackathons. Garage has been a driver and supporter of the hackathon. So, to us, it was like, hey, great, that’s how we work! And now we’re going to do more collaboration with the rest of the company.

Host: You have a fantastic and diverse group of researchers working with you, many of whom have been on the podcast already and been delightful. Who and what does it take to tackle big issues, huge ideas like hands-free keyboards and eye tracking and 3-D sound?

Rico Malvar: Right. One important concept, and it’s particularly important for Enable, is that we really need to pay attention to the user. Terms such as “user-centric” – yeah, they sound like cliché – but especially in accessibility, this is super important. For example, in our Enable team, the area working with eye tracking, our main intended user were people with ALS since the motivation from Steve Gleason. And then, in our team, Ann Paradiso, who is our user experience manager, she created what we call the PALS program. PALS means Person with ALS. And we actually brought people with ALS in their wheelchairs and everything to our lab and discussed ideas with them. So, they were not just testers, they were brainstorming with us on the design and technologies…

Host: Collaborators.

Rico Malvar: Collaborators. They loved doing it. They really felt, wow, I’m in this condition but I can contribute to something meaningful and we will make it better for the next generation…

Host: Sure.

Rico Malvar: …of people with this. So, this concept of strong user understanding through user design and user research, particularly on accessibility, makes a big difference.

Host: Mmm hmm. Talk a little bit about the technical side of things. What kinds of technical lines of inquiry are you really focusing on right now? I think our listeners are really curious about what they’re studying and how that might translate over here if they wanted to…

Rico Malvar: That’s a great question. Many of the advancements today are associated with artificial intelligence, AI, because of all the applications of AI, including in our projects. AI is typically a bunch of algorithms and data manipulation in finding patterns in data and so forth. But AI, itself, doesn’t talk to the user. You still need the last mile of the interfaces, the new interface. Is the AI going to appear to the user as a voice? Or as something on the screen? How is the user going to interact with the AI? So, we need new interfaces. And then, with the evolution of technology, we can develop novel interfaces. Eye tracking being an example. If I tell you that you’re going to control your computer with your eyes, you’re going to say, what? What does that mean? If I tell you, you’re going to control the computer with your voice, you say, oh yeah, I’ve been doing that for a while. With the eye tracking for a person with a disability, they immediately get it and say, a-ha! I know what it means, and I want to use that. For everybody, suppose, for example, that you are having your lunch break and you want to browse the news on the internet, get up to date on a topic of interest. But you’re eating a sandwich. Your hands are busy, your mouth is busy, but your eyes are free. You could actually flip around pages, do a lot of things, just with your eyes and you don’t need to worry about cleaning your hands and touching the computer because you don’t need to touch the computer. And you can think, in the future, where you may not even need your eyes. I may read your thoughts directly. And, at some point, it’s just a matter of time. It’s not that far away. We are going to read your thoughts directly.

Host: That’s both exciting and scary. Ummmm…

Rico Malvar: Yes.

Host: What does it take to say, all right, we’re going to make a machine be able to look at your eyes and tell you back what you are doing?

Rico Malvar: Yeah, you see, it’s a specialized version of computer vision. It’s basically cameras that look at your eyes. In fact, the sensor works by first illuminating your eyes with bright IR lights, infrared, so it doesn’t bother you because you can’t see. But now you have this bright image that the camera is looking at, IR can see, and then models in a little bit of AI and a little bit of just graphics and computer vision and signal modeling, that then make an estimate of the position of your eyes and associate that with elements on the screen. So, it’s almost as if you have a cursor on the screen.

Host: Okay.

Rico Malvar: That is controlled with your eyes, very similar to a mouse, with the difference that the eye control works better if we don’t display the cursor. With the mouse, you actually should display the cursor…

Host: Ooohhh, interesting….

Rico Malvar: …with eye control, the cursor works better if it is invisible. But you see the idea there is that you do need specialists, you need folks who understand that. And sometimes you do a combination of some of that understanding being in the group, so we need to be the top leaders in that technology, or we partner with partners that have a piece of the technology. For example, for the eye tracking, we put much more emphasis on designing the proper user interfaces and user experiences, because there are companies that do a good job introducing eye tracking devices. So, we leverage the eye tracking devices that these companies produce.

Host: And behind that, you are building on machine learning technologies, on computer vision technologies and… um… so…

Rico Malvar: Correct. For example, a typical one is that the keyboard driven by your eyes. You still want to have a predictive keyboard.

Host: Sure.

Rico Malvar: So, as you are typing the letters, it guesses. But how you interface on the guess, it’s very interesting, because when you are typically using a keyboard, your eye is looking at the letters, your fingers are typing on the keys. When you’re doing an eye control keyboard, your eye has to do everything. So, how you design the interface should be different.

Host: Yeah.

Rico Malvar: And we’ve learned and designed good ways to make that different.

Host: If I’m looking at the screen and I’m moving my eyes, how does it know when I’m done, you know, like that’s the letter I want? Do I just laser beam the…??

Rico Malvar: You said you would be asking deep technical questions and you are. That one, we use the concept that we call “dwelling.” As you look around the keyboard, remember that I told you we don’t display the cursor?

Host: Right.

Rico Malvar: So, but as you – the position where you look in your eyes, the focus of your eye, is in a particular letter, we highlight that letter. It can be a different color, it can be a lighter shade of grey…

Host: Gotcha.

Rico Malvar: So, as you move around, you see the letters moving around. If you want to type a particular letter, once you get to that letter, you stop moving for a little bit, let’s say half a second. That’s a dwell. You dwell on that letter a little bit and we measure the dwell. And there’s a little bit of AI to learn what is the proper dwell time based on the user.

(music plays)

Host: One thing I’m fascinated by, not just here, but in scientific ventures everywhere, is the research “success story.” The one that chronicles the path of a blue-sky research thing to instantiation in a product. And, I know, over and over, researchers have told me, research is generally a slow business, so it’s not like, oh, the overnight success story, but there’s a lot of hard-won success stories or stories that sort of blossomed over multiple years of serendipitous discovery. Do you have any stories that you could share about things that you’ve seen that started out like a hair-brained idea and now millions of people are using?

Rico Malvar: You know, there’s so many examples. I particularly like the story of Kinect, which was actually not a product developed by Microsoft Research, but in close collaboration with Microsoft Research. It was the Kinect team, at the time, in Windows. Because at some point, the leader of the team, Alex Kipman, came to us and says, oh, we want to do a new controller. What if you just spoke to the machine, made gestures and we could recognize everything? You say, that sounds like sci-fi. So, naahhh, that doesn’t work. But then Alex was very insistent. And then we said, no, wait a second, but to detect gestures, we need specialized computer vision. We’ve been doing computer vision for 15 years. To identify your voice, we need speech recognition. We’ve also been doing speech recognition for 15 years. Oh, but now there maybe be other sounds and there are maybe multiple people… oh, but just a little over 10 years ago, we started these microphone arrays. They are acoustic antennas. They can tune to the sound of whoever is speaking all of that.

Host: Directional.

Rico Malvar: The directional sound input. And I said, wait a second, we actually have all the core elements, we could actually do this thing. So, after the third or fourth meeting, I said, okay Alex, I think we can do that. And he said, great, you have two years to do it. What??? Yeah, because we need to ship at this particular date. And it all worked. I doubt there’s some other institution or company that could have produced that because we’ve been doing what was, apparently, “blue-sky” for many years, but then we created all those technologies and when then need arose, I say, a-ha, we can put them altogether.

Host: Where is Kinect today?

Rico Malvar: Kinect used to be a peripheral device for Xbox. We changed it into an IoT device. So, there’s a new Kinect kit, connects to Azure so people can do Kinect-like things, not just for games but for everything. And all the technology that supports that is now in Azure.

Host: So, Rico, you have a reputation for being an optimist. You’ve actually said as much yourself.

Rico Malvar: (laughs) Yes, I am!

Host: Plus, you work with teams on projects that are actually making the lives of people with disabilities, and others, profoundly better. But I know some of the projects that you worked on fall somewhere in the bounds of medical interventions.

Rico Malvar: Mmm-hmm.

Host: So, is there anything about what you do that keeps you up at night, anything we should be concerned about?

Rico Malvar: Yeah, you know, when you are helping a person with disability, sometimes what you are doing can be seen as, is that a treatment, is that a medical device? In most cases, they are not. But the answer to those questions can be complicated and there can be regulations. And of course, Microsoft is a super-responsible company, and if anything is regulated, of course, we are going to pay attention to the regulations. But some of those are complex. So, doing it right by the regulations can take significant amount of work. So, we have to do this extra work. So, my team has to spend time, sometimes in collaboration with our legal team, to make sure we do the right things. And I hope also that we will help evolve those regulations, potentially by working with the regulatory bodies, educating them on the evolution of the technology. Because in all areas, not just this area, but almost all areas of technology, regulations tend to be behind. It’s hard to move, and understandably so. So, the fact that we have to spend significant effort dealing with that does keep me up at night a little bit. But we do our best.

Host: You know, there’s a bit of a Wild West mentality where you have to, like you say, educate. And so, in a sense what I hear you saying is that, as you take responsibility for what you are doing, you are helping to shape and inform the way the culture onboards these things.

Rico Malvar: Exactly right, yes. Exactly right.

Host: So, how would you sort of frame that for people out there? How do we, you, help move the culture into a space that more understands what’s going on and can onboard it with responsibility themselves?

Rico Malvar: That is a great question. And you see for example, in areas such as AI, artificial intelligence, people are naturally afraid of how far can AI go? What are the kinds of things it could do?

Host: Yeah.

Rico Malvar: Can we regulate so that there will be some control in how it’s developed? And Microsoft has taken the stance that we have to be very serious about AI. We have to be ethical, we have to preserve privacy and all of those things. So, instead of waiting for regulation and regulatory aspects to develop, let’s help them. So, we were founders of – not just me, but the company and especially the Microsoft Research AI team – founders of the Partnership for AI, in partnership with other companies to actually say no, let’s be proactive about that.

(music plays)

Host: Tell us a bit about Rico Malvar. Let’s go further back than your time here at MSR and tell us how you got interested in technology, technology research. How did you end up here at Microsoft Research?

Rico Malvar: Okay, on the first question, how I got interested in technology? It took me a long time. I think I was 8 years old when my dad gave me an electronics kit and I start playing with that thing and I said, a-ha! That’s what I want to do when I grow up. So, then I went through high school taking courses in electronics and then I went to college to become an electrical engineer and I loved the academic environment, I loved doing research. So, I knew I wanted to do grad school. I got lucky enough to be accepted at MIT and when I arrived there, I was like, boy, this place is tough! And it was tough! But then when I finished and I went back to my home country, I created the signal processing group at the school there, which was… I was lucky to get fair amounts of funding, so we did lots of cool things. And then, one day, some colleagues in a company here in the US called me back in Brazil and they say, hey, our director of research decided to do something else. Do you want to apply for the position? And then I told my wife, hey, there’s a job opening in the US, what about that? I said, well go talk to them. And I came, talked to them. They make me an offer. And then it took us about a whole month discussing, are we going to move our whole family to another country? Hey, we lived there before, it’s not so bad, because I studied here. And maybe it’s going to be good for the kids. Let’s go. If something doesn’t work, we move back. I say, okay. So, and… here we are. But that was not Microsoft. That was for another company at the time, a company called PictureTel which was actually the leading company in professional video conferencing systems.

Host: Oh, okay.

Rico Malvar: So, we were pushing the state-of-the-art on how do you compress video and audio and these other things? And I was working happily there for about four years and then one day I see Microsoft and I say, wow, Microsoft Research is growing fast. Then one afternoon, I said, ah, okay, I think about it and I send an email to the CTO of Microsoft saying, you guys are great, you are developing all these groups. You don’t have yet a group on signal processing. And signal processing is important because one day we’re going to be watching video on your computers via the internet and all of that, so you should be investing more on that. And I see you already have Windows Media Player. Anyways, if you want to do research in signal processing, here’s my CV. I could build and lead a group for you doing that. And then I tell my wife and she goes, you did what?? You sent an email to the CTO of Microsoft??

Host: Who was it at the time?

Rico Malvar: It was Nathan Myhrvold.

Host: Nathan.

Rico Malvar: And she said, nah. I say, what do I have to lose? The worst case, they don’t respond, and life is good. I have a good job here. It’s all good. And that was on a Sunday afternoon. Monday morning, I get an email from Microsoft. Hey, my name is Suzanne. I work on recruiting. I’m coordinating your interview trip. I said, alright! And then I show the email to my wife and she was like, what? It worked? Whoa! And then it actually was a great time. The environment here, from day one, since the interviews, the openness of everybody, of management, the possibilities and the desire of Microsoft to, yeah, let’s explore this area, this area. One big word here is diversity. Diversity of people, diversity of areas. It is so broad. And that’s super exciting. So, I was almost saying, whatever offer they make me, I’ll take it! Fortunately, they made a reasonable one, so it wasn’t too hard to make that decision.

Host: Well, two things I take away from what you’ve just told me. You keep using the word lucky and I think that has less to do with it than you are making it out to be. Um, because there’s a lot of really smart people here that say, I was so lucky that they offered me this. It’s like, no, they’re lucky to have you, actually. But also, the idea that if you don’t ask, you are never going to know whether you could have or not. I think that’s a wonderful story of boldness and saying why not?

Rico Malvar: Yeah. And in fact, boldness is very characteristic of Microsoft Research. We’re not afraid. We have an idea, we just go and execute. And we’re fortunate, and I’m not going to say lucky, I’m going to say fortunate, that we’re in a company that sees that and gives us the resources to do so.

Host: Rico, I like to ask all my guests, as we come to the end of our conversation, to offer some parting thoughts to our listeners. I think what you just said is a fantastic parting thought. But maybe there’s more. So, what advice or wisdom would you pass on to what we might call the next generation of technical researchers? What’s important for them to know? What qualities should they be cultivating in their lives and work in order to be successful in this arena?

Rico Malvar: I would go back on boldness and diversity. Boldness, you’ve already highlighted Gretchen, that, you know, if you have an idea but it’s not just too rough an idea, you know a thing or two why that actually could work, go after it! Give it a try. Especially if you are young. Don’t worry if you fail many things. I failed many things in my life. But what matters is not the failures. You learn from the failures and you do it again. And the other one is diversity. Always think diversity in all the dimensions. All kind of people, everywhere in the world. It doesn’t matter gender, race, ethnicity, upbringing, rich, poor, whatever they come from, everybody can have cool ideas. The person whom you least expect to invent something might be the one inventing. So, listen to everybody because that diversity is great. And remember, the diversity of users. Don’t assume that all users are the same. Go learn what users really think. If you are not sure if Idea A or Idea B is the better, go talk to them. Try them out, test, get their opinion, test things with them. So, push diversity on both sides, diversity on the creation and diversity on who is going to use your technology. And don’t assume you know. In fact, Satya has been pushing the whole company towards that. Put us in a growth mindset which basically means keep learning, right? Because then if you do that, that diversity will expand and then we’ll be able to do more.

Host: Rico Malvar, I’m so glad that I finally got you on the podcast. It’s been delightful. Thanks for joining us today.

Rico Malvar: It has been a pleasure. Thanks for inviting me.

(music plays)

To learn more about Dr. Rico Malvar and how research for people with disabilities is enabling people of all abilities, visit Microsoft.com/research.

post

Podcast with Microsoft Research Cambridge’s Dr. Cecily Morrison: Empowering people with AI

Cecily Morrison

Researcher Cecily Morrison from Microsoft Research Cambridge

Episode 60, January 23, 2019

You never know how an incident in your own life might inspire a breakthrough in science, but Dr. Cecily Morrison, a researcher in the Human Computer Interaction group at Microsoft Research Cambridge, can attest to how even unexpected events can cause us to see things through a different – more inclusive – lens and, ultimately, give rise to innovations in research that impact everyone.

On today’s podcast, Dr. Morrison gives us an overview of what she calls the “pillars” of inclusive design, shares how her research is positively impacting people with health issues and disabilities, and tells us how having a child born with blindness put her in touch with a community of people she would otherwise never have met, and on the path to developing Project Torino, an inclusive physical programming language for children with visual impairments.

Related:


Episode Transcript

Cecily Morrison: Working in the health and disability space has been a really interesting space to work with these technologies because you can see, on the one hand, that they can have a profound impact on the lives of the people that you’re working with. And when I say profound, I don’t mean, you know, they had a nicer day. I mean, they can have lives and careers that they couldn’t consider otherwise.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: You never know how an incident in your own life might inspire a breakthrough in science, but Dr. Cecily Morrison, a researcher in the Human Computer Interaction group at Microsoft Research Cambridge, can attest to how even unexpected events can cause us to see things through a different – more inclusive – lens and, ultimately, give rise to innovations in research that impact everyone.

On today’s podcast, Dr. Morrison gives us an overview of what she calls the “pillars” of inclusive design, shares how her research is positively impacting people with health issues and disabilities, and tells us how having a child born with blindness put her in touch with a community of people she would otherwise never have met, and on the path to developing Project Torino, an inclusive physical programming language for children with visual impairments. That and much more on this episode of the Microsoft Research Podcast.

(music plays)

Host: Cecily Morrison, welcome to the podcast.

Cecily Morrison: Thank you.

Host: You’re a researcher under the big umbrella of Human Computer Interaction in the Cambridge, England, lab of Microsoft Research and you are working on technologies that enable human health and well-being in the broadest sense. So, tell us, in the broadest sense, about your research. What gets you up in the morning?

Cecily Morrison: I like technology that helps people live the lives that they want to live, whether that’s because they have a health issue or a disability, or they’re just trying to live better. I want to be part of making those technologies. We have a quite an exciting group structure that we work in here. So, at the moment, we sit on a floor of multidisciplinary researchers that mix human computer interaction, design, engineering, software engineering, hardware engineering. We sort of sit together as a community, and then we work across three strands: the future of work, the future of the cloud, and the empowering people with AI. And through those themes of work across our lab, we get to work with people in many different kinds of groups. I specifically work with people in the machine learning team and looking how the kinds of machine learning opportunities that we have now can underpin experiences that really enable people to do things they couldn’t do before.

Host: I want to drill in on this idea of inclusive design for a second. It speaks to a mindset and assumptions that researchers make even as they approach working on a new technology. How would you define research that incorporates inclusion from the outset, and how might we change the paradigm so that inclusivity would be the default mode for everyone?

Cecily Morrison: So, inclusive design, as it’s been put through the inclusive design handbook done by Microsoft, has three important pillars. The first one is to recognize exclusion. So, it used to be that disability was a thing that if you had a different physical makeup, you were missing an arm, you couldn’t see, you were considered to have a disability. And the World Health Organization changed that definition some years back now to say that actually, what disability is, is a mismatch between a person’s physical capabilities and the environment which they’re in. So, if you’re a wheelchair user and you don’t have curb cuts, then you immediately feel disabled because it’s really hard for you to get around. You know what? If you’re a buggy user, you feel the same. You know somehow, you have to get that massive buggy across the pavement. And thank goodness we have curb cuts that were pioneered for people who were using wheelchairs.

Host: Right.

Cecily Morrison: I think, in that regard, as we think about as technologists, we are people who can recognize and address that exclusion by creating technologies that ensure that there isn’t a mismatch between the environment that I and the technology people are using and their particular physical makeup and needs. So, I start from that perspective, that we as technology designers, have an important role to make the world a more inclusive place. Because it’s not about how people are born, or how they – what happens to their bodies over their lives. It’s about the environments that we create, and technology is an important part of the environments that we create. So, the second part of inclusive design is really about saying that when we design things, we need to design for a set of people. And often, we implicitly do this by designing for ourselves. We just don’t recognize that we’re designing for ourselves. And if we don’t have very inclusive teams, that means we get the same ideas over and over again, and they’re a little bit different, and a little bit this way, a little bit that way. But they’re really the same idea. When we start to design for people who have a very different experience of the world, which people with disabilities do, we can start to pull ourselves into a different way of thinking and really start to generate ideas that we wouldn’t have considered before. So, I think people with disabilities can really inspire us to innovate in ways that we hadn’t expected. And the third thing is, then, to extend to many people. So, if we design for a particular group, people say, oh, well there aren’t very many of them, and, you know, where’s my technology? But actually, the exciting thing is that, by designing for a particular group who’s different, we get new ideas that we can potentially extend to many people. So, if you think about designing for somebody with only one arm, and that means, for example, using a computer, a phone, any technology with a single hand. You can think, well, there aren’t that many people who only have one arm. But then you start to think, well, how many people have a broken arm at some time in their lives? Well, that’s a much larger number. So that person has a, what we might think of as a, temporary disability. And then what about those people who have what’s called a situational disability? So, in a particular situation, they only have access to one arm. So, I know this quite well, as the mother of a small baby. If you have to hold a baby and do something on your phone, you need to do it with one hand. I can guarantee you. So, this inclusive design is a way of helping us really generate new ideas by thinking about and working with people with disabilities and then extending them to help all of us. So, we create more innovative technologies that include more people in our world and help us break down those barriers that create disabilities.

Host: Let’s talk about this idea of human health and well-being being central to the focus of your work. Even Christopher Bishop at your lab has said healthcare is a fruitful field for AI and machine learning and technology research in general, but it’s challenging because that particular area is woefully behind other industries simply in embracing current technologies, let alone emerging ones. So how do you see that landscape given the work you’re doing, and what can we do about it?

Cecily Morrison: Well, I remember when I arrived at Microsoft Research, I was really excited to come here because I had just spent four years working in our National Health Service in the UK, really trying to help them put into practice some of the technologies that already existed. And man, was it hard work! It was incredibly important work, but it was really, really hard work. And I don’t think it’s because people are afraid of technologies or they don’t want to use technologies, but you’re dealing with an incredibly complex organization, and you can’t get it wrong. You can’t get it wrong, because the impact you could have on someone’s life is beyond what I think we would ethically allow ourselves. So, I was excited to come to Microsoft Research, and I said you know I really want to work on technologies that impact people, but at the same time, we need a little bit more space to be able to experiment and think about new ideas without being so constrained by having to deliver a service every day. One challenge with healthcare is the easiest way to think about what a technology might do is to imagine what people do now and think, well how would a technology do that? But actually, that’s not really where we see innovation. We see innovation usually coming in at making something different, making something new, or making something easier, not doing something the same.

(music plays)

Host: Let’s talk about some of your specific research. I want to begin with a really cool project called Assess MS. Tell us how this came about. What was the specific problem you were addressing, and how does this illustrate the goal of collaboration between humans and machines?

Cecily Morrison: Right, so Assess MS was a project to track disease progression in multiple sclerosis using computer vision technology. It was a collaboration between Microsoft Research and Novartis Pharmaceutical, with a branch based in Basel, Switzerland. And it really came about as healthcare is moving into the technology space and technology’s moving into the healthcare space with these two large companies thinking about, what could we do together? How can we bring our expertise together? We were approached by our partner, Novartis, and they said, we would like to have a “neurologist in a box.” And it took a lot of time and working with them, negotiating with them, doing design work with them to understand that a neurologist in a box is not really what technology is good at, but we could do something even more powerful. And what that something was, was that we were looking at how do we track disease progression in multiple sclerosis? Now, patients with multiple sclerosis might have very, very different paths of that particular disease. It could progress very quickly, and within two years they lose their lives. They could have it for sixty years and really have minor symptoms such as very numb feet or some cognitive difficulties. These are very, very different experiences, and it can be very difficult for patients to know when or how or which treatments to start if you don’t know any sense of how your disease might progress. And one step in helping patients and clinicians make those decisions is being able to very consistently track when the disease is progressing. Now that was really difficult when we started, because they were using a range of paper and pencil tools where a neurologist would look at a patient, ask them to do a movement such as extending their arm out to the side and then touching their nose, and then checking for a tremor in the hand. Now, in one year with one neurologist, they might say, oh, well that’s a tremor of one. And the next year or the next neurologist, they might say, oh, that’s a tremor of two. Then there’s the question of, has the patient changed, or is it just that the neurologist is at a different time and a different neurologist? Because there’s no absolute criteria for what is a one and what is a two. And again, if you’re lucky enough to have the same doctor, you might be slightly better, but again, it’s been a year’s time between the two experiences. But what a machine does really well – they’re not very good at helping a patient make decisions about their care – but they are very good at doing things consistently. So, tracking disease progression was something that we said, well, we can do very consistently with a machine. And we can then supply those results to the patient and the neurologist to really think through what are the best options for that patient that particular year?

Host: So, how is the machine learning technology playing into this? What specific technical aspects to this Assess MS have you seen developing over the course of this project?

Cecily Morrison: There are quite a range of things, actually. In the first instance, we were using machine learning to do this categorization. So, at the moment, neurological symptoms in MS are already categorized with a particular tool called the Expanded Disability Status Scale, the EDSS. And we were attempting to replicate those measures as being measures that the clinical field was already comfortable with. And so, in that regards, we were using a set of training data of 500 plus patients that we had collected and labeled and using that to train algorithms and test out and research, really, we were more testing out different kinds of algorithms that might be able to discriminate between those patient levels. But actually, what we did on the human-computer interaction side of things was actually making a lot of that machine learning work. So, the first thing that we needed to do was design a device that helped people capture the data in a form that was standardized enough for the machine learning to work well. The first thing that we saw when we just did a little bit of pilots, that the cameras were tilted, people were out of the frame, you couldn’t see half their legs because they had sparkly pants on. All kinds of things that you just don’t imagine until you go into a real-world context that we had to design. And what’s, I think for me, quite interesting is that people are really willing to work with a machine so that the machine can see well, as long as they understand how the machine is seeing. And it’s not seeing like a person. So, we built a physical interface, as in a physical prototype, which allowed people to position and see and adjust the way the vision was seeing so it could capture really good quality data for machine learning.

Host: Right.

Cecily Morrison: That was step one. And then step two was like, oh, we need labeled data to train against, and we discovered, very quickly, that the clinicians – if we’re trying to increase our consistency above clinicians – if we use the current way clinicians label data at the moment, we’re going to get the same level of consistency as clinicians. So, we won’t really have achieved our goal. So, we had to come up with a new way to get more precise and consistent labels from clinicians. And again, we did something pretty interesting there. Partially, we used interaction design features, so we went with the idea that clinicians, and people generally, they’re much better at giving relative labels. So, this person is better than that person, rather than saying, this person is a one and that person’s a two, which we call a discreet label. So, what we did is, we did a pairwise comparison. We said, okay, tell us which person is more disabled. This worked really well in terms of consistency, although we nearly had all of our clinicians quit because they figured, you know, this is incredibly tedious work. And again, that’s where machine learning and good design can come in. Because we said, well actually, we have this great algorithm called TrueSkill. This is an algorithm that was originally used for matching players in Xbox games. But actually, what it does, is give us a probabilistic distribution of how likely someone is better than someone else. So, it takes a problem, which is pairwise comparison, which is an unsquared problem, and makes it a linear problem. And to interpret that for people who don’t really work in this space, that basically means if you have 100 films to label, that takes you 100 times however long it takes, which in this case is about a second, rather than taking 100 times 100.

Host: Right.

Cecily Morrison: Which is a much longer time. By using sort of thoughtful ways and other kinds of machine learning, we could actually make that process much faster. So, we managed to show that we could get much more consistent and finer-grained labels much faster than the original approach. So, we went to build the big system, but in the end, actually, we spent a lot of our times on these challenges that just make computer vision systems work in the real world.

Host: Is this working in the real world, or is it still very much at the prototype research stage?

Cecily Morrison: Well, I think it was a very large project, a lot of data was collected, the data sets are still there. But what we found was that really the machine learning isn’t really up to discriminating the fine level of detail that we need yet. But we have a data set, because we expect, in the next couple of years, it will be. So, it’s on pause.

Host: Let’s talk about one of the most exciting projects you’re working on and it’s launching as we speak, called Project Torino. And you said this was sort of a serendipitous, if not accidental, project for you. Tell us all about Project Torino. This is so cool.

Cecily Morrison: So, Project Torino is a physical programming language for teaching basic programming concepts and computational learning skills to children ages seven to eleven regardless of their level of visions, whether they’re blind, low vision, partially sighted or sighted. It’s a tool that children can use. And it was, indeed, a serendipitous project. We were exploring technology that blind and low-vision children used, because we have a blind child. And at the time, he was quite young. He was about 18 months. And we really wondered how many blind and low-vision people were involved in the design of this technology. And we thought, what would it look like if these kids, these blind and low-vision kids that were in our community that we now knew through our son – what would it look like if they were designing the technologies of tomorrow, their own technologies, other technologies? So, we decided to run an outreach workshop teaching the children in our community how to do a design process and how to come up with their own ideas. So, we brought them together. We had a number of different design process activities that we did. And, you know, they came up with amazing things. We gave them a base technology based on Arduino that turns light into sound. And we just walked them through a process to create something new with that. And they came up with incredible things that you’d never think of. So, one young girl came up with an idea of this hat – very fashionable hat, I have to say – which adjusted the light so that she could always see, because she had a condition where, if the light was perfect, she could see almost perfectly, and if the light was just a little bit wrong, she was almost totally blind. So, it was quite difficult for her in school. We had another child who created this, um, you might call it a robot, which was running around his 100-room castle which was imaginary, I learned, in the end, to find out which rooms had windows, and which rooms didn’t have windows because, at the age of seven, he had told me very confidently that his mom had told him that sighted people like windows, and he should put them in the rooms with windows. So, we were really excited about how engaged the children were, the ideas they came up with were great. But it was an outreach workshop, so when we were finished with the day, we thought we were finished. And that week, a number of the parents phoned me back or emailed me and said, great, you know, my child has come up with several new ideas. They really want to build them, so, how can they code? And I thought, gosh, I have no idea! Most of the, you know, languages that we would use with children of that age group, between seven and eleven, are not very accessible. They’re block-based languages. So, I asked around, did anybody know? We tried a few things out. We tried putting assistive technologies on existing languages, and we discovered that this was a big failure. The first time I made a child cry, I was a little bit sad, a little bit depressed about that. So that was definitely not the right direction, but I was having lunch one day with a colleague of mine who works in my group as a hardware research engineer. And I said, you know, is there anything out there that we could hack together, just to enable these kids to learn to code, give them the basics before they’re ready to code with a text-based language with an assistive technology when they’re a bit older? And the answer was, well, not really, but actually, I think we can build that. I think we’ve got a bunch of the base tech there already. So, we got a bunch of interns together and off we went.

Host: And… where is it now?

Cecily Morrison: It’s been a very exciting journey from that first prototype, which was really a good prototype, tested with ten children, to a second and a third prototype which was then manufactured to test with a hundred children. And after an incredibly successful beta trial, we are partnering with American Printing House for the Blind who will take this technology to market as a product.

Host: Wow. How does it work?

Cecily Morrison: How does it work? It’s a set of physical pods that you connect together with wires. And each of these pods is a statement in your program, and you can connect a number of pods to create a multi-statement program which creates music, stories or poetry. And in the process, with different types of pods, we take children through the different types of control flows that you can have in a programming language.

Host: And so, this is not just, you know, the basics of programing languages. It’s computation thinking and, sort of, preparing them, as you say, for what they might want to do when they get older?

Cecily Morrison: Yeah, so I think whether children become, you know, software engineers or computer scientists in some way or not, a lot of the skills that they can learn through coding and through the computational learning aspect of what we were doing, are key to many, many careers. So those are things like breaking a problem down. You’re stuck; you can’t solve it. How are you going to break it down to a problem that you can solve? Or, you’ve got a bug; it’s not working. How are you going to figure out where it is? How are you going to fix it? Perhaps my favorite one, and perhaps this is just a beautiful memory I had of a child with one of those a-ha moments, is, how do you make something more efficient? A physical programming language can’t have very many pods. And I think, in our current version, we have about twenty-one pods. So, you have to use those really efficiently. That means, you have to use loops if you want to do things again, because you don’t have enough pods to do it out in a serial fashion. And I remember a child trying to create the program with Jingle Bells. It was just before Christmas. We were all ready to go off on holiday, and she was determined to solve this before any of us could go home. She’d mapped it all out, and she said, “But I don’t have enough pods for the last two words!” I said, well, you know, we have solved this, so it must be solvable. So, she’s sitting there and thinking, and her mom looks at her and goes, “Jingle Bells, Jingle Bells…” And all of a sudden, she goes, “Oh, I get it! I get it!” And she reaches for the loop and puts it in a loop. But I think those are the kinds of moments, both as a researcher, which are just beautiful to see when your technologies really help someone move forward. But also, the kind of thing that we’re trying to get children to get at, which is to really understand that they can do things in multiple ways.

Host: Who would ever have thought that Jingle Bells would give someone an a-ha moment in technology research?!

(music plays)

Host: So, let’s talk a bit about some rather cutting-edge, ongoing inclusive design research you’re involved in, where the goal is to create a deeply personal visual agent. What can you tell us about the direction of this research and what it might bode for the future?

Cecily Morrison: I think, across all of the major industrial research labs and industrial partners in technology, there’s a lot of focus on agents, and agents as being a way to augment your world with useful information in the moment. We’ve been working on visual agents, so visual agents are ones that incorporate computer vision. And I think one of the interesting challenges that come from working in this space is that there are many, many things that we can perceive in the world. You know, our computer vision is getting better by the month. Not even by the year, by the month. From when we started to now, the things that we can do are dramatically different. But that’s kind of a problem from a human experience point of view, because, what’s my agent going to tell me, now that I can recognize everything and recognize relationships between things, and I can recognize people? Now we have this relevance problem, is what am I going to surface and actually tell the person which is relevant to them in their particular context? So, I think one of the exciting things that we’re thinking about is how do we make things personalized to people without using either a lot of their data, or asking them to do things that require a deeper understanding of computer science? So, that’s a real challenge of how we build new kinds of algorithms and new kinds of interfaces to work hand-in-hand with agents to get the experience that people want without having to put too much effort in.

Host: So, I want to talk about a topic I’ve discussed with several guests on the podcast. It’s this trend towards cross- or multi-disciplinary research, and I know that’s important to you. Tell us how you view this trend – even the need – to work across disciplines in the research you’re doing today.

Cecily Morrison: Well, I can’t think of a project I’ve ever worked on in technology that hasn’t required working across disciplines. I think if you really want to impact people, that requires people with lots of different kinds of expertise. When I first started doing research as a PhD, I started right away working with clinicians, with social scientists, with computer scientists. That was a small team at the time. The Torino Project that I’ve just discussed, we were quite a large team. We had hardware engineers, software engineers, UX designers, user researchers, social scientists involved. Industrial designers as well. Everyone needed to bring together their particular perspective to enable that system to be built. And I feel, in some ways, incredibly privileged to work at Microsoft Research where I sit on a floor with all those people. So, it’s just a lunch conversation away to get the expertise you need to really think about, how can I get this aspect of what I’m trying to solve?

Host: Hmm. You know, there’s some interesting, and even serious, challenges that arise in the area of safety and privacy when we talk about technologies that impact human health. You’ve alluded to that earlier. So, as we extend our reach, we also extend our risk. Is there anything that keeps you up at night about what you’re doing, and how are you addressing those challenges?

Cecily Morrison: No doubt any technology that uses computer vision, sets many people into a worried expression. What are you capturing? What are you doing with it? So, I’ve certainly thought quite a lot, and quite deeply, about what we do and why we do it. And I think working in the health and disability space has been a really interesting space to work with these technologies because you can see, on the one hand, that they can have a profound impact on the lives of the people that you’re working with. And when I say profound, I don’t mean, you know, they had a nicer day. I mean, they can have lives and careers that they couldn’t consider otherwise. That said, we are, no doubt, with vision technology, capturing other people. But for me, that’s one of the most exciting design spaces that we can work in. We can start to think about, how do we build systems in which users and bystanders have enough feedback that they can make choices in the use of that system? So, it used to be that users of the systems were the ones that controlled the system. But I think we’re moving into an era where we allow people to participate in systems even when they’re not the direct user of those systems. And I think Assess MS was a good example, because there we were also capturing clinical data of people, and we had to be very careful about balancing the need to, for example, look at that data to figure out where our algorithms were going wrong, and respecting the privacy of the individuals as there’s no way to anonymize the data. So, I can assure you, we thought very hard about how we do that within our team. But it was also a very interesting discussion with some of our colleagues who are working in cloud computing to say, you know, there’s a real open challenge here which hopefully won’t be open too much longer, about how we deal with clinical data, how we allow machine learning algorithms to work on data so not everyone can see all of the same data. So, it’s certainly top of mind in how we do that ethically and respectfully, and of course, legally, now that we have many legal structures in place.

Host: Cecily, tell us a bit about yourself. Your undergrad is in anthropology, and then you got a diploma and a PhD in computer science. How and why did that happen, and how did you end up working in Microsoft Research?

Cecily Morrison: Well, I suppose life never takes the direction you quite expect. It certainly hasn’t for me. I did a lot of maths and science as a high school student. But I was getting a little bit frustrated, because I really liked understanding people. And what I really liked about anthropology was it was a very systematic way of looking at human behavior and how different behaviors could adjust the system in different ways. And that, to me, was a little bit like some of the maths that I was doing, but just with people. Sort of solving the same kind of problems but using people and systems rather than equations. So, I found that very interesting. I went off to do a Fulbright Scholarship in Hungary. I was studying the role of traditional music, in particular bagpipe music, in the changes and political regimes in Hungary. And, as part of that, I spent a couple of years there, I found some really interesting things with children. I started teaching kids. I started working with them on robotics, just because, well, it was fun. And having done that, I was then seeing that, actually, there could be a lot of better ways to build technology that supports interaction between children in the classroom. So off I set myself to find a way to build better technologies. I figured I needed to know something about computing first. So, I thought I’d do a diploma in computer science. But that, again, distracted me when I was given this opportunity to work in the healthcare space and I realized that really what I wanted to do was create technology that enabled people in ways they wanted to be enabled, whether that be education or health or disability. So, I ended up doing a PhD in computing and then, very quickly, moving into working in technology in the NHS. And soon after that I came to Microsoft to work on the Assess MS project.

Host: So, you have two boys, currently 11 months and 6 years. Do you feel like kids, in general, and your specific boys are informing your work, and how has that impacted things, as you see them, from a research perspective?

Cecily Morrison: Again, one of the serendipities of life, you can get frustrated with them, or you can take them and run with them. So, I have an older child who was born just before I started at Microsoft, who is blind, and I have another 11-month-old baby who… we call him a classic. We have the new age and the classic version. And it very much has impacted my work. Seeing the world in a different perspective, taking part in communities that I wouldn’t otherwise have seen or taken part of have definitely driven what we’ve done. So, Torino is certainly an example of that. But a lot of the work I’ve done around inclusive design is driven very much by that. And I think, interestingly enough, in the agent space, we have done some work with people who are blind and low vision because, at the time we started working with agents, typical people were not heavy users of agents. In fact, most people thought they were toys. Whereas for people who are blind and low vision, they were early adopters and heavy users of agent technologies and really could work with us to help push the boundaries of what these technologies can do. If you’re not using technology regularly, you can’t really imagine what the next steps were. So, it’s a great example of inclusive design where we can work with this cohort of young, very able, blind people to help us think about what agents of the future are going to look like for all of us.

Host: So, while we’re on the topic of you, you’re a successful young woman doing high-tech research. What was your path to getting interested? Was it just natural, or did you have role models or inspirations? Who were your influences?

Cecily Morrison: (laughs) Well, I think, as maybe some of the stories I’ve said so far, you could see serendipity has played a substantial role in my life, and I guess I’m grateful to my parents for being very proactive in helping me accept serendipity and running with it wherever it has taken me. I think I’ve been very lucky to have a boss and mentor, Abby Sellen, maybe people may know from the HCI community, who’s been amazingly adept at navigating, building great technology and navigating the needs we all have as people in our own personal lives. I’m sure there have been many other people. I take inspiration wherever it’s offered.

Host: As we close, Cecily, I’d like you to share some personal advice or words of wisdom. What you’re doing is really inspirational and really interesting. How could academically minded people in any discipline get involved in building technologies that matter to people, like you?

Cecily Morrison: I think knowing about the world helps you build technologies that matter. And to take an example from the blind space, I’ve seen a lot of technology out there where people build technology because they want to do good, but they don’t know how to do good, because they don’t know the people they’re designing for and building. We have lots of techniques for getting to know people. But I think in some ways, the best is to just go out and have a life outside of your academic world that you can draw inspiration from. Go find people. Go talk to people. Go volunteer with people. To me, if we want to build technologies that matter to people, we need to spend a good part of our life with people understanding what matters to them, and that’s something that drives me as a person. And I think it then comes into the way I think about technology. Another thing to say is, be open to serendipity. Be open to the things that cross your path. And I know, as academic researchers, sometimes we feel that we need to define ourselves. And perhaps that’s important, although it’s never been the way that I’ve worked. But I think there’s also something about, you can be incredibly genuine if you go with things that are really meaningful to you. And being genuine in what you do gives you insights that nobody else will have. I never expected to have a blind child, but I think it’s been incredibly impactful in the way I approach my life and the way I approach the technology I build. And I don’t think I would have innovated in the same way if I had not had that sort of deep experience of living life in a different way.

Host: Cecily Morrison, thanks for joining us today.

Cecily Morrison: Thanks very much.

(music plays)

To learn more about Dr. Cecily Morrison and how researchers are using innovative approaches to empower people to do things they couldn’t do before, visit Microsoft.com/research.

post

Scientists discover how bacteria use noise to survive stress

January 22, 2019 | By Microsoft blog editor

Noisy expression of stress response in microcolony of E. coli.

Mutations in the genome of an organism give rise to variations in its form and function—its phenotype. However, phenotypic variations can also arise in other ways. The random collisions of molecules constituting an organism—including its DNA and the proteins that transcribe the DNA to RNA—result in noisy gene expression that can lead to variations in behavior even in the absence of mutations. In a research paper published in Nature Communications, researchers at Microsoft Research and the University of Cambridge have discovered how bacteria can couple noisy gene expression with noisy growth to survive rapidly changing environments.

“We have taken advantage of advances in microfluidics technology, time-lapse microscopy, and the availability of libraries of genetically modified bacteria that have happened in the past decade or so to provide unprecedented detail of how single cells survive stress,” says Microsoft PhD Scholar Om Patange. “We hope this will help fellow researchers see that studies of bacteria at the single-cell level can reveal important aspects of how these organisms live and contend with their environment.”

Cells stochastically turn on their stress response and slow down growth to survive future stressful times. A montage of E. coli grown in a microfluidics device illustrates this phenomenon.

Using a microfluidic device, Patange—together with colleagues and cosupervisors Andrew Phillips, head of Microsoft Research’s Biological Computation group, and James Locke, research group leader at Cambridge’s Sainsbury Laboratory—observed single Escherichia coli cells grow and divide over many generations. They found that a key regulator of stress response called RpoS pulsed on and off. When these happily growing cells were exposed to a sudden chemical stress, the few cells ready for the stress survived. This is a striking example of a microbial population partitioning into two populations despite being of the same genetic makeup. The researchers further discovered that the surviving population was paying a cost to survive: They grew slower than their neighbors.

To uncover the mechanism causing the cells to grow slowly and turn on their stress response, the researchers developed a stochastic simulation of biological reactions inside single cells. They found that a simple mutual inhibitory coupling of noisy stress response and noisy growth caused the pulses observed and also captured more subtle observations.

This study, for which single-cell datasets are available on GitLab, has both pure and applied implications. The stress response phenomenon may be related to persistence, a strategy used by bacteria to evade antibiotics without mutations. Understanding the connection between persistence and stress response may lead to more nuanced approaches to antibiotic treatments. The idea that bacteria have evolved a population-level phenotype governed by single-cell actions is also intriguing. Understanding the benefit gained by the population at the expense of single bacteria may yield insights into the evolution of cooperative strategies.

“The bacteria might teach us about cooperative strategies we haven’t already come up with,” says Patange. “We might also learn how to use and defend against bacteria better if we can see the world from their perspective.”

post

WIRED: Undersea servers stay cool while processing oceans of data

Most electronics suffer a debilitating aquaphobia. At the ­littlest­ spillage—heaven forbid Dorothy’s bucket—of water, our wicked widgets shriek and melt.

Microsoft, it would seem, missed the memo. Last June, the company installed a smallish data center on a patch of seabed just off the coast of Scotland’s Orkney Islands; around it, approximately 933,333 bucketfuls of brine circulate every hour. As David Wolpert, who studies the thermodynamics of computing systems, wrote in a recent blog post for Scientific American, “Many people have impugned the rationality.”

Related Stories

The idea to submerge 864 servers in saltwater was, in fact, quite rational, the result of a five-year research project led by future-proofing engineers. Errant liquid might fritz your phone, but the slyer, far deadlier killer of technology is the opposing elemental force, fire. Nearly every system failure in the history of computers has been caused by overheating. As diodes and transistors work harder and get hotter, their susceptibility to degradation intensifies exponentially. Localized, it’s the warm iPhone on your cheek or a wheezing laptop giving you upper-leg sweats. At scale, it’s Outlook rendered inoperable by remote server meltdown for 16 excruciating hours—which happened in 2013.

Servers underlie the networked world, constantly refreshing the cloud with droplets of data, and they’re as valuable as they are vulnerable. Housed by the hundreds, and often the thousands, in millions of data centers across the United States, they cost billions every year to build and protect. The most significant number, however, might be a single-digit one: Running these machines, and therefore cooling them, blows through an estimated 5 percent of total energy use in the country. Without that power, the cloud burns up and you can’t even fact-check these stats on Google (an operation that costs some server, somewhere, a kilojoule of energy).

Alyssa Foote

Savings of even a few degrees Celsius can significantly extend the lifespan of electronic components; Microsoft reports that, on the ocean floor 117 feet down, its racks stay 10 degrees cooler than their land-based counterparts. Half a year after deployment, “the equipment is happy,” says Ben Cutler, the project’s manager. (The only exceptions are some of the facility’s ­outward-facing cameras, lately blinded by algal muck.)

Another Microsoft employee refers to the effort as “kind of a far-out idea.” But the truth is, most hyperscalers investing in superpowered cloud server farms, from Amazon to Alibaba, see in nature a reliable defense against ever more sophisticated, heat-spewing circuits. Google’s first data center, built in 2006, sits on the temperate banks of Oregon’s Columbia River. In 2013, Facebook opened a warehouse in northern Sweden, where winters average –20 degrees Celsius. The data company Green Mountain buried its massive DC1-­Stavenger center inside a Norwegian mountain; pristine, near-freezing water from a fjord, guided by gravity, flows through the cooling system. What Tim Cook has been calling the “data-­industrial complex” will rely, if it’s to sustainably expand to the farthest reaches, on a nonindustrial means of survival.

Alyssa Foote

Underwater centers may represent the next phase, a reverse evolution from land to sea. It’s never been hard, after all, to waterproof large equipment—think of submarines, which get more watertight as they dive deeper and pressure increases. That’s really all Microsoft is doing, swapping out the payloads of people for packets of data and hooking up the trucklong pod to umbilical wiring.

Nonetheless, Cutler says, the concept “catches people’s imagination.” He receives enthusiastic emails about his sunken center all the time, including one from a man who builds residential swimming pools. “He was like, you guys could provide the heating for the pools I install!” Cutler says. When pressed on the feasibility of the business model, Cutler adds: “We have not studied this.”

Alyssa Foote

Others have. IBM maintains a data center outside of Zurich that really does heat a public swimming pool in town, and the Dutch startup Nerdalize will erect a mini green data center in your home with promises of a warm shower and toasty living room. Hyperlocal servers, part of a move toward so-called edge computing, not only provide recyclable energy but also bring the network closer to you, making your connection speeds faster. Microsoft envisions sea-based facilities like the one in Scotland serving population-dense coastal cities all over the world.

“I’m not a philosopher, I’m an engineer,” Cutler says, declining to offer any quasipoetic contemplations on the imminent fusion of nature and machine. Still,
he does note the weather on the morning his team hauled the servers out to sea. It was foggy, after a week of clear skies and bright sun—as though the literal cloud, reifying the digital, were peering into the shimmering, unknown depths.


Jason Kehe (@jkehe) wrote about drone swarms in issue 26.08.

This article appears in the January issue. Subscribe now.


More Great WIRED Stories