Posted on Leave a comment

Minimizing trial and error in the drug discovery process

molecules, stock image

In 1928, Alexander Fleming accidentally let his petri dishes go moldy, a mistake that would lead to the breakthrough discovery of penicillin and save the lives of countless people. From these haphazard beginnings, the pharmaceutical industry has grown into one of the most technically advanced and valuable sectors, driven by incredible progress in chemistry and molecular biology. Nevertheless, a great deal of trial and error still exists in the drug discovery process. With an estimated space of 1060 small organic molecules that could be tried and tested, it is no surprise that finding useful compounds is difficult and that the process is full of costly dead ends and surprises.

The challenge of molecule design also lies at the heart of many applications outside pharmacology, including in the optimization of energy production, electronic displays, and plastics. Each of these fields has developed computational methods to search through molecular space and pinpoint useful leads that are followed up in the lab or in more detailed physical simulations. As a result, there are now vast libraries of molecules tagged with useful properties. The abundance of data has encouraged researchers to turn to data-driven approaches to reduce the degree of trial and error in chemical development, and the aim of our paper being presented at the 2018 Conference on Neural Information Processing Systems (NeurIPS) is to investigate how recent advances, specifically in deep learning techniques, could help harness these libraries for new molecular design tasks.

Deep learning with molecular data

Figure 1: The chemical structure of naturally occurring penicillin (penicillin G) and its representation as a graph in a GGNN. The messages passed in the environment of a single node are shown as curved arrows, and the neural networks that transform the messages are shown as small squares. Repeated rounds of message passing allow each node to learn about its surroundings (gray circles).

Deep learning methods have revolutionized a range of applications requiring understanding or generation of unstructured data such as pictures, audio, and text from large datasets. Applying similar methods to organic molecules poses an interesting challenge because molecules contain a lot of structure that is not easy to concisely capture with flat text strings or images (although some schemes do exist). Instead, organic chemists typically represent molecules as a graph where nodes represent atoms and edges represent covalent bonds between atoms. Recently, a class of methods that have collectively become known as neural message passing has been developed precisely to handle the task of deep learning on graph-structured data. The idea of these methods is to encode the local information, such as which element of the periodic table a node represents, into a low-dimensional vector at each node and then pass these vectors along the edges of the graph to inform each node about its neighbors (see Figure 1). Each message is channeled through small neural networks that are trained to extract and combine information to update the destination node’s vector representation to be informative for the downstream task. The message passing can be iterated to allow each node to learn about its more distant neighbors in the graph. Microsoft Research developed one of the earliest variants of this class of deep learning models—the gated graph neural network (GGNN). Microsoft’s primary application focus for GGNNs is in the Deep Program Understanding project, where they are used to analyze program source code (which can also be represented using graphs). Exactly the same underlying techniques are applicable to molecular graphs.

Generating molecules

Figure 2: Example molecules generated by our system after being trained on organic solar cell molecules (CEP database).

Broadly speaking, there are two types of questions that a machine learning system could try to solve in molecule design tasks. First, there are discriminative questions of the following form: What is the property Y of molecule X? A system trained to answer such questions can be used to compare given molecules by predicting their properties from their graph structure. Second, there are generative questions—what is the structure of molecule X that has the optimum property Y?—that aim to invent structures that are similar to molecules seen during training but that optimize for some property. The new paper concentrates on the latter, generative question; GGNNs have already shown great promise in the discriminative setting (for example, see the code available here).

The basic idea of the generative model is to start with an unconnected set of atoms and some latent “specification” vector for the desired molecule and gradually build up molecules by asking a GGNN to inspect the partial graph at each construction step and decide where to add new bonds to grow a molecule satisfying the specification. The two key challenges in this process are ensuring the output of chemically stable molecules and designing useful low-dimensional specification vectors that can be decoded into molecules by the generative GGNN and are amenable to continuous optimization techniques for finding locally optimal molecules.

For the first challenge, there are many chemical rules that dictate whether a molecular structure is stable. The simplest are the valence rules, which dictate how many bonds an element can make in a molecule. For example, carbon atoms have a valency of four and oxygen a valency of two. Inferring these known rigid rules from data and learning to never violate them in the generative process is a waste of the neural network’s capacity. Instead, in the new work, we simply incorporate known rules into the model, leaving the network free to discover the softer trends and patterns in the data. This approach allows injection of domain expertise and is particularly important in applications where there is not enough data to spend on relearning existing knowledge. We believe that combining this domain knowledge and machine learning will produce the best methods in the future.

Figure 3: Example molecule optimization trajectory when optimizing the quantitative estimate of drug-likeness (QED) of a molecule after training on the ZINC database. The initial molecule has a QED of 0.4, and the final molecule has a QED of 0.9

Figure 3: Example molecule optimization trajectory when optimizing the quantitative estimate of drug-likeness (QED) of a molecule after training on the ZINC database. The initial molecule has a QED of 0.4, and the final molecule has a QED of 0.9

For the second challenge, we used an architecture known as a variational autoencoder to discover a space of meaningful specification vectors. In this architecture, a discriminative GGNN is used to predict some property Y of a molecule X, and the internal vector representations in this discriminative GGNN are used as the specification vector for a generative GGNN. Since these internal representations contain information about both the structure of molecule X and the property Y, continuous optimization methods can be used to find the representation that optimizes property Y; the representation is then decoded to find useful molecules. Example molecules generated by the new system are shown in Figures 2 and 3.

Collaborating with experts

The results in the paper are very promising on simple molecule design tasks. However, deep learning methods for molecule generation are still in their infancy, and real-world molecule design is a very complicated process with many different objectives to consider, such as molecule efficacy, specificity, side effects, and production costs. To make significant further progress will require collaboration of machine learning experts and expert chemists. One of the main aims of this paper is to showcase the basic capabilities of deep learning in this space and thereby act as a starting point for dialogue with chemistry experts to see how these methods could enhance their productivity and have the most impact.

Posted on Leave a comment

Princeton and Microsoft collaborate to tackle fundamental challenges in microbiology

Princeton University has teamed up with Microsoft to collaborate on the leading edge of microbiology and computational modelling research.   

In this project, Microsoft is helping Princeton to better understand the mechanisms of biofilm formation by providing advanced technology that will greatly extend the type of research analysis capable today. Biofilms — surface-associated communities of bacteria — are the leading cause of microbial infection worldwide and kill as many people as cancer does. They are also a leading cause of antibiotic resistance, a problem highlighted by the World Health Organization as “a global crisis that we cannot ignore.” Understanding how biofilms form could enable new strategies to disrupt them. 

Ned Wingreen

Ned Wingreen, the Howard A. Prior Professor in the Life Sciences and professor of molecular biology and the Lewis-Sigler Institute for Integrative Genomics.

To support Princeton, a Microsoft team led by Dr. Andrew Phillips, head of the Biological Computation group at Microsoft Research, will be working closely with Bonnie Bassler, a global pioneer in microbiology who is the Squibb Professor in Molecular Biology and chair of the Department of Molecular Biology at Princeton and a Howard Hughes Medical Institute Investigator, and with Ned Wingreen, the Howard A. Prior Professor in the Life Sciences and professor of molecular biology and the Lewis-Sigler Institute for Integrative Genomics.

Using the power of Microsoft’s cloud and advanced machine learning, Princeton will be able to study different strains of biofilms in new ways to better understand how they work. Microsoft is contributing a cloud-based prototype that can be used for biological modelling and experimentation that will be deployed at Princeton. This work combines programming languages and compilers, which generate biological protocols that can be executed using lab automation technology. It allows experimental data to be uploaded to the cloud where it can be analyzed at scale using advanced machine learning and data analysis methods, to generate biological knowledge. This in turn informs the design of subsequent experiments, to provide insight into the mechanisms of biofilm formation. Princeton is contributing world-leading expertise in experiments and modelling of microbial biofilms.  

“This collaboration enables us to bring together advances in computing and microbiology in powerful new ways,” said Brad Smith, president of Microsoft. “This partnership can help us unlock answers that we hope someday may help save millions of people around the world.”

“By combining our distinctive strengths, Princeton and Microsoft will increase our ability to make the discoveries needed to improve lives and serve society,” said Christopher L. Eisgruber, president of Princeton University. “Technology is creating new possibilities for collaboration, and we hope this venture will inspire other innovative partnerships in the years ahead.”

Pablo Debenedetti, Princeton’s dean for research, said: “We are delighted to be collaborating with Microsoft to advance scientific innovation with this new project, investigating the fundamentals that underlie urgent biomedical problems. Doing cutting-edge research that helps define the boundaries of knowledge and that could ultimately benefit society at large is what we strive for at Princeton.”

Princeton’s relationship with Microsoft is one of the University’s most extensive with industry, spanning collaborations in computer science, cybersecurity and now biomedical research.

As a global research university and leader in innovation, Princeton University cultivates mutually beneficial relationships with companies to support the University’s educational, scientific and scholarly mission. The University is guided by the principle that initiatives to fortify and connect with the innovation ecosystem will advance Princeton’s role as an internationally renowned institution of higher education and accelerate its ability to have greater impact in the world. 

Posted on Leave a comment

MARLÖ competition challenges researchers to build AI agents that collaborate — and compete to win

Malmo

With the latest Project Malmo competition, we’re calling on researchers and engineers to test the limits of their thinking as it pertains to artificial intelligence, particularly multi-task, multi-agent reinforcement learning. Last week, a group of attendees at the 14th Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’18) participated in a one-day workshop featuring the competition, exchanging ideas on the unique challenges of the research area with some of the field’s leading minds.

Learning to Play: The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition requires participants to design learning agents capable of collaborating with or competing against other agents to complete tasks of varying difficulty across 3D games. It is the second competition affiliated with Project Malmo, which is an open-ended platform designed for the experimentation of artificial intelligence. Last year’s Malmo Collaborative AI Challenge yielded a diversity and creativity in approach that exceeded expectations, and we look forward to the same this time around.

The competition, co-hosted by Microsoft, Queen Mary University of London, and CrowdAI, is open to participants worldwide through December 31 (submit your entries here).

Sam Devlin, a game AI researcher from the Machine Intelligence and Perception group at Microsoft Research Cambridge, organized the MARLÖ AIIDE 2018 Workshop in collaboration with our academic partners Diego Perez-Liebana of Queen Mary and Sharada Mohanty of École Polytechnique Fédérale de Lausanne, Switzerland.

The workshop included a short tutorial of MARLÖ that allowed attendees to experiment with competition agents and keynote addresses from two distinguished speakers. There were also a series of short contributed talks and a panel session to encourage attendees to share ideas around the application of reinforcement learning in modern commercial video games.

Jesse Cluff, Principal Engineering Lead, The Coalition

Jesse Cluff, Principal Engineering Lead, The Coalition

The first keynote speaker was Jesse Cluff, Principal Engineering Lead with The Coalition. Jesse has more than 20 years of experience in the industry, working on many exciting game titles, including Jackie Chan Stuntmaster, The Simpsons: Hit & Run, Bully, and Gears of War 4. During the workshop, he explored two aspects of game AI—the hardware side in discussing how we run programs in real time with limited resources and the emotional side in discussing how we maximize the enjoyment of players while controlling difficulty. He also talked about how AI techniques are actually used in commercial game products and the challenges he’s facing that still need further research.

Martin Schmid, Research Scientist, DeepMind

Martin Schmid, Research Scientist, DeepMind

Martin Schmid, a research scientist with DeepMind, was the second keynote speaker. He is the lead author of DeepStack, the first computer program to outplay human professionals at heads-up no-limit Texas Hold’em poker, and he spoke about the program as an example of how successful AI methods used in complex games of perfect information like Go can advance AI application in imperfect-information games like poker. The work has huge practical significance since we regularly have to deal with imperfect information in the real world. These two keynotes were inspiring for faculty, researchers, and graduate students in attendance.

From left: Mobchase, Buildbattle, and Treasurehunt

From left: Mobchase, Buildbattle, and Treasurehunt

The workshop also featured the MARLÖ competition’s kickoff tournament. Agents of the participating teams competed in a round robin to achieve the highest scores across three different games—Mobchase, Buildbattle, and Treasurehunt. At the end of the day, we announced the rankings of the enrolled teams. The top three eligible teams will each be presented with the Progress Award, a travel grant worth up to $2,500 for use toward a relevant conference at which they can publish their competition results. The MARLÖ competition is open until December 31, after which the final tournament will be held offline. We hope to see more participants join.

Posted on Leave a comment

Podcast: Hearing in 3D with Dr. Ivan Tashev

Ivan Tashev podcast

Partner Software Architect, Dr. Ivan Tashev

Episode 50, November 14, 2018

After decades of research in processing audio signals, we’ve reached the point of so-called performance saturation. But recent advances in machine learning and signal processing algorithms have paved the way for a revolution in speech recognition technology and audio signal processing. Dr. Ivan Tashev, a Partner Software Architect in the Audio and Acoustics Group at Microsoft Research, is no small part of the revolution, having both published papers and shipped products at the forefront of the science of sound.

On today’s podcast, Dr. Tashev gives us an overview of the quest for better sound processing and speech enhancement, tells us about the latest innovations in 3D audio, and explains why the research behind audio processing technology is, thanks to variations in human perception, equal parts science, art and craft.

Related:


Episode Transcript

Ivan Tashev: You know, humans, they don’t care about mean square error solution or maximum likelihood solution, they just want the sound to sound better. For them. And it’s about human perception. That’s one of the very tricky parts in audio signal processing.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: After decades of research in processing audio signals, we’ve reached the point of so-called performance saturation. But recent advances in machine learning and signal processing algorithms have paved the way for a revolution in speech recognition technology and audio signal processing. Dr. Ivan Tashev, a Partner Software Architect in the Audio and Acoustics Group at Microsoft Research, is no small part of the revolution, having both published papers and shipped products at the forefront of the science of sound.

On today’s podcast, Dr. Tashev gives us an overview of the quest for better sound processing and speech enhancement, tells us about the latest innovations in 3D audio, and explains why the research behind audio processing technology is, thanks to variations in human perception, equal parts science, art and craft. That and much more on this episode of the Microsoft Research Podcast.

Host: Ivan Tashev, welcome to the podcast.

Ivan Tashev: Thank you.

Host: Great to have you here. You’re a Partner Software Architect in the Audio and Acoustics groups at Microsoft Research, so, in broad strokes, tell us about your work. What gets you up in the morning, what questions are you asking, what big problems are you trying to solve?

Ivan Tashev: So, in general, in Audio and Acoustics Research Group, we do audio signal processing. That includes enhancing of a captured sound by our microphones, better sound reproduction using binaural audio, so-called spatial audio. We do a lot of work in audio analytics, recognition of audio objects, recognition of the audio background. We design a lot of interesting audio devices. Our research ranges from applied research related to Microsoft products to a blue-sky research far from what is Microsoft business today.

Host: So, what’s the ultimate goal? Perfect sound?

Ivan Tashev: Hhhh… Perfect sound is a very tricky thing, because it is about human perception. And this is very difficult to be modeled using mathematical equations. So, the classic statistical signal processing was established in 1947 with a paper published by Norbert Wiener defining what we call, today, the Wiener Filtering. The approach is simple: you have a process, you make a statistical model, you define optimality criterion, make the first derivative, make it zero, voila! You have the analytical solution of the problem. The problem is that, you either have an approximate model, and find the solution analytically, or you have precise model which you cannot solve analytically. The other thing is the optimality criterion. You know, humans, they don’t care about mean square error solution or maximum likelihood solution, they just want the sound to sound better. For them. And it’s about human perception. That’s one of the very tricky parts in audio signal processing.

Host: So, where are we heading in audio signal processing, in the era of machine learning and neural networks?

Ivan Tashev: The machine learning and neural networks are capable to find the solution from the data without us making an approximate model. And this is the beauty of this whole application of machine learning in signal processing, and the reason why we achieve significantly better results than using statistical signal processing. Even more, we train the neural network using certain cost function and we can make the cost function to be even another neural network, trained on human perception for better audio which allows us to achieve better perception of a higher quality of the speech enhancement we do using neural network. I’m not saying that we should go in every single audio processing block using machine learning and neural networks. We have processing blocks which have a nice and clean analytical solution, and this runs fast and efficient, and they will remain the same. But in many cases, we operate with approximate models with not very natural optimality criteria. And then, this is where the machine learning shines. This is where we can achieve much better results and provide a higher quality of our output signal.

Host: One interesting area of research that you are doing is noise robust speech recognition. And this is where researchers are working to improve automatic speech recognition systems. So, what’s the science behind this and how are algorithms helping to clean up the signal?

Ivan Tashev: We are witnessing a revolution in speech recognition. The classic speech recognizer was based on so-called Hidden Markov Models or HMM’s. And they served us quite well, but the revolution came when neural networks were implemented and trained to do speech recognition. My colleagues in the speech research group were the first to design a neural network-based speech recognition algorithm which instantly showed better results than the existing production HMM-based speech recognizer. The speech recognition engine has one channel input, while in audio processing, we can deal with multiple channels, so-called microphone arrays, and they give us a sense of spatiality. We can detect the direction where the sounds came from. We can enhance that sound. We can suppress sounds coming from other directions. And then provide this cleaner sound to the speech recognition engine. The microphone reprocessing technologies combined together with techniques like sound source localization and tracking and sound source separation allow us to even separate two simultaneously speaking humans in the conference room and feed two separate instances of the speech recognizer for meeting transcription.

Host: Are you serious?

Ivan Tashev: Yes, we can do that. Even more, the audio processing engine has more prior information. For example, the signal we send to the loudspeakers. And the goal of this engine is to remove the sound which is interfering for our sound. And this is also one of the oldest signal processing algorithms and every single speaker phone has it. But, in all instances, it has been implemented as a mono acoustic echo cancellation. In Microsoft, we were the first to design a stereo and surround sound echo canceller despite a paper written by the inventor of the acoustic echo cancellation himself, stating that stereo acoustic cancellation is not possible. And it’s relatively simple to understand: you have two channels between the left and the right speaker coming to one microphone, so you have one equation and two unknowns. And Microsoft released, as part of Kinect for Xbox, a surround sound echo cancellation engine. Not that we solved five unknowns from one equation, but we just found a workaround which was good enough for any practical purposes and allowed us to clean the surround sound coming from the Xbox to provide a cleaner sound to the speech recognition engine.

Host: So, did you write a paper and say, “Yes, it is possible, thank you very much!”?

Ivan Tashev: I did write a paper.

Host: Oh, you did!

Ivan Tashev: And it was rejected with the most crucial feedback from the reviewers I have ever seen in my career. It is the same to go to the French Academy of Sciences and to propose eternal engine. They have decided, since 18th century, not to discuss papers about that. When I received the rejection notice, I went downstairs in my lab, started the demo, listened to the output. Okay, it works! So, we should be fine!

(music plays)

Host: One thing that’s fascinated me about your work is the infamous anechoic chamber – or chambers, as I came to find out – at Microsoft, and one’s right here in Building 99, but there are others. And so, phrases like “the quietest place on earth” and “where sound goes to die” are kind of sensational, but these are really interesting structures and have really specific purposes which I was interested to find out about. So, tell us about these anechoic, or echo-free, chambers. How many are there here, how are they different from one another and what are they used for?

Ivan Tashev: So, the anechoic chamber is just a room insulated from the sounds outside. In our case, it’s a concrete cube which does not touch the building and sits on around half a meter of rubber to prevent vibrations from the street to come into the room. And internally, the walls, the ceiling and the floor are covered with sound absorption panels. This is pretty much it. What happens is that the sound from the source reaches the microphone, or the human ear, only using the direct path. There is no reflection from the walls and there is no other noise in the chamber. Pretty much that anechoic chamber simulates absence of a room. And it’s just an instrument for making acoustical measurements. What we do in the chamber is we measure the directivity patterns of microphones or radiation patterns of loudspeakers as they are installed in the devices we design. Initially, the anechoic chamber here, in Microsoft Building 99, the headquarters of Microsoft Research, was the only one in Microsoft. But with our engagement with product teams, it became overcrowded, and our business partners decided to build their own anechoic chambers. And there are, today, five in Microsoft Corporation. They all can perform the standard set of measurements, but all of them are a little bit different from each other. For example, the “Quietest Place in the Earth,” as recorded in the Guinness Book of Records, is the anechoic chamber in Building 88. And the largest anechoic chamber is in Studio B which allows making measurements with lower frequencies than in the rest of the chambers. In our chamber, in Building 99, it’s the only one in Microsoft which can allow human beings to stay prolonged amount of time in the chamber because we have air-conditioning connected to the chamber. It’s a different story how much effort it cost us to make the rumbling noise from the air conditioner not to enter the anechoic chamber. But this allowed us to do a lot of research on human spatial hearing in that chamber.

Host: So, drill in on that a little bit because, coming from a video production background, the air conditioner in a building is always the annoying part for the sound people. But you’ve got that figured out in the way that you situated the air conditioning unit and so on?

Ivan Tashev: To remove this rumbling sound from the air conditioner, we installed a gigantic filter which is under the floor of the entire equipment room. So, think about six by four meters floor and this is how we were able to reduce the sound from the air conditioning. Still, if you do a very precise acoustical measurement, we have the ability to switch it off.

Host: Okay. So, back to what you had said about having humans in this room for prolonged periods of time. I’ve heard that your brain starts to play tricks on you when you are in that quiet of a place for a prolonged period of time. What’s the deal there?

Ivan Tashev: OK. This is the human perception of the anechoic chamber. Humans, in general, are, I would say two and a half dimensional creatures. When we walk on the ground, we don’t have very good spatial hearing, vertically. We do much better horizontally. But also, we count on the first reflection from the ground to use it as a distance cue. When you enter the anechoic chamber, you subconsciously swallow, and this is a reaction because your brain thinks that there is a difference in the pressure between your inner ear and the atmosphere which presses the ear drums and you cannot hear anything.

Host: So that swallowing reaction is what you do when you’re in an airplane and the pressure actually changes. And you get the same perception in this room, but the pressure didn’t change.

Ivan Tashev: Exactly. But the problem in the room is that you cannot hear anything just because there is no sound in the chamber. And the other thing what happens is you cannot hear that reflection from the floor which is basically very hard-wired in our brains. We can distinguish two separate sounds when the distance between them is a couple of milliseconds. And when the sound source is far away, this difference between the direct path and the reflection from the ground is less than that. We hear this as one sound. We start to perceive those two as separate sounds when the sound source is closer than a couple of meters away… means two jumps. Then subconsciously alarm bells start to ring in our brain that, hey, there is a sound source less than two jumps away, watch out not to become the dinner! Or maybe this is the dinner!

Host: So, the progress, though, of what your brain does and what your hearing does inside the chamber for one minute, for ten minutes, what happens?

Ivan Tashev: So, there is no sound. And, the brain tries to acquire as much information as possible. And the situation when you don’t get information is called information deprival. You, first after a minute or so, start to hear a shhhhhh, which is actually the blood in the vessels of your ear. Then, after a couple of minutes, you start to hear your body sounds, your heartbeat, your breathing. And, under no other senses, eyes closed, no sound coming, literally you reach, after ten, fifteen minutes the stage of audio hallucinations. Our brains are pattern-matching machines, so sooner or later, the brain will start to recognize sounds you have heard somewhere in different places. We – people from my team – we have not reached that stage, simply because when you work there, the door is open, the tools are clanking, we have conversations, etcetera, etcetera. But maybe someday I will have to lay there and close my eyes and see, can I reach the hallucination stage?

(music plays)

Host: Well, let’s talk about the research behind Microsoft Kinect. And that’s been a huge driver of innovations in this field. Tell us how the legacy of research and hardware for Kinect led to progress in other areas of Microsoft.

Ivan Tashev: Kinect introduced us to new modalities in human-machine interfaces: voice and gesture. And it was a wildly successful product. Kinect entered the Guinness Book of Records for the fastest-selling electronic device in the history of mankind. Microsoft sold eight million devices in the first three months of the beginning of the production. Since then, most of the technologies in Kinect have been further developed. But even during the first year of Kinect, Microsoft released Kinect for Windows which allowed researchers from all over the globe to do things we even didn’t thought of. This is so-called Kinect Effect. We had more than fifty start-ups building their products using technologies from Microsoft Kinect. Today, most of them are further developed, enhanced, and are part of our products. I’ll give just two examples. The first is HoloLens. The device does not have a mouse or keyboard and the human-machine interface is built on three input modalities: gaze, gesture and voice. In HoloLens, we have a depth camera, quite similar to the one in Kinect, and we do gesture recognition using super-refined and improved algorithms, but they originate from the ones we had in Kinect. The second example is also HoloLens. HoloLens has four microphones, the same number as Kinect, and I would say that the audio enhancement pipeline for getting the voice of the person wearing the device is the granddaughter of the audio pipeline released in Kinect in 2010.

Host: Now let’s talk about one of the coolest projects you are working on. It’s the spatial audio or 3D audio. What’s your team doing to make the 3D audio experience a reality?

Ivan Tashev: In general, spatial audio or 3D audio is a technology that allows us to project audio sources in any desired position to be perceived by the human being wearing headphones. This technology is not something new. Actually, we have instances of it in mid-19th century, when two microphones and two rented telephone lines were used for stereo broadcasting of a theatrical play. Later, in the 20th century, there have been vinyl records marked to be listened with headphones because they were stereo recorded using a dummy head with two microphones in the ears. This technology did not fly because of two major deficiencies. The first is, you move your head left and right and the entire audio scene rotates with you. The second is that your brain may not exactly like the spatial cues coming from the microphones in the ear of the dummy head. And this is where we reach the topic of head-related transfer functions. Literally, if you have a sound source somewhere in the space, the sound from it reaches your left and right ear in a slightly different way. It can be modeled as two filters. And if you filter it through those two filters and play it through headphones, your brain will perceive the sound coming from that direction. If we know those pairs of filters for all directions around you, this is called head-related transfer functions. The problem is that they are highly individual. Head-related transfer functions are formed by the size and the dimensions of the head, the position of the ears, the fine structure of the pinna, the reflections from the shoulders. And we did a lot of research to find the way to quickly generate personalized head-related transfer functions. We put, in our anechoic chamber, more than four hundred subjects. We measured their HRTFs. We did a submillimeter precision scan of their head and torso, and we did measurement of certain anthropometric dimensions of those subjects. Today, we can just measure several dimensions of your head and generate your personalized head-related transfer function. We can do this even from a depth picture. Literally, you can tell how you hear from the way you look. And we polished this technology to extend that in HoloLens, you have your spatial audio personalized without even knowing it. You put the device on and you hear through your own personalized spatial hearing.

Host: How does that do that automatically?

Ivan Tashev: Silently, we measure certain anthropometrics of your head. Our engineering teams, our partners, decided that there should not be anything visible for generation of those personalized spatial hearing.

Host: So, if I put this on, say the HoloLens headset, it’s going to measure me on the fly?

Ivan Tashev: Mmm hmmm.

Host: And then the 3D audio will happen for me. Both of us could have the headset on and hear a noise in one of our ears that supposedly is coming from behind us, but really isn’t. It’s virtual.

Ivan Tashev: That’s absolutely correct. With the two loudspeakers in HoloLens or in your headphones, we can make you perceive the sound coming from above, from below, from behind. And this is actually the main difference between surround sound and 3D audio for headphones. Surround sound has five or seven loudspeakers, but they are all in one plane. So, surround audio world is actually flat. While with this spatial audio engine, we can actually render audio above and below which opens pretty much a new frontier in expressiveness of the audio, what we can do.

Host: Listen, as you talk, I have a vision of a bat in my head sending out signals and getting signals and echolocations and…

Ivan Tashev: We did that.

Host: What?

Ivan Tashev: We did that!

Host: Okay, tell.

Ivan Tashev: So, one of our projects – this is one of those more blue-sky research projects – is exactly about that. What we wanted to explore is using audio as echolocation in the same way the bats see in complete darkness. And we built a spherical loudspeaker array of eight transducers which sent ultrasound pulses towards given direction, and near it, an eight-element microphone array which, through the technology called beam forming, listens towards the same direction. With this, we utilized the energy of the loudspeakers well, and reduced the amount of sounds coming from other directions and this allows us to measure the energy reflected by the object in that direction. When you do the scanning of the space, you can create an image which is exactly the same as created from a depth camera using infrared light but with a fraction of the energy. The ultimate goal, eventually, will be to get the same gesture recognition with one tenth or one hundredth of the power necessary. This is important for all portable battery-operated devices.

Host: Yeah. Speaking of that, accessibility is a huge area of interest for Microsoft right now, especially here in Microsoft Research with the AI for Accessibility initiative. And it’s really revolutionizing access to technology for people with disabilities. Tell us how the research you’re doing is finding its way into the projects and products in the arena of accessibility.

Ivan Tashev: You know, accessibility finds a resonance among Microsoft employees. The first application of our spatial audio technology was actually not HoloLens. It was a project which was a kind of a grass roots project when Microsoft employees worked with a charity organization called Guide Dogs in United Kingdom. And from the name you can basically guess that they train guiding dogs for people with blindness. The idea was to use the spatial audio to help the visually impaired. Multiple teams in Microsoft Research, actually, have been involved to overcome a lot of problems, including my team, and this whole story ended up with releasing a product called Soundscape, which is a phone application which allows people with blindness to navigate easier where the spatial audio acts like a finger-pointer. When the system says, “And on the left is the department store,” actually that voice-prompt came from the direction where the department store is, and this is additional spatial cue which helps the orientation of the visually impaired people. Another interesting project we have been involved, also is a grass roots project. It was driven by a girl which was hearing-impaired. She initiated a project during one of the yearly hackathons. And the project was triggered by the fact that she was told by her neighbor that your CO2 alarm is beeping already a week. You have to replace the battery. So, we created a phone application which was able to recognize numerous sounds like CO2 alarm, fire alarm, door knock, phone ring, baby crying, etcetera, etcetera, and to signal the hearing-impaired person using vibration, or the display. And this is to help to navigate and to live a better life in our environment.

(music plays)

Host: You have an interesting personal story. Tell us a bit about your background. Where did you grow up, what got you interested in the work you are doing and how did you end up at Microsoft Research?

Ivan Tashev: I’m born in a small country in Southeast Europe called Bulgaria. I took my diploma in electronic engineering, and PhD in computer science from the Technical University of Sofia, and immediately after my graduation, started to work as a researcher there. In 1998, I was Assistant Professor in the Department of Electronic Engineering when Microsoft hired me, and I moved to Washington State. Spent to two full shipping cycles in Microsoft engineering teams before, in 2001, to move in Microsoft Research. And what I have learned during those two shipping cycles actually helped me a lot to talk better with the engineers during the technology transfers I have done with Microsoft engineering teams.

Host: Yeah, and there’s quite a bit of tech transfer that’s coming out of your group. What are some examples of the things that have been “blue sky research” at the beginning that are now finding their way into millions of users’ desks and homes?

Ivan Tashev: I have been lucky enough to be a part of very strong research groups and to learn from masters like Anoop Gupta or Rico Malvar. My first project in Microsoft Research was called Distributed Meetings and we used that device to record meetings, to store them and to process them. Later, this device became a roundtable device which is part of many conference rooms worldwide. Then, I decided to generalize the microphone array support I designed for round table device and this became the microphone array support in Windows Vista. Next challenge was to make this speech enhancement pipeline to work even in more harsh conditions like the noisy car. And, I designed the algorithms and transferred them to the first speech-driven entertainment system in a mass-production car. And then, the story continues with Kinect, with HoloLens, many other products, and this is another difference between industrial research and academia. The satisfaction from your work is measurable. You know to how many homes your technology has been released, to how many people you changed the way they live, entertain or work.

Host: As we close, Ivan, perhaps you can give some parting advice to those of our listeners that might be interested in the science of sound, so to speak. What are the exciting challenges out there in audio and acoustics research, and what guidance would you offer would-be researchers in this area?

Ivan Tashev: So, audio processing is a very interesting area of research because it is a mixture between art, craft and science. It is science because we work with mathematical models and we have repetitive results. But it is an art because it’s about human perception. Humans have their own preferences and tastes, and this makes it very difficult to model with mathematical models. And it’s also a craft. There are always some small tricks and secret sauce which are not mathematical models but make the algorithms from one lab work much better than the algorithms from another lab. Into the mixture, we have to add the powerful innovation of machine learning technologies, neural networks and artificial intelligence which allow us to solve problems we thought were unsolvable and to produce algorithms which work much better than the classic ones. So, the advice is, learn signal processing and machine learning. This combination is very powerful!

Host: Ivan Tashev, thank you for joining us today.

Ivan Tashev: Thank you.

To learn more about Dr. Ivan Tashev and how Microsoft Research is working to make sound sound better, visit Microsoft.com/research.

Posted on Leave a comment

Microsoft’s code-mixing project could help computers handle Spanglish

ENMLP

Communication is a large part of who we are as human beings, and today, technology has allowed us to communicate in new ways and to audiences much larger and wider than ever before. That technology has assumed single-language speech, which — quite often — does not reflect the way people naturally speak. India, like many other parts of the world, is multilingual on a societal level with most people speaking two or more languages. I speak Bengali, English, and Hindi, as do a lot of my friends and colleagues. When we talk, we move fluidly between these languages without much thought.

This mixing of words and phrases is referred to as code-mixing or code-switching, and from it, we’ve gained such combinations as Hinglish and Spanglish. More than half of the world’s population speaks two or more languages, so with as many people potentially code-switching, creating technology that can process it is important in not only creating useful translation and speech recognition tools, but also in building engaging user interface. Microsoft is progressing on that front in exciting ways.

In Project Mélange, we at Microsoft Research India have been building technologies for processing code-mixed speech and text. Through large-scale computational studies, we are also exploring some fascinating linguistic and behavioral questions around code-mixing, such as why and when people code-mix, that are helping us build technology people can relate to. At the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), my colleagues and I have the opportunity to share some of our recent research with our paper “Word Embeddings for Code-Mixed Language Processing.

A data shortage in code-mixed language

Word embeddings — multidimensional vector representation where words similar in meaning or used in similar context are closer to each other — are learnt using deep learning from large language corpora and are valuable in solving a variety of natural language processing tasks using neural techniques. For processing code-mixed language — say, Hinglish — one would ideally need an embedding of words from both Hindi and English in the same space. There are standard methods for obtaining multilingual word embeddings; however, these techniques typically try to map translation equivalents from the two languages (e.g., school and vidyalay) close to each other. This helps in cross-lingual transfer of models. For instance, a sentiment analysis system trained for English can be appropriately transferred to work for Hindi using multilingual embeddings. But it’s not ideal for code-mixed language processing. While school and vidyalay are translation equivalents, in Hinglish, school is far more commonly used than vidyalay; also, these words are used in slightly different contexts. Further, there are grammatical constraints on code-mixing that disallow certain types of direct word substitutions, most notably for verbs in Hinglish. For processing code-mixed language, the word embeddings should ideally be learnt from a corpus of code-mixed text.

It is difficult to estimate the amount of code-mixing that happens in the world. One good proxy is the code-mixing patterns on social media. Approximately 3.5 percent of the tweets on Twitter are code-mixed. The above pie charts show the distribution of monolingual and code-mixed, or code-switched (cs), tweets in seven major European languages: Dutch (nl), English (en), French (fr), German (de), Portuguese (pt), Spanish (es), and Turkish (tr).

The chart above shows the distributions of monolingual and code-mixed tweets for 12 major cities in Europe and the Americas that were found to have very large or very small fractions of code-mixed tweets, represented in the larger pies by the missing white wedge. The smaller pies show the top two code-mixed language pairs, the size being proportionate to their usage. The Microsoft Research India team found that code-mixing is more prevalent in cities where English is not the major language used to tweet.

Even though code-mixing is extremely common in multilingual societies, it happens in casual speech and rarely in text, so we’re limited in the amount of text data available for code-mixed language. What little we do have is from informal speech conversations, such as interactions on social media, where people write almost exactly how they speak. To address this challenge, we developed a technique to generate natural-looking code-mixed data from monolingual text data. Our method is based on a linguistic model known as the equivalence constraint theory of code-mixing, which imposes several syntactic constraints on code-mixing. In building the Spanglish corpus, for example, we used Bing Microsoft Translator to first translate an English sentence into Spanish. Then we aligned the words, identifying which English word corresponded to the Spanish word, and in a process called parsing identified in the sentences the phrases and how they’re related. Then using the equivalence constraint theory, we systematically generated all possible valid Spanglish versions of the input English sentence. A small number of the generated sentences were randomly sampled based on certain criteria that indicated how close they were to natural Spanglish data, and these sentences comprise our artificial Spanglish corpus. Since there is no dearth of monolingual English and Spanish sentences, using this fully automated technique, we can generate as large a Spanglish corpus as we want.

Solving NLP tasks with an artificially generated corpus

Through experiments on parts-of-speech tagging and sentiment classification, we showed that word embeddings learnt from the artificially generated Spanglish corpus were more effective in solving these NLP tasks for code-mixed language than the standard cross-lingual embedding techniques.

The linguistic theory–based generation of code-mixed text has applications beyond word embeddings. For instance, in one of our previous studies published earlier this year, we showed that this technique helps us in learning better language models that can help us build better speech recognition systems for code-mixed speech. We are exploring its application in machine translation to improve the accuracy of mixed-language requests. And imagine a multilingual chatbot that can code-mix depending on who you are, the context of the conversation, and what topic is being discussed, and switch in a natural and appropriate way. That would be true engagement.

Posted on Leave a comment

Data Science Summer School students take a fresh look at the world’s largest rapid transit system

DS3 2018 at Microsoft Research New York City

DS3 2018 at Microsoft Research New York City

This month marked the 5th anniversary of the Microsoft Research Data Science Summer School (DS3). DS3 is an intensive, eight-week hands-on introduction to data science for college students in the New York City area committed to increasing diversity in computer science. The program is taught by leading scientists at Microsoft Research and is held at the Microsoft Research New York City lab.

Each year the program receives upwards of 200 applications, out of which only eight students, demonstrating academic excellence and a passion for using technology to help society, are selected to participate. These students complete four weeks of intensive course work and spend the remaining four weeks of their summer working on an original research problem. Graduates of the program have gone on to a number of exciting careers, ranging from data scientist positions at companies like Microsoft, Bloomberg, and American Express to PhD programs at universities such as Cornell and NYU.

Past projects have looked at how students progress through the New York City public school system, investigated demographic disparities in the city’s policing activities, and formulated improvements for the city’s taxi fleet and bike sharing service.

This year’s students used their newly acquired data science skills to examine another way of getting around New York City—the city’s subway system—and presented some impressive findings at the DS3 banquet to an overflowing room of select members of New York City’s tech community. They examined rider wait times and trip times, compared the subway to above ground travel, and investigated how changes to the system affect rider options.

Below is a summary of their presentation, which you can watch in full. The project is also available on GitHub.

[embedded content]

Akbar Mirza, a senior from City College, opened the talk by discussing the history of NYC’s subway system, which is the largest rapid transit system in the world, serving approximately 5.5 million riders each day. He highlighted the growing concern that the system has become unreliable due to aging equipment, some of which dates back to the early 20th century. And while current system-wide metrics provide some insight into the state of the subway system, they fail to capture how riders actually experience the subway.

Akbhar Mirza

Akbar Mirza

This motivated the students to investigate the subway system using the data behind the system’s new countdown clocks that record train locations. Specifically, they used a dataset collected and cleaned by local data scientist Todd Schneider that contained the approximate location of every train in the system for every minute of each day from January through May of 2018.

Next, Brian Hernandez, a senior from Hunter College, walked the audience through how this data could be used to understanding how long riders spend waiting for trains. He used these calculations to compare his commuting options on the F and 7 trains, showing that while the typical wait time is the same on both lines, the F train has much higher variability than the 7 train, making the 7 the preferred option.

Brian Hernandez

Brian Hernandez

Amanda Rodriguez, a senior at Lehman College, continued the presentation with a more granular look at subway wait times throughout the city. She presented a comprehensive wait time model that considers station- and line-specific factors as well as day of week, time of day, and weather effects. Her analysis revealed interesting patterns in wait time variability throughout the city and showed that heavy rain can result in as much as a 25% increase in typical wait times at certain locations.

Amanda Rodriguez

Amanda Rodriguez

Taxi Baerde, a senior from Adelphi University, introduced the next topic—constructing a formal representation of the subway network as a graph that could be used for finding shortest paths between any two stops and computing trip times. Taxi discussed how it’s surprisingly difficult to settle on such a representation because the network itself is so dynamic, with changing schedules, partial routes, and skipped stops. He also presented a method, called k-shortest paths, for identifying different possible itineraries between a pair of stations (for instance, taking the local versus express, or transferring between multiple likes).

Taxi Baerde

Taxi Baerde

Next, Phoebe Nguyen, a junior at Baruch College, showed how Taxi’s cleaned subway graph could be used to compare different commuting options between a pair of stations in a two- step process—first, finding a set of candidate paths between the stations; and second, reconstructing how long it actually took trains to make these trips. She used this method to compare different options for various trips, showing once again that variability is often the key for deciding between two different options.

Phoebe Nguyen

Phoebe Nguyen

Peter Farquharson, a junior from Lehman College, extended Phoebe’s results to answer a question on many busy New Yorkers’ minds: when is the subway a better option than a car? He demonstrated how open data from the city’s Taxi and Limosine Commission could be used to estimate how long past car trips between two subway stations would have taken, and compared this with corresponding subway trips. His results highlighted that, once variability is factored in, the subway can be an attractive alternative to driving when trying to get to midtown Manhattan during rush hour or traveling to JFK airport.

Peter Farquharson

Peter Farquharson

Ayliana Teitelbaum, a sophomore from Yeshiva University, looked at trip times from a different angle to tackle a question that New Yorkers face in choosing where to live—how long should you expect your commute to take coming from different parts of the city? She extended Phoebe’s results by showing historical trip times between each of the nearly 500 stations in the system to a fixed workplace destination, and presented the results as a heatmap. By comparing typical and worst case commute times for each station, she showed that accounting for variability can increase commute times in the outer boroughs by up to 50%.

Ayliana Teitelbaum

Ayliana Teitelbaum

Sasha Paulovich, a senior at Fordham University, presented the final set of results, considering how changes to the subway system affect riders and how subway experiences differ across demographic groups. She presented a heatmap similar to Ayliana’s that showed how we can expect commute times to change after the L train shuts down in January 2019, and an analogous map that projected commute times to LaGuardia airport if the proposed AirTrain extension to Willets Point is built. Finally, she discussed station options and commute times for riders requiring accessible stations and showed a correlation between median household income and commute times.

Sasha Paulovich

Sasha Paulovich

The team and their Microsoft Research mentors closed out the evening by fielding a host of questions from the audience, where the students discussed all of the additional topics they thought about tackling and the various extensions and future work to be done.

The team’s work has been accepted at the 2018 MIT Conference on Digital Experimentation (CODE) taking place in Cambridge, Massachusetts on October 26.

Posted on Leave a comment

How Microsoft got into edge computing and real-time video analytics

I vividly remember October 29, 2008. I had invited colleagues from academia and industry to Building 99, home of Microsoft Research, for a daylong meeting to discuss the future of mobile and cloud computing. My friends flew to Redmond, Washington, from different parts of the world, and together in one of the conference rooms, we brainstormed ideas, using the whiteboard to design new cloud architectures, write down problems, and explore challenges. Eventually, we came up with a new computing paradigm that is now popularly known as edge computing. We called our edge nodes cloudlets.

Fast-forward 10 years, and we find ourselves in a world where edge computing is a major technology trend that is being embraced by cloud providers and most major telecommunications companies. Looking back, I am proud that we got many things right. For example, we were spot-on with the fundamentals. We devised an architecture that reduces latency to a compute infrastructure, decreases the need for large amounts of expensive network bandwidth to the cloud, and enables mission-critical operations to continue even when the network to the cloud is down. All this was right on the mark.

Joining me at that meeting were Ramón Cáceres (AT&T Labs), Nigel Davies (Lancaster University, U.K.), Mahadev Satyanarayanan (Carnegie Mellon University), and Roy Want (Intel Research). The five of us had been working in mobile computing, so naturally, we focused on devices such as smartphones, augmented reality/virtual reality headsets, and wearable computers. We did not discuss sensor networking or cyber-physical systems, which have recently emerged as the Internet of Things (IoT).

The case for edge computing

I had the opportunity to make the case for edge computing to the senior leadership team of Microsoft — including our CEO at the time, Steve Ballmer — twice. The first time was in December 2010. At the end of the presentation, Steve asked me which current application I would move to edge computing.

I had been thinking about future applications such as AR/VR and hadn’t deeply thought about existing applications, so I awkwardly answered, “Speaker and command recognition.” An executive vice president whose team was working on this challenge was in attendance, and he disagreed. Although I had built and demonstrated a small prototype of such a system (think Skype Translator) at the 2009 Microsoft Research Faculty Summit, I hadn’t thought about how we would instantiate such an application at scale. Needless to say, my answer could have been better.

My research team and I continued working on edge computing, and in January 2014, I presented to the senior leadership team again. This time, I told them about micro datacenters, a small set of servers placed on premises to do what the cloud did; essentially, today’s equivalent of Microsoft Azure Stack. I demonstrated several scenarios in which the virtues of micro datacenters were irrefutable: real-time vision analytics with associated action, energy saving in mobile devices, and single-shooter interactive cloud gaming. This time, it worked. In a booming voice, Steve — who was still our CEO — said, “Let’s do this.”

The green light was followed by a series of meetings with Microsoft distinguished engineers and technical fellows to discuss the rollout of edge computing, and through these meetings, it became increasingly clear that one question remained unanswered: What compelling real-world applications could not thrive without edge computing? Remember, Microsoft was rapidly building mega-datacenters around the world, on a path to 30-millisecond latency for most people on the planet with wired networking, and IoT had not yet emerged as a top-level scenario. So, which high-demand applications could edge computing take to the next level that cloud computing couldn’t?

The need for a killer app

We had to come up with a killer app. Around the same time as these meetings, I took a sabbatical, with stops in London and Paris. While there, I noticed the proliferation of cameras on city streets. Instinctively, I knew that people were not looking at every livestream from these cameras; there were simply too many. According to some reports, there were tens of millions of cameras in major cities. So how were they being used? I imagined every time there was an incident, authorities would have to go to the stored video stream to find the recording that had captured the event and then analyze it. Instead, why not have computers analyze these streams in real-time and generate a workflow whenever an anomaly was detected? Computers are good at such things.

For this to work, we would need cloud-like compute resources, and they would have to be close to the cameras because the system would have to analyze large quantities of data quickly. Furthermore, the cost of streaming every video stream to the cloud could be prohibitive, plus add to it the expense of renting GPUs in the cloud to process each of these streams. This was the perfect scenario — the killer app for edge computing — and it would solve a compelling real-world, large-scale problem.

In the years that followed, we worked diligently on edge-based real-time video analytics, publishing several papers in top conferences. We even deployed a system in Bellevue, Washington, for traffic analysis, accident prevention, and congestion control as part of the city’s Vision Zero program. This brings me to our paper being presented at the third Association for Computing Machinery/IEEE Symposium on Edge Computing (SEC) October 25–27 in Bellevue. The work represents another step in our journey to nail the live video analytics challenge using edge computing.

Best tradeoff between multiple resources and accuracy

In our paper “VideoEdge: Processing Camera Streams using Hierarchical Clusters,” we describe how a query made to our system is automatically partitioned so some portions of it run on edge computing clusters (think micro datacenter) and some in the cloud. In deciding what to execute where, we recognize and plan for multiple different queries that may be issued to our system concurrently. As they execute on the same infrastructure, we try not to repeat any processing. The objective is to run the maximum number of queries on the available compute resources while guaranteeing expected accuracy. This is a challenging task because we have to consider both the network and compute demands, the constraints in the hierarchical cluster, and the various tunable parameters. This creates an exponentially large search space for plans, placements, and merging.

In VideoEdge, we identify the best tradeoff between multiple resources and accuracy, thus narrowing the search space by identifying a small band of promising configurations. We also balance the resource benefits and accuracy penalty of merging queries. The results are good. We are able to improve accuracy by as much as 25 times compared to state-of-the-art techniques such as fair allocation. VideoEdge builds on a substantial body of research results we have generated since early 2014 on real-time video analytics.

IoT embraces edge computing

A few years after we began researching video analytics, IoT emerged, as thought leaders in different industries such as manufacturing, health care, automobile, and retail started focusing on using information technology to increase efficiencies in their systems. They understood automation combined with artificial intelligence, made possible with IoT, could lower operating costs and increase productivity. The key ingredient was sensing, processing, and actuation in real time.

For this to work, the time between sensing and processing and between processing and actuation had to be negligible. While processing could be done in the cloud, the latency to it was relatively high, the network to it was expensive, and IoT systems had to survive disconnections from it. Enter edge computing — it was the perfect solution for such scenarios. Recognizing this, Microsoft has committed more resources to the combined technology, announcing in April a sizable investment in IoT and edge computing.

While we began 10 years ago, I believe the most interesting portion of our journey is just starting. Simply search for the term “edge computing,” and you will see how much has been written about this topic both in industry and academia. And SEC 2018, for which I have the honor of serving as program co-chair, is further proof of the excitement surrounding this emerging computing paradigm. The papers feature many different topics, ranging from data security and integrity to machine learning at the edge, specialized hardware for edge computing, 5G edge, programming models, and deployment on drones, automobiles, the retail space, and factory floors. As we continue to build new products and learn, we uncover new challenges that engineers and researchers love to solve, and as our platform matures, we will see the creation of a new generation of applications.

In my experience, I have found it takes on average seven years for a new technology to go from research lab to real world. In 2013, I made a prediction that edge computing will be everywhere by 2020. I continue to believe this is going to happen. My colleagues and I believe that together we are entering the best part of this journey.

In a keynote address at the 2013 IEEE International Conference on Cloud Networking (IEEE CloudNet), Victor Bahl presented the above slide and predicted edge computing will be everywhere by 2020, a statement he stands by today.

Posted on Leave a comment

Designing the future with the help of the past with Bill Buxton

Bill Buxton

Principal Researcher Bill Buxton

Episode 46, October 17, 2018

The ancient Chinese philosopher Confucius famously exhorted his pupils to study the past if they would divine the future. In 2018, we get the same advice from a decidedly more modern, but equally philosophical Bill Buxton, Principal Researcher in the HCI group at Microsoft Research. In addition to his pioneering work in computer science and design, Bill Buxton has spent the past several decades amassing a collection of more than a thousand artifacts that chronicle the history of human computer interaction for the very purpose of informing the future of human computer interaction.

Today, in a wide-ranging interview, Bill Buxton explains why Marcel Proust and TS Eliot can be instructive for computer scientists, why the long nose of innovation is essential to success in technology design, why problem-setting is more important than problem-solving, and why we must remember, as we design our technologies, that every technological decision we make is an ethical decision as well.

Related:


Transcript

Bill Buxton: If you are going to come and make an argument that something is going to have huge impact in the next five years, if you haven’t got fifteen years of history of that idea and can trace its evolution and history and so on, then you are probably wrong or you haven’t done your homework or you might get your head cut off when you come to this presentation unprepared. Even if you are right, and you don’t have that fifteen years, then that’s gambling, that’s not investment, that’s not research. You are just lucky. Design is a repeatable profession.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: The ancient Chinese philosopher Confucius famously exhorted his pupils to study the past if they would divine the future. In 2018, we get the same advice from a decidedly more modern, but equally philosophical Bill Buxton, Principal Researcher in the HCI group at Microsoft Research. In addition to his pioneering work in computer science and design, Bill Buxton has spent the past several decades amassing a collection of more than a thousand artifacts that chronicle the history of human computer interaction for the very purpose of informing the future of human computer interaction.

Today, in a wide-ranging interview, Bill Buxton explains why Marcel Proust and TS Eliot can be instructive for computer scientists, why the long nose of innovation is essential to success in technology design, why problem-setting is more important than problem-solving, and why we must remember, as we design our technologies, that every technological decision we make is an ethical decision as well. That and much more on this episode of the Microsoft Research Podcast.

Host: Bill Buxton, welcome to the podcast.

Bill Buxton: Glad to be here.

Host: So, I’d like to start by asking my guests what gets you up in the morning, but you’ve already answered that in print, and I quote, “What gets me up in the morning is to realize what I dream about.” So, now you have to tell us what you dream about.

Bill Buxton: It depends which morning it is, I think. I think there’s an embarrassment of riches of things to want to do, and I think that that’s one of the best things because you’re never at a loss to be motivated. But then the other problem is, you have to make choices as to which one you pursue. You can do anything and everything in your life, you just can’t do them all at once. You always want to be falling in love with something that just captured your imagination, but in so doing, you have to retire a previous passion or at least move it to the background because you can’t go full-throttle into more than one or two things. One description of what I do for a living is Experience Design. And I’m prone to say Jimmy Hendrix had the greatest wisdom of this, and that’s the most profound question, “Are you experienced?” And if you don’t have a breadth, as well as depth, of experience to draw on, how can you be good at Experience Design? Because it’s building up this repertoire and curating this repertoire of experiences in your life across the board that is the treasure trove that you can mine in whatever you’re trying to do.

Host: Your bio says you are a relentless advocate for innovation, design and the appropriate consideration of human values, capacity and culture in the conception, implementation and use of new products and technologies.” Which is a mouthful. But let’s unpack that a little bit. I’m really intrigued by your statement of the “appropriate consideration.” Tell us what you mean by that in the context of designing new technologies and products.

Bill Buxton: Well, one of my heroes is a historian of technology named Melvin Kranzberg, and he has some laws. But his first law is, “Technology is not good, it’s not bad, but nor is it neutral.” It will be some combination of the two. As soon as you say words like good and bad, that implies you have a moral compass. And the real question is, is that when you are making technological decisions and launching technologies into society, you are, in fact, making an ethical choice, whether you know it or not. And so, maybe you’ll do a better job of it and weight more heavily on the positive if you actually know what that moral compass is and that you are, in fact, making an ethical decision. I’m not trying to put too heavy a weight on this in that you are playing God, but you are in fact having impact. But you are also human, so how can you just do the best? You will get some stuff wrong. So, take responsibility to clean up the mess without throwing the baby out with the bath water. And so, it basically says that “appropriateness” is appropriate to the moral order of place or where it’s going to be placed. That’s the closest way I can put it.

Host: Let’s talk about your job description at Microsoft Research. When you started at MSR, Rick Rashid hired you to, as you say, “Make design a core part of the Microsoft culture.” So, how did you go about doing what he said? That’s about the vaguest job description I can think of, and yet it… it’s perfect.

Bill Buxton: Well, actually what he really said was, “Make a difference and if you are not the best person to figure out how you should do that, you are probably shouldn’t have the job.” Then my response was okay, I’m going to try to help contribute to bringing a greater awareness of design to the company and that meant, actually, not trying to design products, but trying to design a culture and change the attitudes and not elevate design to the point where everything is design-led, but where it’s an equal partner at the table. In the early days, when I would speak to different teams in the company, in large or small groups, it would be kind of like, don’t expect this to come from above, or from management or anything like that, because we are our own culture. We make it. And it’s every individual. And if you can actually start to just feel empowered to, within your own, even if it’s one other person, you can start to make adjustments along the way you want that can go viral because if we’re shifting in a good direction, it will be noticed, and then people will say, well, what’s the secret sauce that you’re using? And nobody can own this. It can’t be about any individual. It’s got to be about empowering individuals to form groups and clusters because that’s what culture is. It’s a mutually agreed upon set of values.

Host: So today, like you, I’m going to use some literary quotes to prompt our discussion on technology research. So, let’s start with Marcel Proust. He once said, “The real voyage of discovery is not in seeking new landscapes, but in having new eyes.” One of the major themes in research is looking for the next big discovery, right?

Bill Buxton: Yes.

Host: How does having new eyes, or different optics, as you’ve said it, inform the quest for innovation or how should it inform the quest for innovation?

Bill Buxton: So, the net result is that, in some sense, I would describe my job description as being an optician and to find the right lenses. I’ll give an example. As you say, the industry is heavily driven by people trying to find the next big thing, whether it’s a new gadget or a new application, killer app or a new service. And if you’re just graduating from university or design school or whatever, that of course, you want to become a millionaire by the time you’re 24 or you’re a failure. And so, there’s all these pressures. And so, my automatic reaction, I just wrote a two-pager that said the next big thing isn’t a thing. And, I said, it’s actually a change in relationship amongst the things that are already there and the things that are going to emerge. And when I say relationship amongst those things, it’s about the social relationships, things like kinship, introduction, negotiation, approach, departure, all of these things, the moral order. These are all terms that we know about the society of people. But, we aren’t used to speaking about in terms of the society of technology. What could you do that would have more impact than if things just worked? If things just worked together seamlessly? And if, in working together, every new thing I added came at a great value in and of itself, but it also added value to everything else I already had, and they to it, and furthermore every new thing I added reduced the complexity, not only of that new thing, but reduced the complexity of everything else in the ecosystem and they it. We realize that hardly anything works well together much less seamlessly. And what we’ve forgotten, when we come back to the human side, is that the better we get at really making desirable, beautiful, affordable, useful, valuable devices, the worse we’re making things. The cumulative complexity of a bunch of desirable, simple, affordable, valuable things is way above the human’s capacity to deal with. And that’s why you must reduce complexity with everything you add. And that takes a very different approach because it forces you into thinking about an ecosystem. Albert Shum who is part of the “Canadian Mafia” trying to change design at Microsoft here is a good friend and a fellow cyclist. And he has a nice way of saying it, that in the industry, we spend a whole bunch of time learning how to design houses. The real challenge is building the communities and the city planning and the urban planning and the flow of things. And I think even the changes we’ve been making over the last year or two have been significant steps on this path. But the challenge in innovation is, how do you go beyond that and say what are the right metrics for our aspirations and where we can be and how soon we should get there? Because only when you find that, can you set appropriate goals that most meet your objectives.

(music plays)

Host: You have become the collector and curator of more than a thousand computer hardware artifacts that chronicle the history of various aspects of human computer interaction. So, tell us about your collection, or collections. How did you get started doing this, what kinds of things have you collected and how hard was it for you to get your hands on some of these things? I’ve seen the collection. It’s crazy!

Bill Buxton: Well, first of all, my name is Bill and I’m not a hoarder. I’m a collector. The one-word answer, it was an accident. Maybe a more informative answer is to say it is a reflection of my process of what I do for a living. I’m always looking for reference material, always scanning, collecting things around, surrounding yourself with them for inspiration for ideas, and to trigger thoughts, and having them sitting there around you, and all of a sudden, some new relationship pops out. When I’m at a loss for a solution to a problem, I go and surround myself with these objects. But over about forty, forty-five years, I’ve never thrown any of them out. I’ve kept them all. And so, when anything came out, whether it was a brochure or an article in a magazine, or something like that, I kept it and documented it for future reference, for teaching, for teaching myself and to go back to say, hey, I think I’ve seen this before. And you can think of them all as prototypes, and really expensive-to-make prototypes, which I could get for practically nothing, sometimes like on eBay, where it’s like a really expensive education where somebody else paid the tuition. And they’re sitting there, if you want get the benefit of that education, you can. And therefore, when I do start to make something or when anybody in the company does, they can start at a much higher level because they’ve got these reference objects.

Host: Interesting.

Bill Buxton: And so, the base point-of-departure for any problem I’m looking at is, somebody has already solved this problem and there’s something out there that’s already done this. So, I’m going prospecting before I go building.

Host: Tell us about the collection. What’s in it?

Bill Buxton: Well, the collection is sort of a cross-section of all of the input devices through which people have interacted with digital technology pretty much from the beginning. And so that would include mice and joy sticks and trackballs and trackpads. It is PDAs. It’s game controllers, it’s foot pedals, head displays. It’s uh, smart watches going back to 1978. It’s the world’s first smart phone. It’s the history of portable music players. It’s the history of AR and VR technologies going back to a reproduction I made of the very first Stereo Viewer from 1838. And it’s also examples to use to serve as the basis for story-telling that illustrate some of the things that are really important about design. I don’t think many people in VR know that it is due to virtual reality, in an early form, that led to Yellowstone being made the first national park in the world, not just the United States. Or that the very first stereoscope from 1838 was already looking into a virtual space because photography wasn’t invented till the following year. There were no photographs to make stereo images from and they had to be hand-drawn and so when you looked into Wheatstone’s original reflective stereo craft, you’re looking into hand-drawn lines into a world that never existed.

Host: Wow.

Bill Buxton: I think those things are really interesting because you start to see patterns, if you go through it. But from those patterns, you say, okay, they probably haven’t stopped, and so you can extrapolate. So, it’s really hard to extrapolate from a point. If I have a line, it’s much easier. And so, I have this game, I’ll do it with adults as well as children. I’ll draw all these different lines and say, “Continue these lines.” And then I’ll put a point. And they have no idea what to do with the point, but all those other things, they can continue because they can see the pattern as things were going. And it doesn’t mean the extrapolation is correct, but it gives you your initial bearing for your exploration and usually because there’s other things involved, there’s probably a couple of lines that come and you’ll start, maybe you’ll see there’s intersections from extrapolations. And you have these ways to visualize. And this gives you a different way to think, accompanied by concrete examples that you can experience to get to the assets at the finest granularity.

Host: So, you referred to something you called the long nose of innovation. I think researchers are familiar with the phrase the long tail. But the long nose is an interesting one. And it’s in context of new technologies and how long it takes them to catch on. And you also had said at some point, that our slow rate of progress is maybe not necessarily due to a lack of technology but lack of imagination. How and why do we lack imagination and what can we do? What can researchers do about that?

Bill Buxton: The long nose basically comes back as sort of saying if we look historically at the evolution of technologies, it takes at least twenty years from the first clear articulation of something to the point that it’s mature, where let’s measure maturity as it’s a billion-dollar industry. If you are going to come and make an argument that something is going to have huge impact in the next five years, if you haven’t got fifteen years of history of that idea and can trace its evolution and history and so on, then you are probably wrong or you haven’t done your homework or you might get your head cut off when you come to this presentation unprepared. Even if you are right, and you don’t have that fifteen years, then that’s gambling, that’s not investment, that’s not research. You’re just lucky. Design is a repeatable profession. It’s not, I get lucky once in a while. And so, if you want to study design and innovation, study the repeat offenders, the ones that can do it over and over. You don’t have to wait for the muse to come and drive you. And that’s what you learn. And you can only do that if you have process. And the long nose is a key part of that process. Now, for those who doubt, the mouse, which everybody who saw one in 1968, knew it was the right thing. But it wasn’t until Windows 95 before everybody had a mouse at their desk. Now, why did it take so long? I first used a mouse in 1971. Now the thing is, you need a perfect wave of things. You had to perfect Windows icons. You had to train the developers how to write this type of graphic user interface. That was a whole new thing from DOS or UNIX. And you needed the processors. You needed graphics processors. You needed the displays to switch to bitmap displays rather than calligraphic displays which dominated back in the time, basically glorified oscilloscopes. Every technology goes the same route. And so, the long nose is basically this reminder of how long it takes. So, it also says the following things and reinforces what I was saying about the combinations about innovation being the aggregation of existing ideas: that everybody thinks that things are moving really quickly and that is not true. We mistake a whole bunch of things moving really slowly, with things moving quickly. It’s the difference between amperage and voltage. Any single technology is evolving, statistically speaking, really slowly. But, when you have a number of different things moving slowly, at slightly different paces, but simultaneously and at different stages on the nose, if you start to realize that’s what’s going on in the overall technological ecosystem, you can see those patterns and then project forward because you can extrapolate from history, and say, here’s where you hit the inflection point and that’s when things are going to happen. Everything has a perfect storm, and there’s methods by using this technique to actually predict when that perfect storm is going to happen. I’ll give you a really quick example. I spent my early career, after I switched being a musician, to building digital music synthesizers for live performance. So, I saw the evolution of how digital audio emerged. I went to Silicon Graphics and became Chief Scientist there doing animation systems. But the only act of genius I had, because I wasn’t in computer graphics, I was literate, but I wasn’t, you know, a specialist in computer graphics. But, I knew that computer graphics was going to follow exactly the same pattern as computer music, but it was multiple orders of magnitude more complex, so it was just shifted further along the timeline. And so, all the planning over the eight and a half years I was there, we kept hitting that right. And the reason we could know exactly what to do and when was because I just was repeating what I had already done in music. And so, all I needed to do was to see that relationship. And I think overall, that type of pattern happens throughout, but you have to know those other areas where you go prospecting. So, the long nose, the notion of history, collecting, sampling and not just going immediately to building. We spend far too much and go far too quickly into problem-solving and don’t spend enough time problem-setting. And that’s the ultimate skill.

Host: Can you define problem-setting a little more clearly?

Bill Buxton: Problem-setting is basically, it’s not enough to get the design right, you’ve got to design the right thing. And so, if you just leap in and start building something where you’ve got a solution, you have no idea if that’s the best option. There might have been a better way and you didn’t take time because you are already behind schedule. But here’s the crazy thing. At the beginning of the product cycle, you have a small team just getting going. Your burn rate, in terms of what it’s costing you per week in terms of the project and that, is very, very low. So, what you then should be doing is thoroughly exploring a range of different alternatives. Problem-setting, part of that process is this notion of, you cannot give me one idea. You have to learn how to work quickly and give me multiples. That’s a technique for this whole issue of, how do you deal with the problem-setting? And by exploring the space first… oh, that’s the real problem… Put it this way. You have a bunch of people that talk about user-centered design. And they’ll say, you know, go talk to your users and they will tell you what to do. Okay. Would you go to a doctor where you walked in, and the doctor said, okay what’s wrong with you, what operation do you need and what drugs should I give you under what dose, right? And that’s how some people naively interpret user-centered design, is “listen to users.” And, no. I’m going to ask you all kinds of questions. But I’m going to take all of those as part of the information that helps me make a diagnosis. And so, where do we collect the symptoms to find out where the real problems are? You’re telling me this. I understand the situation. Now, I have to know enough about your industry to ask pertinent questions. And for me, that’s what the problem-setting is. The designer, the main equipment is to have that meta-knowledge. And that’s where the diverse interests come in, so how do you get that knowledge? But if you don’t even know that’s the kind of knowledge you need to get, you’re not even going to go looking for it.

Host: So, you look at the product development cycles and, even in research, what you’re talking about is something that people would have to say, “Okay, we need to rethink how we work and what we make time for.”

Bill Buxton: So, I’d throw the argument the other way: you can’t afford not to do it. So, your cost-per-month on a project, if you put an extra month up front, it costs you almost nothing. And if it comes up with a much better solution that’s a fraction of the price and can get it done more quickly and have a much better margin, first of all, you’ve made up for the lost time by having spent that up front. But let’s pretend it still takes the same amount of time. We never have time to do problem-setting and so on sufficiently. We’re getting better at it. But we seem to be able to have time to be three months late where we are fully-loaded with the highest burn rate possible, right? I mean, if you’re going to take an extra month, do you want to play it where it costs you the most or do you want to do it up front and you get a better product? The other part is, it’s not all in one. You don’t make all your decisions up front and then go build. The decisions that you make the earliest are the ones that are hardest to change later. So, that’s your basic architecture. In the software industry, we don’t have architects. What we call an architect, in architectural terms, is actually a structural engineer. And we have no architect that has design perspective at the very beginning. But also, there’s this notion that once you’ve got a representation, like a rendering of what the screens are or some of these other things, that that’s the end of the design. There’s only two places where there’s room for creativity in design. So, the first place for creativity is the heuristics process whereby you innumerate the repertoire of things from which you are going to choose, and then the second is the heuristic you use to eliminate all but one. And it’s that inhale/exhale. You start with nothing, you end with one. But you have to go through that whole thing. You would love, afterwards, you know I say, I could have got here right from the beginning. And you could, but you never would have. And that’s the biggest mistake. The fastest way to a mediocre product is to make a plan and stick to it.

(music plays)

Host: Let’s talk about AI for a minute. Because tech companies are putting a premium on AI talent, uh…

Bill Buxton: Oh, is it important now?

Host: Apparently, people are using the terms gold rush, talent war…

Bill Buxton: Feeding frenzy…

Host: …feeding frenzy. And you’ve suggested that there’s a risk that anyone who’s not doing AI, might be marginalized.

Bill Buxton: So, I have to preface that by saying, I think what we can do today in AI is absolutely unbelievable. It’s beyond my wildest expectations in what we’d be able to do at this point. It’s unbelievably valuable, but it’s nevertheless essential but not sufficient. And as I said, you need a perfect storm of a whole bunch of things to get a sustainable system, or an ecosystem in place. And my fear is that if you focus too much on the AI component, that you distort the other requisite skillsets and disciplines that are needed to ensure that AI is successful. Every discipline represented in our company is essential to our success but not sufficient. And the trick is to find the balance. And one of the important elements here is to make a distinction between literacy and expertise. It is essential that everybody in the company has a level of literacy about AI. But it’s equally important to have literacy about every one of those disciplines. And that means that AI should be working as hard to gain literacy in the disciplines that are core to its success, as those disciplines are to AI. What happens, if we push so hard on the AI front and we don’t make that clear distinction between literacy and expertise, that developers and designers are so focused on AI, that they feel that if they’re not going that direction and chasing that really, really hard, that it’s a career-limiting move. I think that what is clear is that you may end up with the best AI in the world and still be beaten by somebody who’s got only 20% of the AI competence, but they’ve got way better integration of the AI into their larger ecosystem. Because, like any other technology, it’s not good, it’s not bad, but nor is it neutral, and it will be a positive and a negative consequence of that technological change.

Host: Another premise in AI research has its underpinnings in what we’ve referred to as DIKW pyramid where you start with data which supposedly gets refined into information and then to knowledge and culminates in wisdom, which is the ability to make good decisions based on the data you have. And this, of course, has literary roots in T.S. Eliot’s, The Rock: “Where is the life we’ve lost in living? Where is the wisdom we’ve lost in knowledge? Where is the knowledge we’ve lost in information?” Talk about this in the context of this idea that if we have enough data, with machine learning, computational power and sophisticated algorithms, we’ll end up with wisdom.

Bill Buxton: Well, first of all, Eliot left off two levels there. So, where’s the wisdom we’ve lost in knowledge, the knowledge we’ve lost in information, information we’ve lost in data and the data we’ve lost in noise. You have to remember noise cancellation. And people talked about a data revolution and so on… No, it’s a data explosion. And information technologies? No, it’s not. It’s only information if it can serve as the basis for informed decision-making. I think it’s very, very healthy to have that hierarchy. I think it’s extremely valuable to be able to fit things into moving up that food chain. But I think that the role that intelligence plays there, and where intelligence lies, is a sticky thing. And we have to base our expectations of the technology, and therefore have our engineering guided, by a sense of what’s possible at any point in time along that path. Now, I know that we were talking in AI about, you know, sensing an ecosystem environment and all this sort of stuff. Well, we have to be realistic about how much of that we can sense at what point in time, and then understand what elements are being neglected and are not simply feasible at this point to deal with and therefore our notion of intelligence is limited. And how do we, at any point in time, make sure we’re back-filling those gaps until it can be proven that we’ve got those other parts reliably taken care of. And again, by looking at the disciplines, doing the analysis, we can look at the timeline and take appropriate action for each thing to make sure that we’ve got the bases covered with the appropriate technologies for that moment in history and not make colossal mistakes and confuse the target with where we are right now. It comes right back to what I said earlier: it’s not just being able to get the vision, it’s how do I get there from here?

Host: What would you say to the people that are moving into this arena right now? What should they be thinking? What could their next steps be?

Bill Buxton: In a way, my advice is less concrete in terms of “learn this, learn that” in terms of some skill. We’ve said already that the problems we face today require depth. You have to be really good at what you are doing if you want to really have influence. And for me, the only way you can get really, really, really good at something is if you’re just so passionately in love with it that it’s not work. Now, people say okay, you got to find your passion. Well, the problem is how do you do that? Get into the traffic, because if it’s not hitting you wherever you are, then move. But the other part is, by trusting my nose, the stuff that caught my fancy in chasing those things that made no sense, but, in retrospect, were the perfect career moves. Like why would anybody go to university and do computer music when nobody even knew what a computer was? And spend four years doing that? But it was the most brilliant career decision that I never made. It wasn’t a career decision. I wanted to be a musician. But I would say, always be bad at something you love. And it doesn’t matter if things make sense. That’s the other part that’s really critical. I purposely rejected any career path for which there was a brochure in the guidance counselor’s office in high school. Because it’s already full. There’s going to be already too many people doing that. And it’s not that I’m not competitive, it’s just that my main competitive advantage is, I’m not trying to compete in the same race. And if you’ve got these interests and you become uniquely qualified, you can have the satisfaction you’re the best in the world at what you do. You’re just the only one. That makes you also the worst. That keeps hubris from taking over. But have the faith that at some point in your life, all that work will be recognized and somebody will need it. There’s somebody in the world who needs it. And the question is now to find it. For me, it took me till I was forty. But the time leading up to that was so full of rich experience that it never occurred to me that I wasn’t making any money. I was the richest person in the world because I was doing what I love doing.

Host: Bill Buxton, thank you for joining us today.

Bill Buxton: Thank you for having me.

(music plays)

To learn more about Bill Buxton and the latest innovations in human computer interaction, visit Microsoft.com/research

Posted on Leave a comment

Lunar library to include photos, books stored in DNA

September 27, 2018

Posted on Leave a comment

Playing to the crowd and other social media mandates with Microsoft Research’s Dr. Nancy Baym

Nancy Baym

Dr. Nancy Baym, Principal Researcher from Microsoft Research

Episode 41, September 12, 2018

Dr. Nancy Baym is a communication scholar, a Principal Researcher in MSR’s Cambridge, Massachusetts, lab, and something of a cyberculture maven. She’s spent nearly three decades studying how people use communication technologies in their everyday relationships and written several books on the subject. The big take away? Communication technologies may have changed drastically over the years, but human communication itself? Not so much.

Today, Dr. Baym shares her insights on a host of topics ranging from the arduous maintenance requirements of social media, to the dialectic tension between connection and privacy, to the funhouse mirror nature of emerging technologies. She also talks about her new book, Playing to the Crowd: Musicians, Audiences and the Intimate Work of Connection, which explores how the internet transformed – for better and worse – the relationship between artists and their fans.

Related:


TRANSCRIPT

Nancy Baym: It’s not just that it’s work, it’s that it’s work that never, ever ends. Because your phone is in your pocket, right? So, you’re sitting at home on a Sunday morning, having a cup of coffee and even if you don’t do it, there’s always the possibility of, “Oh, I could Tweet this out to my followers right now. I could turn this into an Instagram story.” So, the possibility of converting even your most private, intimate moments into fodder for your work life is always there, now.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: Dr. Nancy Baym is a communication scholar, a Principal Researcher in MSR’s Cambridge, Massachusetts, lab, and something of a cyberculture maven. She’s spent nearly three decades studying how people use communication technologies in their everyday relationships and written several books on the subject. The big take away? Communication technologies may have changed drastically over the years, but human communication itself? Not so much.

Today, Dr. Baym shares her insights on a host of topics ranging from the arduous maintenance requirements of social media, to the dialectic tension between connection and privacy, to the funhouse mirror nature of emerging technologies. She also talks about her new book, Playing to the Crowd: Musicians, Audiences and the Intimate Work of Connection, which explores how the internet transformed – for better and worse – the relationship between artists and their fans. That and much more on this episode of the Microsoft Research Podcast.

Host: Nancy Baym, welcome to the podcast.

Nancy Baym: Nice to be here.

Host: So, you’re a principle researcher at the MSR lab in Cambridge, Massachusetts, not to be confused with the one in Cambridge, England. Give our listeners an overview of the work that goes on in New England and of your work in particular. What are the big issues you’re looking at? Why is the work important? Basically, what gets you up in the morning?

Nancy Baym: So, the lab in New England is one of Microsoft’s smaller researcher labs. We’re very interdisciplinary, so, we have people in my basic area which is social media and social issues around technology from humanistic and social scientific perspectives. And we have that alongside people working on machine learning and artificial intelligence, people working on economics, people working on cryptography, people working on math and complexity theory, people doing algorithmic game theory, and then we also have a bioinformatics and medicine component to this program also. So, we’re really interested in getting people from very different perspectives together and listening to each other and seeing what kinds of new ideas get sparked when you get people from radically different disciplines together in the same environment and you give them long periods of time to get to know one another and get exposed to the kinds of work that they do. So, that’s the lab as a whole. My group is… we call ourselves the Social Media Collective, which is a, sort of, informal name for it. It’s not an official title but it’s sort of an affectionate one. There are three core people here in our New England lab, and then, which would be me, Mary Gray and Tarleton Gillespie, and then we have a postdoc and we have, in the summer, PhD interns, we have a research assistant, and we’re all interested in questions around how people use technologies, the kinds of work that people do through technologies, the kinds of work that technologies create for people, and the ways that that affects them, their identities, their relationships, their communities, societies as a whole.

Host: You know, as you talk about the types of researchers that you have there, I wonder, is New England unique among the labs at Microsoft?

Nancy Baym: I think we are, in that we are more interdisciplinary than many of them. I mean our Redmond lab, obviously, has got people from a huge range of disciplines, but it’s also got a huge number of people, whereas we’re a much smaller group. We’re on one floor of a building and there are, you know, anywhere from twenty to fifty of us, depending on how many visitors are in the lab and how many interns are around or what not, but that’s still a really small fraction of the Redmond group. So, I think anybody in a particular field finds themselves with many fewer colleagues from their own field relative to their colleagues as a whole in this lab. Whereas, I think most of our labs are dominated much more by people from computer science. Obviously, computer science is well-represented here, but we have a number of other fields as well. So, I think that foregrounding of interdisciplinarity is unique to this lab.

Host: That’s great. So, the social science research in the context of social computing and social media, it’s an interesting take on research in general at Microsoft, which is a high-tech company. How do you think the work that you do informs the broader work of Microsoft Research and Microsoft in general?

Nancy Baym: I would like to think that the kinds of work that I do, and that my colleagues are doing, are helping the company, and technology companies in general, think in more sophisticated ways about the ways that the technologies that we create get taken up and get used and with what consequences. I think that people who build technologies, they really want to help people do things. And they’re focused on that mission. And it can be difficult to think about, what are all the ways that that might get taken up besides the way that I imagine it will get taken up, besides the purpose that I’m designing it for? So, in some sense, I think part of our group is here to say, here’s some unexpected things you might not be thinking about. Here’s some consequences, or in the case of my own work, I’d like to think about the ways that technologies are often pushing people toward more connection and more time with others and more engagement and more sharing and more openness. And yet, people have very strong needs for privacy and for distance and for boundaries and what would it mean, for example, to think about how we could design technologies that helped people draw boundaries more efficiently rather than technologies that were pushing them toward openness all the time?

Host: I love that. And I’m going to circle back, in a bit, to some of those issues of designing for dialectic and some of the issues around unintended consequences. But first, I want to talk about a couple books you wrote. Before we talk about your newest book, I want to spend a little time talking about another book you wrote called Personal Connections in the Digital Age. And in it, you challenge conventional wisdom that tends to blame new technologies for what we might call old problems. Talk a little bit about Personal Connections in the Digital Age.

Nancy Baym: That book came out of a course that I had been teaching for, oh gosh, fifteen, sixteen, seventeen years, something like that, about communication and the internet, and one of the things that tends to come up is just what you’re talking about. This idea that people tend to receive new technologies as though this is the first time these things have ever been disrupted. So, part of what that book tries to do is to show how the way that people think and talk about the internet has these very long histories in how people think and talk about other communication technologies that have come before. So, for example, when the telephone was invented, there was a lot of concern that the telephone was going to lead to social disengagement, particularly among women, who would spend all the time talking on the phone and would stop voting. Um… (laughter) which doesn’t sound all that different from some contemporary ways that people talk about phones! Only now it’s the cell phones that are going to cause all that trouble. It’s that, but it’s also questions around things like, how do we present ourselves online? How do we come to understand who other people are online? How does language change when it’s used online? How do we build relationships with other people? How do we maintain relationships with people who we may have met offline? And also, how do communities and social networks form and get maintained through these communication technologies? So, it’s a really broad sweep. I think of that book as sort of the “one stop shop” for everything you need to know about personal connections in the digital age. If you just want to dive in and have a nice little compact introduction to the topic.

Host: Right. There are other researchers looking into these kinds of things as well. And is your work sort of dovetailing with those findings in that area of personal relationships online?

Nancy Baym: Yeah, yeah. There’s quite a bit of work in that field. And I would say that, for the most part, the body of work which I review pretty comprehensively in Personal Connections in the Digital Age tends to show this much more nuanced, balanced, “for every good thing that happens, something bad happens,” and for all of the sort of mythologies about “its destroying children” or “you can’t trust people you meet online,” or “people aren’t their real selves” or even the idea that there’s something called “real life,” which is separate from what happens on the internet, the empirical evidence from research tends to show that, in fact, online interaction is really deeply interwoven with all of our other forms of communication.

Host: I think you used the word “moral panic” which happens when a new technology hits the scene, and we’re all convinced that it’s going to ruin “kids today.” They won’t have manners or boundaries or privacy or self-control, and it’s all technology’s fault. So that’s cool that you have a kind of answer to that in that book. Let’s talk about your new book which is super fascinating: Playing to the Crowd: Musicians, Audiences and the Intimate Work of Connection. Tell us how this book came about and what was your motivation for writing it?

Nancy Baym: So, this book is the result of many years of work, but it came to fruition because I had done some early work about online fan community, particularly soap opera fans, and how they formed community in the early 1990s. And then, at some point, I got really interested in what music fans were doing online and so I started a blog where I was posting about music fans and other kinds of fans and the kinds of audience activities that people were doing online and how that was sort of messing with relationships between cultural producers and audiences. And that led to my being invited to speak at music industry events. And what I was seeing there was a lot of people with expertise saying things like, “The problem is, of course, that people are not buying music anymore, so the solution to this problem is to use social media to connect with your audience because if you can connect with them, and you can engage them, then you can monetize them.” And then I was seeing the musicians ask questions, and the kinds of questions that they were asking seemed very out-of-step with the kind of advice that they were being given. So, they would be asking questions like, do I have to use all of the sites? How do I know which ones to use? So, I got really interested in this question, of sort of, what, from the point of view from these people who were being told that their livelihood depends on creating some kind of new social relationship using these media with audiences, what is this call to connect and engage really about? What does it feel like to live with that? What are the issues it raises? Where did it come from? And then this turned into a much larger-scoped project thinking about musicians as a very specific case, but one with tremendous resonance for the ways that so many workers in a huge variety of fields now, including research, feel compelled to maintain some kind of visible, public persona that engages with and courts an audience so that when our next paper comes out, or our next record drops, or our next film is released or our next podcast comes out, the audience is already there and interested and curious and ready for it.

Host: Well let me interject with a question based on what you said earlier. How does that necessarily translate into monetization? I can see it translating into relationship and, you know, followership, but is there any evidence to support the you know…?

Nancy Baym: It’s magic, Gretchen, magic!

Host: OK. I thought so! I knew it!

Nancy Baym: You know, I work with economists and I keep saying, “Guys, let’s look at this. This is such a great research problem.” Is it true, right? Because you will certainly hear from people who work at labels or work in management who will say, “We see that our artists who engage more do better.” But in terms of any large scale “what works for which artists when?” and “does it really work across samples?” is, the million-dollar question that you just asked, is does it actually work? And I don’t know that we know the answer to that question. For some individuals, some of the time, yes. For the masses, reliably, we don’t know.

Host: Well and the other thing is, being told that you need to have this social media presence. It’s work, you know?

Nancy Baym: That’s exactly the point of the book, yeah. And it’s not just that it’s work, it’s that it’s work that never, ever ends. Because your phone is in your pocket, right? So, you’re sitting at home on a Sunday morning, having a cup of coffee, and even if you don’t do it, there’s always the possibility of, “Oh, I could tweet this out to my followers right now. I could turn this into an Instagram story.” So, the, the possibility of converting even your most private, intimate moments into fodder for your work life is always there, now. And the promise is, “Oh, if you get a presence, then magic will happen.” But first of all, it’s a lot of work to even create the presence and then to maintain it, you have to sell your personality now. Not just your stuff. You have to be about who you are now and make that identity accessible and engaging and what not. And yet it’s not totally clear that that’s, in fact, what audiences want. Or if it is what audiences want, which audiences and for which kinds of products?

(music plays)

Host: Well, let’s get back to the book a little bit. In one chapter, there’s a subsection called How Music Fans came to Rule the Internet. So, Nancy, how did music fans come to rule the internet?

Nancy Baym: So, the argument that I make in that chapter is that from the earliest, earliest days of the internet, music fans, and fans in general, were not just using the internet for their fandom, but were people who were also actively involved in creating the internet and creating social computing. So, I don’t want to say that music fans are the only people who were doing this, because they weren’t, but, from the very beginnings of online interaction, in like 1970, you already had the very people who are inventing the concept of a mailing list, at the same time saying, “Hey, we could use one of these to exchange Grateful Dead tickets, ‘cause I have some extra ones and I know there’s some other people in this building who might want them.” So, you have people at Stanford’s Artificial Intelligence laboratory in the very beginning of the 1970s saying, “Hey, we could use this enormous amount of computing power that we’ve got to digitize The Grateful Dead lyrics.” You have community computing projects like Community Memory being launched in the Bay Area putting their first terminal in a record store as a means of bringing together community. And then, from those early, early moments throughout, you see over and over and over again, music fans creating different forms of online community that then end up driving the way that the internet develops, peer-to-peer file sharing being one really clear example of a case where music fans helped to develop a technology to serve their needs, and by virtue of the success of that technology, ended up changing not just the internet, but industries that were organized around distributing cultural materials.

Host: One of the reviewers of Playing to the Crowd, and these reviews tend to be glowing, right? But he said, “It’ll change the way we think about music, technology and people.” So, even if it didn’t change everything about the way we think about music technology and people, what kinds of sort of “ah-ha findings” might people expect to find in the book?

Nancy Baym: I think one of the big ah-has is the extent to which music is a form of communication which has become co-opted, in so many ways, by commercial markets, and alongside that, the ways in which personal relationships and personal communication, have also become co-opted by commercial markets. Think about the ways that communication platforms monetize our everyday, friendly interaction through advertising. And the way that these parallel movements of music and relational communication from purely social activities to social activities that are permeated by commercial markets raises dialectic tensions that people then have to deal with as they’re continually navigating moving between people and events and circumstances and moments in a world that is so infused by technology and where our relationships are infused by technology.

Host: So, you’ve used the word “dialectic” in the context of computer interface design, and talked about the importance of designing for dialectic. Talk about what you mean by that and what kinds of questions arise for a developer or a designer with that mind set?

Nancy Baym: So, “dialectic” is one of the most important theoretical concepts to me when I think about people’s communication and people’s relationships in this project, but, in general, it’s a concept that I come back to over and over and over, and the idea is that we always have competing impulses that are both valid, and which we have to find balance between. So, a very common dialectic in interpersonal relationships is the desire to, on the one hand, be connected to others, and on the other, to be autonomous from others. So, we have that push and pull between “I want us to be part of each other’s lives all the time, and also leave me alone to make my own decisions.” (laughter) So that dialectic tension is not that one is right and one is wrong. It’s that that, and, as some of the theorists I cite on this argue, probably infinite dialectic tensions between “I want this, but I also want that” and it’s the opposite, right? And so, if we think about social interaction, instead of it being some sort of linear model where we start at point A with somebody and we move onto B and then C and then D, if we think of it instead as, even as we’re moving from A to B to C, that’s a tightrope. But at any given moment we can be toppling into one side or the other if we’re not balancing them carefully. So, if we think about a lot of the communication technologies that are available to us right now, they are founded, often quite explicitly, on a model of openness and connection and sharing. So, those are really, really valuable positions. But they’re also ends of dialectics that have opposite ends that are also very valid. So, all of these ways in which we’re pushed to be more open, more connected, to share more things, they are actually always in conflict within us with desires to be protective of other people or protective of ourselves, to have some distance from other people, to have autonomy. And to be able to have boundaries that separate us from others, as well as boundaries that connect us to one another. So, my question for designers is, how could we design in ways that make it easier for people to adjust those balances? In a way, you could sort of think about it as, what if we made the tightrope, you know, thicker so that it were easier for people to balance on, and you didn’t need to be so good at it, to make it work moment-to-moment?

Host: You know, everything you’ve just said makes me think of, you know, say, someone who wants to get involved in entertainment, in some way, and one of the plums of that is being famous, right? And then you find…

Nancy Baym: Until they are.

Host: …Until you are… that you don’t have control over all the attention you get and so that dialectic of “I want people to notice me/I want people to leave me alone” becomes wildly exacerbated there. But I think, you know, we all see “over-sharers,” as my daughter calls, them on social media. It’s like keep looking at me all the time. It’s like too much information. Have some privacy in your life…

Nancy Baym: Well you know, but that’s a great case, because I would say too much information is not actually a property of information, or of the person sending that information, it’s a property of the person receiving that information. Because, in fact, for some, it’s not going to be too much information. For some, it’s going to be exactly the right amount of information. So, I think of the example, of, from my point of view, a number of people who are parents of young children post much too much information on social networks. In particular, I’m really, really turned off by hearing about the details of their trivial illnesses that they’re going through at any given moment. You know, I mean if they got a real illness, of course I want to hear about it, but if you know, they got a fever this week and they’re just feeling a little sick, I don’t really need daily updates on their temperature, for instance. Um… on the other hand, I look at that, and I say, “Oh, too much information.” But then I say, “I’m not the audience for that.” They’ve got 500-600 friends. They probably put that there for grandma and the cousins who actually really do care. And I’m just not the audience. So, it’s not that that’s too much information. It’s that that information wasn’t meant for me. And instead of blaming them for having posted it, maybe I should just look away and move on to the next item in my feed. That’s ok, too. I’m sure that some of the things that I share strike some people as too much information but then, I’ll tell you what, some of the things that post that I think of as too much information, those are often the ones that people will later, in other contexts, say, “Oh my gosh, it meant so much to me that you posted about… whatever.” So, you know, we can’t just make these judgements about the content of what other people are producing without understanding the contexts in which it’s being received, and by whom.

Host: That is such a great reminder to us to have grace.

Nancy Baym: Grace for other people, that too, yeah.

Host: You’ve been watching, studying and writing about cyberculture for a long time. Going back a ways, what did you see, or even foresee, when you started doing this research and what if anything has surprised you along the way?

Nancy Baym: Well, it’s a funny thing. I mean, when I started doing this research, it was 1991. And the landscape has changed so much since then, so that the kinds of things that I could get away with being an insightful scholar for saying in 1991 are practically laughable now, because people just didn’t understand, at that time, that these technologies were actually going to be really socially useful. That people were going to use these technologies to present themselves to others, to form relationships, to build communities, that they were going to change the way audiences engaged, that they were going to change politics, that they were going to change so many practices of everyday life. And I think that those of us who were involved in cyberculture early, whether it was as researchers or just participants, could see that what was happening there was going to become something bigger than it was in those early days.

(music plays)

Host: I ask all of the researchers that come on the podcast some version of the question, “Is there anything that keeps you up at night?” To some degree, I think your work addresses that. You know, what ought we to be kept up at night about, and how, how ought we to address it? Is there anything that keeps you up at night, or anything that should keep us up at night that we should be thinking about critically as we’re in this landscape now?

Nancy Baym: Oh gosh, do any of us sleep anymore at all? (laughter) I mean I think what keeps me up nights is thinking, is it still ok to study the personal and the ordinary when it feels like we’re in such in extraordinary, tumultuous and frightening times, uh, nationally and globally? And I guess what I keep coming back to, when I’m lying awake at 4 in the morning saying, “Oh, maybe I just need to start studying social movements and give up on this whole interpersonal stuff.” And then I say to myself, “Wait a minute. The reason that we’re having so much trouble right now, at its heart, is that people are not having grace in their relations with one another,” to go back to your phrase. That what we really, really need right now more than anything is to be reconnected to our capacity for human connection with others. And so, in that sense, then, I kind of put myself to sleep by saying, “OK, there’s nothing more important than actual human connection and respect for one another.” And so that’s what I’m trying to foster in my work. So, I’m just going to call that my part and write a check for some of those other causes I can’t contribute to directly.

Host: I, I love that answer. And that actually leads beautifully into another question which is that your social science work at MSR is unique at industrial research labs. And I would call Microsoft, still, an industrial, you know, situation.

Nancy Baym: Definitely.

Host: So, you get to study unique and challenging research problems.

Nancy Baym: I have the best job in the world.

Host: No, I do, but you got a good one. Because I get to talk to people like you. But what do you think compels a company like Microsoft, perhaps somewhat uniquely, to encourage researchers like you to study and publish the things you do? What’s in it for them?

Nancy Baym: My lab director, Jennifer Chayes, talks about it as being like a portfolio which I think is, is a great way to think about it. So, you have this cast of researchers in your portfolio and each of them is following their own path to satisfying their curiosity and by having some of those people in that portfolio who really understand people, who really understand the way that technologies play out in ordinary people’s everyday lives and lived experiences, there may be moments where that’s exactly the stock you need at that moment. That’s the one that’s inflating and that’s the expertise that you need. So, given that we’re such a huge company, and that we have so many researchers studying so many topics, and that computing is completely infused with the social world now… I mean, if we think about the fact that we’ve shifted to so much cloud and that clouds are inherently social in the sense that it’s not on your private device, you have to trust others to store your data, and so many things are now shared that used to be individualized in computing. So, if computing is infused with the social, then it just doesn’t even really make sense for a tech company to not have researchers who understand the social, and who are studying the social, and who are on hand with that kind of expertise.

Host: As we close, Nancy, what advice would you give to aspiring researchers, maybe talking to your 25-year-old self, who might be interested in entering this field now, which is radically different from where it was when you started looking at it. What, what would you say to people that might be interested in this?

Nancy Baym: I would say, remember that there is well over a hundred years of social theory out there right now, and the fact that we have new communication technologies does not mean that people have started from scratch in their communication, and that we need to start from scratch in making sense of it. I think it’s more important than ever, when we’re thinking about new communication technologies, to understand communication behavior and the way that communication works, because that has not fundamentally transformed. The media through which we’ve used it has, but the way communication works to build identity, community, relationships, that has not fundamentally, magically, become something different. The same kind of interpersonal dynamics are still at play in many of these things. I think of the internet and communication technologies as being like funhouse mirrors. Where some phenomena get made huge and others get made small, so there’s a lot of distortion that goes on. But nothing entirely new is reflected that never existed before. So, it’s really important to understand the precedents for what you’re seeing, both in terms of theory and similar phenomena that might have occurred in earlier incarnations, in order to be able to really understand what you’re seeing in terms of both what is new, but also what’s not new. Because otherwise, what I see a lot in young scholarship is, “Look at this amazing thing people are doing in this platform with this thingy.” And it is really interesting, but it also actually looks a whole lot like what people were doing on this other platform in 1992, which also kind of looks a lot like what people were doing with ‘zines in the 1920s. And if we want to make arguments about what what’s new and what’s changing because of these things, it’s so important that we understand what’s not new and what these things are not changing.

(music plays)

Host: Nancy Baym, it’s been an absolute delight talking to you today. I’m so glad you took time to talk to us.

Nancy Baym: Alrighty, bye.

To learn more about Dr. Nancy Baym, and how social science scholars are helping real people understand and navigate the digital world, visit Microsoft.com/research.