post

IoT Plug and Play is now available in preview

Today we are announcing that IoT Plug and Play is now available in preview! At Microsoft Build in May 2019, we announced IoT Plug and Play and described how it will work seamlessly with IoT Central. We demonstrated how IoT Plug and Play simplifies device integration by enabling solution developers to connect and interact with IoT devices using device capability models defined with the Digital Twin definition language. We also announced a set of partners who have launched devices and solutions that are IoT Plug and Play enabled. You can find their IoT Plug and Play certified devices at the Azure Certified for IoT device catalog.

With today’s announcement, solution developers can start using Azure IoT Central or Azure IoT Hub to build solutions that integrate seamlessly with IoT devices enabled with IoT Plug and Play. We have also launched a new Azure Certified for IoT portal, for device partners interested to streamline the device certification submission process and get devices into the Azure IoT device catalog quickly.

This article outlines how solution developers can use IoT Plug and Play devices in their IoT solutions, and how device partners can build and certify their products to be listed in the catalog.

Faster device integration for solution developers

Azure IoT Central is a fully managed IoT Software as a Service (SaaS) offering that makes it easy to connect, monitor, and manage your IoT devices and products. Azure IoT Central simplifies the initial setup of your IoT solution and cuts the management burden, operational costs, and overhead of a typical IoT project. Azure IoT Central integration with IoT Plug and Play takes this one step further by allowing solution developers to integrate devices without writing any embedded code. IoT solution developers can choose devices from a large set of IoT Plug and Play certified devices to quickly build and customize their IoT solutions end-to-end. Solution developers can start with a certified device from the device catalog and customize the experience for the device, such as editing display names or units. Solution developers can also add dashboards for solution operators to visualize the data; as part of this new release, developers have a broader set of visualizations to choose from. There is also the option to auto generate dashboards and visualizations to get up and running quickly. Once the dashboard and visualizations are created, solution developers can run simulations based on real models from the device catalog. Developers can also integrate with the commands and properties exposed by IoT Plug and Play capability models to enable operators to effectively manage their device fleets. IoT Central will automatically load the capability model of any certified device, enabling a true Plug and Play experience!

Another option available for developers who’d like more customization is to build IoT solutions with Azure IoT Hub and IoT Plug and Play devices. With today’s release, Azure IoT Hub now supports RESTful digital twin APIs that expose the capabilities of IoT Plug and Play device capability models and interfaces. Developers can set properties to configure settings like alarm thresholds, send commands for operations such as resetting a device, route telemetry, and query which devices support a specific interface. The most convenient way is to use the Azure IoT SDK for Node.js (other languages are coming soon). And all devices enabled for IoT Plug and Play in the Azure Certified for IoT device catalog will work with IoT Hub just like they work with IoT Central.

An image of the certified device browsing page.

Streamlined certification process for device partners

The Azure Certified for IoT device catalog allows customers to quickly find the right Azure IoT certified device to quickly start building IoT solutions. To help our device partners certify their products as IoT Plug and Play compatible, we have revamped and streamlined the Azure Certified for IoT program by launching a new portal and submission process. With the Azure Certified for IoT portal, device partners can define new products to be listed in the Azure Certified for IoT device catalog and specify product details such as physical dimensions, description, and geo availability. Device partners can manage their IoT Plug and Play models in their company model repository, which limits access to only their own employees and select partners, as well as the public model repository. The portal also allows device partners to certify their products by submitting to an automated validation process that verifies correct implementation of the Digital Twin definition language and required interfaces implementation.

An image of the device page for the MXChip-Certified.

Device partners will also benefit from investments in developer tooling to support IoT Plug and Play. The Azure IoT Device Workbench extension for VS Code adds IntelliSense for easy authoring of IoT Play and Play device models. It also enables code generation to create C device code that implements the IoT Plug and Play model and provides the logic to connect to IoT Central, without customers having to worry about provisioning or integration with IoT Device SDKs.

The new tooling capabilities also integrates with the model repository service for seamless publishing of device models. In addition to the Azure IoT Device Workbench, device developers can use tools like the Azure IoT explorer and the Azure IoT extension for Azure Command-line Interface. Device code can be developed with the Azure IoT SDK for C and for Node.js.

An image of the Azure IoT explorer.

Connect sensors on Windows and Linux gateways to Azure

If you are using a Windows or Linux gateway device and you have sensors that are already connected to the gateway, then you can make these sensors available to Azure by simply editing a JSON configuration. We call this technology the IoT Plug and Play bridge. The bridge allows sensors on Windows and Linux to just work with Azure by bridging these sensors from the IoT gateway to IoT Central or IoT Hub. On the IoT gateway device, the sensor bridge leverages OS APIs and OS plug and play capabilities to connect to downstream sensors and uses the IoT Plug and Play APIs to communicate with IoT Central and IoT Hub on Azure. A solution builder can easily select from sensors enumerated on the IoT device and register them in IoT Central or IoT Hub. Once available in Azure, the sensors can be remotely accessed and managed. We have native support for Modbus and a simple serial protocol for managing and obtaining sensor data from MCUs or embedded devices and we are continuing to add native support for other protocols like MQTT. On Windows, we also support cameras, and general device health monitoring for any device the OS can recognize (such as USB peripherals). You can extend the bridge with your own adapters to talk to other types of devices (such as I2C/SPI), and we are working on adding support for more sensors and protocols (such as HID).

Next steps

post

Hackathons show teen girls the potential for AI – and themselves

This summer, young women in San Francisco and Seattle spent a weekend taking their creative problem solving to a whole new level through the power of artificial intelligence. The two events were part of a Microsoft-hosted AI boot-camp program that started last year in Athens, then broadened its reach with events in London last fall and New York City in the spring.

“I’ve been so impressed not only with the willingness of these young women to spend an entire weekend learning and embracing this opportunity, but with the quality of the projects,” said Didem Un Ates, one of the program organizers and a senior director for AI within Microsoft. “It’s just two days, but what they come up with always blows our minds.” (Read a LinkedIn post from Un Ates about the events.)

The problems these girls tackled aren’t kid stuff: The girls chose their weekend projects from among the U.N. Sustainable Development Goals, considered to be the most difficult and highest priority for the world.

The result? Dozens of innovative products that could help solve issues as diverse as ocean pollution, dietary needs, mental health, acne and climate change. Not to mention all those young women – 129 attended the U.S. events – who now feel empowered to pursue careers to help solve those problems. They now see themselves as “Alice,” a mascot created by the project team to represent the qualities young women possess that lend themselves to changing the world through AI.

Organizers plan to broaden the reach of these events, so that girls everywhere can learn about the possibility of careers in technology.

Related:

post

Podcast: How machines are learning to ace the reading comprehension exam

Dr. TJ Hazen

Episode 86, August 21, 2019

The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.

On today’s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik’s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world.

Related:


Transcript

T.J. Hazen: Most of the questions are fact-based questions like, who did something, or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. How can we make this technology work for real problems that our enterprise customers are bringing in?

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He’s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.

On today’s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik’s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world. That and much more on this episode of the Microsoft Research Podcast.

(music plays)

Host: T.J. Hazen, welcome to the podcast!

T.J. Hazen: Thanks for having me.

Host: Researchers like to situate their research, and I like to situate my researchers so let’s get you situated. You are a Senior Principal Research Manager in the Engineering and Applied Research group at Microsoft Research in Montreal. Tell us what you do there. What are the big questions you’re asking, what are the big problems you’re trying to solve, what gets you up in the morning?

T.J. Hazen: Well, I’ve spent my whole career working in speech and language understanding, and I think the primary goal of everything I do is to try to be able to answer questions. So, people have questions and we’d like the computer to be able to provide answers. So that’s sort of the high-level goal, how do we go about answering questions? Now, answers can come from many places.

Host: Right.

T.J. Hazen: A lot of the systems that you’re probably aware of like Siri for example, or Cortana or Bing or Google, any of them…

Host: Right.

T.J. Hazen: …the answers typically come from structured places, databases that contain information, and for years these models have been built in a very domain-specific way. If you want to know the weather, somebody built a system to tell you about the weather.

Host: Right.

T.J. Hazen: And somebody else might build a system to tell you about the age of your favorite celebrity and somebody else might have written a system to tell you about the sports scores, and each of them can be built to handle that very specific case. But that limits the range of questions you can ask because you have to curate all this data, you have to put it into structured form. And right now, what we’re worried about is, how can you answer questions more generally, about anything? And the internet is a wealth of information. The internet has got tons and tons of documents on every topic, you know, in addition to the obvious ones like Wikipedia. If you go into any enterprise domain, you’ve got manuals about how their operation works. You’ve got policy documents. You’ve got financial reports. And it’s not typical that all this information is going to be curated by somebody. It’s just sitting there in text. So how can we answer any question about anything that’s sitting in text? We don’t have a million or five million or ten million librarians doing this for us…

Host: Right.

T.J. Hazen: …uhm, but the information is there, and we need a way to get at it.

Host: Is that what you are working on?

T.J. Hazen: Yes, that’s exactly what we’re working on. I think one of the difficulties with today’s systems is, they seem really smart…

Host: Right?

T.J. Hazen: Sometimes. Sometimes they give you fantastically accurate answers. But then you can just ask a slightly different question and it can fall on its face.

Host: Right.

T.J. Hazen: That’s the real gap between what the models currently do, which is, you know, really good pattern matching some of the time, versus something that can actually understand what your question is and know when the answer that it’s giving you is correct.

Host: Let’s talk a bit about your group, which, out of Montreal, is Engineering and Applied Research. And that’s an interesting umbrella at Microsoft Research. You’re technically doing fundamental research, but your focus is a little different from some of your pure research peers. How would you differentiate what you do from others in your field?

T.J. Hazen: Well, I think there’s two aspects to this. The first is that the lab up in Montreal was created as an offshoot of an acquisition. Microsoft bought Maluuba, which was a startup that was doing really incredible deep learning research, but at the same time they were a startup and they needed to make money. So, they also had this very talented engineering team in place to be able to take the research that they were doing in deep learning and apply it to problems where it could go into products for customers.

Host: Right.

T.J. Hazen: When you think about that need that they had to actually build something, you could see why they had a strong engineering team.

Host: Yeah.

T.J. Hazen: Now, when I joined, I wasn’t with them when they were a startup, I actually joined them from Azure where I was working with outside customers in the Azure Data Science Solution team, and I observed lots of problems that our customers have. And when I saw this new team that we had acquired and we had turned into a research lab in Montreal, I said I really want to be involved because they have exactly the type of technology that can solve customer problems and they have this engineering team in place that can actually deliver on turning from a concept into something real.

Host: Right.

T.J. Hazen: So, I joined, and I had this agreement with my manager that we would focus on real problems. They were now part of the research environment at Microsoft, but I said that doesn’t restrict us on thinking about blue sky, far-afield research. We can go and talk to product teams and say what are the real problems that are hindering your products, you know, what are the difficulties you have in actually making something real? And we could focus our research to try to solve those difficult problems. And if we’re successful, then we have an immediate product that could be beneficial.

Host: Well in any case, you’re swimming someplace in a “we could do this immediately” but you have permission to take longer, or is there a mandate, as you live in this engineering and applied research group?

T.J. Hazen: I think there’s a mandate to solve hard problems. I think that’s the mandate of research. If it wasn’t a hard problem, then somebody…

Host: …would already have a product.

T.J. Hazen: …in the product team would already have a solution, right? So, we do want to tackle hard problems. But we also want to tackle real problems. That’s, at least, our focus of our team. And there’s plenty of people doing blue sky research and that’s an absolute need as well. You know, we can’t just be thinking one or two years ahead. Research should be also be thinking five, ten, fifteen years ahead.

Host: So, there’s a whole spectrum there.

T.J. Hazen: So, there’s a spectrum. But there is a real need, I think, to fill that gap between taking an idea that works well in a lab and turning it into something that works well in practice for a real problem. And that’s the key. And many of the problems that have been solved by Microsoft have not just been blue sky ideas, but they’ve come from this problem space where a real product says, ahh, we’re struggling with this. So, it could be anything. It can be, like, how does Bing efficiently rank documents over billions of documents? You don’t just solve that problem by thinking about it, you have to get dirty with the data, you have to understand what the real issues are. So, many of these research problems that we’re focusing on, and we’re focusing on, how do you answer questions out of documents when the questions could be arbitrary, and on any topic? And you’ve probably experienced this, if you are going into a search site for your company, that company typically doesn’t have the advantage of having a big Bing infrastructure behind it that’s collecting all this data and doing sophisticated machine learning. Sometimes it’s really hard to find an answer to your question. And, you know, the tricks that people use can be creative and inventive but oftentimes, trying to figure out what the right keywords are to get you to an answer is not the right thing.

Host: You work closely with engineers on the path from research to product. So how does your daily proximity to the people that reify your ideas as a researcher impact the way you view, and do, your work as a researcher?

T.J. Hazen: Well, I think when you’re working in this applied research and engineering space, as opposed to a pure research space, it really forces you to think about the practical implications of what you’re building. How easy is it going to be for somebody else to use this? Is it efficient? Is it going to run at scale? All of these problems are problems that engineers care a lot about. And sometimes researchers just say, let me solve the problem first and everything else is just engineering. If you say that to an engineer, they’ll be very frustrated because you don’t want to bring something to an engineer that works ten times slower than needs to be, uses ten times more memory. So, when you’re in close proximity to engineers, you’re thinking about these problems as you are developing your methods.

Host: Interesting, because those two things, I mean, you could come up with a great idea that would do it and you pay a performance penalty in spades, right?

T.J. Hazen: Yeah, yeah. So, sometimes it’s necessary. Sometimes you don’t know how to do it and you just say let me find a solution that works and then you spend ten years actually trying to figure out how to make it work in a real product.

Host: Right.

T.J. Hazen: And I’d rather not spend that time. I’d rather think about, you know, how can I solve something and have it be effective as soon as possible?

(music plays)

Host: Let’s talk about human language technologies. They’ve been referred to by some of your colleagues as “the crown jewel of AI.” Speech and language comprehension is still a really hard problem. Give us a lay of the land, both in the field in general and at Microsoft Research specifically. What’s hope and what’s hype, and what are the common misconceptions that run alongside the remarkable strides you actually are making?

T.J. Hazen: I think that word we mentioned already: understand. That’s really the key of it. Or comprehend is another way to say it. What we’ve developed doesn’t really understand, at least when we’re talking about general purpose AI. So, the deep learning mechanisms that people are working on right now that can learn really sophisticated things from examples. They do an incredible job of learning specific tasks, but they really don’t understand what they’re learning.

Host: Right.

T.J. Hazen: So, they can discover complex patterns that can associate things. So in the vision domain, you know, if you’re trying to identify objects, and then you go in and see what the deep learning algorithm has learned, it might have learned features that are like, uh, you know, if you’re trying to identify a dog, it learns features that would say, oh, this is part of a leg, or this is part of an ear, or this is part of the nose, or this is the tail. It doesn’t know what these things are, but it knows they all go together. And the combination of them will make a dog. And it doesn’t know what a dog is either. But the idea that you could just feed data in and you give it some labels, and it figures everything else out about how to associate that label with that, that’s really impressive learning, okay? But it’s not understanding. It’s just really sophisticated pattern-matching. And the same is true in language. We’ve gotten to the point where we can answer general-purpose questions and it can go and find the answer out of a piece of text, and it can do it really well in some cases, and like, some of the examples we’ll give it, we’ll give it “who” questions and it learns that “who” questions should contain proper names or names of organizations. And “when” questions should express concepts of time. It doesn’t know anything about what time is, but it’s figured out the patterns about, how can I relate a question like “when” to an answer that contains time expression? And that’s all done automatically. There’s no features that somebody sits down and says, oh, this is a month and a month means this, and this is a year, and a year means this. And a month is a part of a year. Expert AI systems of the past would do this. They would create ontologies and they would describe things about how things are related to each other and they would write rules. And within limited domains, they would work really, really well if you stayed within a nice, tightly constrained part of that domain. But as soon as you went out and asked something else, it would fall on its face. And so, we can’t really generalize that way efficiently. If we want computers to be able to learn arbitrarily, we can’t have a human behind the scene creating an ontology for everything. That’s the difference between understanding and crafting relationships and hierarchies versus learning from scratch. We’ve gotten to the point now where the algorithms can learn all these sophisticated things, but they really don’t understand the relationships the way that humans understand it.

Host: Go back to the, sort of, the lay of the land, and how I sharpened that by saying, what’s hope and what’s hype? Could you give us a “TBH” answer?

T.J. Hazen: Well, what’s hope is that we can actually find reasonable answers to an extremely wide range of questions. What’s hype is that the computer will actually understand, at some deep and meaningful level, what this answer actually means. I do think that we’re going to grow our understanding of algorithms and we’re going to figure out ways that we can build algorithms that could learn more about relationships and learn more about reasoning, learn more about common sense, but right now, they’re just not at that level of sophistication yet.

Host: All right. Well let’s do the podcast version of your NERD Lunch and Learn. Tell us what you are working on in machine reading comprehension, or MRC, and what contributions you are making to the field right now.

T.J. Hazen: You know, NERD is short for New England Research and Development Center

Host: I did not!

T.J. Hazen: …which is where I physically work.

Host: Okay…

T.J. Hazen: Even though I work closely and am affiliated with the Montreal lab, I work out of the lab in Cambridge, Massachusetts, and NERD has a weekly Lunch and Learn where people present the work they’re doing, or the research that they’re working on, and at one of these Lunch and Learns, I gave this talk on machine reading comprehension. Machine reading comprehension, in its simplest version, is being able to take a question and then being able to find the answer anywhere in some collection of text. As we’ve already mentioned, it’s not really “comprehending” at this point, it’s more just very sophisticated pattern-matching. But it works really well in many circumstances. And even on tasks like the Stanford Question Answering Dataset, it’s a common competition that people have competed in, question answering, by computer, has achieved a human level of parity on that task.

Host: Mm-hmm.

T.J. Hazen: Okay. But that task itself is somewhat simple because most of the questions are fact-based questions like, who did something or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we’re focused right now. Like, how can we make this technology work for real problems that our enterprise customers are bringing in? So, we have customers coming in saying, I want to be able to answer any question in our financial policies, or our auditing guidelines, or our operations manual. And people don’t ask “who” or “when” questions of their operations manual. They ask questions like, how do I do something? Or explain some process to me. And those answers are completely different. They tend to be longer and more complex and you don’t always, necessarily, find a short, simple answer that’s well situated in some context.

Host: Right.

T.J. Hazen: So, our focus at MSR Montreal is to take this machine reading comprehension technology and apply it into these new areas where our customers are really expressing that there’s a need.

Host: Well, let’s go a little deeper, technically, on what it takes to enable or teach machines to answer questions, and this is key, with limited data. That’s part of your equation, right?

T.J. Hazen: Right, right. So, when we go to a new task, uh, so if a company comes to us and says, oh, here’s our operations manual, they often have this expectation, because we’ve achieved human parity on some dataset, that we can answer any question out of that manual. But when we test the general-purpose models that have been trained on these other tasks on these manuals, they don’t generally work well. And these models have been trained on hundreds of thousands, if not millions, of examples, depending on what datasets you’ve been using. And it’s not reasonable to ask a company to collect that level of data in order to be able to answer questions about their operations manual. But we need something. We need some examples of what are the types of questions, because we have to understand what types of questions they ask, we need to understand the vocabulary. We’ll try to learn what we can from the manual itself. But without some examples, we don’t really understand how to answer questions in these new domains. But what we discovered through some of the techniques that are available, transfer learning is what we refer to as sort of our model adaptation, how do you learn from data in some new domain and take an existing model and make it adapt to that domain? We call that transfer learning. We can actually use transfer learning to do really well in a new domain without requiring a ton of data. So, our goal is to have it be examples like hundreds of examples, not tens of thousands of examples.

Host: How’s that working now?

T.J. Hazen: It works surprisingly well. I’m always amazed at how well these machine learning algorithms work with all the techniques that are available now. These models are very complex. When we’re talking about our question answering model, it has hundreds of millions of parameters and what you’re talking about is trying to adjust a model that is hundreds of millions of parameters with only hundreds of examples and, through a variety of different techniques where we can avoid what we call overfitting, we can allow the generalizations that are learned from all this other data to stay in place while still adapting it so it does well in this specific domain. So, yeah, I think we’re doing quite well. We’re still exploring, you know, what are the limits?

Host: Right.

T.J. Hazen: And we’re still trying to figure out how to make it work so that an outside company can easily create the dataset, put the dataset into a system, push a button. The engineering for that and the research for that is still ongoing, but I think we’re pretty close to being able to, you know, provide a solution for this type of problem.

Host: All right. Well I’m going to push in technically because to me, it seems like that would be super hard for a machine. We keep referring to these techniques… Do we have to sign an NDA, as listeners?

T.J. Hazen: No, no. I can explain stuff that’s out…

Host: Yeah, do!

T.J. Hazen: … in the public domain. So, there are two common underlying technical components that make this work. One is called word embeddings and the other is called attention. Word embeddings are a mechanism where it learns how to take words or phrases and express them in what we call vector space.

Host: Okay.

T.J. Hazen: So, it turns them into a collection of numbers. And it does this by figuring out what types of words are similar to each other based on the context that they appear in, and then placing them together in this vector space, so they’re nearby each other. So, we would learn, that let’s say, city names are all similar because they appear in similar contexts. And so, therefore, Boston and New York and Montreal, they should all be close together in this vector space.

Host: Right.

T.J. Hazen: And blue and red and yellow should be close together. And then advances were made to figure this out in context. So that was the next step, because some words have multiple meanings.

Host: Right.

T.J. Hazen: So, you know, if you have a word like apple, sometimes it refers to a fruit and it should be near orange and banana, but sometimes it refers to the company and it should be near Microsoft and Google. So, we’ve developed context dependent ones, so that says, based on the context, I’ll place this word into this vector space so it’s close to the types of things that it really represents in that context.

Host: Right.

T.J. Hazen: That’s the first part. And you can learn these word embeddings from massive amounts of data. So, we start off with a model that’s learned on far more data than we actually have question and answer data for. The second part is called attention and that’s how you associate things together. And it’s the attention mechanisms that learn things like a word like “who” has to attend to words like person names or company names. And a word like “when” has to attend to…

Host: Time.

T.J. Hazen: …time. And those associations are learned through this attention mechanism. And again, we can actually learn on a lot of associations between things just from looking at raw text without actually having it annotated.

Host: Mm-hmm.

T.J. Hazen: Once we’ve learned all that, we have a base, and that base tells us a lot about how language works. And then we just have to have it focus on the task, okay? So, depending on the task, we might have a small amount of data and we feed in examples in that small amount, but it takes advantage of all the stuff that it’s learned about language from all these, you know, rich data that’s out there on the web. And so that’s how it can learn these associations even if you don’t give it examples in your domain, but it’s learned a lot of these associations from all the raw data.

Host: Right.

T.J. Hazen: And so, that’s the base, right? You’ve got this base of all this raw data and then you train a task-specific thing, like a question answering system, but even then, what we find is that, if we train a question answering system on basic facts, it doesn’t always work well when you go to operation manuals or other things. So, then we have to have it adapt.

Host: Sure.

T.J. Hazen: But, like I said, that base is very helpful because it’s already learned a lot of characteristics of language just by observing massive amounts of text.

(music plays)

Host: I’d like you to predict the future. No pressure. What’s on the horizon for machine reading comprehension research? What are the big challenges that lie ahead? I mean, we’ve sort of laid the land out on what we’re doing now. What next?

T.J. Hazen: Yeah. Well certainly, more complex questions. What we’ve been talking about so far is still fairly simple in the sense that you have a question, and we try to find passages of text that answer that question. But sometimes a question actually requires that you get multiple pieces of evidence from multiple places and you somehow synthesize them together. So, a simple example we call the multi-hop example. If I ask a question like, you know, where was Barack Obama’s wife born? I have to figure out first, who is Barack Obama’s wife? And then I have to figure out where she was born. And those pieces of information might be in two different places.

Host: Right.

T.J. Hazen: So that’s what we call a multi-hop question. And then, sometimes, we have to do some operation on the data. So, you could say, you know like, what players, you know, from one Super Bowl team also played on another Super Bowl team? Well there, what you have to do is, you have to get the list of all the players from both teams and then you have to do an intersection between them to figure out which ones are the same on both. So that’s an operation on the data…

Host: Right.

T.J. Hazen: …and you can imagine that there’s lots of questions like that where the information is there, but it’s not enough to just show the person where the information is. You also would like to go a step further and actually do the computation for that. That’s a step that we haven’t done, like, how do you actually go from mapping text to text, and saying these two things are associated, to mapping text to some sequence of operations that will actually give you an exact answer. And, you know, it can be quite difficult. I can give you a very simple example. Like, just answering a question, yes or no, out of text, is not a solved problem. Let’s say I have a question where someone says, I’m going to fly to London next week. Am I allowed to fly business class according to my policies from my company, right? We can have a system that would be really good at finding the section of the policy that says, you know, if you are a VP-level or higher and you are flying overseas, you can fly business class, otherwise, no. Okay? But, you know, if we actually want the system to answer yes or no, we have to actually figure out all the details, like okay, who’s asking the question? Are they a VP? Where are they located? Oh, they’re in New York. What does flying overseas mean??

Host: Right. They’re are layers.

T.J. Hazen: Right. So that type of comprehension, you know, we’re not quite there yet for all types of questions. Usually these things have to be crafted by hand for specific domains. So, all of these things about how can you answer complex questions, and even simple things like common sense, like, things that we all know… Um. And so, my manager, Andrew McNamara, he was supposed to be here with us, one of his favorite examples is this concept of coffee being black. But if you spill coffee on your shirt, do you have a black stain on your shirt? No, you’ve got a brown stain on your shirt. And that’s just common knowledge. That is, you know, a common-sense thing that computers may not understand.

Host: You’re working on research, and ultimately products or product features, that make people think they can talk to their machines and that their machines can understand and talk back to them. So, is there anything you find disturbing about this? Anything that keeps you up at night? And if so, how are you dealing with it?

T.J. Hazen: Well, I’m certainly not worried about the fact that people can ask questions of the computer and the computer can give them answers. What I’m trying to get at is something that’s helpful and can help you solve tasks. In terms of the work that we do, yeah, there are actually issues that concern me. So, one of the big ones is, even if a computer can say, oh, I found a good answer for you, here’s the answer, it doesn’t know anything about whether that answer is true. If you go and ask your computer, was the Holocaust real? and it finds an article on the web that says no, the Holocaust was a hoax, do I want my computer to show that answer? No, I don’t. But…

Host: Or the moon landing…!

T.J. Hazen: …if all you are doing is teaching the computer about word associations, it might think that’s a perfectly reasonable answer without actually knowing that this is a horrible answer to be showing. So yeah, the moon landing, vaccinations… The easy way that people can defame people on the internet, you know, even if you ask a question that might seem like a fact-based question, you can get vast differences of opinion on this and you can get extremely biased and untrue answers. And how does a computer actually understand that some of these things are not things that we should represent as truth, right? Especially if your goal is to find a truthful answer to a question.

Host: All right. So, then what do we do about that? And by we, I mean you!

T.J. Hazen: Well, I have been working on this problem a little bit with the Bing team. And one of the things that we discovered is that if you can determine that a question is phrased in a derogatory way, that usually means the search results that you’re going to get back are probably going to be phrased in a derogatory way. So, even if we don’t understand the answer, we can just be very careful about what types of questions we actually want to answer.

Host: Well, what does the world look like if you are wildly successful?

T.J. Hazen: I want the systems that we build to just make life easier for people. If you have an information task, the world is successful if you get that piece of information and you don’t have to work too hard to get it. We call it task completion. If you have to struggle to find an answer, then we’re not successful. But if you can ask a question, and we can get you the answer, and you go, yeah, that’s the answer, that’s success to me. And we’ll be wildly successful if the types of things where that happens become more and more complex. You know, where if someone can start asking questions where you are synthesizing data and computing answers from multiple pieces of information, for me, that’s the wildly successful part. And we’re not there yet with what we’re going to deliver into product, but it’s on the research horizon. It will be incremental. It’s not going to happen all at once. But I can see it coming, and hopefully by the time I retire, I can see significant progress in that direction.

Host: Off script a little… will I be talking to my computer, my phone, a HoloLens? Who am I asking? Where am I asking? What device? Is that so “out there” as well?

T.J. Hazen: Uh, yeah, I don’t know how to think about where devices are going. You know, when I was a kid, I watched the original Star Trek, you know, and everything on there, it seemed like a wildly futuristic thing, you know? And then fifteen, twenty years later, everybody’s got their own little “communicator.”

Host: Oh my gosh.

T.J. Hazen: And so, uh, you know, the fact that we’re now beyond where Star Trek predicted we would be, you know, that itself, is impressive to me. So, I don’t want to speculate where the devices are going. But I do think that this ability to answer questions, it’s going to get better and better. We’re going to be more interconnected. We’re going to have more access to data. The range of things that computers will be able to answer is going to continue to expand. And I’m not quite sure exactly what it looks like in the future, to be honest, but, you know, I know it’s going to get better and easier to get information. I’m a little less worried about, you know, what the form factor is going to be. I’m more worried about how I’m going to actually answer questions reliably.

Host: Well it’s story time. Tell us a little bit about yourself, your life, your path to MSR. How did you get interested in computer science research and how did you land where you are now working from Microsoft Research in New England for Montreal?

T.J. Hazen: Right. Well, I’ve never been one to long-term plan for things. I’ve always gone from what I find interesting to the next thing I find interesting. I never had a really serious, long-term goal. I didn’t wake up some morning when I was seven and say, oh, I want to be a Principal Research Manager at Microsoft in my future! I didn’t even know what Microsoft was when I was seven. I went to college and I just knew I wanted to study computers. I didn’t know really what that meant at the time, it just seemed really cool.

Host: Yeah.

T.J. Hazen: I had an Apple II when I was a kid and I learned how to do some basic programming. And then I, you know, was going through my course work. I was, in my junior year, I was taking a course in audio signal processing and in the course of that class, we got into a discussion about speech recognition, which to me was, again, it was Star Trek. It was something I saw on TV. Of course, now it was Next Generation….!

Host: Right!

T.J. Hazen: But you know, you watch the next generation of Star Trek and they’re talking to the computer and the computer is giving them answers and here somebody is telling me you know there’s this guy over in the lab for computer science, Victor Zue, and he’s building systems that recognize speech and give answers to questions! And to me, that was science-fiction. So, I went over and asked the guy, you know, I heard you’re building a system, and can I do my bachelor’s thesis on this? And he gave me a demo of the system – it was called Voyager – and he asked a question, I don’t remember the exact question, but it was probably something like, show me a map of Harvard Square. And the system starts chugging along and it’s showing results on the screen as it’s going. And it literally took about two minutes for it to process the whole thing. It was long enough that he actually explained to me how the entire system worked while it was processing. But then it came back, and it popped up a map of Harvard Square on the screen. And I was like, ohhh my gosh, this is so cool, I have to do this! So, I did my bachelor’s thesis with him and then I stayed on for graduate school. And by seven years later, we had a system that was running in real time. We had a publicly available system in 1997 that you could call up on a toll-free number and you could ask for weather reports and weather information for anywhere in the United States. And so, the idea that it went from something that was “Star Trek” to something that I could pick up my phone, call a number and, you know, show my parents, this is what I’m working on, it was astonishing how fast that developed! I stayed on in that field with that research group. I was at MIT for another fifteen years after I graduated. At some point, a lot of the things that we were doing, they moved from the research lab to actually being real.

Host: Right.

T.J. Hazen: So, like twenty years after I went and asked to do my bachelor’s thesis, Siri comes out, okay? And so that was our goal. They were like, twenty years ago, we should be able to have a device where you can talk to it and it gives you answers and twenty years later there it was. So, that, for me, that was a queue that maybe it’s time to go where the action is, which was in companies that were building these things. Once you have a large company like Microsoft or Google throwing their resources behind these hard problems, then you can’t compete when you’re in academia for that space. You know, you have to move on to something harder and more far out. But I still really enjoyed it. So, I joined Microsoft to work on Cortana…

Host: Okay…

T.J. Hazen: …when we were building the first version of Cortana. And I spent a few years working on that. I’ve worked on some Bing products. I then spent some time in Azure trying to transfer these things so that companies that had the similar types of problems could solve their problems on Azure with our technology.

Host: And then we come full circle to…

T.J. Hazen: Then full circle, yeah. You know, once I realized that some of the stuff that customers were asking for wasn’t quite ready yet, I said, let me go back to research and see if I can improve that. It’s fantastic to see something through all the way to product, but once you’re successful and you have something in a product, it’s nice to then say, okay, what’s the next hard problem? And then start over and work on the next hard problem.

Host: Before we wrap up, tell us one interesting thing about yourself, maybe it’s a trait, a characteristic, a life event, a side quest, whatever… that people might not know, or be able to find on a basic web search, that’s influenced your career as a researcher?

T.J. Hazen: Okay. You know, when I was a kid, maybe about eleven years old, the Rubik’s Cube came out. And I got fascinated with it. And I wanted to learn how to solve it. And a kid down the street from my cousin had taught himself from a book how to solve it. And he taught me. His name was Jonathan Cheyer. And he was actually in the first national speed Rubik’s Cube solving competition. It was on this TV show, That’s Incredible. I don’t know if you remember that TV show.

Host: I do.

T.J. Hazen: It turned out what he did was, he had learned what is now known as the simple solution. And I learned it from him. And I didn’t realize it until many years later, but what I learned was an algorithm. I learned, you know, a sequence of steps to solve a problem. And once I got into computer science, I discovered all that problem-solving I was doing with the Rubik’s Cube and figuring out what are the steps to solve a problem, that’s essentially what things like machine learning are doing. What are the steps to figure out, what are the features of something, what are the steps I have to do to solve the problem? I didn’t realize that at the time, but the idea of being able to break down a hard problem like solving a Rubik’s Cube, and figuring out what are the stages to get you there, is interesting. Now, here’s the interesting fact. So, Jonathan Cheyer, his older brother is Adam Cheyer. Adam Cheyer is one of the co-founders of Siri.

Host: Oh my gosh. Are you kidding me?

T.J. Hazen: So, I met the kid when I was young, and we didn’t really stay in touch. I discovered, you know, many years later that Adam Cheyer was actually the older brother of this kid who taught me the Rubik’s Cube years and years earlier, and Jonathan ended up at Siri also. So, it’s an interesting coincidence that we ended up working in the same field after all those years from this Rubik’s Cube connection!

Host: You see, this is my favorite question now because I’m getting the broadest spectrum of little things that influenced and triggered something…!

Host: At the end of every podcast, I give my guests a chance for the proverbial last word. Here’s your chance to say anything you want to would-be researchers, both applied and other otherwise, who might be interested in working on machine reading comprehension for real-world applications.

T.J. Hazen: Well, I could say all the things that you would expect me to say, like you should learn about deep learning algorithms and you should possibly learn Python because that’s what everybody is using these days, but I think the single most important thing that I could tell anybody who wants to get into a field like this is that you need to explore it and you need to figure out how it works and do something in depth. Don’t just get some instruction set or some high-level overview on the internet, run it on your computer and then say, oh, I think I understand this. Like get into the nitty-gritty of it. Become an expert. And the other thing I could say is, of all the people I’ve met who are extremely successful, the thing that sets them apart isn’t so much, you know, what they learned, it’s the initiative that they took. So, if you see a problem, try to fix it. If you see a problem, try to find a solution for it. And I say this to people who work for me. If you really want to have an impact, don’t just do what I tell you to do, but explore, think outside the box. Try different things. OK? I’m not going to have the answer to everything, so therefore, if I don’t have the answer to everything, then if you’re only doing what I’m telling you to do, then we both, together, aren’t going to have the answer. But if you explore things on your own and take the initiative and try to figure out something, that’s the best way to really be successful.

Host: T.J. Hazen, thanks for coming in today, all the way from the east coast to talk to us. It’s been delightful.

T.J. Hazen: Thank you. It’s been a pleasure.

(music plays)

To learn more about Dr. T.J. Hazen and how researchers and engineers are teaching machines to answer complicated questions, visit Microsoft.com/research

post

Digital transformation helps governments be more inclusive

Governments exist to improve the lives of citizens, and the right technology is key to bringing that mission into a rapidly changing, digital world. This is nowhere truer than in the case of providing services for citizens who need accessible services. Digital transformation is a pressing issue for most governments, and the imperative to modernize workplaces and services brings with it an opportunity to empower every citizen with technology that is designed with accessibility and inclusiveness in mind.

As most government organizations know, citizen trust is incredibly difficult to build and, in a rapidly changing landscape, even harder to keep. The desire to build that trust is one aspect that is driving digital transformation among governments. There are already inspiring examples of digitally-driven government innovation underway, for instance in Riverside County, where employees are using Microsoft Power BI data analytics to make government spending more efficient and transparent. In fact, citizens are 58 percent more likely to trust a government institution that provides great digital experiences. When most people are accustomed to carrying out their day-to-day tasks in an efficient digital environment, from online banking to making purchases, they expect their experience with governments to be the same. When government organizations meet this demand head-on, providing more efficient, positive digital experiences for citizens, it drives that all-important citizen trust.

For Microsoft, AI is at the forefront of a commitment to developing technology that caters to diverse needs. Using Microsoft Cognitive Services, AI has the potential to break down barriers, particularly in the government space, where people of all abilities need to stay informed and make the most of civic life. Increasingly, citizens are demanding services that are digitally-driven and user-centric, and governments that can meet this demand with intelligent services are well-placed to gain citizen trust and create lasting, positive relationships.

The need to meet citizen demand is clear; that means embracing digital transformation as well as optimizing services for all citizens and making accessibility a top priority. Some cities are already making positive changes. With the help of cloud and AI platforms, people with disabilities in Moscow are using an urban mobility app called Moovit to help navigate public transit and gain independence. Moovit, along with Microsoft, is partnering with cities the world over to help create more accessible transit solutions. Azure Maps underpins these mobility-as-a-service solutions for governments, helping produce more accessible transit apps.

Still, acquiring new technology can take a backseat to merely keeping the lights on, especially in the face of tightening budgets and finite resources. Simply put, governments need to do more with less. Rather than acting as a barrier to digital transformation, the need to streamline processes and conserve resources should be seen as one of the most compelling motives to adopt a more modern, cloud-enabled approach. Digital transformation provides an opportunity to utilize more efficient technologies—doing more with less. The statistics are astonishing: AI and automation in the government space can save up to 96.7 million federal hours a year, amounting to a potential savings of 3.3 billion dollars. By embracing cloud computing and data analytics, governments can increase total revenues by 1 to 3 percent. And, all the benefits of digital transformation include the ability to maximize the accessibility of government offerings.

Creating a robust strategy for digital transformation is one way governments are innovating to meet the unique demands of their industry. These strategies aim to address key issues for governments, such as how to engage and connect with all citizens, how to modernize their workplaces, and how to enhance their services. When these key issues are addressed, a more digitally mature organization emerges, one that is able to provide better, more modern services, boost productivity, and keep citizens of all abilities engaged.

To better understand how digital transformation can help your specific organization achieve more, check out our digital assessment designed for governments.

post

Microsoft joins partners and The Linux Foundation to create Confidential Computing Consortium

Microsoft has invested in confidential computing for many years, so I’m excited to announce that Microsoft will join industry partners to create the Confidential Computing Consortium, a new organization that will be hosted at The Linux Foundation. The Confidential Computing Consortium will be dedicated to defining and accelerating the adoption of confidential computing.

Confidential computing technologies offer the opportunity for organizations to collaborate on their data sets without giving access to that data, to gain shared insights and to innovate for the common good. The Consortium, which will include other founding members Alibaba, ARM, Baidu, Google Cloud, IBM, Intel, Red Hat, Swisscom and Tencent, is the organization where the industry can come together to collaborate on open source technology and frameworks to support these new confidential computing scenarios.

As computing moves from on-premises to the public cloud and the edge, protecting data becomes more complex. There are three types of possible data exposure to protect against. One is data at rest and another data in transit. While there’s always room to improve and innovate, the industry has built technologies and standards to address these scenarios. The third possible exposure – or as I like to think of it, the critical ‘third leg of the stool’ – is data in use. Protecting data while in use is called confidential computing.

Protecting data in use means data is provably not visible in unencrypted form during computation except to the code authorized to access it. That can mean that it’s not even accessible to public cloud service providers or edge device vendors. This capability enables new solutions where data is private all the way from the edge to the public cloud. Some of the scenarios confidential computing can unlock include:

  • Training multi-party dataset machine learning models or executing analytics on multi-party datasets, which can allow customers to collaborate to obtain more accurate models or deeper insights without giving other parties access to their data.
  • Enabling confidential query processing in database engines within secure enclaves, which removes the need to trust database operators.
  • Empowering multiple parties to leverage technologies like the Confidential Consortium Framework, which delivers confidentiality and high transaction throughput for distributed databases and ledgers.
  • Protecting sensitive data at the edge, such as proprietary machine learning models and machine learning model execution, customer information, and billing/warranty logs.

Simply put, confidential computing capabilities, like the ability to collaborate on shared data without giving those collaborating access to that data, has the power to enable organizations to unlock the full potential of combined data sets. Future applications will generate more powerful understanding of industries’ telemetry, more capable machine learning models, and a new level of protection for all workloads.

However, enabling these new scenarios requires new attestation and key management services, and for applications to take advantage of those services and confidential computing hardware. There are multiple implementations of confidential hardware, but each has its own SDK. This leads to complexity for developers, inhibits application portability, and slows development of confidential applications.

This is where the Confidential Computing Consortium comes in, with its mission of creating technology, taxonomy, and cross-platform development tools for confidential computing. This will allow application and systems developers to create software that can be deployed across different public clouds and Trusted Execution Environment (TEE) architectures. The organization will also anchor industry outreach and education initiatives.

Microsoft will be contributing the Open Enclave SDK to the Confidential Computing Consortium to develop a broader industry collaboration and ensure a truly open development approach. Other founding members, Intel and Red Hat will be contributing Intel® SGX and Red Hat Enarx to the new group.

The Open Enclave SDK is targeted at creating a single unified enclave abstraction for developers to build TEE-based applications. It creates a pluggable, common way to create redistributable trusted applications securing data in use. The SDK originated inside Microsoft and was published on GitHub over a year ago under an open source license.

The Open Enclave SDK, which supports both Linux and Windows hosts and has been used and validated by multiple open source projects, was designed to:

  • Make it easy to write and debug code that runs inside TEEs.
  • Allow the development of code that’s portable between TEEs, starting with Intel® SGX and ARM TrustZone.
  • Provide a flexible plugin model to support different runtimes and cryptographic libraries.
  • Enable the development of auditable enclave code that works on both Linux and Windows.
  • Have a high degree of compatibility with existing code.

We want to thank the Linux Foundation and all our industry partners for coming together to advance confidential computing. These technologies offer the promise to protect data and enable collaboration to make the world more secure and unlock multiparty innovations. Personally, I’m looking forward to seeing what we can all do together.

Let us know what you’d like to see from the Confidential Computing Consortium in the comments.

Additional resources:
CCC Website
Linux Foundation press release
Open Enclave SDK site and on GitHub

post

Hackers get help from Garage interns with new features in program for quickly building apps

Following the initial release of Web Template Studio in May earlier this year, Web Template Studio 2.0 is now available with additional services, inspired by community feedback and built by a second team of Garage interns. The VS Code extension helps hackers create full stack web apps quickly, now with a broader range of front and back end service options. Try Web TS 2.0 and share feedback for new feature requests on GitHub and check out the full story on the Windows Developer Blog or the walk-through video on YouTube.

Tag teaming development

This team of Garage interns, based out of the Vancouver, BC, pioneered a new approach for the unique internship program, picking up where a first team of interns left off to build on a product and refine its direction. Unlike in a traditional internship, Garage interns hear pitches from sponsoring teams who outline challenging engineering projects interns will tackle with a team. Typically, interns create new projects from the ground up to be released by Microsoft or partners, or add new capabilities to scaled Microsoft products. In this case, the Web Template Studio team from the Summer 2019 cohort accepted the baton from the Winter Cohort and continued iterating on the product features and direction.

Web Template Studio screenshotIn addition to managing the opensource feedback provided following the launch at Build 2019, the interns spoke to current users, student developers, hackers, and more to understand where Web TS could be enhanced. In addition to requests to deepen the bench of supported frameworks, the team also honed in on the value the solution could provide to the novice developer with a foundation of experience creating web apps.

Web Template Studio was created with hackers in mind; it’s ideal for rapid prototyping and spinning web apps up quickly at hackathons. In fact, the team made early versions of Web TS 2.0 available to hackers across Microsoft’s global, annual hackathon in July. The intern teams confirmed that Web TS is most useful for developers who have some background in creating web apps with specific frameworks–in their research, about 20 hours of coding time–enough to select their preferred front and back end without the tedium of wading through endless forums to identify how to stitch them together quickly in a time crunch setting. Web TS continues to be a great solution to get started with Azure with its simple wizard.

Adding new services to Web Template Studio

The team listened closely to developer feedback on GitHub and expanded the front and back end frameworks supported from React.js and Node.js available at the initial launch to also include Angular, Vue, and Flask. In an update this morning, the team also added App Service support to make it even easier to create web apps powered by Azure for storage and cloud hosting.

Throughout the summer, the team added a number of new features, namely adding additional frameworks.

  • Angular support
  • Flask support
  • Vue support
  • App Service support

For the full details of new features and how to use them, check out the Windows Developer Blog.

You can see a walk-through of Web TS on YouTube.

Apply to the Garage Internship

The Garage is hiring for the 2020 Winter & Summer seasons! Here you can learn more details about the internship and how to apply.

post

Germany-based startup Breeze is using AI to measure and improve sustainable air quality

As part of our AI for Earth commitment, Microsoft supports five projects from Germany in the areas of environmental protection, biodiversity and sustainability. In the next few weeks, we will introduce the project teams and their innovative ideas that made the leap into our global programme and group of AI for Earth grantees.   

Measuring and improving sustainable air quality in cities with transparant results is Hamburg-based startup Breeze’s mission. Founded in 2015, the award-winning company develops small, low-cost sensors that can be installed in almost any location, measuring pollutants such as soot, nitrogen oxides, ammonia, ozone or particulate matter, while also identifying their sources.

While vehicles are a common source of pollution, large construction sites, for example, can also greatly increase air pollution in short periods of time. A local portal publishes Breeze’s collected data in real time, so that affected residents can learn about the current situation at any given moment. In addition, Breeze has developed a comprehensive catalog of measures that helps cities and communities specifically improve the situation on the ground. 

From the beginning, Breeze has processed the data of its fully networked sensors in the Azure cloud. Breeze founder Robert Heinecke now wants to take his project to the next level with the help of AI – relying on the support of AI for Earth. Breeze has already received $ 12,000 in cloud credits which will be used to set up a new machine learning development environment in Azure. “We have now set up our own AI experimental laboratory to test how AI can support us even better in our added value,” explains Heinecke. 

So far, Heinecke and his team have identified four areas in which AI can be used. Firstly, AI should significantly improve the quality of the measurement data and draw an even more accurate picture of the data on site by measuring both measurement errors of individual devices as well as excluding environmental influences from the sensor data. At the same time, AI will also be deployed in a predictive maintenance capacity, to forecast when sensors need to be serviced or even replaced.

AI will also help make precise predictions about the development of air quality in the future, such as linking weather data to information from its own measuring stations. By doing so, ill or particularly sensitive people, such as those with asthma, can prepare for harsher conditions in advance. Lastly, AI will also help to streamline Breeze’s consulting offer by accurately calculating which of the 3,500 identified measures can improve the air quality at a particular location the best. 

Currently, pilot projects are already running in Hamburg, Moers and Neckarsulm (Germany), and Heinecke and his team are already in negotiations with numerous other cities, although there can sometimes be friction. In Heinecke’s words, “the mills of the administration grind slowly. Some cities may also prefer not to know exactly how bad the air really is, because then they would have to act.”

AI for Earth
The AI ​​for Earth program helps researchers and organizations to use artificial intelligence to develop new approaches to protect water, agriculture, biodiversity and the climate. Over the next five years, Microsoft will invest $ 50 million in “AI for Earth.” To become part of the “AI for Earth” program, developers, researchers and organizations can apply with their idea for a so-called “Grant”. If you manage to convince the jury of Microsoft representatives, you´ll receive financial and technological support and also benefit from knowledge transfer and contacts within the global AI for Earth network. As part of Microsoft Berlin´s EarthLab and beyond, five ideas have been convincing and will be part of our “AI for Earth” program in the future in order to further promote their environmental innovations. 

At #DigitalFuerAlle you can continue to follow the development of the projects and our #AIforEarth initiative. 

Would you also like to apply for a grant from the “AI for Earth” initiative? 

Apply now 

Tags: , ,

One simple action you can take to prevent 99.9 percent of attacks on your accounts

There are over 300 million fraudulent sign-in attempts to our cloud services every day. Cyberattacks aren’t slowing down, and it’s worth noting that many attacks have been successful without the use of advanced technology. All it takes is one compromised credential or one legacy application to cause a data breach. This underscores how critical it is to ensure password security and strong authentication. Read on to learn about common vulnerabilities and the single action you can take to protect your accounts from attacks.

Animated image showing the number of malware attacks and data breaches organizations face every day. 4,000 daily ransomware attacks. 300,000,000 fraudulent sign-in attempts. 167,000,000 daily malware attacks. 81% of breaches are caused by credential theft. 73% of passwords are duplicates. 50% of employees use apps that aren't approved by the enterprise. 99.9% of attacks can be blocked with multi-factor authentication.

Common vulnerabilities

In a recent paper from the SANS Software Security Institute, the most common vulnerabilities include:

  • Business email compromise, where an attacker gains access to a corporate email account, such as through phishing or spoofing, and uses it to exploit the system and steal money. Accounts that are protected with only a password are easy targets.
  • Legacy protocols can create a major vulnerability because applications that use basic protocols, such as SMTP, were not designed to manage Multi-Factor Authentication (MFA). So even if you require MFA for most use cases, attackers will search for opportunities to use outdated browsers or email applications to force the use of less secure protocols.
  • Password reuse, where password spray and credential stuffing attacks come into play. Common passwords and credentials compromised by attackers in public breaches are used against corporate accounts to try to gain access. Considering that up to 73 percent of passwords are duplicates, this has been a successful strategy for many attackers and it’s easy to do.

What you can do to protect your company

You can help prevent some of these attacks by banning the use of bad passwords, blocking legacy authentication, and training employees on phishing. However, one of the best things you can do is to just turn on MFA. By providing an extra barrier and layer of security that makes it incredibly difficult for attackers to get past, MFA can block over 99.9 percent of account compromise attacks. With MFA, knowing or cracking the password won’t be enough to gain access. To learn more, read Your Pa$$word doesn’t matter.

MFA is easier than you think

According to the SANS Software Security Institute there are two primary obstacles to adopting MFA implementations today:

  1. Misconception that MFA requires external hardware devices.
  2. Concern about potential user disruption or concern over what may break.

Matt Bromiley, SANS Digital Forensics and Incident Response instructor, says, “It doesn’t have to be an all-or-nothing approach. There are different approaches your organization could use to limit the disruption while moving to a more advanced state of authentication.” These include a role-based or by application approach—starting with a small group and expanding from there. Bret Arsenault shares his advice on transitioning to a passwordless model in Preparing your enterprise to eliminate passwords.

Take a leap and go passwordless

Industry protocols such as WebAuthn and CTAP2, ratified in 2018, have made it possible to remove passwords from the equation altogether. These standards, collectively known as the FIDO2 standard, ensure that user credentials are protected end-to-end and strengthen the entire security chain. The use of biometrics has become more mainstream, popularized on mobile devices and laptops, so it’s a familiar technology for many users and one that is often preferred to passwords anyway. Passwordless authentication technologies are not only more convenient for people but are extremely difficult and costly for hackers to compromise. Learn more about Microsoft passwordless authentication solutions in a variety of form factors to meet user needs.

Convince your boss

Download the SANS white paper Bye Bye Passwords: New Ways to Authenticate to read more on guidance for companies ready to take the next step to better protect their environments from password risk. Remember, talk is easy, action gets results!

post

Azure Archive Storage expanded capabilities: Faster, simpler, better

Since launching Azure Archive Storage, we have seen unprecedented interest and innovative usage from a variety of industries. Archive Storage is built as a scalable service for cost-effectively storing rarely accessed data for long periods of time. Cold data such as application backups, healthcare records, autonomous driving recordings, etc. that might have been previously deleted could be stored in Azure Storage’s Archive tier in an offline state, then rehydrated to an online tier when needed. Earlier this month, we made Azure Archive Storage even more affordable by reducing prices by up to 50 percent in some regions, as part of our commitment to provide the most cost-effective data storage offering.

We’ve gathered your feedback regarding Azure Archive Storage, and today, we’re happy to share three archive improvements in public preview that make our service even better.

1. Priority retrieval from Azure Archive

To read data stored in Azure Archive Storage, you must first change the tier of the blob to hot or cool. This process is known as rehydration and takes a matter of hours to complete. Today we’re sharing the public preview release of priority retrieval from archive allowing for much faster offline data access. Priority retrieval allows you to flag the rehydration of your data from the offline archive tier back into an online hot or cool tier as a high priority action. By paying a little bit more for the priority rehydration operation, your archive retrieval request is placed in front of other requests and your offline data is expected to be returned in less than one hour.

Priority retrieval is recommended to be used for emergency requests for a subset of an archive dataset. For the majority of use cases, our customers plan for and utilize standard archive retrievals which complete in less than 15 hours. But on rare occasions, a retrieval time of an hour or less is required. Priority retrieval requests can deliver archive data in a fraction of the time of a standard retrieval operation, allowing our customers to quickly resume business as usual. For more information, please see Blob Storage Rehydration.

The archive retrieval options now provided under the optional parameter are:

  • Standard rehydrate-priority is the new name for what Archive has provided over the past two years and is the default option for archive SetBlobTier and CopyBlob requests, with retrievals taking up to 15 hours.
  • High rehydrate-priority fulfills the need for urgent data access from archive, with retrievals for blobs under ten GB, typically taking less than one hour.

Regional priority retrieval demand at the time of request can affect the speed at which your data rehydration is completed. In most scenarios, a high rehydrate-priority request may return your Archive data in under one hour. In the rare scenario where archive receives an exceptionally large amount of concurrent high rehydrate-priority requests, your request will still be prioritized over standard rehydrate-priority but may take one to five hours to return your archive data. In the extremely rare case that any high rehydrate-priority requests take over five hours to return archive blobs under a few GB, you will not be charged the priority retrieval rates.

2. Upload blob direct to access tier of choice (hot, cool, or archive)

Blob-level tiering for general-purpose v2 and blob storage accounts allows you to easily store blobs in the hot, cool, or archive access tiers all within the same container. Previously when you uploaded an object to your container, it would inherit the access tier of your account and the blob’s access tier would show as hot (inferred) or cool (inferred) depending on your account configuration settings. As data usage patterns change, you would change the access tier of the blob manually with the SetBlobTier API or automate the process with blob lifecycle management rules.

Today we’re sharing the public preview release of Upload Blob Direct to Access tier, which allows you to upload your blob using PutBlob or PutBlockList directly to the access tier of your choice using the optional parameter x-ms-access-tier. This allows you to upload your object directly into the hot, cool, or archive tier regardless of your account’s default access tier setting. This new capability makes it simple for customers to upload objects directly to Azure Archive in a single transaction. For more information, please see Blob Storage Access Tiers.

3. CopyBlob enhanced capabilities

In certain scenarios, you may want to keep your original data untouched but work on a temporary copy of the data. This holds especially true for data in Archive that needs to be read but still kept in Archive. The public preview release of CopyBlob enhanced capabilities builds upon our existing CopyBlob API with added support for the archive access tier, priority retrieval from archive, and direct to access tier of choice.

The CopyBlob API is now able to support the archive access tier; allowing you to copy data into and out of the archive access tier within the same storage account. With our access tier of choice enhancement, you are now able to set the optional parameter x-ms-access-tier to specify which destination access tier you would like your data copy to inherit. If you are copying a blob from the archive tier, you will also be able to specify the x-ms-rehydrate-priority of how quickly you want the copy created in the destination hot or cool tier. Please see Blob Storage Rehydration and the following table for information on the new CopyBlob access tier capabilities.

Hot tier source

Cool tier source

Archive tier source

Hot tier destination

Supported

Supported

Supported within the same account; pending rehydrate

Cool tier destination

Supported

Supported

Supported within the same account; pending rehydrate

Archive tier destination

Supported

Supported

Unsupported

Getting Started

All of the features discussed today (upload blob direct to access tier, priority retrieval from archive, and CopyBlob enhancements) are supported by the most recent releases of the Azure Portal, .NET Client Library, Java Client Library, Python Client Library. As always you can also directly use the Storage Services REST API (version 2019-02-02 and greater). In general, we always recommend using the latest version regardless of whether you are using these new features.

Build it, use it, and tell us about it!

We will continue to improve our Archive and Blob Storage services and are looking forward to hearing your feedback about these features through email at ArchiveFeedback@microsoft.com. As a reminder, we love hearing all of your ideas and suggestions about Azure Storage, which you can post at Azure Storage feedback forum.

Thanks, from the entire Azure Storage Team!

post

Microsoft Edge Insiders: Try out an experimental preview of Collections for Microsoft Edge

Today, we’re releasing an experimental preview of Collections for Microsoft Edge. We initially demoed this feature during the Microsoft Build 2019 conference keynote. Microsoft Edge Insiders can now try out an early version of Collections by enabling the experimental flag on Microsoft Edge preview builds starting in today’s Canary channel build.

We designed Collections based on what you do on the web. It’s a general-purpose tool that adapts to the many roles that you all fill. If you’re a shopper, it will help you collect and compare items. If you’re an event or trip organizer, Collections will help pull together all your trip or event information as well as ideas to make your event or trip a success. If you’re a teacher or student, it will help you organize your web research and create your lesson plans or reports. Whatever your role, Collections can help.

The current version of Collections is an early preview and will change as we continue to hear from you. For that reason, it’s currently behind an experimental flag and is turned off by default. There may be some bugs, but we want to get this early preview into your hands to hear what you think.

To try out Collections, you’ll need to be on the Canary Channel which you can download from the Microsoft Edge Insider website.

Once you’re on the right build, you’ll need to manually enable the experiment. In the address bar, enter edge://flags#edge-collections to open the experimental settings page. Click the dropdown and choose Enabled, then select the Restart button from the bottom banner to close all Microsoft Edge windows and relaunch Microsoft Edge.

Screenshot of the "Experimental Collections feature" flag in edge://flags

Once the Collections experiment is enabled, you can get started by opening the Collections pane from the button next to the address bar.

Animation of adding a page to a sample collection titled "Amy's wishlist" 

When you open the Collections pane, select Start new collection and give it a name. As you browse, you can start to add content related to your collection in three different ways:

  • Add current page: If you have the Collections pane open, you can easily add a webpage to your collection by selecting Add current page at the top of the pane.

Screenshot of a sample collection titled "Amy's wishlist," with the "Add current page" button highlighted

  • Drag/drop: When you have the Collections pane open, you can add specific content from a webpage with drag and drop. Just select the image, text, or hyperlink and drag it into the collection.

Animation showing an image being dragged to the Collections pane

  • Context menu: You can also add content from a webpage from the context menu. Just select the image, text, or hyperlink, right-click it, and select Add to Collections. You can choose an existing collection to add to or start a new one.

Screenshot of the "Add to Collections" entry in the right-click context menu

When you add content to Collections, Microsoft Edge creates a visual card to make it easier to recognize and remember the content. For example, a web page added to a collection will include a representative image from that page, the page title, and the website name. You can easily revisit your content by clicking on the visual card in the Collections pane.

Screenshot of cards in the Collections pane

You’ll see different cards for the different types of content you add to Collections. Images added to a collection will be larger and more visual, while full websites added to a collection will show the most relevant content from the page itself. We’re still developing this, starting with a few shopping websites. Content saved to a collection from those sites will provide more detailed information like the product’s price and customer rating.

  • Add notes: You can add your own notes directly to a collection. Select the add note icon Add note icon from the top of the Collections pane. Within the note, you can create a list and add basic formatting options like bold, italics, or underline.
  • Rearrange: Move your content around in the Collections pane. Just click an item and drag and drop it in the position you prefer.
  • Remove content: To remove content from your collection, hover over the item, select the box that appears in the upper-right corner, and then select the delete icon Trash can icon from the top of the Collections pane.

Once you’ve created a collection, you can easily use that content by exporting it. You can choose to export the whole collection or select a subset of content.

  • Send to Excel: Hit the share icon from the top of the Collections pane and then select Send to Excel. Your content will appear on a new tab with pre-populated table(s) that allow you to easily search, sort, and filter the data extracted from the sites you added to your Collection. This is particularly useful for activities like shopping, when you want to compare items.

Screenshot highlighting the Send to Excel button in the Collections pane

  • Copy/paste: Select items by clicking the box in the upper right. A gray bar will appear at the top of the Collections pane. Select the copy icon Copy icon to add those items to your clipboard. Then, paste it into an HTML handler like Outlook by using the context menu or Ctrl+V on your keyboard.

Sending content to Excel is available for Mac and Windows devices running Windows 10 and above. We’ll add support for Windows devices running Windows 7 and 8 soon. Additional functionality, like the ability to send to Word, will also come soon.

This is the just the first step in our Collections journey and we want to hear from you. If you think something’s not working right, or if there’s some capability you’d like to see added, please send us feedback using the smiley face icon in the top right corner of the browser.

Screenshot highlighting the Send Feedback button in Microsoft Edge

Thanks for being a part of this early preview! We look forward to hearing your feedback.

– The Microsoft Edge Team