Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Microsoft - Introducing new voice styles in Azure Cognitive Services

#1
Introducing new voice styles in Azure Cognitive Services

<div><div><img src="https://www.sickgaming.net/blog/wp-content/uploads/2020/04/introducing-new-voice-styles-in-azure-cognitive-services.gif" class="ff-og-image-inserted"></div>
<p><em>This post was co-authored by <a href="https://techcommunity.microsoft.com/t5/user/viewprofilepage/user-id/175688">@Qinying Liao</a>, <a href="https://techcommunity.microsoft.com/t5/user/viewprofilepage/user-id/23979">@Anny Dow</a>&nbsp;, Yueying Liu, and Peter Pan. &nbsp;</em></p>
<p>Neural TTS enables fluid, natural-sounding speech that matches the patterns and intonation of human voices, helping developers bring their solutions to life.</p>
<p>Today, we’re building upon our Neural Text to Speech (Neural TTS) capabilities in Azure Cognitive Services with new voice styles. With the new styles—newscast, customer service, and digital assistant—developers can tailor the voice of their apps and services to fit their brand or unique scenario.</p>
<p>Built on a powerful base model, our neural TTS voices are very natural, reliable, and expressive. Through transfer learning, the neural TTS model can learn different speaking styles from various speakers, enabling nuanced voices.</p>
<p>In addition to our new voice styles optimized for specific scenarios, we are also releasing new emotion styles. These styles allow you to adjust voices to express different emotions to fit the context, like cheerfulness or empathy. Let’s dive in.</p>
<p><strong>Introducing Newscast, Customer Service, and Digital Assistant styles</strong></p>
<p><strong>&nbsp;</strong></p>
<p><strong>Newscast</strong></p>
<p>With neural TTS voices in the newscast style, your users can enjoy listening to news or articles in a professional tone that reflects what you might hear on TV or radio newscasts.</p>
<p>Hear Aria’s (English – Female) and Xiaoxiao’s (Chinese – Female) voices in the <em>newscast</em> style:</p>
<table>
<tbody>
<tr>
<td width="442.727px" height="30px">
<p>Text</p>
</td>
<td width="150.909px" height="30px">
<p>Newscast style</p>
</td>
<td width="155.455px" height="30px">
<p>Default</p>
</td>
</tr>
<tr>
<td width="442.727px" height="139px">
<p><em>Heavy snow and strong winds hammered parts of the central U.S. on Thursday and began moving into the Great Lakes region, knocking out power to tens of thousands of people and creating hazardous travel conditions a day after pummeling Colorado.</em></p>
</td>
<td width="150.909px" height="139px"> </td>
<td width="155.455px" height="139px"> </td>
</tr>
<tr>
<td width="442.727px" height="111px">
<p>现今,大批企业以数字化转型为战略目标,数字化转型可赋能企业重构竞争环境、满足客户期望、增强服务运营。为了真正实现“ being digital ”, 许多企业将人工智能视作实现数字化转型目标的首选技术工具之一。</p>
</td>
<td width="150.909px" height="111px"> </td>
<td width="155.455px" height="111px"> </td>
</tr>
</tbody>
</table>
<p>Check out the newscast style in the Bing mobile app. When you search news with the voice search feature, you can hear news briefs using Aria’s newscast style voice.</p>
<p>You can also check out Xiaoxiao’s newscast style voice, which has been adopted in WeChat through the Microsoft Listening Docs app. In Microsoft Listening Docs, users can hear Xiaoxiao’s voice read out multiple document types such as Word, PowerPoint, Excel, as well as images. Users can easily generate audio content for online trainings, news podcasts and more, and share with their social circles.</p>
<p><strong>Customer Service</strong></p>
<p>The customer service style features a friendly and engaging tone and is suitable for scenarios involving customer support, such as an individual checking into their flight, making a restaurant reservation, or reporting a claim.</p>
<p>Hear Aria’s and Xiaoxiao’s voices in the <em>customer service</em> style:</p>
<table class=" lia-align-left">
<tbody>
<tr>
<td width="378.182px" height="57px">
<p>Text</p>
</td>
<td width="196.364px" height="57px">
<p>Customer Service style&nbsp;</p>
</td>
<td width="174.545px">
<p>Default</p>
</td>
</tr>
<tr>
<td width="378.182px" height="139px">
<p><em>Alright, it’s going to be right in front of your door, within 30 minutes. Thanks for calling &nbsp;Pizza Loco! </em><em>Have a great night!</em></p>
</td>
<td width="196.364px" height="139px">
<p><audio controls="controls"></audio>&nbsp;</p>
</td>
<td width="174.545px"> </td>
</tr>
<tr>
<td width="378.182px" height="275px">
<p>客服:您好,欢迎致电智慧银行,我是您的智能客服晓晓,请问有什么可以帮您?</p>
<p>客户:你好,我想调整信用卡的额度。</p>
<p>客服:嗯,请稍等,我查询一下状态。请问您要调整到多少额度?</p>
<p>客户:帮我调到三万人民币吧。</p>
<p>客服:好的,已经给您变更成功,稍后您会收到短信提醒。</p>
<p>客户:好的,谢谢。</p>
<p>客服:感谢您的来电,祝您生活愉快,再见。</p>
</td>
<td width="196.364px" height="275px"> </td>
<td width="174.545px"><audio controls="controls"></audio></td>
</tr>
</tbody>
</table>
<p><strong>Digital</strong>&nbsp;<strong>Assistant</strong></p>
<p>Many customers have been using neural TTS voices for their digital assistant solutions. We are introducing two styles in this area: a chat style for more casual, conversational bots, and a more professional style for scenarios such as in-car digital assistants.</p>
<p>The <em>chat</em> style features a conversational tone, simulating casual dialogue.</p>
<p>Hear Aria’s voice in the <em>chat </em>style:</p>
<table>
<tbody>
<tr>
<td width="117.273px">
<p>Style</p>
</td>
<td width="289.091px">
<p>Text</p>
</td>
<td width="110.909px">
<p>Chat style</p>
</td>
<td width="102.727px">
<p>Default</p>
</td>
</tr>
<tr>
<td width="117.273px">
<p>Chat</p>
</td>
<td width="289.091px">
<p><em>Oh, well that’s quite a change from California to Utah</em>.</p>
</td>
<td width="110.909px"> </td>
<td width="102.727px"> </td>
</tr>
</tbody>
</table>
<p>The <em>assistant</em> style features a friendly and helpful tone, which is suitable in scenarios such as smart speakers or in-car assistants. Use the digital assistant voice to hear the weather forecast, search for information, navigate directions, set reminders, and more.</p>
<p>Hear Xiaoxiao’s voice in the <em>assistant</em> style:</p>
<table>
<tbody>
<tr>
<td width="454.545px">
<p>Text</p>
</td>
<td width="137.273px">
<p>Assistant style</p>
</td>
<td width="157.273px">
<p>Default</p>
</td>
</tr>
<tr>
<td width="454.545px">
<p>没听到你说话,请再说一次。</p>
</td>
<td width="137.273px"> </td>
<td width="157.273px"> </td>
</tr>
<tr>
<td width="454.545px">
<p>现在听的是:FM88.8<span>,江苏音乐台的节目,滴滴叭叭早上好。</span></p>
</td>
<td width="137.273px"> </td>
<td width="157.273px"> </td>
</tr>
</tbody>
</table>
<p><strong>Bringing new emotions to Neural Text to Speech</strong></p>
<p>To enable you to build nuanced voices for your unique scenario, Neural Text to Speech also offers different emotion styles. You can access <em>cheerful</em> and <em>empathetic</em> styles for Aria’s voice, <em>lyrical</em> style for Xiaoxiao’s voice—which sounds heartfelt and is optimized to read prose or poetry, and <em>cheerful</em> style for Francisca’s voice (Brazilian Portuguese).</p>
<p>Hear the new styles below:</p>
<table>
<tbody>
<tr>
<td width="107.273px">
<p>Style</p>
</td>
<td width="264.545px">
<p>Text</p>
</td>
<td width="132.727px">
<p>Style</p>
</td>
<td width="109.091px">
<p>Default</p>
</td>
</tr>
<tr>
<td rowspan="2" width="107.273px">
<p>Cheerful</p>
</td>
<td width="264.545px">
<p><em>G</em><em>reat, I hope she will like it!&nbsp;</em></p>
</td>
<td width="132.727px"><audio controls="controls"></audio></td>
<td width="109.091px"><audio controls="controls"></audio></td>
</tr>
<tr>
<td width="264.545px">
<p><em>A canadense postou uma música nova no seu perfil oficial do Twitter.</em></p>
</td>
<td width="132.727px"> </td>
<td width="109.091px"> </td>
</tr>
<tr>
<td width="107.273px">
<p>Empathetic</p>
</td>
<td width="264.545px">
<p><em>I want to let you know that you’re loved. I know things are hard right now and it’s OK. You don’t have to do this alone</em></p>
</td>
<td width="132.727px"> </td>
<td width="109.091px"> </td>
</tr>
<tr>
<td width="107.273px">
<p>Lyrical</p>
</td>
<td width="264.545px">
<p>大家晚上好,我是晓晓。在每一个夜晚来临的时候,我都在这里陪你入睡。忙碌的一天又过去了,现在的你是窝在沙发上看着窗外发呆,还是倒了一杯咖啡继续解决白天没有做完的工作呢?时间过得真快呀,在学校里咬着早餐上课,和同学们嬉戏打闹的日子,仿佛就在昨天。但一转眼,我们都穿着西装变成了大人。&nbsp;</p>
</td>
<td width="132.727px"> </td>
<td width="109.091px"> </td>
</tr>
</tbody>
</table>
<p>These new voice styles are also available for customized brand voices through our <a href="https://speech.microsoft.com/customvoice" target="_blank" rel="noopener noreferrer">Custom Neural Voice</a> capability, allowing you to build a unique voice that can also benefit from our new scenario and emotion styles. As part of Microsoft’s commitment to designing AI responsibly, we have developed guidelines for customers in using Custom Neural Voice, in alignment with Microsoft’s&nbsp;<a href="https://www.microsoft.com/AI/our-approach-to-ai" target="_blank" rel="noopener noreferrer">principles for responsible innovation in AI.</a> Learn more about the process for getting started with Custom Neural Voice <a href="https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/concepts-gating-overview" target="_blank" rel="noopener noreferrer">here</a>. &nbsp;&nbsp;</p>
<p><strong>Get Started</strong></p>
<p>Get started with the new neural TTS voice styles available in Azure Cognitive Services. Check out our <a href="https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup?tabs=csharp#adjust-speaking-styles" target="_blank" rel="noopener noreferrer">documentation</a> to learn more.</p>
</div>


https://www.sickgaming.net/blog/2020/04/...-services/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016