Authoring for Pluralsight – Microsoft Azure Cognitive Services: Speech to Text SDK

Featured

I am creating a new course or Pluralsight titled – Microsoft Azure Cognitive Services: Speech to Text SDK. If you would like to check out my other courses, they can be found in my author’s profile. Here is the breakdown for the course:

Audience Profile

This course targets software developers who are looking to get started with Microsoft Azure Cognitive Services: Speech to Text API to build modern AI solutions and want to get started building an AI solution with a simple REST interface and a robust set of device SDKs.

Abstract

With AI becoming more and more ubiquitous, it is important to quickly and easily integrate with AI services. This course will show how to create modern applications using Microsoft Azure Cognitive Services: Speech to Text API and SDKs.

Prerequisites

This course assumes viewers are familiar with C# and understands REST APIs and JSON.

Microsoft Azure Cognitive Services: Text to Speech API – Published!

Featured

My new Pluralsight course, Microsoft Azure Cognitive Services: Text to Speech API, has just been published. You can find it here. If you would like to check out my other courses, you can find them on my author’s profile. Here is the course synopsis:

Short description:
In this course, you will gain a foundational knowledge of the Text to Speech API that will help you move forward with your overall understanding of the Microsoft Cognitive Services Suite.
 
Long description:
With AI becoming more and more ubiquitous in application development, it is important to quickly and easily integrate intelligence into your application. In this course, Microsoft Azure Cognitive Services: Text to Speech API, you will learn how to understand, configure, and utilize the Text to Speech API. First, you will discover how to use out of the box voices. Next, you will explore how to use machine learning-based voices in your app. Finally, you will learn how to create and use custom voices for your application and brand. When you are finished with this course, you will have a foundational knowledge of the Text to Speech API that will help you move forward with your overall understanding of the Microsoft Cognitive Services Suite.
 
Tags for this course:
Audience/Roles: software-development
Topics/Subjects: cloud-platforms
Tools: azure-cognitive-services

Authoring for Pluralsight – Microsoft Azure Cognitive Services: Text to Speech API

I’m excited to announce that I am authoring another course for Pluralsight. This course targets software developers who are looking to get started with Microsoft Azure Cognitive Services: Text to Speech API to build modern AI solutions and want to get started building an AI solution with a simple REST interface. This course continues from the other Cognitive Services courses created and being created for the Cognitive Services track.

Abstract

With AI becoming more and more ubiquitous, it is important to quickly and easily integrate with AI services. This course will show how to create modern applications using Microsoft Azure Cognitive Services: Text to Speech API with JavaScript, C#, Java, C++, and Python.

Prerequisites

This course assumes viewers are familiar with C# or Java or JavaScript or Python or C++ and understands REST APIs and JSON.

Description

Contoso is an insurance company that has decided to integrate text to speech for multiple consumer facing applications. This course will take a look at utilizing the following features of Cognitive Services – Text to Speech API:

  • Default API interface through multiple SDKs: JavaScript, C#, Java, C++, and Python
  • Creating custom voice fonts
  • Popular scenarios and use case for Text to Speech

 

 

Hacking Izon Cameras and using Azure IoT Edge

After Izon announced that they were closing down their services (leaving the cameras I already owned useless), I decided to turn them into something useful using Azure. First let me list some resources:

Use the Will it hack link to get access to the mobileye website and verify that the Izon device is still streaming and still working. If it is working, you are already done with edits to the device unless you would like to change the passwords (which you should).

Our goals are as follows:

  • Process the video feed from the Izon camera (we will cheat this early on and only use the image feed)
    • Check for motion
    • Check for faces
    • Check if faces are white listed
    • Check for my dog
  • Process the audio feed
    • Check for any noise
    • Check for non human noises
    • Check for dog barks
    • Check for my and my wife’s voice

These are all stretch goals that will be referred back to as the project moves forward.

Create the Azure IoT Edge module

For the first module, we will use the C Module base image. We are looking for two things from this module:

  • Download the picture feed and pass it to the Edge Hub
  • Download the audio feed and pass it to the Edge Hub

If you don’t know where to get started with the C module of the Azure IoT Edge platform, there is helpful information on the Azure Documentation page. Once the C module is created and ready for editing, we are going to connect to the image feed from the devices. To make this simple, both feeds will be retrieved using HTTP. For the video feed, its simple enough to grab images from the Izon camera existing camera feed.

Now one thing we need, is to be able to connect to each camera within the local network shared with the Edge. Since we would like to be able to add and remove cameras, we will use the device twin to update and manage the list of IP address. The code for updating the list is as follows:

With that code in place, the list of IP addresses can be updated from the Azure UI and the Azure Service SDKs.

Downloading from the Image feed

The Izon cameras make downloading the image feed trivial. There is an existing endpoint where you can grab the latest image directly from the camera’s web server. The latest image is always at /cgi-bin/img-d1.cgi. (NOTE: if you are checking this image from a browser, be sure to have some cache busting!). To download this image into our module, we will use the Curl library for it’s easy HTTP implementation. To add Curl to our Edge module, we will add the following lines to the Dockerfile.amd64.debug:

With curl now added to the image, it can be utilized in code by adding it to the method invoked in our main loop. The code will download the file for each entry in the IP address list. Once the image is downloaded, it will send it as a message to the Edge Hub and add the IP address of the camera to the message header. Here is that code:

Downloading from the Audio feed

Now that the image feed is being published to the Edge Hub, its time to connect the audio feed. The audio feed is trickier since the Izon camera doesn’t have an easy to use endpoint (that I know of) for downloading the audio samples like we can with the image feed. In the next entry in this series, an Audio feed will be derived from an RSTP stream.

 

CodeMash 2019 – Alternative Device Interfaces and Machine Learning

I was once again accepted to speak at CodeMash. This year I will be presenting – Alternative Device Interfaces and Machine Learning. If you would like to purchase tickets, they are for sell. Here is what is going to be covered:

Alternative Device Interfaces and Machine Learning

In this presentation, we will look at the how users interface with machines without the use of touch. These different types of interaction have their benefits and pitfalls. To showcase the power of these user interactions we will explore: Voice commands with mobile applications, Speech Recognition, and Computer Vision. After this presentation, attendees will have the knowledge to create applications that can utilize voice, video, and machine learning.

Quicken Loans TechCon 2018

September 20th at Cobb Center in Detroit I will be presenting:

Alternative Device Interfaces and Machine Learning

Abstract

In this presentation, we will look at the how users interface with machines without the use of touch. These different types of interaction have their benefits and pitfalls. To showcase the power of these user interactions we will explore: Voice commands with mobile applications, Speech Recognition, and Computer Vision. After this presentation, attendees will have the knowledge to create applications that can utilize voice, video, and machine learning.

Description

Users use voice (Alexa, Cortana, Google Now) or video as a mode of interaction with applications. More than a fad, this is a natural interface for users and is becoming more and more common with the ever-decreasing size of hardware.

Different types of interaction have their benefits and pitfalls. To showcase the power of these user interactions we will explore: Voice commands with two app types: UWP and Xamarin Forms (iOS and Android). Speech Recognition with Cognitive Services: Verifying the speaker with Speaker Recognition API. Computer Vision with Cognitive Services: Verifying a user with Face API.

By utilizing UWP, Xamarin, and Cognitive services; a device with the ultimate in customization for user interactions will be created. Come and see how!

Update – Techbash

UPDATE:

Another one of my talks was selected for Techbash: Alternative Device Interfaces and Machine Learning.

In this presentation, we will look at the how users interface with machines without the use of touch. These different types of interaction have their benefits and pitfalls. To showcase the power of these user interactions we will explore: Voice commands with mobile applications, Speech Recognition, and Computer Vision. After this presentation, attendees will have the knowledge to create applications that can utilize voice, video, and machine learning.

Users use voice (Alexa, Cortana, Google Now) or video as a mode of interaction with applications. More than a fad, this is a natural interface for users and is becoming more and more common with the ever-decreasing size of hardware.

Different types of interaction have their benefits and pitfalls. To showcase the power of these user interactions we will explore: Voice commands with two app types: UWP and Xamarin Forms (iOS and Android). Speech Recognition with Cognitive Services: Verifying the speaker with Speaker Recognition API. Computer Vision with Cognitive Services: Verifying a user with Face API.

By utilizing UWP, Xamarin, and Cognitive services; a device with the ultimate in customization for user interactions will be created. Come and see how!

Original:

This year I will be presenting Enable IoT with Edge Computing and Machine Learning at TechBash. Here is the outline:

Being able to run compute cycles on local hardware is a practice predating silicon circuits. Mobile and Web technology has pushed computation away from local hardware and onto remote servers. As prices in the cloud have decreased, more and more of the remote servers have moved there. This technology cycle is coming full circle with pushing the computation that would be done in the cloud down to the client. The catalyst for the cycle completing is latency and cost. Running computations on local hardware softens the load in the cloud and reduces overall cost and architectural complexity.

The difference now is how the computational logic is sent to the device. As of now, we rely on app stores and browsers to deliver the logic the client will use. Delivery mechanisms are evolving into writing code once and having the ability to run that logic in the cloud and push that logic to the client through your application and have that logic run on the device. In this presentation, we will look at how to accomplish this with existing Azure technologies and how to prepare for upcoming technologies to run these workloads.

 

DevNet Create

I’m proud to be presenting Alternative Device Interfaces and Machine Learning at DevNet Create this year. With AI becoming more and more ubiquitous, it is important to note the effect on a user’s experience. This presentation is meant to show how to create modern applications using machine learning provided by a third party and showcase what some third parties provide.

In this presentation, we will look at the how users interface with machines without the use of touch. These different types of interaction have their benefits and pitfalls. To showcase the power of these user interactions we will explore: Voice commands with mobile applications, Speech Recognition, and Computer Vision. After this presentation, attendees will have the knowledge to create applications that can utilize voice, video, and machine learning.

Users use voice (Alexa, Cortana, Google Now) or video as a mode of interaction with applications. More than a fad, this is a natural interface for users and is becoming more and more common with the ever-decreasing size of hardware.

Different types of interaction have their benefits and pitfalls. To showcase the power of these user interactions we will explore: Voice commands with two app types: UWP and Xamarin Forms (iOS and Android). Speech Recognition with Cognitive Services: Verifying the speaker with Speaker Recognition API. Computer Vision with Cognitive Services: Verifying a user with Face API.

By utilizing UWP, Xamarin, and Cognitive services; a device with the ultimate in customization for user interactions will be created. Come and see how!