Convert Audio to Metadata: Microsoft Cognitive Tool Kit & KL

One of the most exciting parts about KnowledgeLake cloud capture and Capture Server Professional is its extensibility. Previously, we’ve discussed the platform’s ability to retrieve taxonomy, route and upload content to different Enterprise Content Management (ECM) systems. We’ve even demonstrated how to create a custom activity allowing users to define sensitive keywords they may want to obscure in batches.

(If you’re curious, feel free to contact us. Or, sign up to watch this solution in action.)

Today, we’re using built-in tools and features that can enable users to quickly and easily add custom workflows or add-ons to fit business processes, or even fulfill a unique demand — like transforming audio to metadata.

In this blog post, I will demonstrate how you can add a custom workflow step to Capture Server Professional by leveraging the Microsoft Cognitive Toolkit. This action will open recorded audio files from a batch, convert them into text data then add that information to the documents in the batch as metadata for quick archival and search.

To create a custom activity, all that is needed is to define a PowerShell script block that will be called whenever the activity is processed. As an example, the following script looks through all of the documents contained in the current batch, determines if the document being examined contains an audio file, then converts any found audio file contents to text.

To do this, the script makes an HTTP post request to Microsoft’s speech recognition engine and passes the bytes of the audio file to the engine as the body of the request. The response from the engine will contain an array of possible interpretations of the audio as text strings with their associated confidence value. The results are sorted from highest to lowest confidence by default, so the script simply uses the first result. The resulting string’s value is added to the originating document’s metadata.

Learn how KnowledgeLake's solution can help you convert audio to metadata and gain greater insight into your customers' minds.

After defining this script block in a PowerShell session, we can add a new activity definition that we will call “Convert Speech to Text”. As a result, the new Convert Speech to Text activity definition appears inside of the Process Designer app and can be added to any process. The example process below simply contains the new Convert Speech to Text activity followed by a Send to Folder activity. After creating and saving the process, we can now upload audio files using either the Upload app or a scheduled Import Job, and the transcription of the audio will be appended to the new document’s metadata. This metadata can then easily be found within the content management system.

Thanks to the ease of use of Capture Server Professional and Microsoft Cognitive Toolkit Speech Recognition API, I was able to create a new workflow step that interprets audio data and adds converted transcriptions to document metadata using a lean 30 lines of PowerShell. Using these and other tools built into KnowledgeLake cloud capture and Capture Server Professional, anyone can quickly make powerful and flexible customizations to your content management system that to drive business needs and create new innovations.

Discover how converting audio to metadata can impact your organization and explore the other possibilities of KnowledgeLake Capture Server Professional!

Watch our solution in action!



Leave a Comment

You must be logged in to post a comment.