Many people I speak to confuse voice biometrics with speech recognition, but the two are very different technologies. Speech recognition, or ASR, is the ability to take spoken words and transcribe them to written words and/or to understand the phrases with natural understanding (NLU). Speech recognition is used in everything from ordering products with Alexa to transcribing agent conversations for speech analytics like sentiment scoring and topic spotting.
Voice biometrics looks at the physical structure of a person’s vocal tract to understand the individualized characteristics of a person’s speech patterns; this is used in most cases for authentication and fraud prevention. We see voice biometrics used in many industries, but mostly in the financial sector and in companies that require stricter authentication than pin codes and one-time passcodes.
Where Did We Get Voice Biometrics?
The idea that speech could be used for recognizing individuals first began back in 1960 with Gunnar Fant, a Swedish professor who first published a model describing the physiological components of acoustic speech production, based on the analysis of x-rays of individuals making specified phonic sounds; this was later expanded by Dr. Joseph Perkell in 1970. In the late 1970s and 80s Texas Instruments and the National Institute of Standards and Technology (NIST) (with funding from the NSA) developed the idea of speech processing techniques using analog outputs of speech to be used for recognition of individuals.
In the past 20 years, rapid technological advancements in AI, neural networks, and increasing computing power have helped to bring in audio-visual speech recognition, isolated word recognition, speaker adaptation, and digital speaker recognition. These advancements have spurred a plethora of applications in speech biometrics that can be implemented into mobile devices, IVRs, and agent desktop with no hardware required.
Where Are Voice Biometrics Now?
The cloud contact center as a service (CCaaS) market in the areas of finance, documentation, law enforcement, health systems, and insurance organizations has begun embracing speech biometrics to aid members and customers in authentication to access secured services and allow agents more effective and faster methods of ensuring client identity. For example, one of our customers has systems in place to enroll thousands of members using their voices for account access and agent verification.
Others are using voice biometrics for fraud prevention and bad-actor identification before a customer even reaches an agent. However, with better authentication, fraudsters have become more creative in their methods by using recordings of individuals and text-to-speech technologies that allow them to not only sound like, but imitate, the victims so effectively, it is hard to know if it is live or if it is Memorex (sorry all you younger ones – see this commercial).
The latest in speech generation can use less than 14 seconds of a person’s voice to create realistic conversation snippets that are indistinguishable from the actual person. Current biometric providers are tuning their products to listen for multi-timbre, voice inflections, cadence, and synthetic speech detection (SSD) to fight these new threats with new algorithms that promise 86% detection rates by discriminating the tiniest differences between synthesized and actual speech.
How Do We Use Voice Biometrics?
There are two basic uses of voice biometrics today – active and passive. Active voice biometrics are what you see in the movies when the government agent walks up to a door and says, “my voice is my passport, authenticate me,” and the door opens for him. The first requirement for active authentication is to have a record of the person’s speech, which is usually done by having them repeat a phrase three times and an algorithm will analyze those snippets and create a model of their speech. The system then discards the recording, which is important for privacy and regulatory concerns about recording private parties. Then subsequent calls by the customer just require them to repeat the phrase and the system will identify them as authentic or counterfeit based on the saved model of their voice, which can account for background noise, sickness, and bad connections and still verify them.
The other use of voice biometrics is passive which, for example, can listen to a conversation between a customer and an agent and build a similar model of their voice with just 20-30 seconds of sampling. This can then be used to identify the caller by comparing successive calls with as little as 10 seconds of their voice to authenticate them without them repeating a given phrase. Each model has advantages and disadvantages, but both are equally valid for authenticating customers before they access secured systems or account information. Agents are usually given a score of validity or a stop-light ideograph to easily assess the authenticity of the caller.
Who Is Making Voice Biometric Technology?
There are many companies working on or selling software for voice/speech biometric applications including, but not limited to:
Each vendor provides either active and/or passive authentication and many have pre-built integrations with contact center solutions like Genesys Cloud, Amazon Connect, Cisco, and others. When looking for a provider it is important to ask them for evidence of successful integrations, the costing model (per customer or per authentication), and the support model as these can vary widely depending on method the overall solution offered.
TTEC Digital has extensive experience and great success with several of the companies in the market and can assist in evaluations and integrations with your contact center solution.