Deepfake Cloned Voice Scams are here to stay and FI’s declare war on scammers

1. Perils of AI-enabled voice cloning harms

The financial institutions realized that AI is a double-edged sword, potential enough to bring upside benefits in terms of operational efficiency and proven enough to generate harm to the society and swindle the victims’ funds.

AI started playing a vital role in our day-to-day operational life. We feel the gain of reduced labor and our efforts are channelized towards better activities. Equally AI plays a vital role in a fraudsters life by upgrading their fraudulent techniques to scam the victim to siphon their funds.

A clear threat to the financial system is the voice cloning scams, which has become highly prominent with millions of dollars lost due to the AI technology.

It is the responsibility of each financial institution in collaboration with the other enablers like consulting and technology firms to proactively invest and identify a mitigating solution, which should help to prevent the deepfake cloned voice scams.

Institutions demonstrating their responsibility to be proactive in countering the AI driven digital scams will not only help to enhance their brand, reputation as an intelligent forward thinking futuristic firm, but also end up expanding their customer base along with achievable profitable growth.

Few interesting metrics & reports to share :

McAfee’s lab survey found that 53% of adults share their voice online at least once a week.
The Australian Cyber Security Centre (ACSC) has issued a warning that, nearly 240,000 Australians reported scams for a total of $568 million lost to AI scams.
The voice biometrics or identity-centric solutions to combat synthetic voices has convinced 91% of US banks to reconsider their use of voice verification – BioCatch survey
New data from Starling Bank, 28% of UK adults say they have already been targeted by an AI voice cloning scam at least once in the past year. The same data revealed that nearly half of UK adults (46%) have never heard of an AI voice-cloning scam and are unaware of the danger.
Recent report from Federal Trade Commission (FTC) says that – The growth of artificial intelligence has made it easier for criminals to clone voices and create almost perfect requests that sound just like people you know.
In fact, according to FTC reports, in 2023, the U.S. lost $2.7 billion to imposter scams alone. These include scammers pretending to be the government, your bank's fraud department, a technical support expert or a distressed friend or relative.
In 2023, the FBI reported a 14% increase in the number of complaints of telephone scams filed to them by adults over the age of 60, with losses of US$3.1 billion in 2022 to US$3.4 billion a year later.
News surfaced that cyber attackers are exploiting AI-based voice technology to deceive individuals. A recent study indicates that India is the country with the highest number of victims, with 83% of Indians losing their money in such fraudulent activities.

2. Industry collaboration coupled with our AI innovation lab

Given the darknet markets potential to supply any formatted data at any time to the criminals to execute their large scams and frauds in a seamless manner, the only best way to for financial institutions to counter the fraud rings is via forming a public-private-partnership (PPP) to counter the same.

Realizing this we at Wipro, have collaborated with industry leaders to exchange ideas and independently invested our time and effort to develop sophisticated deep-learning models to combat the AI generated voice scams.

With our missionary statement - ‘the business drives the purpose, and the purpose drives the business’ – our clear strategy is to partner with our clients to develop models to prevent the cloned voice scams on a real-time, via a proof-of-value (PoV) exercise.

As the fraudsters are quite contemporary in nature by introducing new ways of scamming the victims, hence we try to stay relevant by upgrading our AI models by introducing multiple variables to predict the customers behaviors and transaction pattern changes.

Our AI models build with an objective to study the voice data & the behavior patterns like – Change in voice pattern, change in breathing (a sense of urgency), struggling to voice over or talk, unusual voice or fall in voice, to mention a few. Our in-depth domain & consulting knowledge coupled with AI technology acumen helps to think in a futuristic manner to address the contemporary industry challenges.

3. Our approach towards model development and solution build:

We successfully completed the development of an optimized and efficient models to identify the cloned voice addressing both the human and the machine generated voice.

A multi-attention LSTM modl and fine-tuned Vision Transformed (ViT) using our voice sample data.

From our experimentation, both these models demonstrated strong real-time cloned voice identification, achieving up to to 93% accuracy on the training data and 90% accuracy on unseen data.

We trained and validated the models using various open-source datasets, which contributed to these results. However, when applying the models to more realistic, real-world data, the accuracy drops to around 70%.

The efficiency & effectiveness on the model results depends purely on continuous model training with improved data quality viz., diverse set of data sourced from different practical environments, aiming to enhance accuracy to 95%.

By leveraging external, real-world data (rather than open-source datasets), we aim to make the models more robust and adaptable to real-world scenarios.

4. Model Execution Plan: Development & Deployment

The summary of the steps followed so far in the development process are:

a) Building a robust data pipeline – Original data along with cloned data collection

b) Continue to explore avenues to benchmark our work vis-à-vis the industry best standards

c) Continue to explore various models to fine tune the model efficiency and accuracy

d) API endpoints created in AZURE

Step 1: Data Collection & Preparation:

Our approach towards data collection:
- Collect diverse real-world data from various sources (e.g., audio, images, text) to ensure a broad range of training data.
- Gather data that closely mirrors the target environments for improved model generalization.
Synthetic Data Generation Using GAN Models:
- Develop and implement Generative Adversarial Networks (GANs) to generate synthetic data that enhances and supplements the real-world dataset.
- Focus on ensuring the synthetic data closely mimics real user data to maintain model performance.

Stage 2: Model Development & Enhancement

Development of GenAI Algorithm:
- Design and build a Generative AI algorithm capable of cloning real user data based on reference text inputs.
- Ensure the algorithm can generate realistic data that reflects the user's unique characteristics.
Enhancement of VIT & LSTM Multi-Attention Models:
- Integrate new, diverse training datasets to refine the performance of Vision Transformer (VIT) and Long Short-Term Memory (LSTM) multi-attention models.
- Use the enhanced dataset to ensure the models can handle a wide range of real-world scenarios.

Stage 3: Model Validation

Model Validation on Real-Time Recorded Data:
- Validate the performance of the enhanced models by testing them on real-time recorded data.
- Measure accuracy and ensure that the models achieve at least 90% accuracy in practical, real-world scenarios.
Development of GenAI Algorithm for Cloning User Data
- Input Type: Text-based reference data (e.g., user-specific descriptions or metadata).

Stage 4: Model Deployment

Deployment of Models & API Creation:
- Deploy the trained and validated models.
- Develop an endpoint API that allows external applications to interact with the models.
- Ensure the API is scalable, secure, and efficient for consumption by different platforms.

Stage 5: Integration & Accessibility

API Integration as SaaS Solution:
- Design the API to be easily integrated into various Software-as-a-Service (SaaS) platforms.
- Enable seamless integration with third-party solutions, allowing businesses and developers to embed the models into their systems with minimal effort.
Solution Availability via SDK & UI:
- Package the solution as a Software Development Kit (SDK) for developers to directly incorporate the model into their applications.
- Develop a user-friendly interface (UI) for non-technical users to access and interact with the model's capabilities without requiring programming knowledge.

Stage 6: Monitoring & Continuous Improvement

Ongoing Monitoring & Model Refinement:
- Continuously monitor the model's performance and user feedback after deployment.
- Incorporate additional real-world data and feedback to further enhance the model’s robustness and performance.

The above simple development stages helped Wipro to achieve the following:

a) A unique real-time solution to identify the cloned voice scams

b) Address our existing and prospective customers requirements to identify the deepfake voice cloned scams and frauds

We continue to sharpen the solution with our hyper scalar collaboration – Microsoft AZURE and take this developed solution to address the client’s needs on mitigating challenges on voice biometrics.

5. Appendix

[ References, Additional information/reading]

About the Authors

Venkatesh Balasubramaniam

Dr. Gopichand Agnihotram

Joydeep Sarkar

Srilakshmi Subramanian