The use of AI in phishing emails
Sep. 30, 2024
What you'll learn in this article
- AI tools enable threat actors to generate well-constructed and contextually accurate emails.
- Overall, generative AI emails have a polished and professional tone, making them more convincing.
- Analyst investigations should increasingly look for words and phrases associated with generative AI models and not just the sender information and payloads.
AI Generated Messages
Threat actors are increasingly leveraging generative AI technologies to craft phishing emails. These AI-generated attacks are harder to identify due to improved language fluency and tone mimicry. When interviewing Mimecast threat researchers for our recent Threat Intelligence report, questions were asked about the pervasiveness of AI in phishing emails - but no metrics could be quantified. Our data science team took on the challenge to help, by building a detection engine to determine if a message is human- or AI-generated based on a mixture of current and historical emails, and synthetic AI-generated emails.
The research indicates a point in time when Mimecast starts observing an increasing trend in AI-generated emails correlating with the release of ChatGPT. In addition, we also observed malicious AI-generated BEC, fraud and phishing emails. Analysts are advised to stay vigilant, as these attacks are expected to increase in both volume and complexity.
The full details of our findings can explored further within this blog post.
Telltale signs of AI-generated emails
One of the most notable characteristics of AI language models is the use of complex words and sentence structures. Researchers found AI language models favor certain words in scientific writing. “Analyzing 14 million papers from 2010-2024, they noticed a sharp increase in specific ‘style words’ after late 2022, when AI tools became widely available. For example, ‘delves’ appeared 25 times more often in 2024 than before.
Mimecast’s data science team started with the intention to train a model the differences between human- and AI written emails. In total over 20,000 emails were utilized from Mimecast’s data coupled with LLM generated synthetic data. We then sampled 1000 emails per month from January 2022 to June 2024 to identify how many were AI written. These statistics show that out of 30,000 emails analyzed it was found that 2,330 were AI-written representing 7.8% of all emails in the dataset. The results from this exercise can be found in figure 1 which also highlights the increase of AI-written emails.
Targets
Global
Examples of AI-Generated Emails
During the analysis process several malicious examples were found containing distinctive language indicative of AI tools.
Example 1 - AI generated spam message
Indicators:
- "delves into the intricacies of", "navigating through the complexities of"
- Overuse of bullets
Example 2 - AI generated BEC message
Indicators:
- ‘I hope this message finds you well'
- Repetition of the words ‘gift cards’ and ‘surprise’
Example 3 - AI generated BEC message
Indicators:
- ‘Hello!’
Example 4 - AI generated phishing message
Indicators:
- ‘delve deeper into this’
- ‘stumbled’ or ‘stumbled upon’
- Long ‘-’ utilized across ChatGPT
Recommendations
These findings indicate that manual phishing investigations remain a crucial layer of defense, especially when flagged by end users. It's vital that threat researchers scrutinize the language for specific markers that align with our findings; by cross-referencing indicators such as “delve deeper into this” or “hello!”, particularly among users who commonly don’t use such language. With known threat patterns, you can identify phishing threats more effectively, reducing remediation time and mitigating organizational risk.
It should be noted that security teams should ensure their indicators utilized within their investigations, evolve alongside large language models which are continually changing.