Blogs
The latest cybersecurity trends, best practices, security vulnerabilities, and more
The State of AI Models: Performance, Cost, and Applications
Not all large language models are created equal!
By Martin Holste · March 31, 2025
I still get the newspaper delivered. I don’t know why I get a kick out of an analog news delivery device, but I read it every day. Last week, it suddenly shrunk considerably. It went from around 30-40 pages to about 8, feeling more like a restaurant menu than a newspaper. Why? Ransomware. My local paper (their corporate conglomerate) was taken down by a crime gang. Unable to pay the ransom, they lost access to all of the typical publishing tools they relied upon, and had to start laying out the paper by hand. Add them to the ever-growing list of businesses I, as a consumer, have gotten breach notifications from, lost service to, or have otherwise heard about being hacked.
So how does a shiny new technology like generative AI keep my local newspaper from shrinking? It comes down to being able to know when criminals have first entered an environment, and boot them out before they can get a foothold to initiate a ransom. We measure this ability in mean-time-to-detection (MTTD), and this is where GenAI is changing the game and giving defenders a chance to turn things around.
Here's a breakdown: GenAI can automate the investigation process, finding critical alerts and assessing their severity to know the instant an attacker is in the environment. This auto-investigation involves several key steps:
- Determining "good" or "bad" activity: GenAI analyzes whether an activity, like using PowerShell, is malicious or benign.
- Identifying involved parties: GenAI examines user profiles, IP addresses, and standard tool usage to understand who is involved and what their roles are.
- Defining "normal" behavior: GenAI understands what tools and activities are typical for users and can flag deviations.
- Reconstructing the sequence of events: GenAI pieces together the story of what happened based on the available evidence.
- Making decisions: GenAI evaluates all factors and determines the appropriate response.
This auto-investigation capability helps in quickly identifying and prioritizing critical alerts, ensuring that security teams can focus on the most pressing issues, which is how defenders have a chance of finding and evicting ransomware criminals before they can do damage.
But how can you trust that AI is making the right security decision for you? This is where expertise in different generative AI models is key, because the wrong model can make the wrong decision.
Choosing the right large language models (LLM’s)
First, let’s cover what we mean by generative AI models. A large language model (LLM) is an artificial intelligence (AI) system that can understand and generate human language. LLMs are trained on large amounts of data using machine learning techniques. The ability to generate new content in the same manner as a human would with cognitive thought is where the “GenAI” term comes from.
Different LLMs have different strengths and weaknesses, and so having an evaluation framework for choosing the right model is the only way to ensure that you’re using the right model for the job. At Trellix, we use a purpose-built system in which we test different prompts and models and have AI evaluate the responses to decide which is best.
Our framework measures the amount of the analysis detail, the correctness of the responses, and compares this with the price associated with running the model. Based on these evaluations, we’ve chosen the following models:
Anthropic's Claude Sonnet: When we need the most thorough, knowledgeable, and complex answers, Claude is our model of choice. It performs detailed analysis with incredible skill, and it performs machine-level tasks like decoding and info lookups better than a human.
Amazon Nova Micro: For tasks that require formatting, straightforward decisions, and quick analysis, Nova Micro is the perfect fit. Its price-to-performance ratio lets us use generative AI in situations that would be otherwise cost-prohibitive.
Amazon Nova Lite: When it comes to coding and automating tasks, such as generating new plugins or enhancing existing functions, Nova Lite takes the lead. It lets us create new content at will.
What about the other models out there? How do they compare against our selections – and against each other? Our evaluations find that the quality of analyses can range from superhuman performance to being flat out incorrect. Here’s a recent benchmark rating the model analyses compared with a human:

What does this difference in model performance look like in a real-world security scenario? Here are three comparisons between the responses for performing a task beyond what a human can do: decode base64-encoded text.
Claude Sonnet 3.5 V2
The base64 decoded PowerShell script is checking and managing Windows Update settings
- Script appears to be managing the conflict between MDM and GPO update settings
- This appears to be legitimate MDM automation
- The sequence of events shows expected MDM management activities
- Other events (4688, 4662, 4670) are consistent with system management tasks
- The base64 encoding is a standard practice for AirWatch/Workspace ONE
In this example, Claude 3.5v2 has gone beyond what a human can do (highlighted in green) and performed on-the-fly decoding, arriving at the better-than-human decision.
Gemini 1.5 Pro
The long base64 encoded command line argument is highly suspicious. The decoded PowerShell script (provided in the supporting info) checks a registry value related to updates and then, if configured to allow automatic updates, removes a registry key related to Windows Updates (`HKLM:\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate`) and restarts the `wuauserv` service. Disabling automatic updates is a common tactic used by attackers to prevent systems from patching vulnerabilities. This is not standard behavior for MDM software.
Here we see Gemini 1.5 Pro do a detailed decoding and analysis, but then it makes a critical error by arriving at the wrong conclusion (disabling automatic updates is standard behavior for MDM software, because it controls the updates itself).
Nova Pro
The presence of base64-encoded strings in PowerShell scripts is common for configuration data or encoded commands in legitimate scenarios. - The data suggests routine system maintenance and configuration activities. The processes and commands align with standard administrative tasks.
- There is no clear indication of anomalous behavior beyond the use of base64 encoding, which is not uncommon in scripting.
Nova Pro arrives at the right conclusion, but it has to assume what was inside the encoded command.
How Trellix Wise uses GenAI models
So how does all of this analysis of model efficacy benefit you? In 2024 Trellix introduced Trellix Wise, our capability of leveraging GenAI in the Trellix Security Platform. Built on over a decade of AI modeling and 25 years in threat intelligence, analytics, and machine learning, Trellix Wise capabilities relieve alert fatigue and surface stealthy threats, ensuring no threat is missed.
It enhances Trellix Managed Detection and Response (MDR) capabilities,with pre-training that focuses on valuable detections rather than requiring analysts to figure out effective prompts for chatbots. The platform offers a differentiated approach with extensive third-party integrations that leverages GenAI to address high priority use cases such as ransomware and identity theft where the speed and efficacy of Trellix Wise provides a crucial advantage.
Generative AI is at the heart of Trellix Wise, enabling automated investigations and decision-making. It can:
- Automatically investigate alerts: Determining the severity and scope of potential threats.
- Understand context: Recognizing normal behavior versus malicious activity.
- Create detailed stories: Providing a complete picture of what happened.
- Make decisions: Escalating or deprioritizing alerts based on comprehensive analysis.
By automating alert triage, investigation, and response, Trellix Wise enables security teams to work more efficiently and effectively. As the threat landscape continues to evolve, AI will become increasingly critical in defending against sophisticated attacks. With its rich history of innovation and commitment to AI-driven security, Trellix Wise is best positioned to lead the way into the future.
To experience Trellix Wise for yourself, take our interactive self-guided product tour.
If you’re interested in a security prompt engineering challenge, we have a capture-the-flag workshop that’s both fun and a great way to understand how LLM’s work with security data. Contact us today at ai@trellix.com to schedule a custom one for your organization!

RECENT NEWS
-
Feb 5, 2025
Trellix Accelerates Secure Cloud Adoption in Australia with New Government Accreditations
-
Jan 28, 2025
Trellix and NEXTGEN Accelerate Cybersecurity Platform Adoption in Australia and New Zealand
-
Jan 22, 2025
Trellix Welcomes New CEO to Lead Next Phase of Growth
-
Jan 14, 2025
Trellix Accelerates Global Partner Growth with Revamped Xtend Partner Program
-
Jan 13, 2025
Trellix Promotes Gareth Maclachlan to Chief Product Officer
RECENT STORIES
Latest from our newsroom
Get the latest
Stay up to date with the latest cybersecurity trends, best practices, security vulnerabilities, and so much more.
Zero spam. Unsubscribe at any time.