Like many companies that handle Threat Intelligence data on a large scale, we have developed a number of in-house systems to help us identify and manage threat indicators (colloquially known as “indicators of compromise”, or “IOCs” for short). Some time ago, while reviewing some third-party solutions to potentially enhance our capabilities, I observed (and initially was surprised at finding) a substantial difference in capabilities between competing products. Or at least I thought they were competing products, until I realized that our use case for this technology isn’t the only use case for it, nor likely even the leading use case for it.
I spend a lot of time identifying IOCs related to malware. One of my goals is to “find all the IOCs” related to a given malware family. Typically, my ultimate goal is to identify the “command and control” (“C2”) locations, in order to report those in our feeds. In order to find as many C2s as I can, and to associate them with a given malware family, it is critical for me to identify unique attributes that exist within the given malware family. I’ll show you some reasons why in the example below.
Let me give an older example, using a malware known as Pony (or Fareit). Pony is designed to search the computer for saved credentials (username/password/resource location) that it can extract and send those stolen credentials to a URL (in this case, its C2), typically configured and operated by a criminal. It also is designed to (optionally) receive additional commands from the C2, such as to download and execute software (typically other malware) from other URLs.
Without spending too much time further explaining Pony and all of its iterations over the years (I assume you’re busy!), here are some example IOCs that it has been associated with at some point in the past:
- Performs an HTTP POST to the C2, using a user-agent associated with Internet Explorer 5, running on Windows 98
- Attempts to read the value of the HWID registry key in HKCU\Software\WinRAR
- Attempts to access files where specific FTP clients store saved passwords
You might ask “if all that you need to report on is the C2 location, why is it important to know what registry values it queries, or what files it drops?” And I’d be glad that you asked that, so that I can answer it, else this blog post would end rather awkwardly right here.
While there’s a possibility that some other malware will come along and contact its C2 in the same manner, the chances that it also performs the same filesystem and registry activity is somewhat smaller. (And frankly, if it performs all the same actions, but isn’t actually Pony, does it matter all that much? My thought is if it walks like a Pony, and quacks like a Pony… err, scratch that.) A more realistic (false positive) scenario is some different malware that connects to a C2 and receives an instruction to perform a DDOS attack against a target, perhaps using an HTTP POST flood, and while using spoofed user-agent headers, sends the one often associated with Pony. I certainly don’t want to report the DDOS victim as being a Pony C2! So, identifying as much as I can about a malware family, especially from multiple sources, such as sandbox reports, static analysis and memory artifacts, helps me not only find as many of these as possible, but drastically reduces the rate of false positives in our data.
No, my ADHD hasn’t sidetracked me, that example actually does relate to my initial point about various use cases. You see, while we have tools that I can use to get IOCs, times change, and thus, needs change. Modern malware often utilizes various forms of anti-analysis, so as you encounter these, and your existing tools don’t handle something well, you need to decide how to handle the new challenge. Do you:
- Upgrade a tool you already have?
- Create a new tool?
- Find a new tool?
- Hire the band TOOL?
So, this one time (not at band camp), I was looking at different sandbox solutions, both open source and commercial, thinking that perhaps one (or more) had already solved the challenge for this specific malware family, and if so, all I need to do is implement (open source) or purchase (commercial) the technology. In particular, I wanted something that would at minimum solve this problem, do it in a cost-effective manner (in money and in time), and do it quickly and efficiently. And that’s where I came across (what I thought were) competing technologies that varied considerably in features and behavior.
Some sandbox solutions are focused on the kind of data that I’m looking for – that is, all of it. I don’t care about scoring algorithms, machine learning (ML), artificial intelligence (AI), and to some extent, speed. I simply want reliable IOC extraction, scaled to handle at least 500,000 samples daily. The scaling part we have; the IOC extraction is what needs to continuously evolve to handle the latest threats.
Other sandbox solutions are focused more on quickly determining if something is a threat, and how much of a threat. These often utilize scoring, and might also include AI/ML. Solutions like this tend to be geared less towards my need, and more towards perhaps a SOC or an appliance. One example usage handled well by this type of solution: Perhaps an unknown email attachment arrives, and the mail gateway needs to determine whether to deliver it to the end user. If 99% of attachments in this scenario were malicious, businesses would likely tolerate a solution that takes 10-15 minutes to determine that; however, that (99% malicious) is unlikely to be the case. The speed of business often requires much less delay, so a sub-60-second decision is optimal. The mail gateway and the SOC analyst don’t (typically) need to know what files are created upon execution, what registry/mutex updates would be made, etc. They just need to know that they must not let that attachment reach its destination mailbox. (If it did reach the destination, then knowing the IOCs becomes far more important, as then you can likely detect if a successful infection occurs.)
The lesson here is, be well aware of what it is that you are trying to accomplish, down to the finer details. If I were to jump at the cheapest, easiest-to-implement sandbox solution that could handle 500,000 samples a day, I might find myself needing to invest months of my time modifying it to perform memory analysis, extract IOCs from static and memory analysis, and introduce that data into its report. Plus, we’d need it to output the report in a format that our existing processes can work with. It might be far better to keep our process the same, but add a license for a commercial product that allows for 1,000 samples per day, but can give me what I need, without much development time. I can update our process to determine what to send to that commercial product (in this case, the samples that I cannot currently get IOCs for in an automated manner), and then write a script to translate its report into what we need it to look like.
These considerations are helpful when choosing a Team Cymru solution as well. If you find yourself wanting to block access to known botnet controllers on your corporate network, and your network uses a proxy that can deny access, perhaps our Controller Feed would be of benefit to you. Or, if you perform risk assessment for an eCommerce or online banking site, and want to gauge if a login attempt or online purchase needs further assessment before allowing it to continue, our Reputation Feed would be a better fit for you. But if you’re a group of threat hunters or intelligence analysts, tracking activities and movement of threat actors, there is no substitute for accessing Team Cymru’s Pure Signal™ via the Augury™ analysts’portal.
 Yes, this happens, and it’s not unusual. (At least it wasn’t that unusual, back when infected Windows computers were commonly used as sources of DDoS attacks. But I digress.)