A number of recent botnets and advanced threats use HTTP as their primary communications channel with their control servers. McAfee Labs research during the last couple of years reveals that more than 60 percent of the top botnet families depend on HTTP. These numbers have increased significantly over the last few quarters. The following pie chart shows the predominance of HTTP in botnet communications.
Why is HTTP so popular? One reason is that HTTP is always allowed on the network perimeter. Because the malicious traffic blends well with legitimate HTTP traffic, it’s hard to differentiate and impossible for network security devices to block malicious traffic unless there is a known signature for it.
The limits with the traditional signature-based approach drive security researchers to shift focus to behavioral-based approaches, but the question remains: What network behaviors should the security devices look for?
Botnets typically work in a “pull” fashion; they continuously fetch commands from the control server, either at fixed intervals or at stealth level. Once connected, they usually “phone home” via HTTP GET/POST requests to a specific server resource (URI). Subsequently, a botnet might execute a command sent by the server or sleep for fixed interval before it tries again. Security researchers can perhaps leverage this connection behavior.
An infected machine connecting periodically to a control server is automated traffic. We need to draw a line between automated and human-initiated traffic as well as between control server responses and legitimate server responses. We can rely on a few facts:
- It is abnormal for most users to connect to a specific server resource repeatedly and at periodic intervals. There might be dynamic web pages that periodically refresh content, but these legitimate behaviors can be detected by looking the server responses.
- The first connection to any web server will always have response greater than 1KB because these are web pages. A response size of just 100 or 200 bytes is hard to imagine under usual conditions.
- Browsers will send the full HTTP headers in the request unless it comes from a man-in-the-middle attack
The preceding points give us a way we can look for specific behaviors on the network: Repetitive connections to the same server resource over HTTP.
If we monitor a machine under idle conditions–when the user is not logged on and the host does not generate a high volume of traffic–we can distinguish botnet activity with a high level of accuracy.
Under these conditions, if the machine was infected with the Zeus botnet, for example, the traffic from the infected machine would look like this:
Notice that Zeus connects to one control domain and keeps running HTTP POST every six seconds to a specific server resource. Algorithmically, while idle, we’d deem a host’s activity repetitively suspicious under these conditions:
- The number of unique domains a system connects to is less than a certain threshold
- The number of unique URIs a system connects to is less than a certain threshold
- For each unique domain, the number of times a unique URI is repetitively connected to is greater than a certain threshold
Assuming the volume of traffic from the host is less, If we take the preceding conditions in a window of say two hours, we might come up with following:
- Number of unique domains = 1 (less than the threshold)
- Number of unique URIs connected = 1 (less than the threshold)
- For each unique domain, the number of times a unique URI is repetitively connected to = 13 (greater than threshold)
All of the thresholds can be set as configurable parameters, depending on typical traffic on a network. The following traffic pattern shows the behavior of the SpyEye botnet. The repetitive activity here occurs every 31 seconds as it connects to a specific resource.
However, the solution does not mandate that repetitive activity should be seen at these fixed intervals. If we choose to monitor within a larger window. We could detect more stealthy activities. The following flowchart represents a possible sequence of operations.
The first few checks are important to determine whether the host is talking too much. First, Total URI > Threshold determines that we have enough traffic to look into. Next, Total Domain access >/= Y determines that the number of domains accessed is not too large. The final check is to see if Total unique URIs < Z. The source ends up on the suspicious list if we believe it has generated repetitive connections.
For instance, if the Total URIs = 30, Total Domain access = 3, and Total Unique URI accessed = 5, we guarantee a repetitive URI access from the host. Now if the number of repetitive accesses to any particular URI crosses the threshold (for example, 1 URI accessed 15 times within a window), we can further examine the connection and apply some of heuristics to increase our confidence level and eliminate false positives. Some heuristics we can apply:
- Minimal HTTP headers sent in the request
- Absence of UA/referrer headers
- Small server responses lacking structure of usual web page
Let’s look at an example of SpyEye sending minimal HTTP headers without a referrer header:
We implemented a proof of concept for this approach and detected repetitive activity over the network with relative ease.
We applied this approach to several top botnet families that exhibit this behavior. We found we could detect them with a medium to high level of confidence.
Behavioral detection methods will be key to detecting next-generation threats. Given the complexity and sophistication of the recent advanced attacks, such detection approaches can address threats proactively–without waiting for signature updates–and will prove to be much faster.