Vers un regroupement multicritères comme outil d’aide à l’attribution d’attaque dans le cyber-espace

Facebook Tweet Pin Email

Understanding the existing and emerging threats on the Internet should help us to effectively protect the Internet economy, our information systems and the net citizens. This assertion may look blindingly obvious to many people. However, things are less evident when looking more closely at the problem. Among security experts, there is at least one thing on which everybody agrees: combating cyber-crime becomes harder and harder . Recent threat reports published by major security companies have also acknowledged the fact that the cyber-crime scene is becoming increasingly more organized, and more consolidated .

There is obviously more at stake than just technical challenges, hacking fame, or digital vandalism. Money is at stake. In the last years, it has been often reported that cybercriminals were building and maintaining an underground economy, which can offer the commoditization of activities, such as the sale of 0-day exploits or new malware, the sale of compromised hosts, spamming and phishing resources, the sale of stolen credentials, etc [54, 158]. In most cases, these illegal and profitable activities are enabled by gaining control over botnets [11, 34, 126, 10] comprising thousands or even millions of machines, with many of those computers belonging to innocent home users. The worldwide spam problem is also largely due to those groups of compromised computers under the control of cyber criminal organizations. According to the 2009 Annual Security Report of MessageLabs [94], the annual average spam rate was 87.7% of all intercepted messages (an increase of 6.5% w.r.t. 2008), with 83.4% of this spam volume that originated only from botnets. As analyzed by SecureWorks [151], in 2008 the top botnets were collectively able of sending over 100 billion spams per day. Today, this figure has further increased to 107 billion spams per day [94].

Perhaps even more worrying, the analysis of recent “cyber conflicts”, such as the presumed cases related to Estonia and Georgia [3, 37, 41], have led experts to the conclusion that botnets can be easily turned into digital weapons. Botnets can be used by cybercriminals (or dissidents) to attack the network resources of a country by performing Distributed Denial-of Service (DDoS) attacks against critical web services (e.g., DNS servers, network routers, government or financial websites, etc), which can lead to substantial economical or financial loss. A deep understanding of the long-term behavior of those new armies, and their evolution, is thus a vital requirement to combat effectively those latent threats [168].

Recently, we have also observed the monetization of another type of illegal activities through the propagation and distribution of rogue anti-virus software. Using social engineering, but also some highly-sophisticated techniques (such as the exploitation of client-side vulnerabilities or compromising legitimate web sites), cyber-crooks are able to distribute rogue AV programs, thanks to which they generate a substantial profit [159, 112, 36]. The business model of those miscreants is presumed to be an affiliate-based structure, with per-installation prices for affiliate distributors. The volume of profits generated for those cyber-criminals is impressive: earnings of as much as $332,000 a month were reported in affiliate commissions alone, as observed on the distribution website TrafficConverter.biz .

Since 2003, there seems to be a shift in the nature of attacks in the Internet, from serverside to client-side attacks and from fast spreading worms to profit-oriented activities like identity theft, fraud, spam, phishing, online gambling, extortion. Most of those illegal activities are supported by several large botnets controlled by criminal organizations. All the facts and figures presented in public threat reports are certainly valuable and help to shed some light on those cyber-criminal phenomena, but a lot of unknowns remain. Current analysis techniques do not allow us to automatically discover new relevant knowledge about attack phenomena, certainly not from a strategic viewpoint. Today, there is still a gap of knowledge between what we believe to be happening, and what we can actually observe and prove scientifically. Even if there are some plausible indicators about the origins, causes, and consequences of these new malicious activities, very few claims can be backed up by scientific evidence. The main reason is that no global threat analysis framework exists to rigorously investigate emerging attacks using different data sources, and different viewpoints.

All previously described issues are related to a common security problem often referred to as “attack attribution”. The main contribution of this thesis consists in developing an analytical method in order to systematically address the problem of attack attribution, i.e., how to attribute (apparently) different attacks to a common root cause, based on the combination of all available evidence.

By root cause, we do not refer to the identification of a given machine that has launched one specific, isolated attack. Instead, we are interested in having a better idea of the various individuals, groups or communities (of machines) that are responsible for large-scale attack phenomena. A method for attack attribution must also enable a precise analysis of the modus operandi of the attackers. As a result, it will also help an analyst to get better insights into how cybercriminals operate in the real-world, and the strategies they are using.

Conceptually speaking, this problem comes down to mining a very specific dataset, made of attack traces that are presumably representative of the various phenomena under scrutiny. To address the problem, we have thus focused on clustering and classification techniques applied to attack events, which are enriched with metada and contextual information. By grouping them based upon a series of common elements, we hope to be able to derive semantically rich models but also to effectively associate new phenomena to previous ones. We will use all sort of metadata related to the attacks in order to group, as much as possible and in a meaningful way, the observed phenomena. Examples of information that could possibly help such an analysis are the origins of the attack, their timing, their spreading strategy, their coding style, their behavior, etc. These various characteristics will be also referred to as attack features.

Table des matières

1 Introduction
1.1 Problem statement
1.2 Research methodology
1.3 Structure of the thesis
2 Background and Related Work
2.1 Foreword
2.2 On the analysis of malicious activities
2.2.1 Analysis of honeypot traffic
2.2.2 Analysis of darknet traffic
2.2.3 Analysis of IDS and firewall logs
2.2.4 Analysis of large malware repositories
2.2.5 Research on botnet tracking
2.2.6 Cyber SA
2.2.7 Preliminary conclusions
2.3 Investigative data mining
2.3.1 Security data mining
2.3.2 Crime data mining
2.4 Multicriteria decision analysis in security
2.5 Summary
3 Graph-based Knowledge Discovery in Security Datasets
3.1 Introduction
3.2 Attack Feature Selection
3.3 Clustering analysis
3.3.1 Introduction
3.3.2 Similarity measures
3.3.3 Graph-based clustering
3.4 Evaluation of clustering results
3.4.1 Estimating the optimal number of clusters
3.4.2 Cluster validity indices
3.4.3 Objective evaluation and comparison with other approaches
3.5 Summary
4 Conclusion