Large Scale Malware Analysis, Detection and Signature Generation
Add to Google Calendar
As the main vehicle for most organized cybercrimes, malware has become one of the most severe threats to computer systems and the Internet. The recent advent of automated malware development toolkits has led to a huge surge in the number of new malware threats in recent years. AV companies typically receive tens of thousands of suspicious samples every day, which have to be analyzed by human analysts to determine their identities and create anti-virus signatures. However, the overwhelming number of new malware easily overtaxes the available human resources, making them less responsive to emerging threats. To address these issues, this dissertation proposes four novel and scalable systems with the focus on a central theme: "automation and scalability" . First, it builds malware database management system called SMIT, designed to efficiently check if a new sample is syntactic variations of existing malware using its function-call graph. Evaluation of real-world malware demonstrates SMIT’s effectiveness and scalability to support large number of malware samples. Second, the dissertation develops an automatic malware clustering system called MutantX. By quickly grouping similar samples into clusters, MutantX allows analysts to focus on novel and representative samples and automatically generate labels for unknown samples via their association with existing groups. Third, the dissertation presents Hancock, the first automatic malware signature generation system tackling challenges of automatically creating high-quality string signature with extremely low false positive rates. Finally, observing that two widely used malware analysis approaches have their respective pros and cons, this dissertation proposes a novel system called DUET that optimally integrates malware clusterings based on both static features and dynamic behaviors. The goal of DUET system is to allow the static and dynamic analysis to complement each other and mitigate their respective shortcomings.