Commits

RomanGol committed 6879ad2

ICICS Weekend Modification

almost ready

Comments (0)

Files changed (3)

icics2012/sec_body.tex

 \label{sec_state}
 Encryption function detection is a problem of searching certain algorithms in programs especially in binary code.
 This work is based on the following assumptions:
-(1) We have the knowledge of the algorithm before detecting;
-(2) We assume the implementation is not aimed at failing our detection deliberately.
+(1) The knowledge of the algorithm before detecting is obtained;
+(2) The implementation of algorithm is not aimed at failing the detection deliberately.
 These assumptions are reasonable in the real world for the following reasons.
 First, it is always prudent to adopt mature encryption algorithms for the consideration of security,
 and these mature encryption algorithms are generally public and are tested for a long term.
-So we suppose that our approach doesn't detect an encryption algorithm without knowing its details.
-Second, In most cases, the purpose of the encryption algorithms in malware are to protect malicious code and hide secret data.
+So we suppose that the precondition of detecting an encryption algorithm is knowing its details.
+Second, In most cases, the purpose of the encryption algorithms in malware are to protect malicious code and hide sensitive data.
 Thus these encryption algorithms are often implemented without being obfuscated or packed in order to provide accuracy and efficiency.
 
-Previous detection methods generally take advantage of certain properties as the signature of an algorithm.
+Previous detection methods generally take advantage of certain properties of an algorithm as the signature.
 Caballero et al. took advantage of the fact that encryption routines use a high percentage of bitwise arithmetic instructions.
-The approach of Groebert el al. is based on both generic characteristics of cryptographic code and on signatures for specific instances of cryptographic algorithms.
+The approach of Groebert el al. was based on both generic characteristics of cryptographic code and on signatures for specific instances of cryptographic algorithms.
 Zhange et al. proposed algorithm plagiarism detection approach using critical runtime values.
-And Zhao et al. uses input-output correlation of certain algorithms to detect cryptographic data.
+And Zhao et al. used input-output correlation of certain ciphers to detect cryptographic data.
 
 There are several reasons why proposing new detection techniques is necessary to current security analysis.
 \begin{itemize}
     Existing approaches usually use tools such as QEMU and PIN are generally used to trace data and instructions.
     And these tools don't have satisfactory performance.
     Actually, Groebert el al. reported that for a malware analysis process the tracing took 14 hours and the analysis phase 8 hours.
-    QEMU is a full-system emulator, and to emulate the target program,
-    an entire operating system environment must be established first, which is very time-consuming.
-    PIN is a program instrumentation tool that performs dynamic binary instrumentation at program run time.
-    However, malicious code is sensitive to even the slightest modification, hence instrumentation may lead to failed detection.
-    Thus a new tool for efficient tracing is necessary.
+%    QEMU is a full-system emulator, and to emulate the target program,
+%    an entire operating system environment must be established first, which is very time-consuming.
+%    PIN is a program instrumentation tool that performs dynamic binary instrumentation at program run time.
+%    However, malicious code is sensitive to even the slightest modification, hence instrumentation may lead to failed detection.
 
     \item
-    Existing approaches are not extensible. 
+    Existing approaches are not extensible.
     That is to say, analysts can't easily adjust these specific approaches to either adapt different implementations of algorithms or to detect new ones.
-    A simple form of program is able to improve analyzing efficiency and help analyst deploy her own detection.
+    
 
     \item
     Taking traced instructions alone as input is not enough to acquire effective heuristics.
     For dynamic data based detection, the main problem is how to filter out useless data according to heuristics.
-    Because the data feature related to algorithm is very important for heuristics.
-    It is suggested to combined instructions and data together to acquire powerful heuristics.
 \end{itemize}
 
 In contrast to previous work in this area, the goal of our work is to design extensive, convenient and efficient detection approach.
+We argue that a new approach for efficient tracing is necessary.
+And because the data feature related to algorithm is very important for heuristics,
+it is suggested to combined instructions and data together to acquire powerful heuristics.
+What's more, a simple form of program is able to improve analyzing efficiency and help analyst deploy her own detection.
 We improve the detection approach in two aspects: one is to perform a high speed program tracing using process emulation,
 and the other one is to translate program into IL to simplify construction of heuristics and third-party matching extension design.
-In addition, we verify the matching result with input-output data correlation to reduce the chance of false positive, 
+In addition, we verify the matching result with input-output data correlation to reduce the chance of false positive,
 as well as extract the input and output parameter(e.g., the secret key) at the same time.
 
 
 
 Because of the nature of instruction emulation, full-system emulators often have a poor performance.
 In actual tests, we found that Bochs emulator runs $10^{2}$ slower than non-emulated environment.
-To emulate a single instruction, we often need tens even hundreds of actual
-instructions, which considerably impacts the runtime performance of a full-system emulator.
-
-In program analysis using full-system emulation, we see that the guest(emulated) operating system and the host operating system are usually the same, and the OS specific operations, such as process context switch, are trivial to our analysis. Therefore, we come up a program emulation proposal that directly emulates the target program on host operating system, which we call process emulation.
-
-Being different from full-system emulation, process emulation directly uses the host operating system to provide OS-specific features,  such as handling system API calls.
+To emulate a single instruction, we often need tens even hundreds of actual instructions, 
+which considerably impacts the runtime performance of a full-system emulator.
+
+In program analysis using full-system emulation, 
+we see that the guest(emulated) operating system and the host operating system are usually the same, 
+and the OS specific operations, 
+such as process context switch, are trivial to our analysis. 
+Therefore, we came up a program emulation proposal that directly emulates the target program on host operating system, 
+which we called process emulation.
+
+Being different from full-system emulation, 
+process emulation directly uses the host operating system to provide OS-specific features, such as handling system API calls.
 This assumption requires the guest OS and the host OS to be the same.
 The process emulator is a user-space application that can emulate other user-space applications,
-where CPU instruction execution, memory management and some OS features are emulated by the process emulator, and system calls(API) are executed by the host operating system.
+where CPU instruction execution, memory management and some OS features are emulated by the process emulator, 
+and system calls/APIs are executed by the host operating system.
 The architecture of process emulation is shown in figure~\ref{fig_full_sys}.
 
 
 
 
 \subsection{Program Partitioning}
-The first step of analysis is program partitioning, where sequential instructions traced from process emulation are partitioned into basic blocks or program segments.
+The first step of analysis is program partitioning, 
+where sequential instructions traced from process emulation are partitioned into basic blocks or program segments.
 The goal of this stage is to make partitioned segments the same scale as an algorithm implementation.
 Modern software is modulized, where algorithms used are usually implemented in a module,
 such as a class or a function. Identifying such a module in dynamic

icics2012/sec_intro.tex

 \section{Introduction}
 Recent years have witnessed a dramatic rise in the growth of work on automatically detecting certain algorithms in programs especially in malware.
-In order to solve the problem of algorithm detection, a number of approaches are proposed, most of which are mainly heuristic.
+In order to solve the problem of algorithm detection, a number of approaches were proposed, most of which are mainly heuristic.
 However, despite an increasing interest in algorithm identification in binary programs, and in particular in detecting cryptographic primitives,
 there still lacks systematic and convenient approach that facilitates researchers to perform efficient detection.
 
 We present a generic encryption function detecting approach using \emph{Process Emulation} and \emph{IL(intermediate language)-based Program Analysis}, which is targeted at achieving fast, convenient and extensible detection.
-The basic principle behind our technique is stripping unnecessary runtime information, 
+The basic principle behind our technique is stripping unnecessary runtime information,
 simplifying analysis process and providing interface for new extensions.
-First, we design and implement our own process emulator to reduce the overhead brought by emulating full system environment.
-Then we adopt a self defined simple IL to simplify analyzed program.
+First, we designed and implemented own process emulator to reduce the overhead brought by emulating full system environment.
+Then we adopted a self defined simple IL to simplify analyzed program.
 Based on this IL, not only we designers but also other analysts could easily write a template and match certain algorithms.
-And finally, we propose a combination of IL-based template matching and dynamic value verification to improve the accuracy and efficiency of encryption routines identification.
+And finally, we combined IL-based template matching and dynamic value verification to improve the accuracy and efficiency of encryption routines identification.
 
 Some of the contributions of this work are listed below.
 \begin{itemize}
     \item \emph{Lightweight process emulation.}
-    Process emulation is a novel emulation technique, which tries to run a program within its host operating system,
+    We designed process emulation, a novel emulation technique, to run a program within its host operating system,
     and only emulate the necessary components of a system for the program to be analyzed.
-    It provides a lightweight emulation environment with fast speed while keeping fine-grained analyzing capability.
+    This technique provides a lightweight emulation environment with fast speed while keeping fine-grained analyzing capability.
 
     \item \emph{IL-based program transformation.}
     To address the issues of dynamic program pattern matching and analysis,
     increasing its efficiency and accuracy, and acquiring platform compatibility at the same time.
 
     \item \emph{Flexible template matching.}
-    We provide an open interface for analysts to write template of different algorithms in IL form.
+    We provided an open interface for analysts to write template of different algorithms in IL form.
     Our emulator dynamically loads templates during the detection phase and uses template to construct heuristics.
 
     \item \emph{Template based data filtering and verification.}
 \end{itemize}
 
 The rest of the paper is structured as following.
-Section~\ref{sec_state} gives an overview of detection problem and related work.
+Section~\ref{sec_state} gives an overview of algorithm detection problem and related work.
 Section~\ref{sec_ours} describes our approach in detail.
 Section~\ref{sec_exp} gives concrete instance of template based encryption function detection and evaluation results.
 Some Countermeasures to our approach are discussed in section~\ref{sec_disc} and we gives an overview about future work.

icics2012/title.tex

 However, it's a complicated process to automatically detect encryption functions among huge amount of binary code,
 and the main challenge is to keep high efficiency and accuracy at the same time.
 In this paper we propose an enhanced detection approach.
-First we design a novel process level emulation technique to efficiently analyze binary code, which is less resource-consuming compared with full system emulation.
-Further, we conduct program partitioning and assembly-to-IL(intermediate language) translation on binary code to simplify the analysis.
-We apply our approach to sample programs using cryptographic libraries and custom implemented version of typical encryption algorithms,
-and show that these routines can be detected efficiently, allowing analysts or anti-virus tools to deal with the encrypted data within malware automatically.
-We also show that our approach provides an extensible interface for analysts to add intermediate language templates to detect other forms of functions rather than encryption routines.
+First we designed a novel process level emulation technique to efficiently analyze binary code, which is less resource-consuming compared with full system emulation.
+Further, we conducted program partitioning and assembly-to-IL(intermediate language) translation on binary code to simplify the analysis.
+We applied our approach to sample programs using cryptographic libraries and custom implemented version of typical encryption algorithms,
+and showed that these routines can be detected efficiently.
+It is helpful for analysts using our approach to deal with the encrypted data within malware automatically.
+Our approach also provides an extensible interface for analysts to add extra templates to detect other forms of functions rather than encryption routines.
+
+
+\keywords {Encryption detection, Process emulation, Intermediate language, Binary code analysis}
 \end{abstract}