1. Martin Vejnár
  2. masters_thesis

Commits

Martin Vejnár  committed fb76db0

done

  • Participants
  • Parent commits 0736343
  • Branches default

Comments (0)

Files changed (2)

File 00-overview.tex

View file
 First, the source file to be translated is passed to the C language preprocessor.
 \textsc{ANTLR}-based parser then performs lexical and syntactic analysis of the preprocessed data,
 producing the program's \emph{abstract syntax tree} (AST).~\cite{dragonbook}
-The AST is directly serialized into an XML document~\cite{ref:xml}---each element in the XML document
+The AST is directly serialized into an XML document~\cite{ref:xml}---each element in the document
 corresponds to a node in the AST.
 Finally, for each function, a control-flow graph~\cite{muchnick} is constructed---the nodes
 of the graph are the XML elements corresponding to function statements.
 \textsc{Stanse} currently includes several checkers that consume the internal representation.
 Each checker performs different kind of analysis and can detect a different class of defects.
 As we will be extending~\textsc{Stanse} with the support for a new language,
-it is important to note that not all checkers are completely independent of the source language.
+it is important to note that not all checkers are independent of the source language to the same degree.
 
 The reachability checker, for instance, uses the control-flow graph merely to locate nodes
 (recall that nodes correspond to statements in the original program)
 that are not reachable from the start node.
 Such nodes are reported to the user as they represent dead code that will never be executed,
-which often indicates a bug in the program.
+and often indicate a bug in the program.
 Note that the reachability checker does not interpret the XML data associated with the nodes.% in any way.
 
 
 Consider, for example, the statement \texttt{f1() \&\& f2()}.
 This statement has non-trivial control-flow---the function \texttt{f1} is called first,
 followed by a call to the function \texttt{f2} only if the former function returned a non-zero value.
-In addition, note that as the statement performs two calls
+In addition, since the statement performs two calls
 and the call graph generator is limited to a single call per statement,
 the constructed function call graph will not be complete.
 
 We also chose not to use XML as the encoding for the control-flow graph nodes,
 because it is hard to manipulate for the developer,
 and it potentially increases the application's memory footprint.
-XML allows complete AST subtrees to be associated with a node in the control-flow graph;
-by breaking the graph nodes into smaller pieces, however, XML becomes unnecessary.
+While XML allows complete AST subtrees to be associated with a node in the control-flow graph,
+since we broke the graph nodes into smaller pieces, XML is no longer necessary.
 
 %Unfortunately, such a change reduces the context available during pattern matching
 %in thread, lock and automaton checkers (i.e.\ patterns would be matched against smaller pieces of the original code).
 consisting of a name and zero or more operands.
 There is a limited set of instructions and operand types.
 %Instructions are allowed to use as operands the values returned by other instructions.
-We strived to make the set of instructions language-independent---%
-no language-specific constructs should leak to checkers.
-Such a design makes it easier to create new front-ends,
-as no modification has to be done to the existing checkers.
+We strived to make the set of instructions minimal and language-independent.
+%no language-specific constructs should leak to checkers.
+Having such interface makes it easier to develop both new front-ends and new checkers.
 
 %This allows the association between control-flow graph nodes constructed from a single program statement to remain captured,
 %providing the necessary context for pattern matching.
 
 %of checkers that are able to consume the new representation is necessary.
 
-The fact that the new representation is language-independent also implies that it 
-can replace the existing representation completely.
+We also designed SIR so that it can ultimately replace the existing representation completely.
 For each existing checker, the new representation either already contains
 all the information the checker requires, or can be easily extended to contain it
 (here we refer to the types of variables required by the pointer checker).
 
 As for the translation of C++ programs to SIR,
 we decided not to write our own C++ language parser
-and instead depend on \textsc{Clang} libraries.~\cite{ref:clang}
+and instead depend on the \textsc{Clang} libraries.~\cite{ref:clang}
 \textsc{Clang} serves as a C, C++ and Objective C front-end for the LLVM~\cite{LLVM:CGO04} compiler suite
 and is able to provide us with the AST of any valid C++ program.
 Most of the translation process then consists of traversing the AST
 and translating it to SIR.
 More advanced features of the C++ language---%
-including exception handling or late binding---%
+including exception handling and late binding---%
 are transformed to simpler constructs,
-so as to ensure that minimal changes have to be mode
-to support the features in checkers.
+so as to ensure that minimal changes have to be made
+to support these features in checkers.
 
 \section{Our contribution}
 

File 04-tools.tex

View file
 Direct checking of a SIR unit is typically performed when mutliple SIR units are merged together
 (recall that opening multiple C++ source files in \textsc{Stanse} will not perform late binding across them).
 Note that since SIR units contain the names of the original source files,
-the error traces will not run through SIR files, but through the original source files instead.
+the error traces will not run through the SIR files, but through the original source files instead.
 
 \section{Testing}