ArticlePDF Available

Abstract

In our earlier research[1], we described the Dynamic Syntax Tree method implementation for enhancing the Static Analysis process. After 10+ years of experience, we collected the significant results presented in this paper
Dynamic Syntax Tree: Implementation Results
Prof. Tim Moses, Department of Software Engineering, BitBrainery University, London UK
David Syman, CTO, Security Review Консультант, Chiʂinau - MD
Marco Barzanti, Security Auditor, Poste Italiane - IT
[2012- 10th of December]
Abstract In our earlier research[1], we described the Dynamic Syntax Tree method implementation for enhancing the Static
Analysis process. After 10+ years of experience, we collected the significant results presented in this paper
Keywords - dynamic syntax tree, dynamic analysis , static code analysis, abstract syntax tree, parser, semantic
I. INTRODUCTION
In our earlier research[2], we presented a Dynamic Syntax
Tree-based implementation. Main differences respect than
Abstract Syntax Tree (AST) are:
Very compact Syntax Tree. More than 10 time less than
AST, due to adding an Object Dictionary containing all
object information belonging to a Class, to the DST itself
that contains only pointers to the Object Dictionary.
The Syntax Tree is split into a number of small DSTs, and
paged to small XML files, reducing RAM consumption.
In this way a fixed peak RAM value can be configured
before Static Analysis execution.
The Syntax Tree includes Dynamic information too, for
more accurate Analysis results.
The main contribution of this paper is to present 10 years
results collected in real, international-wide business cases.
For testing the implementation, we use the re-engineered
Security Review Консультант products, named Security Add-
on and Quality Add-on for McCabe®, implemented with the
Dynamic Syntax Tree. Dynamic code is processed by mapping
of dynamic constructs, and then usual techniques for
vulnerabilities detection in the static way are used in
combination with dynamic sandboxing. The semantic analysis
works even for static languages too. The implementation is
able to gather useful information about the source code,
such as possible values of variables or possible relations
between objects.
II. DYNAMIC SYNTAX TREE IMPLEMENTATION
Pre-processing the source code created a specialized
Dynamic Syntax Tree for each Class found. That has been
applied to traditional programming languages too, like
COBOL, where Classes are represented by Programs,
Methods by calls/Performs, Parameter by Using etc. In the
Security Review Консультант implementation, this pre-
processing phase will generate:
A separate Object Dictionary for each Class. All Class
objects will be mapped into 2-bytes Dict-Id, handling a
maximum of 65535 objects per Dictionary. Instead of
storing the object name, 4-bytes and 1-byte pointers to
source will be used for retrieving the object’s name
(source code line and name’s starting column). Parent
Dict-ID (for child Classes) or 1-byte Type + 1-byte
local/global attribute (for the others), and 1-byte
bitmask Attribute field (abstract, serializable, public,
private, protected, static and final).
A Dynamic Syntax Tree for each NameSpace or Package
storing: 10-bytes NameSpace or Package Name and File
Name (including web pages and configuration files) in
compressed format. Further, 2-bytes Dict-Id for Class,
Inner Classes, External Classes, Methods, Parameters,
Branches and Variables will be stored in the tree. For
Methods and Branches, further to Dict-Id, also an hash
code will stored, for Code Duplication detection
purposes. For Branches, conditional statement as a
single line and nesting level (for calculating Quality
metrics) are also stored. Fields will be compressed
using Huffman Coding [3].
Thereafter this pre-processing enables us to work with the
syntax tree of the dynamic source code as it is in a static code
with some limitations, that are not resolvable until runtime
in dynamic languages. For that reason we provide a binaries
analysis too. Binaries will be sandboxed collecting dynamic
information at runtime, using a very fast algorithm that we
discussed in [4]. Mixing source code and binaries analysis
fixed the above mentioned limitations, updating the Dynamic
Syntax Tree with additional information. Object Dictionaries
and Dynamic Syntax Trees are multiple, and optimized for
low resource consumption and higher performances.
Differently from AST and CST (Parse) trees, in case of huge
Classes having more than 65535 objects, the DST Object
Dictionary structure (68,083 nodes), is paged into some small
XML files, about 575KB sized each. The same is done with the
Dynamic Syntax Tree itself: only 4096 Classes at time are
processed, max 135KB each. There was no case of RAM usage
over 700MB, that means we were able to perform a Static
Analysis using a low-end Windows XP notebook with 1GB of
RAM with a single Pentium processor, and up to 5
simultaneously static analysis of different applications at time
are achieved using only a 4GB RAM, in a dual-core processor
machine. A Windows 2008 server version with 8GB RAM,
processed up to 15 analysis at time.
III. RESULTS
The re-engineered software was used for analyzing about
800 Millions of Lines Of Code (MLOC), in different business
sectors, located in 5 countries:
Results were collected in anonymous way, only some
technical information were stored like Industry sector,
Supplier (outsourcer) type, target Platforms and Languages.
Each Application has been analyzed in 3-4 Versions per year,
over 10000 analysis in total.
Each Customer has more than 2 Suppliers, 30 in total, some
of them in common.
Many Suppliers are products’ owners (Commercial Supplier).
% of Internally Developed projects are growing down versus
Outsourced.
Government Institutions and Software integrators were the
main Customer’s base of our analysis, even Financial
Institutes (Banks, Assurance) were the most contributors in
terms of application size (MLOC, Millions of Lines Of Code).
Portable applications are mostly written in JAVA, JavaScript
and PHP, but only few are truly portable as reported.
Mobile Platforms are emerging.
High Business Criticality does not drive all development
projects “in-house. More than 30% of all applications rated
High or Very High in business criticality were sourced by
Commercial software vendors.
Majority of projects are not compliant with OWASP Top 10 or
SANS Top 25. Open Source has higher quality than Internal
Developed.
Commercial has longest remediation cycles
0,00
5,00
10,00
15,00
20,00
25,00
30,00
Supplier
Commercial
Internally
Developed
Open Source
Outsourced
23,38
36,02
38,51
2,10
Industry
Financial
Software
Goverment
Other
16,79
1,93
18,32
39,47
13,04
7,42 3,03
Language
Java
JavaScript
C/C++
.NET
PHP
Java Android
12 11 9875 5 5 5 5 5
3677810 10 12 15 16 17
0
5
10
15
20
25
20022003 2004 2005200620072008 2009 201020112012
Internally Developed Outsourced
0,00
10,00
20,00
30,00
40,00
50,00
Portable Windows Linux Unix Mobile
Platform
The following table synthetizes our effort, showing the huge
difference between the number of Dynamic Syntax Tree
(DST) nodes obtained with the described implementation,
respect than number of Abstract Syntax Tree (AST) nodes
calculated with ANTLR[5]. We are working on further 15%
DST nodes optimization, 20% expected.
# Is the number of analyzed applications. 3007 in total
MLOC is Millions Lines Of Code. 812 MLOC in total.
IV CONCLUSION AND FUTURE WORK
The presented paper described 10 years of analysis results
obtained us an implementation of automatic analysis of the
dynamic language source code using Dynamic Syntax Trees.
The implemented Dynamic Syntax Tree will be used for some
products re-engineering and, after some years of stable
Static Analysis experiences, will be compared to other AST
and CST-based solutions. A separate paper at that time will
be available.
V. ACKNOWLEDGMENTS
This work was gently supported by:
Ruth Goldberg, Software Engineer, Security Review
Консультант, Chiʂinau - MD
REFERENCES
[1] Moses.T., Syman.D., Barzanti M. Static Analysis: A Dynamic Syntax
Tree Implementation. London, December 2001
[2] Moses.T., Syman.D. Static Analysis of Applications written in
modern languages. Moldova, 1999. Translated from Russian and
published by ResearchGate, 2008
[3] Huffman D.A., "A method for the construction of minimum-
redundancy codes", Proceedings of the I.R.E., September 1952
[4] Moses.T., Syman.D., Barzanti M. Binary Analysis: A Dynamic
Sandboxing Implementation. London, July 2006
[5] T. J. PARR University of Minnesota, R. W. QUONG School of Electrical
Engineering, Purdue University. ANTLR: A Predicated LL(k) Parser
Generator. July 1995.
... HYBRID ANALYSIS TECHNIQUES In the following sections we try to briefly describe the hybrid analysis techniques listed above. It is necessary for a better understanding of the proposed new hybrid approach, based on Dynamic Syntax Tree [6] representing the real scope of this publication. ...
Article
Full-text available
Mobile devices are rapidly expanding as per user's need and they are potential risky for personal and professional privacy, because of the various malware and increasing security issues and threats. With lots of frequent application releases and updates happening, conducting a complete security analysis of mobile Apps becomes crucial. In this paper, we present a different approach to ensure the security of these devices starting from the Apps development. We examine the two major platforms in the mobile space, iOS and Android, and for each we provide a thorough investigation of existing and historical security features, evidence-based discussion of known security bypass techniques, and concrete recommendations for remediation.
Conference Paper
Full-text available
Most of Static Analysis tools are nowadays based on Abstract Syntax or Concrete (aka Parser) Trees. For analyzing applications written in modern programming languages, were types and objects are dynamically created, those tools cannot provide accurate analysis results because they are designed for static programming languages only. Moreover described is the new Dynamic Syntax Trees-based method for enhancing the Static Analysis process.
Article
Full-text available
In our earlier research on area of Static Analysis of applications written using modern languages, we discussed about lack of accurate analysis of algorithms based on Abstract Syntax and Concrete (CST, aka Parser) Trees. Moreover described is the Dynamic Syntax Tree method implementation for enhancing the Static Analysis process.
Article
Despite the parsing power of recursive-descent parsers by hand to obtain increased flexibility, better error handling, and ease of debugging. We introduce ANTLR, a public-domain parser generator that combines the flexibility of hand-coded parsing with the convenience of a parser generator, which is a component of PCCTS. ANTLR has many features that make it easier to use than other language tools. Most important, ANTLR provides predicates which let the programmer systematically direct the parse via arbitrary expressions using semantic and syntactic context; in practice, the use of predicates eliminates the need to hand-tweak the ANTLR output, even for difficult parsing problems. ANTLR also integrates the description of lexical and syntactic analysis, accepts syntax trees. ANTLR is widely used, with over 1000 registered industrial and academic users in 37 countries. It has been ported to many popular systems such as the PC, Macintosh, and a variety of UNIX platforms; a commercial C++ front-end has been developed as a result of one of our industrial collaborations.
Article
An optimum method of coding an ensemble of messages consisting of a finite number of members is developed. A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.
Static Analysis of Applications written in modern languages
  • Moses T Syman
Moses.T., Syman.D. Static Analysis of Applications written in modern languages. Moldova, 1999. Translated from Russian and published by ResearchGate, 2008
Binary Analysis: A Dynamic Sandboxing Implementation
  • T Moses
  • D Syman
  • M Barzanti
Moses.T., Syman.D., Barzanti M. Binary Analysis: A Dynamic Sandboxing Implementation. London, July 2006
ANTLR: A Predicated LL(k) Parser Generator
  • T J Parr University Of Minnesota
T. J. PARR University of Minnesota, R. W. QUONG School of Electrical Engineering, Purdue University. ANTLR: A Predicated LL(k) Parser Generator. July 1995.