Automating static analysis with Python and Scitools Understand

I. Introduction

Modern software development demands efficient code maintenance strategies. In fact, as code bases grow in size and complexity, maintaining clean, readable and maintainable code has become increasingly challenging. While refactoring is essential to manage technical debt in legacy system, identifying where to refactor, and how to do it used to depend only on developers' judgement, which is a fastidious task and can lead to refactoring inconsistencies.

Static analysis offer great insights into code quality and structure, yet their typical workflows are based on manual interactions via GUIs and exporting data for offline analysis. This blog discusses a programmatic workflow for static analysis with python and Scitools Understand Software, that enables developers to automate analysis, gather metrics and prepare data for visualization or further analysis without having to jump between tools.

II. Problem with traditional static analysis workflow

Typical static analysis processes are defined by:
- Manual launch of static analysis tool and setting up project's option through the GUI
- Analysis execution and waiting for results, or output information
- Exporting data to spreadsheets or other file formats
This approach creates unnecessary friction, and can not be easily integrated CI/CD pipelines or other automated workflows.

III. Python-driven static analysis with Scitools Understand

Understand is a static analysis tool specialized in source code comprehension, metrics, and standards testing. The platform is engineered to facilitate maintenance and comprehension of existing legacy or newly developed source code repositories through its cross-platform, multi-language, maintenance-oriented IDE (Interactive Development Environment). The source code analyzed may include C, C++, C#, Objective C/Objective C++, Ada, Assembly, Visual Basic, Fortran, Java, JOVIAL, Pascal/Delphi, Python, VHDL, and Web (PHP, HTML, CSS, JavaScript, TypeScript, and XML). Understand's analytical data is programmatically accessible via provided language interfaces, enabling integration with external applications. This blog aims at taking advantage of the Python API interface to:
- Analyze a Java code base directly from a python script;
- Extract full or custom metrics, depending on project's specific needs
- Transform data into analysis-ready formats and save results for visualization or further processing

III.1 Setting up the environment

The working environment is:
- Ubuntu 24.10 OS
- Python3.12
The present implementation utilizes SciTools Understand Build 1220 which was deployed via installation from the source package downloadable at the software's official website. These packages are required before installing SciTools Understand on Linux distributions: libxcb-util1, libxcb-icccm4, libxcb-image0, libxcb1, libxcb-keysyms1, libxcb-render-util0

%tar -xvzf Understand-7.0.1220-Linux-64bit
%cd scitools/bin/linux64
%./understand

The SciTools Understand Python API became accessible commencing with Build 1054 via the upython script included in the software package. This functionality is exclusively compatible with python3.12 and later versions. Following appropriate environment configuration, the upython script may be activated through the execution of its script located at this path /opt/scitools/bin/linux64/upython.

III.2 Python-driven code quality analysis using SciTools Understand

From python script, we will access a local Java code base and conduct analysis at a file-level. The analysis reports multiple metrics, including for example cyclomatic complexity, the number of lines, maximum number of nested instructions, the number of code lines. The followinf key steps are necessary to perform the analysis:
1. Specify Java source code path
repositorysource_path = "Java/java-design-patterns"
2. Creating the understand project, store it with ".und extension, and finally run the software using the following three commands.

project_name = "JavaMetrics_"  + ".und"
project_path = os.path.abspath(project_name)
subprocess.run(["/opt/scitools/bin/linux64/und", "create", "-languages", "Java", project_path])

3. Creating a new project and analyzing it

subprocess.run(["/opt/scitools/bin/linux64/und", "create", "-languages", "Java", project_path])
subprocess.run(["/opt/scitools/bin/linux64/und", "add", source_path, project_path])
subprocess.run(["/opt/scitools/bin/linux64/und", "analyze", project_path])

4. Opening the database, define and collect metrics

db = understand.open(project_path)
all_metrics = ["Cyclomatic", "CountLine", "CountLineCode",						 "MaxNesting","SumCyclomatic", "CountStmtDecl", "CountStmtExe"]
metrics_dict = entity.metric(all_metrics)

And the gather collected metrics into a CSV file for subsequent analysis.

III.3 Results:

Table 1 represent collected results

CSV file of collected metrics from Java project

IV. Conclusion

In this blog, we have discussed how to automatically connect to SciTools Understand software to statically analyze Java code from a python script. The key contribution here is the implementation of a process that limits manual interaction with multiple application interfaces.

References