jQAssistant Language Concept Extractor Architecture

The Language Concept Extractor (LCE) architecture for jQAssistant provides a generic framework for building native tools to scan the source code of arbitrary programming languages and extract relevant language concepts from it. It then consolidates the extracted information into an easy-to-process JSON format for a jQA plugin.

Key Goals

  • extensibility: easily implement the detection and extraction of new language concepts
  • maintainability: the implementation should be easily adaptable to changes in the programming language
  • up-to-date: used APIs and libraries need to closely follow release cycles of the analyzed programming language to allow for the fast adoption of new syntax constructs, etc.

Solution

Core Idea:

  • split scanning process of source code into two parts:
    1. processing of AST using a natively implemented tool for the programming language, to easily extract/consolidate relevant information
    2. graph generation using the consolidated information from step one by using standard jQA scanner mechanisms
  • usage of JSON as an intermediary format as it can easily be processed on most platforms

Basic Overall Process:

flowchart LR
	source[(Source Code)]
	json[[JSON Representation]]
	neo4j[(Neo4j Graph)]
	source-->|LCE Tool|json
	json-->|jQA Plugin|neo4j

Language Concept Extraction Process: (performed by the LCE Tool) The Extractor API orchestrates the extraction process to obtain project objects which are then exported to a JSON file. The orchestration process encompasses the following steps:

  1. native tools and APIs are used to get an enriched, structured view on the source code in the form of ASTs and other data structures
  2. traversers traverse the ASTs of all source files and execute different processors to extract information on a file-by-file basis
  3. the project objects with the extracted language concepts are re-processed by post processors on a project-wide/cross-project basis, allowing for advanced resolution algorithms
  4. the processed project objects are then exported to a JSON file

Concepts & Mechanisms

Projects using the LCE Architecture