Splitting JavaScript code instructions

In the previous section, unlike traditional compiled programs, JavaScript applications are text code with syntactic properties and lack the inherent atomic nature of native operating instructions. Therefore, the key challenge in virtualizing and protecting JavaScript code lies in the need to first perform a special split transformation on the target code, disrupting its readable syntactic properties, extracting intermediate code with atomic operation characteristics, and then designing corresponding virtual instructions and processing functions, ultimately encoding instructions to complete the virtualization process.

Understanding JS Encryption Techniques and Analysis of JSVMP Protection Principles in Depth

The main purpose of instruction splitting is to break down and refine the syntax structure of the target JavaScript code, disrupting its readable syntax in order to obtain an intermediate code sequence that can be reconstructed using a combination of atomic operations. We select a stack-based instruction architecture to design the specific splitting process, simulating the scheduling method of native instructions. Generally, virtual machines can be classified into two types: stack-based and register-based. The choice of virtual machine architecture has a significant impact on the design of the virtual instruction set. The diagram below illustrates the comparison of instructions for the statement 'a=b+c;' in two different virtual machine architectures. In a register-based virtual machine, fewer instructions are needed to achieve complex functionality due to the use of multiple registers, making the interpretation process more complex. On the other hand, a stack-based virtual machine performs calculations based on the stack, resulting in a more standardized and straightforward virtualization process with simpler addressing modes. Therefore, this paper adopts a stack-based virtual machine structure to construct the final virtual instruction set.

Statement splitting

The splitting of instructions is primarily achieved through operations on the abstract syntax tree (AST). Firstly, the target code is analyzed and the abstract syntax tree of the code is extracted. There are various tools available for generating abstract syntax trees, such as the parser included in the UglifyJS compression tool, as well as online tools like Esprima. In this implementation, the Parser class from the Rhino engine provided by Mozilla is used to analyze and extract the abstract syntax tree of the JavaScript code.

Once the abstract syntax tree is obtained, each statement block is divided into multiple subtrees, and the instruction splitting is performed by traversing the tree in post-order. For computational operations, specifically for nodes of type 'InfixExpression' in the syntax tree, as shown in the code on the top left of the diagram, the operands and operators are separated and split into intermediate instructions based on a stack-based architecture.

For operations involving objects and their property methods, such as the example 'document.write(str)', special handling is required. In the abstract syntax tree, this subtree structure is mainly composed of 'PropertyGet' and 'FunctionCall' nodes. In JavaScript, we can change the access of properties (PropertyGet) from dot (.) notation to bracket ([]) notation (ElementGet). Global variables and DOM objects are members of the 'window' object by default. This allows us to split the above code into an 'ElementGet' and a 'FunctionCall' in the form shown in the diagram below. At this point, all objects and properties are treated as constant parameters, making it easier for further processing.

Structural splitting

The above splitting process is applicable when each subtree represents an independent statement block. However, JavaScript is a scripting language that includes advanced code structures such as loops and conditional branches, where each loop subtree and conditional branch subtree may contain multiple statements.

Before performing statement splitting, it is necessary to flatten and split the structures such as branches and loops. The diagram below illustrates the splitting process for a conditional branch structure. Through structural splitting, an IF subtree is divided into multiple expression subtrees, each containing only one statement block. The "ifjmp" and "elsepart" markers are inserted to facilitate the insertion of branch jump instructions and destination addresses during the virtual mapping process, ensuring the correctness of program execution. The splitting operation for loop structures is similar, except that in addition to splitting the condition module and loop body, a forward jump instruction is added at the end of the loop body to ensure the restoration of loop logic.

Through the aforementioned process of instruction splitting, we ultimately obtain a segment of intermediate code that has lost its syntactic properties. At this stage, the intermediate code takes the form of a low-level expression for the script code, where each instruction possesses atomic operation characteristics but lacks the ability to be executed directly. This transformed intermediate code is more suitable for further processing and optimization, in order to generate an executable sequence of instructions. This approach enables us to exert more precise and efficient control over the JavaScript code.

JS一键VMP加密 jsvmp.com

JSVMP一键加密

Splitting JavaScript code instructions

Statement splitting

Structural splitting

JS代码指令拆分

JS代码字符转移

jsvmp.com

JSVMP一键加密