Virtual Interpreter Component Design

发布于 2023-08-14  41 次阅读

The function undertaken by the virtual interpreter is to decode bytecode programs obtained from virtual instruction encoding at runtime, and invoke corresponding interpretation routines to restore their semantics and functionalities. Thus, the essential components comprising the virtual interpreter are the Dispatcher, VMdata (bytecode program), Handler (interpretation routine), and VMcontext (virtual execution environment). As mentioned earlier, our target code is JavaScript, encompassing DOM object and property operations. Through the process of virtualization, aside from the bytecode program VMdata obtained through encoding, there is also a string array VMarray that holds the semantic information of properties. Additionally, the virtual interpreter module is implemented using WebAssembly technology. Given the current browser environment's lack of direct support for executing '.wasm' files, special bridging code implemented in JavaScript is required to load the WASM module. In summary, the components and their relationships within the virtual interpreter designed and implemented based on WebAssembly are depicted in the diagram below. A detailed description of the components follows:

A.Virtual Execution Environment VMcontext

The binary program virtual interpreter operates directly at the native layer, where the system's stack and registers can participate in the computational process. Its internal virtual environment only needs to allocate a segment of memory to simulate mapping of the native real registers. In this article, the logic of the virtual interpreter for JavaScript code is implemented using the C language, requiring emulation of the native layer's execution environment at the source code level. Therefore, as shown in the diagram, the Virtual Execution Environment (VMcontext) needs to encompass core structures for instruction execution. These include the Stack for maintaining essential structures, the Register for storing temporary variables, and a VarList acting similarly to memory, primarily used for transferring external variables and parameters. These critical structures will be simulated using arrays.

B.Critical Data VMdata & VMarray

The bytecode program VMdata is a product of virtual instruction encoding, manifesting as a sequence of intricate bytecode. During runtime, the Dispatcher reads and decodes it, then orchestrates the scheduling of interpretation routines to restore the functionality of the target code. This bytecode encapsulates the semantic logic of the target code. In this study, a design employing the C language is employed to implement an integer array for storage. Following compilation, this array will be preserved within the data segment of the WebAssembly (WASM) module. VMarray, a string array, stores extracted string constants and object property names during the virtualization process of the JavaScript target code. It encompasses the semantics and operational information of these properties. During runtime, specialized property operation Handlers read its elements and, through concatenation, reconstruct the computational operations of object properties. Within this work, a string type array implemented using the C language is used for storage. Similarly, after compilation, it will be retained in the data segment of the WASM module.


The Dispatcher serves as the scheduling nucleus of the virtual interpreter, with its pivotal role illustrated in the diagram above. Initially, it reads and decodes bytecode from VMdata, with the sequencing of bytecode retrieval primarily governed by a variable known as the VPC (Program Counter). Subsequently, it decodes and schedules the appropriate interpretation Handler programs based on indices for execution. Upon completion, control returns to the Dispatcher, and the aforementioned process continues until the decoding of all bytecode is finalized, thereby concluding the loop. Its primary execution logic relies on VMdata to guide the scheduling of the relevant Handlers to restore the original program's functionality. Tailored to the functional requirements, the design utilizes loops in conjunction with selection structures to implement this scheduling process, as exemplified in the code snippet shown below.

D.Interpret assembly Handlers

The Interpreter Assembly is a crucial component for restoring bytecode semantics, as illustrated in the first figure. Once the Dispatcher decodes the bytecode, it invokes the relevant interpretation Handlers to execute computational operations. Virtual instructions can be categorized into parameterized and non-parameterized types. For instructions with parameters, the corresponding interpretation Handlers retrieve values from VMcontext based on the parameters read from VMdata. For instructions without parameters, values are directly retrieved from the top of the stack to perform the relevant operations. The results of these operations encompass the restoration and manifestation of code functionality, thereby affecting the values stored in VMcontext. Consequently, there exists an interactive relationship between the interpretation Handlers and the virtual execution environment.

Object operations in the code frequently require Handlers to execute object and property concatenation. As previously mentioned, WebAssembly does not support such calculations and string operations, and C language cannot emulate this process. To address this challenge, this study designs the final property operation part using JavaScript and employs special macros provided by the compiler to inline JavaScript within C code. The Handler example code shown in the middle of the figure is for the parameterized instruction 'lod_a'. Its primary function is to load a string constant as the subsequent operation object based on the parameter index. The macro 'EM_ASM_ARGS' is employed here for inline communication, transmitting the property string to JavaScript for the final operation.

E.Glue Code

The core portion of the virtual interpreter will ultimately be compiled into a file with the suffix '.wasm'. However, current browsers do not yet support the direct loading of '.wasm' files during execution. This task is carried out by a functional glue code implemented in JavaScript, responsible for loading the '.wasm' file into the browser environment. The primary operations of this glue code involve loading the '.wasm' file, reading its binary code, and storing it within an ArrayBuffer. Following this, the extracted WebAssembly module is instantiated.

Within the '.wasm' file, there may be additional environmental variables introduced. During instantiation, memory space and variable mapping tables need initialization. Moreover, when loading the WebAssembly module, variable parameters can also be passed to it. All these operations are handled through the glue code.


点击体验一键VMP加密 |下滑查看JSVMP相关文章