Virtual Directives and Handler Design

发布于 2023-07-22  45 次阅读

After breaking down the target code into instructions, we obtain the low-level representation of JavaScript code. Careful analysis reveals that this low-level intermediate code is composed of several types of atomic operations, each serving a specific function. These atomic operation instructions can be likened to assembly instructions in a compiler program. After classification, we can design a set of semantically complete virtual instructions to accomplish the virtualization process of JavaScript code. First, we categorize the intermediate code. In traditional binary programs, instructions are often divided into three types: data transfer instructions, control transfer instructions, and arithmetic logical operation instructions. The target JavaScript code, after instruction splitting and character conversion, also contains these three types of operations. Moreover, due to the existence of JavaScript objects, the intermediate code also includes atomic operations targeting objects and properties. As a result, the split intermediate code is further classified into four categories, as shown in the table.

Regarding the Handler, its core implementation is achieved through the 'EM_ASM' macro, which inlines JavaScript code in C. The data reading and execution scheduling are carried out in WASM, while the final functionality is completed by JavaScript. Apart from data transfer instructions, most virtual instructions do not have explicit parameters. Instead, they implicitly retrieve the objects to be operated on from the top of the stack, based on the operation's requirements, and save the final execution results back to the stack. Data transfer operations involve moving operands to specific containers to achieve specific purposes.

As we utilize a stack-based virtual machine architecture, all data passes through the stack, and registers serve as temporary storage for intermediate values during computation. Thus, we need to design two types of data transfer virtual instructions. One is the 'lod' (load) instruction, which pushes the value of a specific storage unit, such as a constant, onto the stack's top. The other is the 'stor' (store) instruction, which fetches the computation result from the stack's top and saves it to a specific storage unit. Table 1 provides examples of virtual instructions for some data transfer operations, all of which have operands and are categorized into four addressing modes: 'Immediate Addressing,' 'String Array Addressing,' 'Register Addressing,' and 'External Variable Addressing.'

Here, addressing is different from the conventional addressing in binary code. In this context, 'String Array Addressing' indicates that the operation object is an indexed array element 'VMA[byte]' (string constant) based on the parameter 'byte.' On the other hand, the definition of an external variable refers to function parameters, global variables in the code, external variables, etc., which are all uniformly stored in a variable pool 'Varlist[]'. Therefore, this addressing mode indicates that the operation object is an external variable 'Varlist[byte]' indexed by the parameter 'byte.' As for storing instructions, their operation object can only be a storage space, so the 'stor' instruction has only two addressing modes.

Attribute operations are vital atomic operations in the JavaScript code virtualization process. Their primary purpose is to concatenate the already split objects, object properties, and attribute method parameters to restore the functionality of the target JavaScript code. In a stack-based virtual machine architecture, this type of operation directly retrieves the operation object from the top of the stack to perform the computation process. In addition to the instructions shown in Table 1, the 'set' instruction also belongs to the attribute operations category, and its function is to assign values to object properties. Particularly, the 'call' instruction handles the uncertainty of the number of parameters in a method call, requiring additional parameters in the Handler to indicate the number of method parameters, ensuring semantic completeness during the virtualization process. Additionally, there is a category of 'fcall' instructions that simulate custom function calls, and their core operation involves calling a function object, such as 'a(b)', without involving object and property retrieval. These instructions also require parameters to indicate the number of function call parameters.

The instruction splitting process results in a clear branching structure of the intermediate code, with the destination addresses for jumps directly obtained. Therefore, in this paper, control transfer operations are divided based on jump conditions: direct jumps and conditional jumps. As shown in Table 1, 'jmp' belongs to direct jumps, and when this instruction is executed, the corresponding destination address parameter is read from the bytecode, and the jump task is performed directly. On the other hand, the conditional jump instruction 'je' calculates the corresponding condition judgment before execution, regardless of the complexity of the condition. The condition result is computed and pushed onto the stack, so only the judgment result needs to be retrieved from the top of the stack, and if the condition is met, the jump task is executed. Since virtual instructions are ultimately encoded into bytecode, jump implementations modify the virtual program counter 'VPC' to achieve the desired effect, making the parameter the offset from the jump instruction to the destination instruction address. Arithmetic logical operation operations are similar to attribute operations. In a stack-based virtual machine architecture, there are no explicit operands, and all parameters are already pushed onto the stack by load instructions. As needed, operands are retrieved from the top of the stack to complete the corresponding computation process and push the computation result back onto the stack.


点击体验一键VMP加密 |下滑查看JSVMP相关文章