Code Protection (JSJIAMI) Techniques and WebAssembly

发布于 2023-03-15  136 次阅读

Research Background:

In the early days of its application, JavaScript mainly dealt with simple operations such as modifying page content and submitting forms. The code structure was simple and there was no high demand for security. However, with the improvement of network and browser performance, front-end JavaScript applications have become more diverse and rich in functionality. The enhanced execution capability and performance have also caused more important logic to shift from the back-end to the front-end. At the same time, this also means that the work of JS encryption is becoming more and more important, and its own security issues and security needs are becoming more urgent.

In many front-end pages involving login, registration, secure payments, and transactions, JavaScript scripts are relied upon to implement security requirements during the interaction process. For example, when a user completes an interaction on a transaction payment page, front-end JavaScript security code will collect feature data on the user's behavior, encrypt and reorganize it, and then analyze and compare the inherent behavioral characteristics of individuals to identify whether there is machine behavior and promptly block the interaction process. As shown in Figure 1, JavaScript code plays an important role in human-machine verification. Once malicious users crack the encryption method of front-end code and reverse engineer the execution logic, they can often forge data to simulate a normal user, bypass security verification, and achieve their malicious goals.

Therefore, with the rapid development of e-commerce and other services, a large number of black industries have emerged that profit from malicious registration, brush orders, and other technical means. To build a secure and trustworthy front-end interaction environment, it is first necessary to ensure the security and reliability of input data and authenticate the authenticity of messages and data from users. However, as the main means of collecting data, JavaScript code in front-end browsers is directly exposed, making it difficult to prevent malicious users from analyzing code and forging data for profit. Therefore, the importance of JavaScript code security in front-end security battles is becoming increasingly prominent with the use of JS encryption.

Currently, the mainstream JavaScript code protection measures mainly include simplification, encryption, and obfuscation, with the core ideas mainly borrowed from traditional software code protection technologies. However, as a scripting language, JavaScript is a text source code with syntax attributes during transmission, making reverse analysis easier than traditional compiled binary applications. In addition, with the development of browser performance and the increasing functionality of debuggers, these protection methods are difficult to provide effective protection.

To address the shortcomings of current JavaScript code protection methods, we propose a JavaScript code virtualization protection method based on WebAssembly. This method draws on the idea of code virtualization protection in binary code protection, and designs and implements a virtualization protection scheme for JavaScript code based on the Abstract Syntax Tree (AST) for special instruction splitting and custom virtualization processing of target code. Then, a specific virtual interpreter is implemented based on WebAssembly compilation to interpret and restore the semantics and logic of the target code during execution. Through virtual machine protection technology, the execution logic of JavaScript code is hidden to prevent the core functionality and logical structure of target code from being maliciously analyzed and used, satisfying the security needs of critical JavaScript applications in the current front-end attack and defense environment.

II. Current Status of JS Encryption

  1. JavaScript Code Protection

With the rapid development of web applications and browser performance, JavaScript, as an important browser language, has been widely used. However, as the source code is exposed during transmission, the security of JavaScript code is vulnerable in front-end battles, making encryption of JavaScript code a topic of concern in academia and industry. There are many commercial protection systems and platforms that offer comprehensive protection for JavaScript code. For example, JScrambler [1] is a mature commercial protection tool that provides multiple code protection options, such as compression, data and string encoding, structure and variable name obfuscation, and simple anti-debugging measures. JavaScript Obfuscator [2] mainly encrypts and obfuscates JavaScript code by encoding, encrypting, and transferring strings, obfuscating variable names, and inserting junk code to enhance the protection strength of JavaScript code. JShaman [3] introduces a polymorphic mutation strategy for JavaScript code in addition to flat control flow obfuscation and code encryption, which automatically mutates every time the script is called, preventing dynamic debugging analysis. There are also protection tools targeting specific protection effects. For example, Google analyzes JavaScript code, removes dead code that will not be executed, removes some logic that can directly output the running result, and compresses script files in a nearly refactoring manner. Yosuke provides two sets of encoding tools that encode JavaScript code into code composed of commonly used characters or emoji characters, disrupting the readability of the code and hindering malicious reverse analysis.

In exploring software code protection, more and more research is focusing on the protection of JavaScript code through JS encryption. Qin et al. [6] proposed a random encryption algorithm "parhelion" that references the characteristics of polymorphic viruses to protect script code through random encryption. Terrace et al. [7] proposed a framework "js.js," which is a JavaScript interpreter running in a JavaScript environment that can execute third-party scripts in a completely isolated sandbox environment, thereby controlling code behavior. Bertholon et al. [8] proposed an evolutionary heuristic design-based obfuscation framework "JShadObf," which uses a given JavaScript program to iteratively optimize and select specific transformation sequences to apply to the target code to enhance its obfuscation strength. Chen Xiaojiang et al. [9] protected JavaScript code from malicious dynamic debugging by analyzing and studying the debugging principles and features of various mainstream browsers and adding debugging behavior detection and response mechanisms. Fang Dingyi et al. [10] introduced the idea of temporal diversity and diversified the processing of JavaScript code to give the protected code temporal diversity during execution, thus achieving the effect of resisting cumulative attacks. Liu et al. [11] proposed a unified framework "Closure" to optimize program obfuscation. Based on the Markov chain Monte Carlo method to guide the random algorithmic search for the optimal solution of a given program input and a set of obfuscation transformation sets, the optimal obfuscation transformation sequence is finally given.

  1. Code Virtualization/JS Encryption/JS Obfuscation Protection

The idea of code virtualization protection (also known as virtual machine protection JS encryption) is already very mature in Windows application security and is currently the strongest binary code protection solution. The basic principle is to replace the original program instructions with custom virtual instructions and then use the corresponding interpreter at runtime to interpret and restore them to local instructions. Because of its strong protection effect, it has been widely used and commercialized in the industry, such as Themida[12], CodeVirtualizer[13], and VMProtect[14]. In academia, there are also a large number of studies discussing how to use and enhance code virtualization technology to protect software from malicious reverse engineering attacks. Fang et al. [15] proposed a multi-stage software virtualization method, which uses different interpreters to iteratively transform key code areas multiple times to improve security, requiring attackers to crack all intermediate results to find the structure of the original code. Similarly, Yang et al. [16] proposed nested virtual machine code protection, which requires attackers to completely reverse engineer one layer of interpreter before moving to the next layer, increasing the cost of malicious reverse engineering attacks. Averbuch et al. [17,18] introduced encryption and decryption techniques based on virtual machine protection, using AES algorithm and custom encryption keys to encrypt virtual instructions, decrypt virtual instructions during runtime, and then schedule a handler to interpret virtual instructions. Fang et al. [19,20] proposed a time diversity protection scheme to increase the time diversity of the protected code area to resist dynamic analysis.

This method achieves its goal by constructing several equivalent but differently formed subprogram execution paths and dynamically selecting one of these paths to execute at runtime. Wang et al. [21] improved the virtual machine protection strength by introducing register rotation and multiple virtual registers based on traditional virtual machine structures. Tang et al. [22] proposed a virtual instruction randomization protection method, designed a random coding method based on the definition of virtual instructions, and made the software have diversity in the protected code after protection, achieving the effect of preventing malicious reverse engineering analysis. Kuang et al. [23,24] proposed a multi-virtual machine code protection scheme with a dynamic scheduling structure, using a dynamic instruction scheduling program and a multi-virtual machine structure to randomly guide program execution along different paths. The protected program has different execution behaviors and paths each time it runs, making it more difficult for attackers to launch attacks by reusing knowledge collected from previous runs or similar applications.

JavaScript, as a scripting code, executes in the browser's JavaScript engine. Although the speed of JavaScript execution has greatly improved through continuous optimization of the engine, its efficiency still cannot match that of native C/C++ programs. Therefore, browser vendors began introducing WebAssembly. WebAssembly (abbreviated as WASM) is an open standard developed by a W3C community group consisting of representatives from all major browser vendors. It is a secure, portable, low-level code format designed for efficient execution and compact representation. Its main goal is to enable high-performance applications on the web, and it is another language that can be executed in the browser besides JavaScript. In addition, existing compilers for various languages, such as Java, Rust, and C/C++, as well as languages specifically designed for WebAssembly, support it as a compilation target, allowing programmers to freely choose the programming language that best suits their application features.

WebAssembly, as an efficient browser language, has received wide attention from academia since its inception. Micha Reiser and Luc Bläser provided a cross-compiler, Speedy.js, which converts performance-critical JavaScript/TypeScript code to WebAssembly and generates the necessary glue code to integrate the generated code into traditional JavaScript applications. This method can significantly speed up computationally intensive code. Letz et al. used the new WebAssembly technology to improve their previous work by applying WebAssembly to the context of WebAudioAPI, implementing more efficient sample-level DSP algorithms in C/C++. Ellul et al. believe that WebAssembly can reduce the space occupied by client-side scripts sent over the network and the inherent execution overhead of JavaScript. It is based on a network platform and is language-independent, with compact and efficient execution coding characteristics, making it ideal for wireless sensor network (WSN) applications.

Through the analysis of JS encryption, it can be found that the current research direction mainly focuses on code encryption and obfuscation protection. These methods have mature applications in traditional binary programs, which also means that there are corresponding mature reverse analysis methods. Although the proposed optimized obfuscation framework and other methods can effectively increase the complexity of obfuscation, JavaScript is a source code transmission scripting language, which is easier to reverse analyze than compiled binary applications, and the added anti-debugging and anti-tampering modules are also easy to discover and remove. The effectiveness of traditional code obfuscation is mainly to change the syntax structure of the target code, while code virtualization maps the target code to virtual instructions and executes it through interpretation, changing the execution process of the target code. Therefore, code virtualization can bring stronger protection effects. The gradual application of WebAssembly technology also makes it more likely to apply the ideas of traditional binary program protection, such as packing and virtualization, to JavaScript code protection.


点击体验一键VMP加密 |下滑查看JSVMP相关文章