Information about Building is scattered across the web, so I compiled this guide for reference.
Want to learn advanced techniques? Check out our premium courses.
正文
Building MirageVM: A JavaScript Virtual Machine for Code Obfuscation
Modern web applications face a persistent challenge: automated attacks that bypass traditional security measures. Captcha farms and AI-powered tools have made it increasingly difficult to distinguish between legitimate users and malicious bots. Services like
can solve a lot of commercially available captchas, while AI tools enable sophisticated scripts to interact with web pages in ways that mimic human behavior or even solve captchas as well.
While the response was increasing captcha challenge complexity, these often frustrate legitimate users without effectively stopping determined attackers. The escalating arms race between security measures and bypass techniques has led many to seek alternative approaches.
Google’s captcha implementation is interesting: they employ a custom JavaScript virtual machine to obfuscate their captcha logic. Instead of shipping readable JavaScript code, they send bytecode that executes within a custom VM. This technique hides the actual implementation behind a layer of abstraction, making reverse engineering significantly more challenging.
The concept of building a VM excited me enough to try building something similar. What started as a “proof of concept” experiment quickly evolved into a multi-month project.
A JavaScript virtual machine for obfuscation works by replacing standard JavaScript with custom bytecode. Instead of sending readable code to the browser, you ship low-level instructions that only your custom VM can execute. Each high-level operation might require multiple VM instructions, similar to how a single line of C code translates to multiple assembly instructions.
This approach creates several layers of protection. First, attackers must understand how the VM interprets bytecode, which is heavily obfuscated. Second, they need to reverse-engineer the instruction set. Finally, they must map low-level operations back to high-level logic. It’s like trying to understand a program by reading assembly code, except this language and its runtime are entirely custom and undocumented.
GLOBAL R0 // Load globalThis object into register R0
GET R0, R0, "Date" // Access the Date property from globalThis
EXEC R1, R0, "now" // Invoke the now method from Date object
This low-level approach increases the complexity of reverse-engineering efforts, as attackers must decipher multiple instructions to understand even basic functionality.
At its core, MirageVM is a state machine that reads and executes bytecode instructions. The implementation follows traditional CPU architecture patterns, featuring registers for data storage and a stack for managing function calls and local variables.
The VM maintains its state through several core components:
: Array of bytecode instructions loaded into the VM
More Details
There are a few more points worth noting. First, browser compatibility varies across different browsers. Second, performance optimization is crucial when handling large amounts of data. Finally, key management is also an important consideration.
That's all for this comprehensive guide. I hope you found it helpful! Feel free to leave comments if you have questions.
Reference: Building MirageVM: A JavaScript Virtual Machine for Code ...