My L1VM runs bytecode with indirect threading dispatch. It has a 64 bit core and about 60 opcodes, 256 registers for integer and double floating numbers. And a stack to store call arguments. On Linux X64 the VM executable is less than 32 KB size!
The VM can run functions in “modules” (.so or .dll libraries) and can be expanded that way.
To make a quick comparison with C, I did run the fractal program on the VM and as an C program doing the same algorithm.
The VM runtime was 4 times longer than the C program. IMHO that is not bad! Anton Ertl did say in an older paper, that a factor 5 - 10 times compared to C is still high efficient .
The link to the GitHub repository can be found on my links page above.
 “The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures” by M. Anton Ertl and David Gregg