Introduction#
“If you can not build it from scratch, you do not understand it”.
If you have ever interacted with Ethereum, whether as a developer or as a user, there is a good chance you came across the EVM
.
EVM#
The EVM is a central part of Ethereum. It is Ethereum’s execution engine, responsible for executing Ethereum programs, also called Smart Contracts.
In this tutorial we are going to build the EVM from scratch. However, this is not a reference implementation. I will omit implementation details if I believe they are not necessary for understanding how the EVM works. This is about learning the core concepts of the EVM from first principles.
That is why I want to call it a mini-EVM. A working but simplified version of the real thing. A virtual machine that takes in an Ethereum program as input and executes that.
But what is the EVM exactly? It is a Virtual Machine
responsible for executing Ethereum bytecode
.
Virtual Machine#
A virtual machine is like a make-believe computer that runs on your real computer. Instead of needing a separate physical machine, it’s all done with software on the computer you already have.
Just like a real computer it has its own language. For the EVM this language is called Ethereum bytecode.
Bytecode#
Bytecode is simply a list of valid EVM opcodes
. A opcode is a operation like ADD
, SUBTRACT
or STOP
.
Some of them can be seen in the table below:
OPCODE |
NAME |
DESCRIPTION |
---|---|---|
0x00 |
STOP |
Halts Execution |
0x01 |
ADD |
Addition operation |
0x02 |
MUL |
Multiplication operation |
0x03 |
SUB |
Subtraction operation |
0x04 |
DIV |
Integer division operation |
Importantly the EVM does not understand what ADD
or SUBTRACT
means. It only knows the identifier of the opcode. For ADD
that would be 0x01
.
So a valid program would look like this: 604260005260206000F3
. This is something the EVM could interpret.
Currently there are 144 opcodes. You can see all of them here. This number changes all the time and is not fixed. New opcodes are added and old ones are deprecated quite frequently.
Solidity, Vyper, Huff#
As a developer you really don’t want to write bytecode directly most of the time. This would be very slow and very error prone. This is where high-level programming languages like Solidity or Vyper come in.
But a Solidity or Vyper file is simply a text file. This is not something the EVM understands. We need a program that takes in the text file and translates that to EVM bytecode. This program is called the compiler
.
If a programming language can be translated (compiled) to EVM bytecode it is said to target
the EVM.
Ethereum vs Bitcoin#
What makes Ethereum so special? Compared to other blockchains that came before it like Bitcoin.
Ethereum is “special” because it is universal or Turing complete
. Which means that any arbitrary program can be run on the EVM (we ignore gas and memory restrictions).
Bitcoin also has programming capabilities, called bitcoin script
. But importantly, bitcoin script is NOT turing complete. There are programs that you simply can not implement in bitcoin script. That is not the case with for the EVM.
Outline#
We will build our mini-EVM from the bottom up. We will start with the Stack
. Then move to Memory
and Storage
. Then we will implement the opcodes, which will take most of our time.
After having built all these building blocks we will combine them to create our own EVM. In the end we are going to use our EVM to run some programs.
It is important to understand that our mini-EVM lives in total isolation. It has no clue about other contracts or accounts. Functions that need to interact with the “outside world” are mocked. This is a deliberate choice to keep it simple.
Prerequisites#
Python
Hexadecimal Numbers
Notes#
Every notebook is available here. If you see any mistakes please create an Issue on GitHub, or even better create a Pull Request ;)