CS 200: Python Virtual Machine (PVM)

Video: A Bit about Bytes: Understanding Python Bytecode James Bennett, PyCon 2018,

EBook: Inside The Python Virtual Machine Obi Ike-Nwosu

Github: github.com/python/cpython source code for Python. ceval.c

Python source code is converted to python byte code, which is then executed by the python virtual machine. The .pyc files contain python byte code.

Python source code goes through the following process:

  1. Lexical analysis. Break the code up into tokens. (See flex in UNIX, nee, lex)
  2. Parsing. Generate a syntactic parse tree. (Like diagramming sentences using parts of speech.) (See bison in UNIX, nee, yacc)
  3. Code generation. Python traverses the code tree and produces a byte code file - usually a .pyc file. (Resides in __pycache__ in Python 3. Take a look.)
  4. Execution. The Python Virtual Machine reads the byte code and executes the instructions.

Some of the modules we use to explore the byte code include:

See the local files: bytecode.py and pvm.py

We define a simple function.

We check out the attributes of the function.

Now we drill down on the \_\_code\_\_ attribute.

We define a function, xf(), which will print out the value of the attributes of the __code__ property.

Every time you define a function, Python creates these attributes. Some are obvious, and others are obscure. We will start by looking at the co_code attribute, which is the byte code instructions needed to execute the function.

We define a simple function getbytes(f) to retrieve the bytecodes.

See Hexadecimal.html for a discussion of hexadecimal byte strings.

We next define a function to print out the individual bytes.

Some of those bytes are opcodes - that is, assembly language instructions. The opcode module provides a mapping between bytes and opcodes.

The opcode.opname array maps numbers to opcodes. We define a function printopcodes(f) to map these codes for a given function.

The dis module has a dis() function which performs this task, as well as the code\_info() function which provides additional information.

We combine these in the showme(f) function.

PVM-lite

See A Python Interpreter Written in Python

Note that this article was written using an earlier version of Python, 3.4 or so. In that version, bytecodes used three bytes: one for the opcode and two for the arguments. After 3.4, Python generated bytecode that used only two bytes per opcode. That is what we use in this assignment. You should make the adjustment in the reading.

The Interpreter

We implement two interpreters in one. They both use a stack to execute the instructions.

The latter relies on the getattr(object, name) function, which returns the value of the named attribute of the object.

hw5: Interpreter: PVM-lite

Following the description in the reading, you will implement PVM-lite: A Python Virtual Machine for a subset of Python.

Included in the subset are:

Not included are:

The dis module can disassemble a Python function, printing out the byte codes for the function. For example, given:

Write your own version of dis.dis() that produces a dictionary object containing the following values:

Each instruction list has the following five components:

Note that makeobj is similar to dis.dis(). You may use dis.dis() to help test your implementation of makeobj().

You may also want to use dis.HAVE_ARGUMENT to identify those opcodes that do and do not take an argument and dis.hascompare to process comparison operators. dis.get_instructions(f) is very useful.

The Interpreter Class

We shall now implement an Interpreter class, along the lines of the reading, which describes a full implementation of the PVM and has the source code available on github. We encourage you to avail yourself of that resource. Our implementation is more modest. Nevertheless, you should try to borrow as much as possible from "byterun" implementation. Yes, I am telling you to copy / adapt code from the byterun github file.

We first define a test function that will execute a Python function using the Interpreter class.

Problem 3 - binary arithmetic

Next we define a bunch of test functions for the Interpreter

Problem 4 - comparison operators

def s4(): return 1 > 2

Problem 5 - jump operators

Problem 6 - lists, tuples, sets, subscript

Problem 7 - slices

Error: should return lists, not tuples.

Problem 8 - unary and inplace operators

Problem 9 - logical binary operators

Extra binary operators

Define the Interpreter Class

Now we start to define the Interpreter itself.

The PVM uses a stack to control execution and pass arguments to functions and operators.

The execute method controls the action. It first decodes the function using makeobj() which we defined above. It sets the program counter (pc) to 0. If the pc becomes greater than the number of instructions, the program halts.

execute cycles through the instructions, printing them out if debug is True. Each PVM opcode/opname is defined as a method of the Interpreter class. This means that each opname is an attribute of the class and can be accessed using getattr(self, instruction). The opcode method is called with or without an argument, as appropriate.

When the while() loop finishes, execute returns the result.

Note that execute calls makeobj() defined above. If your makeobj function is not working correctly, you can still debug your execute function by importing the bytecode version of makeobj from hw5a.pyc:

from hw5a import makeobj
We define a few stack operations that are used in implementing the opcode methods.

We first implement the opcodes needed to execute function s1(). Write the following methods:

problem 3 (10 points)

Define the following set of binary operators needed to execute the s3 set of functions.

problem 4 (10 points)

Define the set of comparison operators needed to execute the s4 set of functions.

problem 5 (10 points)

Define the set of jump operators needed to execute the s5 set of functions. Some are given.

problem 6 (10 points)

Define the set of operators needed to build and index lists, tuples, and sets, and execute the s6 set of functions.

problem 7 (10 points)

Define the set of operators needed to implement the slice function and define the set of operators needed to implement dictionaries and execute the s7 set of functions.

problem 8 (10 points)

Define the set of operators to implement unary and inplace functions and execute the s8 set of functions.

problem 9 (10 points)

Define the set of operators needed to implement the logical binary operators and execute the s9 set of functions.