Implementation of the Python Bytecode Compiler Jeremy Hylton

Implementation of the Python Bytecode Compiler Jeremy Hylton Google

What to expect from this talk • Intended for developers • Explain key data structures and control flow • Lots of code on slides

The New Bytecode Compiler • Rewrote compiler from scratch for 2. 5 – Emphasizes modularity – Work was almost done for Python 2. 4 – Still uses original parser, pgen • Traditional compiler abstractions – Abstract Syntax Tree (AST) – Basic blocks • Goals – Ease maintenance, extensibility – Expose AST to Python programs

Compiler Architecture Tokenizer Tokens Parser Parse Tree AST Converter AST Code Generator Blocks Source Text __future__ bytecode Symbol Table Assembler bytecode Peephole Optimizer bytecode

Compiler Organization compile. c 4, 200 infrastructure 700 code generator 2, 400 assembler 500 peephole optimizer 600 asdl. c, . h <100 pyarena. c 100 future. c 100 ast. c 3, 000 symtable. c 1, 400 Python-ast. c, . h 1, 900 (generated) Total 10, 800

Tokenize, Parse, AST • Simple, hand-coded tokenizer – Synthesizes INDENT and DEDENT tokens • pgen: parser generator – Input in Grammar/Grammar – Extended LL(1) grammar • ast conversion – Collapses parse tree into abstract form – Future: extend pgen to generator ast directly

Grammar vs. Abstract Syntax compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | … if_stmt: 'if' test ': ' suite ('elif' test ': ' suite)* ['else' ': ' suite] for_stmt: 'for' exprlist 'in' testlist ': ' suite ['else' ': ' suite] suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT test: and_test ('or' and_test)* | lambdef and_test: not_test ('and' not_test)* not_test: 'not' not_test | comparison: expr (comp_op expr)* comp_op: '<'|'>'|'=='|'>='|'<>'|'!='|'in'|'not' 'in'|'is' 'not‘ stmt = For(expr target, expr iter, stmt* body, stmt* orelse) | If(expr test, stmt* body, stmt* orelse) | … expr = Bin. Op(expr left, operator op, expr right) | Compare(expr left, cmpop* ops, expr* comparators) | Call(expr func, expr* args, keyword* keywords, expr? starargs, expr? kwargs) | …

AST node types • Modules (mod) • Statements (stmt) • Expressions (expr) – Expressions allowed on LHS have context slot • Extras – Slots, comprehension, excepthandler, arguments – Operator types • Function. Def is complex – Children in two namespaces
![Example Code L = [] for x in range(10): if x > 5: L. Example Code L = [] for x in range(10): if x > 5: L.](http://slidetodoc.com/presentation_image/e317db9c02560dc80f4b2dbb8434f24b/image-9.jpg)
Example Code L = [] for x in range(10): if x > 5: L. append(x * 2) else: L. append(x + 2)

Concrete Syntax Example (if_stmt, (1, 'if'), (test, (and_test, (not_test, (comparison, (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (1, 'x'))))), (comp_op, (21, '>')), (expr, (xor_expr, (and_expr, (shift_expr, (arith_expr, (term, (factor, (power, (atom, (2, '5'))))))), (11, ': '), …
![Abstract Syntax Example For(Name('x', Load), Call(Name('range', Load), [Num(10)]), [If(Compare(Name('x', Load), [Lt], [Num(5)]), [Call(Attribute(Name('L', Load), Abstract Syntax Example For(Name('x', Load), Call(Name('range', Load), [Num(10)]), [If(Compare(Name('x', Load), [Lt], [Num(5)]), [Call(Attribute(Name('L', Load),](http://slidetodoc.com/presentation_image/e317db9c02560dc80f4b2dbb8434f24b/image-11.jpg)
Abstract Syntax Example For(Name('x', Load), Call(Name('range', Load), [Num(10)]), [If(Compare(Name('x', Load), [Lt], [Num(5)]), [Call(Attribute(Name('L', Load), Name('append', Load)), [Bin. Op(Name('x', Load), Mult, Num(2))])] [Call(Attribute(Name('L', Load), Name('append', Load)), [Bin. Op(Name('x', Load), Add, Num(2))])])])

Our Goal: Bytecode 2 3 4 0 BUILD_LIST 3 STORE_FAST 0 1 (L) 6 9 12 15 18 >> 19 22 SETUP_LOOP LOAD_GLOBAL LOAD_CONST CALL_FUNCTION GET_ITER FOR_ITER STORE_FAST 71 (to 80) 1 (range) 1 (10) 1 25 28 31 34 37 LOAD_FAST LOAD_CONST COMPARE_OP JUMP_IF_FALSE POP_TOP 0 2 4 21 5 38 41 44 47 50 51 54 55 >> 58 LOAD_FAST 1 LOAD_ATTR 3 LOAD_FAST 0 LOAD_CONST 3 BINARY_MULTIPLY CALL_FUNCTION 1 POP_TOP JUMP_ABSOLUTE 19 POP_TOP (L) (append) (x) (2) 7 59 62 65 68 71 72 75 76 >> 79 LOAD_FAST LOAD_ATTR LOAD_FAST LOAD_CONST BINARY_ADD CALL_FUNCTION POP_TOP JUMP_ABSOLUTE POP_BLOCK (L) (append) (x) (2) 57 (to 79) 0 (x) (5) (>) (to 58) 1 3 0 3 1 19

Strategy for Compilation • Module-wide analysis – Check future statements – Build symbol table • For variable, is it local, global, free? • Makes two passes over block structure • Compile one function at a time – Generate basic blocks – Assemble bytecode – Optimize generated code (out of order) – Code object stored in parent’s constant pool

Symbol Table • Collect basic facts about symbols, block – Variables assigned, used; params, global stmts – Check for import *, unqualified exec, yield – Other tricky details • Identify free, cell variables in second pass – Parent passes bound names down – Child passes free variables up – Implicit vs. explicit global vars

Name operations • Five different load name opcodes – LOAD_FAST: array access for function locals – LOAD_GLOBAL: dict lookups for globals, builtins – LOAD_NAME: dict lookups for locals, globals – LOAD_DEREF: load free variable – LOAD_CLOSURE: loads cells to make closure • Cells – Separate allocation for mutable variable – Stored in flat closure list – Separately garbage collected

Class namespaces class Spam: id = id(1) 1 2 0 LOAD_GLOBAL 3 STORE_NAME 6 9 12 15 18 19 LOAD_NAME LOAD_CONST CALL_FUNCTION STORE_NAME LOAD_LOCALS RETURN_VALUE 0 (__name__) 1 (__module__) 2 (id) 1 (1) 1 2 (id)

Closures def make_adder(n): x = n def adder(y): return x + y return adder return make_adder 2 3 5 0 3 6 9 12 15 18 21 def make_adder(n): LOAD_FAST 0 (n) STORE_DEREF 0 (x) LOAD_CLOSURE 0 (x) LOAD_CONST 1 (<code>) MAKE_CLOSURE 0 STORE_FAST 2 (adder) LOAD_FAST 2 (adder) RETURN_VALUE 4 0 3 6 7 def adder(y): LOAD_DEREF 0 (x) LOAD_FAST 0 (y) BINARY_ADD RETURN_VALUE

Code generation input • Discriminated unions – One for each AST type – Struct for each option – Constructor functions • Literals – Stored as Py. Object* – ast pass parses • Identifiers – Also Py. Object* – string typedef struct _stmt *stmt_ty; struct _stmt { enum {. . . , For_kind=8, While_kind=9, If_kind=10, . . . } kind; union { struct { expr_ty target; expr_ty iter; asdl_seq *body; asdl_seq *orelse; } For; struct { expr_ty test; asdl_seq *body; asdl_seq *orelse; } If; } int lineno; };

Code generation output • Basic blocks – Start with jump target – Ends if there is a jump – Function is graph of blocks • Instructions – Opcode + argument – Jump targets are pointers • Helper functions – Create new blocks – Add instr to current block struct instr { unsigned char i_opcode; int i_oparg; struct basicblock_ *i_target; int i_lineno; // plus some one-bit flags }; struct basicblock_ { int b_iused; int b_ialloc; struct instr *b_instr; struct basicblock_ *b_next; int b_startdepth; int b_offset; // several details elided };

Code generation • One visitor function for each AST type – Switch on kind enum – Emit bytecodes – Return immediately on error • Heavy use of C macros – ADDOP(), ADDOP_JREL(), … – VISIT(), VISIT_SEQ(), … – Hides control flow

Code generation example static int compiler_if(struct compiler *c, stmt_ty s) { basicblock *end, *next; if (!(end = compiler_new_block(c))) return 0; if (!(next = compiler_new_block(c))) return 0;

Assembler • Lots of fiddly details – Linearize code – Compute stack space needed – Compute line number table (lnotab) – Compute jump offsets – Call Py. Code_New() • Peephole optimizer – Integrated at wrong end of assembler – Constant folding, simplify jumps

AST transformation • Expose AST to Python programmers – Simplify analysis of programs – Generate code from modified AST • Example: – Implement with statement as AST transform • Ongoing work – BOF this afternoon at 3: 15, Preston Trail

Loose ends • compiler package – Should revise to support new AST types – Tricky compatibility issue • Revise pgen to generate AST directly • Develop toolkit for AST transforms • Extend analysis, e. g. PEP 267
- Slides: 24