11 Working with Bytecode ST Working with Bytecode
11. Working with Bytecode
ST — Working with Bytecode Roadmap > The Pharo compiler > Introduction to Pharo bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode Original material by Marcus Denker © Oscar Nierstrasz 11. 2
ST — Working with Bytecode Roadmap > The Pharo compiler > Introduction to Pharo bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 11. 3
ST — Working with Bytecode The Pharo Compiler > Default compiler — very old design — quite hard to understand — impossible to modify and extend > New compiler for Pharo — http: //www. iam. unibe. ch/~scg/Research/New. Compiler/ — adds support for true block closures (optional) © Oscar Nierstrasz 11. 4
ST — Working with Bytecode The Pharo Compiler > Fully reified compilation process: — Scanner/Parser (built with Sma. CC) – builds AST (from Refactoring Browser) — Semantic Analysis: ASTChecker – annotates the AST (e. g. , var bindings) — Translation to IR: ASTTranslator – uses IRBuilder to build IR (Intermediate Representation) — Bytecode generation: IRTranslator – © Oscar Nierstrasz uses Bytecode. Builder to emit bytecodes 11. 5
ST — Working with Bytecode Compiler: Overview code Scanner / Parser AST Semantic Analysis AST Code Generation Bytecode Code generation in detail AST Build IR ASTTranslator IRBuilder © Oscar Nierstrasz IR Bytecode Generation Bytecode IRTranslator Bytecode. Builder 11. 6
ST — Working with Bytecode Compiler: Syntax > Sma. CC: Smalltalk Compiler — Similar to Lex/Yacc — Sma. CC can build LARL(1) or LR(1) parser > Input: — Scanner definition: regular expressions — Parser: BNF-like grammar — Code that builds AST as annotation > Output: — class for Scanner (subclass Sma. CCScanner) — class for Parser (subclass Sma. CCParser) © Oscar Nierstrasz 11. 7
ST — Working with Bytecode Scanner © Oscar Nierstrasz 11. 8
ST — Working with Bytecode Parser © Oscar Nierstrasz 11. 9
ST — Working with Bytecode Calling Parser code © Oscar Nierstrasz 11. 10
ST — Working with Bytecode Compiler: AST > AST: Abstract Syntax Tree — Encodes the Syntax as a Tree — No semantics yet! — Uses the RB Tree: – – – © Oscar Nierstrasz Visitors Backward pointers in Parse. Nodes Transformation (replace/add/delete) Pattern-directed Tree. Rewriter Pretty. Printer RBProgram. Node RBDo. It. Node RBMethod. Node RBReturn. Node RBSequence. Node RBValue. Node RBArray. Node RBAssignment. Node RBBlock. Node RBCascade. Node RBLiteral. Node RBMessage. Node RBOptimized. Node RBVariable. Node 11. 11
ST — Working with Bytecode Compiler: Semantics > We need to analyse the AST — Names need to be linked to the variables according to the scoping rules > ASTChecker implemented as a Visitor — Subclass of RBProgram. Node. Visitor — Visits the nodes — Grows and shrinks scope chain — Methods/Blocks are linked with the scope — Variable definitions and references are linked with objects describing the variables © Oscar Nierstrasz 11. 12
ST — Working with Bytecode A Simple Tree RBParser parse. Expression: '3+4' © Oscar Nierstrasz NB: explore it 11. 13
ST — Working with Bytecode A Simple Visitor RBProgram. Node. Visitor new visit. Node: tree Does nothing except walk through the tree © Oscar Nierstrasz 11. 14
ST — Working with Bytecode Test. Visitor RBProgram. Node. Visitor subclass: #Test. Visitor instance. Variable. Names: 'literals' class. Variable. Names: '' pool. Dictionaries: '' category: 'Compiler-AST-Visitors' Test. Visitor>>accept. Literal. Node: a. Literal. Node literals add: a. Literal. Node value. Test. Visitor>>initialize literals : = Set new. Test. Visitor>>literals ^literals tree : = RBParser parse. Expression: '3 + 4'. (Test. Visitor new visit. Node: tree) literals a Set(3 4) © Oscar Nierstrasz 11. 15
ST — Working with Bytecode Compiler: Intermediate Representation > IR: Intermediate Representation — Semantic like Bytecode, but more abstract — Independent of the bytecode set — IR is a tree — IR nodes allow easy transformation — Decompilation to RB AST > IR is built from AST using ASTTranslator: — AST Visitor — Uses IRBuilder © Oscar Nierstrasz 11. 16
ST — Working with Bytecode Compiler: Bytecode Generation > IR needs to be converted to Bytecode — IRTranslator: Visitor for IR tree — Uses Bytecode. Builder to generate Bytecode — Builds a compiled. Method — Details to follow next section test. Return 1 | i. RMethod a. Compiled. Method | i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); "receiver and args declarations" push. Literal: 1; a. Compiled. Method : = i. RMethod compiled. Method. return. Top; self should: ir. [(a. Compiled. Method value. With. Receiver: nil arguments: #() ) = 1]. © Oscar Nierstrasz 11. 17
ST — Working with Bytecode Roadmap > The Pharo compiler > Introduction to Pharo bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 11. 18
ST — Working with Bytecode Reasons for working with Bytecode > Generating Bytecode — Implementing compilers for other languages — Experimentation with new language features > Parsing and Interpretation: — Analysis (e. g. , self and super sends) — Decompilation (for systems without source) — Printing of bytecode — Interpretation: Debugger, Profiler © Oscar Nierstrasz 11. 19
ST — Working with Bytecode The Pharo Virtual Machine > Virtual machine provides a virtual processor — Bytecode: The “machine-code” of the virtual machine > Smalltalk (like Java): Stack machine — easy to implement interpreters for different processors — most hardware processors are register machines > Pharo VM: Implemented in Slang — Slang: Subset of Smalltalk. (“C with Smalltalk Syntax”) — Translated to C © Oscar Nierstrasz 11. 20
ST — Working with Bytecode in the Compiled. Method > Compiled. Method format: Header Number of temps, literals. . . Literals Array of all Literal Objects Bytecode Trailer Pointer to Source (Number>>#as. Integer) inspect (Number method. Dict at: #as. Integer) inspect © Oscar Nierstrasz 11. 21
ST — Working with Bytecodes: Single or multibyte > Different forms of bytecodes: — Single bytecodes: – Example: 120: push self — Groups of similar bytecodes – – – 16: push temp 1 17: push temp 2 up to 31 — Multibytecodes – – – © Oscar Nierstrasz Type Offset 4 bits Problem: 4 bit offset may be too small Solution: Use the following byte as offset Example: Jumps need to encode large jump offsets 11. 22
ST — Working with Bytecode Example: Number>>as. Integer > Smalltalk code: Number>>as. Integer "Answer an Integer nearest the receiver toward zero. " ^self truncated > Symbolic Bytecode 9 <70> self 10 <D 0> send: truncated 11 <7 C> return. Top © Oscar Nierstrasz 11. 23
ST — Working with Bytecode Example: Step by Step > 9 <70> self — The receiver (self) is pushed on the stack > 10 <D 0> send: truncated — Bytecode 208: send litereral selector 1 — Get the selector from the first literal — start message lookup in the class of the object that is on top of the stack — result is pushed on the stack > 11 <7 C> return. Top — return the object on top of the stack to the calling method © Oscar Nierstrasz 11. 24
ST — Working with Bytecode Pharo Bytecode > 256 Bytecodes, four groups: — Stack Bytecodes – Stack manipulation: push / pop / dup — Send Bytecodes – Invoke Methods — Return Bytecodes – Return to caller — Jump Bytecodes – © Oscar Nierstrasz Control flow inside a method 11. 25
ST — Working with Bytecode Stack Bytecodes > Push values on the stack — e. g. , temps, inst. Vars, literals — e. g: 16 - 31: push instance variable > Push Constants — False/True/Nil/1/0/2/-1 > Push self, this. Context > Duplicate top of stack > Pop © Oscar Nierstrasz 11. 26
ST — Working with Bytecode Sends and Returns > Sends: receiver is on top of stack — Normal send — Super Sends — Hard-coded sends for efficiency, e. g. +, > Returns — Return top of stack to the sender — Return from a block — Special bytecodes for return self, nil, true, false (for efficiency) © Oscar Nierstrasz 11. 27
ST — Working with Bytecode Jump Bytecodes > Control Flow inside one method — Used to implement control-flow efficiently — Example: ^ 1<2 if. True: ['true'] 9 <76> push. Constant: 1 10 <77> push. Constant: 2 11 <B 2> send: < 12 <99> jump. False: 15 13 <20> push. Constant: 'true' 14 <90> jump. To: 16 15 <73> push. Constant: nil 16 <7 C> return. Top © Oscar Nierstrasz 11. 28
ST — Working with Bytecode Closures counter. Block | count | count : = 0. ^[ count : = count + 1]. © Oscar Nierstrasz 11. 29
ST — Working with Bytecode Closures > Break the dependency between the block activation and its enclosing contexts for accessing locals © Oscar Nierstrasz 11. 30
ST — Working with Bytecode Contexts inject: this. Value into: binary. Block | next. Value | next. Value : = this. Value. self do: [: each | next. Value : = binary. Block value: next. Value value: each]. ^next. Value © Oscar Nierstrasz 11. 31
ST — Working with Bytecode Contexts inject: this. Value into: binary. Block | indirect. Temps | indirect. Temps : = Array new: 1. indirect. Temps at: 1 put: this. Value. " was next. Value : = this. Value. " self do: [: each | indirect. Temps at: 1 put: (binary. Block value: (indirect. Temps at: 1) value: each)]. ^indirect. Temps at: 1 © Oscar Nierstrasz 11. 32
ST — Working with Bytecode Contexts inject: this. Value into: binary. Block | indirect. Temps | indirect. Temps : = Array new: 1. indirect. Temps at: 1 put: this. Value. self do: (this. Context closure. Copy: [: each | binary. Block. Copy indirect. Temps. Copy | indirect. Temps. Copy at: 1 put: (binary. Block. Copy value: (indirect. Temps. Copy at: 1) value: each)] copied. Values: (Array with: binary. Block with: indirect. Temps)). ^indirect. Temps at: 1 © Oscar Nierstrasz 11. 33
ST — Working with Bytecode Closure Bytecode > 138 Push (Array new: k)/Pop k into: (Array new: j) > 140 Push Temp At k In Temp Vector At: j > 141 Store Temp At k In Temp Vector At: j > 142 Pop and Store Temp At k In Temp Vector At: j > 143 Push Closure Num Copied l Num Args k Block. Size j © Oscar Nierstrasz 11. 34
ST — Working with Bytecode Roadmap > The Pharo compiler > Introduction to Pharo bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 11. 35
ST — Working with Bytecode Generating Bytecode > IRBuilder: A tool for generating bytecode — Part of the New. Compiler — Pharo: Install packages AST, New. Parser, New. Compiler > Like an Assembler for Pharo © Oscar Nierstrasz 11. 36
ST — Working with Bytecode IRBuilder: Simple Example > Number>>as. Integer i. RMethod : = IRBuilder new num. Rargs: 1; "receiver” add. Temps: #(self); "receiver and args" push. Temp: #self; send: #truncated; return. Top; ir. a. Compiled. Method : = i. RMethod compiled. Method. a. Compiled. Method value. With. Receiver: 3. 5 arguments: #() © Oscar Nierstrasz 3 11. 37
ST — Working with Bytecode IRBuilder: Stack Manipulation > pop. Top — remove the top of stack > push. Dup — push top of stack on the stack > push. Literal: > push. Receiver — push self > push. This. Context © Oscar Nierstrasz 11. 38
ST — Working with Bytecode IRBuilder: Symbolic Jumps > Jump targets are resolved: > Example: false if. True: [’true’] if. False: [’false’] i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); "receiver" push. Literal: false; jump. Ahead. To: #false if: false; push. Literal: 'true'; "if. True: ['true']" jump. Ahead. To: #end; jump. Ahead. Target: #false; push. Literal: 'false'; "if. False: ['false']" jump. Ahead. Target: #end; return. Top; ir. © Oscar Nierstrasz 11. 39
ST — Working with Bytecode IRBuilder: Instance Variables > > Access by offset Read: push. Inst. Var: — receiver on top of stack > Write: store. Inst. Var: — value on stack > Example: set the first instance variable to 2 i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); push. Literal: 2; store. Inst. Var: 1; push. Temp: #self; return. Top; ir. "receiver and args" a. Compiled. Method : = i. RMethod compiled. Method. a. Compiled. Method value. With. Receiver: 1@2 arguments: #() © Oscar Nierstrasz 2@2 11. 40
ST — Working with Bytecode IRBuilder: Temporary Variables > > > Accessed by name Define with add. Temp: / add. Temps: Read with push. Temp: Write with store. Temp: Example: — set variables a and b, return value of a i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); "receiver" add. Temps: #(a b); push. Literal: 1; store. Temp: #a; push. Literal: 2; store. Temp: #b; push. Temp: #a; return. Top; ir. © Oscar Nierstrasz 11. 41
ST — Working with Bytecode IRBuilder: Sends > normal send builder push. Literal: ‘hello’ builder send: #size; > super send … builder send: #selector to. Super. Of: a. Class; — The second parameter specifies the class where the lookup starts. © Oscar Nierstrasz 11. 42
ST — Working with Bytecode Roadmap > The Pharo compiler > Introduction to Pharo bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 11. 43
ST — Working with Bytecode Parsing and Interpretation > First step: Parse bytecode — enough for easy analysis, pretty printing, decompilation > Second step: Interpretation — needed for simulation, complex analyis (e. g. , profiling) > Pharo provides frameworks for both: — Instruction. Stream/Instruction. Client (parsing) — Context. Part (Interpretation) © Oscar Nierstrasz 11. 44
ST — Working with Bytecode The Instruction. Stream Hierarchy Instruction. Stream Context. Part Block. Context Method. Context Decompiler Instruction. Printer Inst. Var. Ref. Locator Bytecode. Decompiler © Oscar Nierstrasz 11. 45
ST — Working with Bytecode Instruction. Stream > Parses the byte-encoded instructions > State: — pc: program counter — sender: the method (bad name!) Object subclass: #Instruction. Stream instance. Variable. Names: 'sender pc' class. Variable. Names: 'Special. Constants' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 11. 46
ST — Working with Bytecode Usage > Generate an instance: instr. Stream : = Instruction. Stream on: a. Method > Now we can step through the bytecode with: instr. Stream interpret. Next. Instruction. For: client > Calls methods on a client object for the type of bytecode, e. g. — push. Receiver — push. Constant: value — push. Receiver. Variable: offset © Oscar Nierstrasz 11. 47
ST — Working with Bytecode Instruction. Client > Abstract superclass — Defines empty methods for all methods that Instruction. Stream calls on a client > For convenience: — Clients don’t need to inherit from this class Object subclass: #Instruction. Client instance. Variable. Names: '' class. Variable. Names: '' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 11. 48
ST — Working with Bytecode Example: A test Instruction. Client. Test>>test. Instructions "just interpret all of methods of Object" | methods client scanner| methods : = Object method. Dict values. client : = Instruction. Client new. methods do: [: method | scanner : = (Instruction. Stream on: method). [scanner pc <= method end. PC] while. True: [ self shouldnt: [scanner interpret. Next. Instruction. For: client] raise: Error. ]. ]. © Oscar Nierstrasz 11. 49
ST — Working with Bytecode Example: Printing Bytecode > Instruction. Printer: — Print the bytecodes as human readable text > Example: — print the bytecode of Number>>as. Integer: String stream. Contents: [: str | (Instruction. Printer on: Number>>#as. Integer) print. Instructions. On: str ] '9 <70> self 10 <D 0> send: truncated 11 <7 C> return. Top ' © Oscar Nierstrasz 11. 50
ST — Working with Bytecode Instruction. Printer > Class Definition: Instruction. Client subclass: #Instruction. Printer instance. Variable. Names: 'method scanner stream indent' class. Variable. Names: '' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 11. 51
ST — Working with Bytecode Instruction. Printer > Main Loop: Instruction. Printer>>print. Instructions. On: a. Stream "Append to the stream, a. Stream, a description of each bytecode in the instruction stream. " | end | stream : = a. Stream. scanner : = Instruction. Stream on: method. end : = method end. PC. [scanner pc <= end] while. True: [scanner interpret. Next. Instruction. For: self] © Oscar Nierstrasz 11. 52
ST — Working with Bytecode Instruction. Printer > Overwrites methods from Instruction. Client to print the bytecodes as text > e. g. the method for push. Receiver Instruction. Printer>>push. Receiver "Print the Push Active Context's Receiver on Top Of Stack bytecode. " self print: 'self' © Oscar Nierstrasz 11. 53
ST — Working with Bytecode Example: Inst. Var. Ref. Locator Instruction. Client subclass: #Inst. Var. Ref. Locator instance. Variable. Names: 'bingo' class. Variable. Names: '' pool. Dictionaries: '' category: 'Kernel-Methods' Inst. Var. Ref. Locator>>interpret. Next. Instruction. Using: a. Scanner bingo : = false. a. Scanner interpret. Next. Instruction. For: self. ^bingo Inst. Var. Ref. Locator>>pop. Into. Receiver. Variable: offset bingo : = true Inst. Var. Ref. Locator>>push. Receiver. Variable: offset bingo : = true Inst. Var. Ref. Locator>>store. Into. Receiver. Variable: offset bingo : = true © Oscar Nierstrasz 11. 54
ST — Working with Bytecode Inst. Var. Ref. Locator > Analyse a method, answer true if it references an instance variable Compiled. Method>>has. Inst. Var. Ref "Answer whether the receiver references an instance variable. " | scanner end printer | scanner : = Instruction. Stream on: self. printer : = Inst. Var. Ref. Locator new. end : = self end. PC. [scanner pc <= end] while. True: [ (printerpret. Next. Instruction. Using: scanner) if. True: [^true]. ]. ^false © Oscar Nierstrasz 11. 55
ST — Working with Bytecode Inst. Var. Ref. Locator > Example for a simple bytecode analyzer > Usage: a. Method has. Inst. Var. Ref > (has reference to variable test. Selector) (Test. Case>>#debug) has. Inst. Var. Ref true > (has no reference to a variable) (Integer>>#+) has. Inst. Var. Ref © Oscar Nierstrasz false 11. 56
ST — Working with Bytecode Context. Part: Semantics for Execution > Sometimes we need more than parsing — “stepping” in the debugger — system simulation for profiling Instruction. Stream subclass: #Context. Part instance. Variable. Names: 'stackp' class. Variable. Names: 'Primitive. Fail. Token Quick. Step' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 11. 57
ST — Working with Bytecode Simulation > Provides a complete Bytecode interpreter > Run a block with the simulator: (Context. Part run. Simulated: [3 factorial]) © Oscar Nierstrasz 6 11. 58
ST — Working with Bytecode Profiling: Message. Tally > Usage: Message. Tally tally. Sends: [3 factorial] This simulation took 0. 0 seconds. **Tree** 1 Small. Integer(Integer)>>factorial > Other example: Message. Tally tally. Sends: [’ 3’ + 1] © Oscar Nierstrasz 11. 59
ST — Working with Bytecode What you should know! What are the problems of the old compiler? How is the new Pharo compiler organized? What does the Pharo semantic analyzer add to the parser-generated AST? What is the format of the intermediate representation? What kind of virtual machine does the Pharo bytecode address? How can you inspect the bytecode of a particular method? © Oscar Nierstrasz 11. 60
ST — Working with Bytecode Can you answer these questions? What different groups of bytecode are supported? Why is the Sma. CC grammar only BNF-“like”? How can you find out what all the bytecodes are? What is the purpose of IRBuilder? Why do we not generate bytecode directly? What is the responsibility of class Instruction. Stream? How would you implement a statement coverage analyzer? © Oscar Nierstrasz 11. 61
ST — Introduction License http: //creativecommons. org/licenses/by-sa/3. 0/ Attribution-Share. Alike 3. 0 Unported You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. © Oscar Nierstrasz 1. 62
- Slides: 62