ST Working with Bytecode 11 Working with Bytecode
ST — Working with Bytecode 11. Working with Bytecode © Oscar Nierstrasz
ST — Working with Bytecode Roadmap > The Squeak compiler > Introduction to Squeak bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode Original material by Marcus Denker © Oscar Nierstrasz 2
ST — Working with Bytecode Roadmap > The Squeak compiler > Introduction to Squeak bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 3
ST — Working with Bytecode The Squeak Compiler > Default compiler — very old design — quite hard to understand — impossible to modify and extend > New compiler for Squeak 3. 9 — http: //www. iam. unibe. ch/~scg/Research/New. Compiler/ — adds support for true block closures (optional) © Oscar Nierstrasz 4
ST — Working with Bytecode The Squeak Compiler > Fully reified compilation process: — Scanner/Parser (built with Sma. CC) – builds AST (from Refactoring Browser) — Semantic Analysis: ASTChecker – annotates the AST (e. g. , var bindings) — Translation to IR: ASTTranslator – uses IRBuilder to build IR (Intermediate Representation) — Bytecode generation: IRTranslator – © Oscar Nierstrasz uses Bytecode. Builder to emit bytecodes 5
ST — Working with Bytecode Compiler: Overview code Scanner / Parser AST Semantic Analysis AST Code Bytecode Generation Code generation in detail AST Build IR ASTTranslator IRBuilder © Oscar Nierstrasz IR Bytecode Generation IRTranslator Bytecode. Builder 6
ST — Working with Bytecode Compiler: Syntax > Sma. CC: Smalltalk Compiler — Similar to Lex/Yacc — Sma. CC can build LARL(1) or LR(1) parser > Input: — Scanner definition: regular expressions — Parser: BNF-like grammar — Code that builds AST as annotation > Output: — class for Scanner (subclass Sma. CCScanner) — class for Parser (subclass Sma. CCParser) © Oscar Nierstrasz 7
ST — Working with Bytecode Scanner © Oscar Nierstrasz 8
ST — Working with Bytecode Parser © Oscar Nierstrasz 9
ST — Working with Bytecode Calling Parser code © Oscar Nierstrasz 10
ST — Working with Bytecode Compiler: AST > AST: Abstract Syntax Tree — Encodes the Syntax as a Tree — No semantics yet! — Uses the RB Tree: – – – © Oscar Nierstrasz Visitors Backward pointers in Parse. Nodes Transformation (replace/add/delete) Pattern-directed Tree. Rewriter Pretty. Printer RBProgram. Node RBDo. It. Node RBMethod. Node RBReturn. Node RBSequence. Node RBValue. Node RBArray. Node RBAssignment. Node RBBlock. Node RBCascade. Node RBLiteral. Node RBMessage. Node RBOptimized. Node RBVariable. Node 11
ST — Working with Bytecode Compiler: Semantics > We need to analyse the AST — Names need to be linked to the variables according to the scoping rules > ASTChecker implemented as a Visitor — Subclass of RBProgram. Node. Visitor — Visits the nodes — Grows and shrinks scope chain — Methods/Blocks are linked with the scope — Variable definitions and references are linked with objects describing the variables © Oscar Nierstrasz 12
ST — Working with Bytecode A Simple Tree RBParser parse. Expression: '3+4' © Oscar Nierstrasz NB: explore it 13
ST — Working with Bytecode A Simple Visitor RBProgram. Node. Visitor new visit. Node: tree Does nothing except walk through the tree © Oscar Nierstrasz 14
ST — Working with Bytecode Test. Visitor RBProgram. Node. Visitor subclass: #Test. Visitor instance. Variable. Names: 'literals' class. Variable. Names: '' pool. Dictionaries: '' category: 'Compiler-AST-Visitors' Test. Visitor>>accept. Literal. Node: a. Literal. Node literals add: a. Literal. Node value. Test. Visitor>>initialize literals : = Set new. Test. Visitor>>literals ^literals tree : = RBParser parse. Expression: '3 + 4'. (Test. Visitor new visit. Node: tree) literals a Set(3 4) © Oscar Nierstrasz 15
ST — Working with Bytecode Compiler: Intermediate Representation > IR: Intermediate Representation — Semantic like Bytecode, but more abstract — Independent of the bytecode set — IR is a tree — IR nodes allow easy transformation — Decompilation to RB AST > IR is built from AST using ASTTranslator: — AST Visitor — Uses IRBuilder © Oscar Nierstrasz 16
ST — Working with Bytecode Compiler: Bytecode Generation > IR — — needs to be converted to Bytecode IRTranslator: Visitor for IR tree Uses Bytecode. Builder to generate Bytecode Builds a compiled. Method Details to follow next section test. Return 1 | i. RMethod a. Compiled. Method | i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); "receiver and args declarations" push. Literal: 1; a. Compiled. Method : = i. RMethod compiled. Method. return. Top; self should: ir. [(a. Compiled. Method value. With. Receiver: nil arguments: #() ) = 1]. © Oscar Nierstrasz 17
ST — Working with Bytecode Roadmap > The Squeak compiler > Introduction to Squeak bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 18
ST — Working with Bytecode Reasons for working with Bytecode > Generating Bytecode — Implementing compilers for other languages — Experimentation with new language features > Parsing and Interpretation: — Analysis (e. g. , self and super sends) — Decompilation (for systems without source) — Printing of bytecode — Interpretation: Debugger, Profiler © Oscar Nierstrasz 19
ST — Working with Bytecode The Squeak Virtual Machine > Virtual machine provides a virtual processor — Bytecode: The “machine-code” of the virtual machine > Smalltalk (like Java): Stack machine — easy to implement interpreters for different processors — most hardware processors are register machines > Squeak VM: Implemented in Slang — Slang: Subset of Smalltalk. (“C with Smalltalk Syntax”) — Translated to C © Oscar Nierstrasz 20
ST — Working with Bytecode in the Compiled. Method > Compiled. Method format: Header Number of temps, literals. . . Literals Array of all Literal Objects Bytecode Trailer Pointer to Source (Number>>#as. Integer) inspect (Number method. Dict at: #as. Integer) inspect © Oscar Nierstrasz 21
ST — Working with Bytecodes: Single or multibyte > Different forms of bytecodes: — Single bytecodes: – Example: 120: push self — Groups of similar bytecodes – – – 16: push temp 1 17: push temp 2 up to 31 — Multibytecodes – – – © Oscar Nierstrasz Type Offset 4 bits Problem: 4 bit offset may be too small Solution: Use the following byte as offset Example: Jumps need to encode large jump offsets 22
ST — Working with Bytecode Example: Number>>as. Integer > Smalltalk code: Number>>as. Integer "Answer an Integer nearest the receiver toward zero. " ^self truncated > Symbolic Bytecode 9 <70> self 10 <D 0> send: truncated 11 <7 C> return. Top © Oscar Nierstrasz 23
ST — Working with Bytecode Example: Step by Step > 9 <70> self — The receiver (self) is pushed on the stack > 10 — — — <D 0> send: truncated Bytecode 208: send litereral selector 1 Get the selector from the first literal start message lookup in the class of the object that is on top of the stack — result is pushed on the stack > 11 <7 C> return. Top — return the object on top of the stack to the calling method © Oscar Nierstrasz 24
ST — Working with Bytecode Squeak Bytecode > 256 Bytecodes, four groups: — Stack Bytecodes – Stack manipulation: push / pop / dup — Send Bytecodes – Invoke Methods — Return Bytecodes – Return to caller — Jump Bytecodes – © Oscar Nierstrasz Control flow inside a method 25
ST — Working with Bytecode Stack Bytecodes > Push values on the stack — e. g. , temps, inst. Vars, literals — e. g: 16 - 31: push instance variable > Push Constants — False/True/Nil/1/0/2/-1 > Push self, this. Context > Duplicate top of stack > Pop © Oscar Nierstrasz 26
ST — Working with Bytecode Sends and Returns > Sends: receiver is on top of stack — Normal send — Super Sends — Hard-coded sends for efficiency, e. g. +, > Returns — Return top of stack to the sender — Return from a block — Special bytecodes for return self, nil, true, false (for efficiency) © Oscar Nierstrasz 27
ST — Working with Bytecode Jump Bytecodes > Control Flow inside one method — Used to implement control-flow efficiently — Example: ^ 1<2 if. True: ['true'] 9 <76> push. Constant: 1 10 <77> push. Constant: 2 11 <B 2> send: < 12 <99> jump. False: 15 13 <20> push. Constant: 'true' 14 <90> jump. To: 16 15 <73> push. Constant: nil 16 <7 C> return. Top © Oscar Nierstrasz 28
ST — Working with Bytecode Roadmap > The Squeak compiler > Introduction to Squeak bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 29
ST — Working with Bytecode Generating Bytecode > IRBuilder: A tool for generating bytecode — Part of the New. Compiler — Squeak 3. 9: Install packages AST, New. Parser, New. Compiler > Like an Assembler for Squeak © Oscar Nierstrasz 30
ST — Working with Bytecode IRBuilder: Simple Example > Number>>as. Integer i. RMethod : = IRBuilder new num. Rargs: 1; "receiver” add. Temps: #(self); "receiver and args" push. Temp: #self; send: #truncated; return. Top; ir. a. Compiled. Method : = i. RMethod compiled. Method. a. Compiled. Method value. With. Receiver: 3. 5 arguments: #() © Oscar Nierstrasz 3 31
ST — Working with Bytecode IRBuilder: Stack Manipulation > pop. Top — remove the top of stack > push. Dup — push top of stack on the stack > push. Literal: > push. Receiver — push self > push. This. Context © Oscar Nierstrasz 32
ST — Working with Bytecode IRBuilder: Symbolic Jumps > Jump targets are resolved: > Example: false if. True: [’true’] if. False: [’false’] i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); "receiver" push. Literal: false; jump. Ahead. To: #false if: false; push. Literal: 'true'; "if. True: ['true']" jump. Ahead. To: #end; jump. Ahead. Target: #false; push. Literal: 'false'; "if. False: ['false']" jump. Ahead. Target: #end; return. Top; ir. © Oscar Nierstrasz 33
ST — Working with Bytecode IRBuilder: Instance Variables > > Access by offset Read: push. Inst. Var: — receiver on top of stack > Write: store. Inst. Var: — value on stack > Example: set the first instance variable to 2 i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); push. Literal: 2; store. Inst. Var: 1; push. Temp: #self; return. Top; ir. "receiver and args" a. Compiled. Method : = i. RMethod compiled. Method. a. Compiled. Method value. With. Receiver: 1@2 arguments: #() © Oscar Nierstrasz 2@2 34
ST — Working with Bytecode IRBuilder: Temporary Variables > > > Accessed by name Define with add. Temp: / add. Temps: Read with push. Temp: Write with store. Temp: Example: — set variables a and b, return value of a i. RMethod : = IRBuilder new num. Rargs: 1; add. Temps: #(self); add. Temps: #(a b); push. Literal: 1; store. Temp: #a; push. Literal: 2; store. Temp: #b; push. Temp: #a; return. Top; ir. © Oscar Nierstrasz "receiver" 35
ST — Working with Bytecode IRBuilder: Sends > normal send builder push. Literal: ‘hello’ builder send: #size; > super send … builder send: #selector to. Super. Of: a. Class; — The second parameter specifies the class where the lookup starts. © Oscar Nierstrasz 36
ST — Working with Bytecode Roadmap > The Squeak compiler > Introduction to Squeak bytecode > Generating bytecode with IRBuilder > Parsing and Interpreting bytecode © Oscar Nierstrasz 37
ST — Working with Bytecode Parsing and Interpretation > First step: Parse bytecode — enough for easy analysis, pretty printing, decompilation > Second step: Interpretation — needed for simulation, complex analyis (e. g. , profiling) > Squeak provides frameworks for both: — Instruction. Stream/Instruction. Client (parsing) — Context. Part (Interpretation) © Oscar Nierstrasz 38
ST — Working with Bytecode The Instruction. Stream Hierarchy Instruction. Stream Context. Part Block. Context Method. Context Decompiler Instruction. Printer Inst. Var. Ref. Locator Bytecode. Decompiler © Oscar Nierstrasz 39
ST — Working with Bytecode Instruction. Stream > Parses the byte-encoded instructions > State: — pc: program counter — sender: the method (bad name!) Object subclass: #Instruction. Stream instance. Variable. Names: 'sender pc' class. Variable. Names: 'Special. Constants' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 40
ST — Working with Bytecode Usage > Generate an instance: instr. Stream : = Intruction. Stream on: a. Method > Now we can step through the bytecode with: instr. Stream interpret. Next. Instruction. For: client > Calls methods on a client object for the type of bytecode, e. g. — push. Receiver — push. Constant: value — push. Receiver. Variable: offset © Oscar Nierstrasz 41
ST — Working with Bytecode Instruction. Client > Abstract superclass — Defines empty methods for all methods that Instruction. Stream calls on a client > For convenience: — Clients don’t need to inherit from this class Object subclass: #Instruction. Client instance. Variable. Names: '' class. Variable. Names: '' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 42
ST — Working with Bytecode Example: A test Instruction. Client. Test>>test. Instructions "just interpret all of methods of Object" | methods client scanner| methods : = Object method. Dict values. client : = Instruction. Client new. methods do: [: method | scanner : = (Instruction. Stream on: method). [scanner pc <= method end. PC] while. True: [ self shouldnt: [scanner interpret. Next. Instruction. For: client] raise: Error. ]. ]. © Oscar Nierstrasz 43
ST — Working with Bytecode Example: Printing Bytecode > Instruction. Printer: — Print the bytecodes as human readable text > Example: — print the bytecode of Number>>as. Integer: String stream. Contents: [: str | (Instruction. Printer on: Number>>#as. Integer) print. Instructions. On: str ] '9 <70> self 10 <D 0> send: truncated 11 <7 C> return. Top ' © Oscar Nierstrasz 44
ST — Working with Bytecode Instruction. Printer > Class Definition: Instruction. Client subclass: #Instruction. Printer instance. Variable. Names: 'method scanner stream indent' class. Variable. Names: '' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 45
ST — Working with Bytecode Instruction. Printer > Main Loop: Instruction. Printer>>print. Instructions. On: a. Stream "Append to the stream, a. Stream, a description of each bytecode in the instruction stream. " | end | stream : = a. Stream. scanner : = Instruction. Stream on: method. end : = method end. PC. [scanner pc <= end] while. True: [scanner interpret. Next. Instruction. For: self] © Oscar Nierstrasz 46
ST — Working with Bytecode Instruction. Printer > Overwrites methods from Instruction. Client to print the bytecodes as text > e. g. the method for push. Receiver Instruction. Printer>>push. Receiver "Print the Push Active Context's Receiver on Top Of Stack bytecode. " self print: 'self' © Oscar Nierstrasz 47
ST — Working with Bytecode Example: Inst. Var. Ref. Locator Instruction. Client subclass: #Inst. Var. Ref. Locator instance. Variable. Names: 'bingo' class. Variable. Names: '' pool. Dictionaries: '' category: 'Kernel-Methods' Inst. Var. Ref. Locator>>interpret. Next. Instruction. Using: a. Scanner bingo : = false. a. Scanner interpret. Next. Instruction. For: self. ^bingo Inst. Var. Ref. Locator>>pop. Into. Receiver. Variable: offset bingo : = true Inst. Var. Ref. Locator>>push. Receiver. Variable: offset bingo : = true Inst. Var. Ref. Locator>>store. Into. Receiver. Variable: offset bingo : = true © Oscar Nierstrasz 48
ST — Working with Bytecode Inst. Var. Ref. Locator > Analyse a method, answer true if it references an instance variable Compiled. Method>>has. Inst. Var. Ref "Answer whether the receiver references an instance variable. " | scanner end printer | scanner : = Instruction. Stream on: self. printer : = Inst. Var. Ref. Locator new. end : = self end. PC. [scanner pc <= end] while. True: [ (printerpret. Next. Instruction. Using: scanner) if. True: [^true]. ]. ^false © Oscar Nierstrasz 49
ST — Working with Bytecode Inst. Var. Ref. Locator > Example for a simple bytecode analyzer > Usage: a. Method has. Inst. Var. Ref > (has reference to variable test. Selector) (Test. Case>>#debug) has. Inst. Var. Ref true > (has no reference to a variable) (Integer>>#+) has. Inst. Var. Ref © Oscar Nierstrasz false 50
ST — Working with Bytecode Context. Part: Semantics for Execution > Sometimes we need more than parsing — “stepping” in the debugger — system simulation for profiling Instruction. Stream subclass: #Context. Part instance. Variable. Names: 'stackp' class. Variable. Names: 'Primitive. Fail. Token Quick. Step' pool. Dictionaries: '' category: 'Kernel-Methods' © Oscar Nierstrasz 51
ST — Working with Bytecode Simulation > Provides a complete Bytecode interpreter > Run a block with the simulator: (Context. Part run. Simulated: [3 factorial]) © Oscar Nierstrasz 6 52
ST — Working with Bytecode Profiling: Message. Tally > Usage: Message. Tally tally. Sends: [3 factorial] This simulation took 0. 0 seconds. **Tree** 1 Small. Integer(Integer)>>factorial > Other example: Message. Tally tally. Sends: [’ 3’ + 1] © Oscar Nierstrasz 53
ST — Working with Bytecode What you should know! What are the problems of the old compiler? How is the new Squeak compiler organized? What does the Squeak semantic analyzer add to the parser-generated AST? What is the format of the intermediate representation? What kind of virtual machine does the Squeak bytecode address? How can you inspect the bytecode of a particular method? © Oscar Nierstrasz 54
ST — Working with Bytecode Can you answer these questions? What different groups of bytecode are supported? Why is the Sma. CC grammar only BNF-“like”? How can you find out what all the bytecodes are? What is the purpose of IRBuilder? Why do we not generate bytecode directly? What is the responsibility of class Instruction. Stream? How would you implement a statement coverage analyzer? © Oscar Nierstrasz 55
ST — Working with Bytecode License > http: //creativecommons. org/licenses/by-sa/3. 0/ Attribution-Share. Alike 3. 0 Unported You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. © Oscar Nierstrasz 56
- Slides: 56