Scheme in Python Overview Well look at how

Overview · We’ll look at how to implement a simple scheme interpreter in Python

What we need to do · A representation for Scheme’s native data structures •

What we won’t need to do We can rely on Python for a number

atoms · Atoms include strings, number, and symbols · We’ll use Python’s native representation

$Symbols # A global dictionary that contains all known symbols __INTERNED_SYMBOLS = {} class$

GCing Unused Symbols · If the only reference to a symbol is from the

using weakrefs import weakref from User. String import User. String as __User. Str …

Representing pairs · The core of scheme only has one kind of data structure

Aside: pairs as closures · Functions are very powerful · We can use them

(define (mycons the. Car the. Cdr) ; ; mycons returns a closure that takes

example > (define p 1 (mycons 2 myempty))) > p 1 #<procedure: . .

Representing pairs · We’ll define a subclass of list to represent a pair Class

Lexical Analyzer · Consume a string of characters, identify tokens, throw away comments and

Token regular expressions PATTERNS = [ ('whitespace', re. compile(r'(s+)')), ('comment', re. compile(r'(; [^n]*)')), ('(',

Lex Examples >>> from lex import * >>> tokenize("") [(None, None)] >>> tokenize(" 1

Raw string notation >>> s = ‘nfoon’ >>> s 'nfoon' >>> print s foo

tokenize() def tokenize(s): toks = [] found = True while s and found: found

tokenize() examples >>> from lex import * >>> tokenize('(a 1. 0)') [('(', '('), ('symbol',

parse · Consume a sequence of tokens and produce a sequence of s-expressions ·

Peeking and eating def peek(tokens): """Take a quick glance at the first token in

Peeking and eating def eat(tokens, desired_type): """If the type of the next token is

Peeking and eating def eat_safe(tokens, token. Type): """Digest the first token in our tokens

def parse. Expression(tokens): if eat(tokens, '''): return cons(symbol('quote'), cons(parse. Expression(tokens), NIL)) if eat(tokens, '`'):

parse_list_members() def parse_list_members(tokens): if eat(tokens, 'dot'): final = parse. Expression(tokens) eat_safe(tokens, ')') return final

Recursive descent parsing · Remember one problem with recursive descent parsing is that the

Python doesn’t optimize tail recursion def fact 0(n): # iterative facorial result = 1

Try this http: //www. csee. umbc. edu/331/fall 08/0101/code/python/ pyscheme-1. 7/src/fact. py

Default limit is 999 fact 2(1000) and fact 3(1000) both die >>> fact 2(1000)

How to solve this? · You can set the maximum recursion depth higher >>>

Trampoline Style · A trampoline is a loop that iteratively invokes thunk-returning functions A

Trampolining is one answer · A way to program using CPS, Continuation Passing Style

Pogo from pogo import pogo, land, bounce def fact 3(n): # factorial in a

Variable length argument lists >>> def foo(*args): print "Number of arguments: ", len(args) print

pogo. py def bounce(function, *args): """Returns new trampolined value that continues bouncing""" return ('bounce',

It works >>> sys. setrecursionlimit(10) >>> fact 3(100) 93326215443944152681699238856266700490715 9682643816214685929638952175999932299156 0894146397615651828625369792082722375825 1185210916864000000000000 L >>>

pogo. py def pogo(bouncer): try: while True: if bouncer[0] == 'land’: return bouncer[1] elif

See pyscheme 1. 6 · Pyscheme 1. 6 is written in trampoline style ·

def eval(exp, env): return pogo(teval(exp, env, pogo. land)) def teval(exp, env, cont): if expressions.

eval def eval(exp, env): if exp. is. Self. Evaluating(exp): return exp if exp. is.

apply def apply(procedure, arguments, env): if exp. is. Primitive. Procedure(procedure): return apply. Prim. Proc(procedure,

Environments · An environment will be a list of frames · Each frame will

THE_EMPTY_ENVIRONMENT = [] env def enclosing. Environment(env): return env[1: ] def first. Frame(env): return

Lookup a Variable Value def lookup. Variable. Value(var, env): while True: if env ==

Define/Set a Variable define. Variable(var, val, env): first. Frame(env)[var] = val def set. Variable.

Builtins · We’ll define a Python function to handle each of the primitive Scheme

Builtins def all. Numbers(numbers): for n in numbers: if type(n) not in (types. Int.

Setting up the initial environment def setup. Environment(): PRIME_PROCEDURES = [ ["car", pair. car],

Slides: 48

Download presentation

Scheme in Python

Overview · We’ll look at how to implement a simple scheme interpreter in Python · This is based on the Scheme interpreter we studied before · We’ll look at pyscheme 1. 6, which was implemented by Danny Yoo as an undergraduate at Berkeley · Since Python doesn’t optimize for tail recursion, he uses trampolining, which we’ll introduce

What we need to do · A representation for Scheme’s native data structures • Pairs (aka, cons cells), symbols, strings, numbers, Booleans · A reader that converts a stream of characters into a stream of s-expressions • We’ll introduce an intervening step reading characters and converting to tokens · Implement various built-ins • e. g. , cons, car, +, …

What we won’t need to do We can rely on Python for a number of very useful things • Representing numbers and strings • Garbage collection • Low level I/O

atoms · Atoms include strings, number, and symbols · We’ll use Python’s native representation for string and numbers · Symbols in Scheme are interned – there is a unique object for each symbol read · This is how they differ from strings, which are not interned • Note: some Lisp implementations intern small integers

$Symbols # A global dictionary that contains all known symbols __INTERNED_SYMBOLS = {} class$

Symbols # A global dictionary that contains all known symbols __INTERNED_SYMBOLS = {} class __Symbol(str): """A symbol is just a special kind of string""" def __eq__(self, other): return self is other def symbol(s): """"Returns symbol given string, creating new ones if needed””” global __interned_symbols if s not in __INTERNED_SYMBOLS: __INTERNED_SYMBOLS[s] = __Symbol(s) return __INTERNED_SYMBOLS[s] # Here are definitions of symbols that we should know scheme_false = symbol("#f") scheme_true = symbol("#t") __empty_symbol = Symbol("") def is. Symbol(s): return type(s) == type(__empty_symbol)

GCing Unused Symbols · If the only reference to a symbol is from the global list of interned symbols, it can be garbage collected · We’ll use Python’s weakref’s for this · A weak reference is a reference that doesn’t protect an object from garbage collection · Objects referenced only by weak references are considered unreachable (or "weakly reachable") and may be collected at any time

using weakrefs import weakref from User. String import User. String as __User. Str … __INTERNED_SYMBOLS = weakref. Weak. Value. Dictionary({}) … class __Symbol(__User. Str): … if s not in __INTERNED_SYMBOLS: # make a temp strong reference new. Symbol = __Symbol(s) __INTERNED_SYMBOLS[s] = new. Symbol return __INTERNED_SYMBOLS[s]

Representing pairs · The core of scheme only has one kind of data structure – lists– and it is made up out of pairs · What Python types should we use? • A user defined class, Pair • Lists • Tuples • Dictionary • Closures

Aside: pairs as closures · Functions are very powerful · We can use them to represent cons cells or pairs · We don’t want to do this in practice · But it shows the power of programming with functions

(define (mycons the. Car the. Cdr) ; ; mycons returns a closure that takes a 2 -arg function and applies ; ; it to the two remembered vlue's, i. e. , the pair's car and cdr. (lambda (f) (f the. Car the. Cdr))) (define (mycar cell) ; ; mycar takes a pair closure and feeds it a 2 -arg function that ; ; just returns the first arg (cell (lambda (the. Car the. Cdr) the. Car))) (define (mycdr cell) ; ; mycdr takes a pair closure and feeds it a 2 -arg function that ; ; just returns the first arg (cell (lambda (the. Car the. Cdr))) (define myempty ; ; the empty list is just a function that always returns true. (lambda (f) #t)) (define (mynull? cell) ; ; a pair is not the empty list (eq? cell myempty))

example > (define p 1 (mycons 2 myempty))) > p 1 #<procedure: . . . /scheme/pairs. ss: 1: 31> > (mycar p 1) 1 > (mycdr p 1) #<procedure: . . . /scheme/pairs. ss: 1: 31> > (mycar (mycdr p 1)) 2 > (mycdr p 1)) #<procedure: myempty>

Representing pairs · We’ll define a subclass of list to represent a pair Class Pair(list) : pass · The cons functions creates a new cons cell with a given car and cdr def cons(car, cdr): return Pair([car, cdr]) · Defining built-in functions for pairs will be easy def car(p): return p[0] def cdr(p): return p[1] def cadr(p): return car(cdr(p)) def set_car(p, x): p[0] = x

Lexical Analyzer · Consume a string of characters, identify tokens, throw away comments and whitespace, and return a list of remaining tokens · Each token will be a (<type>, <token>) tuple like (‘number’, ‘ 3. 145’) or (‘comment’, ‘; ; foo’) · Recognize tokens using regular expressions · We won’t worry about efficiency

Token regular expressions PATTERNS = [ ('whitespace', re. compile(r'(s+)')), ('comment', re. compile(r'(; [^n]*)')), ('(', re. compile(r'(()')), (')', re. compile(r'())')), ('dot', re. compile(r'(. s)')), ('number', re. compile(r'([+]? (? : d+. d+|d+. |. d+|d+))')), ('symbol', re. compile(r'([a-z. AZ+=? !@#$%^&*\/. ><][w+=? !@#$%^&*-/. ><]*)')), ('string', re. compile(r'"(([^"]|\")*)"')), (''', re. compile(r'(')')), ('`', re. compile(r'(`)')), (', ', re. compile(r'(, )')) ]

Lex Examples >>> from lex import * >>> tokenize("") [(None, None)] >>> tokenize(" 1 2. . 3 1. 3 -4") [('number', '1'), ('number', '2. '), ('number', '. 3'), ('number', '1. 3'), ('number', '-4'), (None, None)] >>> tokenize('foo 12. 3 foo +') [('symbol', 'foo'), ('number', '12. 3'), ('symbol', 'foo'), ('symbol', '+'), (None, None)] >>> tokenize('(foo (bar ()))') [('(', '('), ('symbol', 'foo'), ('(', '('), ('symbol', 'bar'), ('(', '('), (')', ')'), (None, None)]

Raw string notation >>> s = ‘nfoon’ >>> s 'nfoon' >>> print s foo >>> s = r'nfoon' >>> s '\nfoo\n' >>> print s nfoon

tokenize() def tokenize(s): toks = [] found = True while s and found: found = False for type, regex in PATTERNS: match_obj = regex. match(s) if match_obj: if type not in ('whitespace', 'comment'): toks. append((type, match_obj. group(1))) s = s[match_obj. span()[1] : ] found = True break if not found: print "n. No match'", s, ”’ – tokenize” toks. append(EOF_TOKEN) return tokens

tokenize() examples >>> from lex import * >>> tokenize('(a 1. 0)') [('(', '('), ('symbol', 'a'), ('number', '1. 0'), (')', ')'), (None, None)] >>> tokenize('(define (add 1 x)(+ x 1))') [('(', '('), ('symbol', 'define'), ('(', '('), ('symbol', 'add 1'), ('symbol', 'x'), (')', ')'), ('(', '('), ('symbol', '+'), ('symbol', 'x'), ('number', '1'), (')', ')'), (None, None)]

parse · Consume a sequence of tokens and produce a sequence of s-expressions · Use a recursive descent parser · We’ll handle just a few special cases, namely quote and backquote and dotted pairs ·

Peeking and eating def peek(tokens): """Take a quick glance at the first token in our tokens list. ""” if len(tokens) == 0: raise Parser. Error, "While peeking: ran out of tokens. ” return tokens[0]

Peeking and eating def eat(tokens, desired_type): """If the type of the next token is desired_type, pop it from the list and return it, else return False””” if len(tokens) == 0: raise Parser. Error, 'No tokens left, seeking ' + desired_type return tokens. pop(0) if tokens[0][0] == desired_type else False

Peeking and eating def eat_safe(tokens, token. Type): """Digest the first token in our tokens list, making sure that we're biting on the right token. Type of thing. ""” if len(tokens) == 0: raise Parser. Error, "While trying to eat %s: ran out of tokens. " % token. Type ) if tokens[0][0] != token. Type: raise Parser. Error, "Seeking %s got %s" % (token. Type, tokens[0]) return tokens. pop(0)

def parse. Expression(tokens): if eat(tokens, '''): return cons(symbol('quote'), cons(parse. Expression(tokens), NIL)) if eat(tokens, '`'): return cons(symbol('quasiquote'), cons(parse. Expression(tokens), NIL)) elif eat(tokens, ', '): return cons(symbol('unquote'), cons(parse. Expression(tokens), NIL)) elif eat(tokens, '('): return parse_list_members(tokens) elif peek(tokens)[0] in ('number’, 'symbol’, 'string'): return parse_atom(tokens) else: raise Parser. Error, ”Parsing: no alternatives" parse

parse_list_members() def parse_list_members(tokens): if eat(tokens, 'dot'): final = parse. Expression(tokens) eat_safe(tokens, ')') return final if peek(tokens)[0] in (''’, '`’, ', ’, '(’, 'number’, 'symbol’, 'string'): return cons(parse. Expression(tokens), parse_list_members(tokens)) if eat(tokens, ')'): return NIL raise Parser. Error, "Can't finish list” + tokens

Recursive descent parsing · Remember one problem with recursive descent parsing is that the grammar has to be right recursive · Another potential problem is recursing too deeply and exceeding the limit on the stack · But maybe we can use tail recursion, which an interpreter or compiler can recognize and execute as iteration? · Not in Python

Python doesn’t optimize tail recursion def fact 0(n): # iterative facorial result = 1 while n>1: result *= n n -= 1 return result def fact 1(n): # simple recursive factorial return 1 if n==1 else n*fact 2(n - 1) def fact 2(n, result=1): # tail recursive factorial return result if n==1 else fact 2(n-1, n*result)

Try this http: //www. csee. umbc. edu/331/fall 08/0101/code/python/ pyscheme-1. 7/src/fact. py

Default limit is 999 fact 2(1000) and fact 3(1000) both die >>> fact 2(1000) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "fact. py", line 17, in fact 2 return result if n==1 else fact 2(n-1, n*result) File "fact. py", line 17, in fact 2 … File "fact. py", line 17, in fact 2 return result if n==1 else fact 2(n-1, n*result) Runtime. Error: maximum recursion depth exceeded

How to solve this? · You can set the maximum recursion depth higher >>> import sys >>> sys. getrecursionlimit() 1000 >>> sys. setrecursionlimit(10000) >>> fact 2(1100) 53437084880926377034242155. . . 0000 L · But this is not a general solution · And Guido is on the record as not wanting to optimize tail recursion • http: //www. artima. com/forums/flat. jsp? forum=106&thread=147358

Trampoline Style · A trampoline is a loop that iteratively invokes thunk-returning functions A thunk is just a a piece of code to perform a delayed computation (e. g. , a closure) · A single trampoline can express all control transfers of a program · Converting a program to trampolined style is trampolining This is kind of continuation passing style of programming · Trampolined functions can do tail recursive function calls in stack-oriented languages

Trampolining is one answer · A way to program using CPS, Continuation Passing Style · CPS is a style of programming where control is passed explicitly as continuations · Trampolining is a simple way to eliminate recursion · We’ll use a simple kind of trampolining · Instead of making a recursive call, a procedure can bounce back up to its caller with a continuation, which can be called to proceed with the computation

Pogo from pogo import pogo, land, bounce def fact 3(n): # factorial in a trampolined style return pogo(fact_tramp(n)) def fact_tramp(n, result=1): return land(result) if n==1 else bounce(fact_tramp, n-1, n*result)

Variable length argument lists >>> def foo(*args): print "Number of arguments: ", len(args) print "Arguments are: ", args >>> foo(1, 2, 3, 'd', 5) Number of arguments: 5 Arguments are: (1, 2, 3, 'd', 5) >>> def bar(arg 1, *rest): print …

pogo. py def bounce(function, *args): """Returns new trampolined value that continues bouncing""" return ('bounce', function, args) def land(value): """Returns new trampolined value that lands off trampoline""" return ('land', value)

It works >>> sys. setrecursionlimit(10) >>> fact 3(100) 93326215443944152681699238856266700490715 9682643816214685929638952175999932299156 0894146397615651828625369792082722375825 1185210916864000000000000 L >>> fact 3(1000) 4023872600770937735. . . 0000000 L

pogo. py def pogo(bouncer): try: while True: if bouncer[0] == 'land’: return bouncer[1] elif bouncer[0] == 'bounce': bouncer = bouncer[1](*bouncer[2]) else: traceback. print_exc() raise Type. Error, "not a bouncer” except Type. Error: traceback. print_exc() raise Type. Error, "not a bouncer”

See pyscheme 1. 6 · Pyscheme 1. 6 is written in trampoline style · Which was done by hand, as opposed to using an automatic trampoliner · And which I’ve been undoing by hand

def eval(exp, env): return pogo(teval(exp, env, pogo. land)) def teval(exp, env, cont): if expressions. is. If(exp): return eval. If(exp, env, cont) … def eval. If(exp, env, cont): def c(predicate_val): if is. True(predicate_val): return teval(if. Consequent(exp), env, cont) else: return teval(if. Alternative(exp), env, cont) return teval(expressions. if. Predicate(exp), env, c)

eval def eval(exp, env): if exp. is. Self. Evaluating(exp): return exp if exp. is. Variable(exp): return env. lookup. Variable. Value(exp, env) if exp. is. Quoted(exp): return eval. Quoted(exp, env) if exp. is. Assignment(exp): return eval. Assignment(exp, env) if exp. is. Definition(exp): return eval. Definition(exp, env) if exp. is. If(exp): return eval. If(exp, env) if exp. is. Lambda(exp): return exp. make. Procedure(exp. lambda. Parameters(exp), exp. lambda. Body(exp), env) if exp. is. Begin(exp): return eval. Sequence(exp. begin. Actions(exp), env) if exp. is. Application(exp): return eval. Application(exp, env) raise Scheme. Error, "Unknown expr, eval " + str(exp)

apply def apply(procedure, arguments, env): if exp. is. Primitive. Procedure(procedure): return apply. Prim. Proc(procedure, arguments, env) if exp. is. Compound. Procedure(procedure): new. Env = env. extend. Environment( exp. procedure. Parameters(procedure), arguments, exp. procedure. Environment(procedure)) return eval. Sequence(exp. procedure. Body(procedure), new. Env) raise Scheme. Error, "Unknown proc - apply " + str(procedure)

Environments · An environment will be a list of frames · Each frame will be a Python dictionary with the variable names as keys and their values as values

THE_EMPTY_ENVIRONMENT = [] env def enclosing. Environment(env): return env[1: ] def first. Frame(env): return env[0] def extend. Environment(var_pairs, val_pairs, base): new_frame = {} vars = to. Python. List(var_pairs) vals = to. Python. List(val_pairs) if len(vars) != len vals: raise Scheme. Error, "Mismatched vals and vars" for (var, val) in zip(vars, vals): new_frame[var] = val return new_frame + base_env

Lookup a Variable Value def lookup. Variable. Value(var, env): while True: if env == THE_EMPTY_ENVIRONMENT: raise Scheme. Error, "Unbound var “+var frame = first. Frame(env) if frame. has_key(var): return frame[var] env = enclosing. Environment(env)

Define/Set a Variable define. Variable(var, val, env): first. Frame(env)[var] = val def set. Variable. Value(var, val, env): while True: if env == THE_EMPTY_ENVIRONMENT: raise Scheme. Error, "Unbound variable -- SET! " + var top = first. Frame(env) if top. has_key(var): top[var] = val return env = enclosing. Environment(env)

Builtins · We’ll define a Python function to handle each of the primitive Scheme functions · Many List functions take any number of args: • (+ 1 2) => 3 • (+ 1 2 3 4 5) => 15 • (+ ) => 0 · We can takuse Python’s (new) syntax for functions that take any number or args, e. g. : • If the last parameter in a function’s parameter list is preceded by a *, it’s bound to a list of the remaining args • def add (*args): sum(args)

Builtins def all. Numbers(numbers): for n in numbers: if type(n) not in (types. Int. Type, types. Long. Type, types. Float. Type): return 0 return 1 def scheme. Add(*numbers): if not all. Numbers(numbers): raise Scheme. Error, "prim + - non-numeric arg” return sum(numbers)

Setting up the initial environment def setup. Environment(): PRIME_PROCEDURES = [ ["car", pair. car], ["cdr", pair. cdr], ["+", scheme. Add], . . . ] init_env = env. extend. Environment( pair. NIL, env. THE_EMPTY_ENVIRONMENT) for name, proc in PRIME_PROCEDURES: p = cons(symbol("primitive"), cons(proc, NIL)) define. Variable(symbol(name), p, env) define. Variable(symbol("#t"), init_env) define. Variable(symbol("#f"), init_env) return initial_environment