Clojure 4 Concurrency 29 Sep20 Concurrency n Clojure

Concurrency n Clojure supports concurrency, and most values are immutable n n Clojure also

Atomicity n An action is atomic if it happens “all at once” from the

Refs and STM n A ref is a mutable reference to an immutable value

Updating a reference variable n Reference variables can only be updated in a transaction

alter n The alter method provides a somewhat more readable way to update a

How STM works n A transaction takes a private copy of any reference it

Side effects n alter must: n n n Be completely free of side effects

Pessimistic locking n Java’s approach is pessimistic--access to shared state is always locked (if

Optimistic evaluation n Clojure’s approach is optimistic--it assumes contention might happen, but probably won’t

Everybody’s first example n (def account 1 (ref 1000)) (def account 2 (ref 1500))

Atoms n n n Refs are used to coordinate multiple changes, which must happen

Validation and metadata n n The keyword : validator introduces a validation function, while

Agents n n A agent is like an Erlang actor that holds (or is?

More about agents n When you “create” an agent, you are getting a thread

Agents and errors n Agents can have validating functions and metadata n Example: (def

Calling Java n n n Clojure has good support for calling Java Our interest

vars n def and defn create vars with a root binding n n Usually,

Watches n n A watch is a function that is called whenever a state

Automatic parallelism n The function pmap is just like map, except that the function

Futures and promises n A future is a computation, running in a single thread,

An aside: memoization n n Memoization is keeping track of previously computed values of

Summary: When to use what n Refs n n Atoms n n When you

Structs I n n A struct is something like an object, more like a

Structs II n Structs are maps, and may be accessed like maps n n

Slides: 26

Download presentation

Clojure 4 Concurrency 29 -Sep-20

Concurrency n Clojure supports concurrency, and most values are immutable n n Clojure also has a form of shared state n n n Clojure has agents, which are similar to Erlang’s actors Concurrency and shared state don’t play well together The Java approach, using locks (synchronization), is very error-prone Closure uses a new approach, Software Transactional Memory (STM), to managing concurrency n This is similar to a database transaction

Atomicity n An action is atomic if it happens “all at once” from the point of view of other threads n n That is, the action either has happened or it hasn’t; it’s not in the middle of happening Few actions are actually atomic n For example, x++ consists of three parts: get the value of x, add 1 to the value, put the new value back in x n n If a spacecraft moves from point (x, y, z) to point (x', y', z'), these values must be updated “simultaneously” n n If another thread access x during the operation, the results are unpredictable The spacecraft is never at point (x', y, z) In Java, the approach is to “lock” the variables being updated, thus denying access to other threads n n It is extremely difficult, with this model, to write correct programs Efficiency isn’t all that easy, either

Refs and STM n A ref is a mutable reference to an immutable value n n To find the value, you have to dereference the variable n n n That is, the data remains immutable, but you can change what data the refers to Basic syntax: (ref initial-state) Typical use: (def ref-variable (ref initial-state)) Syntax: (deref ref-variable ) Syntactic sugar: @ref-variable A reference variable is like an “ordinary” variable in an “ordinary” language—its value can be changed n However, there are restrictions on where it can be changed

Updating a reference variable n Reference variables can only be updated in a transaction n n Basic syntax: (dosync expression. . . expression) Typical use: (dosync (ref-set ref-variable new-value)) n n A transaction is: n n Atomic -- From outside the transaction, the transaction appears instantaneous: It has either happened, or it hasn’t Consistent -- With some additional syntax, a reference can specify validation conditions n n n ref-set is basically an “assignment” to a reference variable If the conditions aren’t met, the transaction fails Isolated -- Transactions cannot see partial results from other transactions A transaction is not: n n n Durable -- Results are not saved for a future run of the program Databases are ACID: Atomic, Consistent, Isolated, and Durable Software transactions are only “ACI”

alter n The alter method provides a somewhat more readable way to update a reference variable n n Due to the way alter works, when you write an updating function, it should have thing being updated as the first argument n n n Syntax: (alter ref-variable function args-to-function) Typical usage: (dosync (alter ref-variable function args)) The return value of alter is the new value of ref-variable (defn function [thing-to-update other-arguments] function-body) When you cons something to a list, the list is the second parameter, which is not what you need There is an additional function, conj, which is like cons with the arguments reversed n (conj sequence item) == (cons item sequence)

How STM works n A transaction takes a private copy of any reference it needs n n Since data structures are persistent, this is not expensive The transaction works with this private copy n n n If the transaction completes its work, and the original reference has not been changed (by some other transaction), then the new values are copied back atomically to the original reference But if, during the transaction, the original data is changed, the transaction will automatically be retried with the changed data However, if the transaction throws an exception, it will abort without a retry

Side effects n alter must: n n n Be completely free of side effects Return a purely functional transformation of the ref value alter can only be done inside a transaction n n A transaction may be retried multiple times Therefore: If the transaction updates state, the update may happen many times

Pessimistic locking n Java’s approach is pessimistic--access to shared state is always locked (if correctly done!) n n n Locking is expensive In most scenarios, contention happens only occasionally Therefore, the expense of locking is usually unnecessary But… Locking must be done anyway because another thread might try to access the shared state

Optimistic evaluation n Clojure’s approach is optimistic--it assumes contention might happen, but probably won’t happen n n n Transactions begin immediately, without locking Because data is persistent (immutable), making a private copy is much less expensive than locking mutable data Clojure guarantees that every transaction will eventually finish, so deadlock does not occur When the transaction completes, the results are copied back atomically Therefore, locking never happens unnecessarily In a high concurrency, high contention scenario, a transaction could be tried many times before it finally succeeds or aborts n In this situation, locking is a more efficient approach

Everybody’s first example n (def account 1 (ref 1000)) (def account 2 (ref 1500)) (defn transfer "Transfers amount of money from a to b" [a b amount] (dosync (alter a - amount) (alter b + amount) ) ) (transfer account 1 account 2 300) (transfer account 2 account 1 50) n Practical Clojure, Luke Vanderhart and Stuart Sierra, p. 101

Atoms n n n Refs are used to coordinate multiple changes, which must happen all at once or not at all Atoms are like refs, but only change one value, and need not occur within a transaction n Syntax: (atom initial-value) n Typical usage: (def atom-name (atom initial-value)) Atoms are accessed just like refs n (deref atom-name) or @atom-name (reset! atom-name new-value) will change the value of an atom, and return the new value (swap! atom-name function additional-arguments) calls the function with the current value of the atom and any additional arguments, and returns the new value n n Like alter, swap! may be retried multiple times, so should have no side effects Atoms are less powerful than refs, but also less expensive

Validation and metadata n n The keyword : validator introduces a validation function, while the keyword : meta introduces a map of metadata Syntax: n n n The validator function is used when a transaction is attempted n n (ref initial-value : validator-fn : metadata-map) (atom initial-value : validator-fn : metadata-map) If a validator function fails, the ref or atom throws an Illegal. State. Exception, and the transaction doesn’t happen Metadata is data about the data, for example, the source of the data, or whether it is serializable n n n Metadata is not considered in comparisons such as equality testing There are methods for working with metadata Metadata is outside the scope of this lecture

Agents n n A agent is like an Erlang actor that holds (or is? ) a value An agent is created with an initial value: n n n You can send an agent a function to update its state: n n Syntax: (agent initial-state) Typical use: (def agent-name (agent initial-state)) (send agent-name update-function arguments) The value of the send is the agent itself, not the value of the function (except, possibly, in the REPL) You can check the current state of an agent with deref or @ You can wait for agents to complete: n (await agent-name-1. . . agent-name-N) n n This is a blocking function, and it could block forever (await-for timeout-millis agent-name-1. . . agent-name-N) n This is a blocking function; it returns nil if it times out, otherwise non-nil

More about agents n When you “create” an agent, you are getting a thread from a Clojuremanaged thread pool n n n send-off has the same syntax and semantics as send, but is optimized for slow processes (such as I/O) If you use a send (or send-off) within a transaction, it is held until the transaction completes n n agent starts the agent running concurrently This prevents the send from occurring multiple times I don’t think there is a way to stop an individual agent n n Clojure will not terminate cleanly if there are still agents running The function (shutdown-agents) tells all agents to finish up their current tasks, refuse to accept any more tasks, and stop running

Agents and errors n Agents can have validating functions and metadata n Example: (def counter (agent 0 : validator number? )) If you send bad data to an agent, n The send returns immediately, without error n The error occurs when you dereference the agent n All further attempts to query the agent will give an error (agent-errors agent-name) will return a sequence of errors encountered by the agent (clear-agent-errors agent-name) will make the agent useable again (set-validator! agent-name validator-fn) adds a validator to an existing agent n n n (agent initial-state : validator-fn : metadata-map)

Calling Java n n n Clojure has good support for calling Java Our interest here is using Java Threads as an alternative to agents The following syntax (notice the dots) creates a new Thread, passes it a function to execute, and starts it: n n user=> (defn foo 10) user=> (. start (Thread. (fn [] (println foo)))) def and defn create vars (also called dynamic vars) with an initial value, or root binding The root binding is available to all Threads (which is why the above works)

vars n def and defn create vars with a root binding n n Usually, this is the only binding You can use the binding macro to create thread-local bindings n n Syntax: (binding [var value. . . var value] expression. . . expression) The value of the binding is the value of the last expression Bindings can only be used on vars that have been defined by def at the top level The bindings have dynamic scope--within the expressions and all code called from those expressions n n This differs from let, which has lexical scope Thread-local bindings can be updated with set!

Watches n n A watch is a function that is called whenever a state changes A watch can be set on any identity (atoms, refs, agents, and vars) n n To add a watch: (add-watch identity key watch-function) n n n The key may be any value that is different from the key for any other watcher on this same identity To define a watch function: (defn function-name [key identity old-val new-val] expressions) When the watch function is called, it is given the old and new values of the identity of the update that caused the change n n For vars, the watch is only called when the root binding changes Other updates may have happened in the meantime To remove a watch function: (remove-watch identity key)

Automatic parallelism n The function pmap is just like map, except that the function is applied to the sequence values in parallel n n The number of threads used depends on the number of CPUs on the system pmap is “partially lazy”--it may run a bit ahead pvalues takes any number of expressions, and returns a lazy sequence of their values pcalls takes any number of zero-argument functions, and returns a lazy sequence of their values

Futures and promises n A future is a computation, running in a single thread, whose value will be required sometime in the future n n You can check if a future has completed with (future-done? future-name) You can get the value computed by a future with deref or @ n n n Syntax: (future expressions) Typical use: (def future-name (future expressions)) This will block until the future is completed A promise is a result that may not yet exist Threads may ask for the result, and block until it exists To create a promise, use (def promise-name (promise)) To deliver a value to a promise, use (deliver promise-name value) The value can be retrieved with deref or @

An aside: memoization n n Memoization is keeping track of previously computed values of a function, in case they are needed again Example: The Collatz function of 6 gives: 6 3 10 5 16 8 4 2 1 The Collatz function of 7 gives: 7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1 Example 2: The factorial of 100 is trivial to compute if you already know the factorial of 99 In Clojure, functions can be memoized: (def faster-collatz (memoize collatz)) n n n This will keep track of all previously computed values of collatz If this requires too much memory, you can write a more sophisticated method to keep track of only some values For obvious reasons, only pure functions can be memoized

Summary: When to use what n Refs n n Atoms n n When you absolutely must have mutable state Remember scope is dynamic, not lexical Validator functions n n Introduce asynchronous concurrency Vars n n Synchronous, independent updates Especially useful for memoized values Agents n n Synchronous, coordinated updates, using STM To maintain data integrity Watches n To trigger events when an identity’s value changes

Structs I n n A struct is something like an object, more like a map To define a struct: (defstruct name key. . . key) n n A struct may be created by calling struct with the correct arguments in the correct order n n Example: (def witches (struct book "Witches Abroad" "Pratchett")) A struct may be created by calling struct-map with key-value pairs in any order n n For example: (defstruct book : title : author) Example: (def witches (struct-map book : author "Pratchett" : title "Witches Abroad")) When creating a struct with struct-map, n n It is not necessary to supply a value for every key Additional keys and values may be included

Structs II n Structs are maps, and may be accessed like maps n n user=> (witches : title) "Witches Abroad" user=> (: author witches) "Pratchett" user=> (get witches : date "unknown") "unknown" Maps are, of course, immutable n assoc will return a new map based on an existing map, with new or replaced key-value pairs n n dissoc will return a new map with key-value pairs removed n n Syntax: (assoc map key value. . . key value) Syntax: (dissoc map key. . . key) (contains? map key) will test if the key occurs in the map

The End