User Authentication and Cryptographic Primitives Brad Karp UCL

User Authentication and Cryptographic Primitives Brad Karp UCL Computer Science CS GZ 03 / 4030 19 th November, 2007

Outline • Authenticating users – Local users: hashed passwords – Remote users: s/key – Unexpected covert channel: the Tenex passwordguessing attack • Symmetric-key-cryptography • Public-key cryptography usage model • RSA algorithm for public-key cryptography – – Number theory background Algorithm definition Cryptographic strength of RSA Ease of misusing RSA 2

Authentication of Local Users • Goal: only file’s owner can access file • UNIX authentication policy: – Each file has an owner principal: an integer user ID – Each file has associated owner permissions (read, write, execute, &c. ) – Each process runs with integer user ID; only can access file as owner if matches file’s owner user ID – OS assigns user ID to user’s shell process at login time, authenticated by username and password – Shell process creates new child processes with same user ID • How does UNIX know the correspondence among <username, user ID, password>, for all users? 3

Straw Man: Plaintext Password Database • Keep password database in a file, e. g. : bkarp: 3715: secretpw mjh: 4212: multicast • Passwords stored in file in plaintext • Make file readable only by privileged superuser (root) • /bin/login program prompts for usernames and passwords on console; runs as root, so can read password database • How well does this scheme meet original goal? 4

Cryptographic Primitive: Cryptographic Hash Function • Don’t want someone who sees the password database to learn users’ passwords • Cryptographic hash function, y=H(x) such that: – H() is preimage-resistant: given y, and with knowledge of H(), computationally infeasible to recover x – H() is second-preimage-resistant: given y, computationally infeasible to find x’ x s. t. H(x)=H(x’)=y • Widely used cryptographic hash functions: – MD-5: output is 128 bits, broken – SHA-1: output is 160 bits; on verge of being broken – SHA-256: output is 256 bits, best current practice 5

Better Plan: Hashed Password Database • Keep password database in a file: bkarp: 3715: Xc 8 z. OP 0 ZHJkp mjh: 4212: p 6 Fs. At. Ql 4 cwi • Instead of password plaintext x, store H(x) • Make file readable by all (!) • One-wayness of H() means no one can recover x from H(x), right? – WRONG! Users choose memorable passwords… 6

Insight: Counting Possible Passwords • If users pick random n-character passwords using c possible characters, how many guesses expected to guess one password? cn/2 e. g. , 8 characters, each ~90 possibilities, 4. 3 x 1015 • Do users pick random passwords? – Of course not; very hard to remember – Common choice: word in native language • How many words in common use in modern English? – 50, 000 -70, 000 (or far fewer, if you read Metro) 7

Dictionary Attack on Hashed Password Databases • Suppose hacker obtains copy of password file (until recently, world-readable on UNIX) • Compute H(x) for 50 K common words • String compare resulting hashed words against passwords in file • Learn all users’ passwords that are common English words after only 50 K computations of H(x)! • Same hashed dictionary works on all password files in world! 8

Salted Password Hashes • Generate a random string of bytes, r • For user password x, store [H(r, x), r] inattacker Dictionary attack still possible after password file! sees password • Result: samepick password produces different Users should passwords that aren’tresult close every machine to on dictionary words. – So must see password file before can hash dictionary – …and single hashed dictionary won’t work for multiple hosts • Modern UNIX: password hashes salted; hashed password database readable only by root 9

Tenex Password Attack: A Covert Channel • Tenex OS stored directory passwords in plaintext • OS supported system call: Lessons: – pw_validate(directory, pw) Don’t store passwords cleartext. • Implementation simply in compared pw to stored Covert channels are real, can be password in directory, charand by char extremely difficult to find and eliminate. • Clever attack: – Make pw span two VM pages, put 1 st char of guess in first page, rest of guess in second page – See whether get a page fault—if not, try next value for 1 st char, &c. ; if so, first char correct! – Now position 2 nd char of guess at end of 1 st page, &c. – Result: guess password in time linear in length!10

Remote User Authentication • Consider the case where Alice wants to log in remotely, across LAN or WAN from server • Suppose network links can be eavesdropped by adversary, Eve • Want scheme immune to replay: if Eve overhears messages, shouldn’t be able to log in as Alice by repeating them to server • Clear non-solutions: – Alice logs in by sending {alice, password} – Alice logs in by sending {alice, H(password)} 11

Remote User Authentication (2) • Desirable properties: – Message from Alice must change unpredictably at each login – Message from Alice must be verifiable at server as matching secret value known only to Alice • Can we achieve these properties using only a cryptographic hash function? 12

Remote User Authentication: s/key • Denote by Hn(x) n successive applications of cryptographic hash function H() to x – i. e. , H 3(x) = H(H(H(x))) • Store in server’s user database: alice: 99: H 99(password) • At first login, Alice sends: {alice, H 98(password)} • Server then updates its database to contain: alice: 98: H 98(password) • At next login, Alice sends: {alice, H 97(password)} – and so on… 13

Properties of s/key • Just as with any hashed password database, Alice must store her secret on the server securely (best if physically at server’s console) • Alice must choose total number of logins at time of storing secret • When logins all “used”, must store new secret on server securely again 14

Secrecy through Symmetric Encryption • Two functions: E() encrypts, D() decrypts • Parties share secret key K • For message M: – E(K, M) C – D(K, C) M • M is plaintext; C is ciphertext • Goal: attacker cannot derive M from C without K 15

Idealized Symmetric Encryption: One-Time Pad • Secretly share a truly random bit string P at sender and receiver • Define as bit-wise XOR • C = E(M) = M P • M = D(C) = C P • Use bits of P only once; never use them again! 16

Stream Ciphers: Pseudorandom Pads • Generate pseudorandom bit sequence (stream) at sender and receiver from short key • Encrypt and decrypt by XOR’ing message with sequence, as with one-time pad • Again, never, ever re-use bits from pseudorandom sequence! • What’s wrong with reusing the stream? – – Alice Server: c 1 = E(s, “Visa card number”) Server Alice: c 2 = E(s, “Transaction confirmed”) Suppose Eve hears both messages Eve can compute: m = c 1 c 2 “Transaction confirmed” 17

Symmetric Encryption: Block Ciphers • Divide plaintext into fixed-size blocks (typically 64 or 128 bits) • Block cipher maps each plaintext block to same-length ciphertext block • Best today to use AES (others include Blowfish, DES, …) • Of course, message of arbitrary length; how to encrypt message of more than one block? 18

Using Block Ciphers: ECB Mode • Electronic Code Book method • Divide message M into blocks of cipher’s block size • Simply encrypt each block individually using the cipher • Send each encrypted block to receiver • Presume cipher provides secrecy, so attacker cannot decrypt any block • Does ECB mode provide secrecy? 19

Avoid ECB Mode! • ECB mode does not provide robust secrecy! • What if there are repeated blocks in the plaintext? Repeated as-is in ciphertext! • What if sending sparse file, with long runs of zeroes? Non-zero regions obvious! • WW II U-Boat example (Bob Morris): – Each day at same time, when no news, send encrypted message: “Nichts zu melden. ” – When there’s news, send the news at that time. – Obvious when there’s news – Many, many ciphertexts of same known plaintext made available to adversary for cryptanalysis—a worry even if encryptions of same plaintext produce different ciphertexts! 20

Using Block Ciphers: CBC Mode • Better plan: make encryptions of successive blocks depend on one another, and initialization vector known to receiver 21

Integrity with Symmetric Crypto: Message Authentication Codes • How does receiver know if message modified en route? • Message Authentication Code: – Sender and receiver share secret key K – On message M, v = MAC(K, M) – Attacker cannot produce valid {M, v} without K • Append MAC to message for tamper-resistance: – Sender sends {M, MAC(K, M)} – M could be ciphertext, M = E(K’, m) – Receiver of {M, v} can verify that v = MAC(K, M) • Beware replay attacks—replay of prior {M, v} by Eve! 22

HMAC: A MAC Based on Cryptographic Hash Functions • HMAC(K, M) = H(K opad. H(K ipad. M)) • where: –. denotes string concatenation – opad = 64 repetitions of 0 x 36 – ipad = 64 repetitions of 0 x 5 c – H() is a cryptographic hash function, like SHA 256 • Fixed-size output, even for long messages 23

Public-Key Encryption: Interface • Two keys: – Public key: K, published for all to see – Private (or secret) key: K-1, kept secret • Encryption: E(K, M) {M}K • Decryption: D(K-1, {M}K) M • Provides secrecy, like symmetric encryption: – Can’t derive M from {M}K without knowing K-1 • Same public key used by all to encrypt all messages to same recipient – Can’t derive K-1 from K 24

Number Theory Background: Modular Arithmetic Primer (1) • Recall the “mod” operator: returns remainder left after dividing one integer by another, the modulus – e. g. , 15 mod 6 = 3 • That is: a mod n = r which just means a = kn + r for some integers k and r • Note that 0 <= r < n 25

Modular Arithmetic Primer (2) • In modular arithmetic, constrain range of integers to be only the residues [0, n-1], for modulus n – e. g. , (12 + 13) mod 24 = 1 – We may also write • Modular arithmetic retains familiar properties: commutative, associative, distributive • Same results whether mod taken at each arithmetic operation, or only at end, e. g. : (a + b) mod n = ((a mod n) + (b mod n)) mod n (ab) mod n = (a mod n)(b mod n) mod n 26

Modular Arithmetic: Advantages • Limits precision required: working mod n, where n is k bits long, any single arithmetic operation yields at most 2 k bits – …so resultsleverages of even seemingly expensive ops, Cryptography “difficult” x) fit in same number of like exponentiation (a operations; want reversing encryption bits as original without key to be operand(s) computationally – Lower precision means faster arithmetic intractable! • Some operations in modular arithmetic are computationally very difficult: – e. g. , computing discrete logarithms: find integer x s. t. 27

Modular Arithmetic: Inverses (1) • In real arithmetic, every integer has a multiplicative inverse—its reciprocal—and their product is 1 – e. g. , 7 x = 1 x = (1/7) • What does an inverse in modular arithmetic (say, mod 11) look like? – that is, 7 x = 11 k + 1 for some x and k – so x = 8 (where k = 5) 28

Aside: Prime Numbers • Recall: prime number is integer > 1 that is evenly divisible only by 1 and itself • Two integers a and b are relatively prime if they share no common factors but 1; i. e. , if gcd(a, b) = 1 • There are infinitely many primes • Large primes (512 bits and longer) figure prominently in public-key cryptography 29

Modular Arithmetic: Inverses (2) • In general, finding modular inverse means Algorithm to find modular inverse: extended finding x s. t. Euclidean Algorithm. Tractable; requires • Does modular inverse always exist? O(log n) divisions. – No! Consider • In general, when a and n are relatively prime, modular inverse x exists and is unique • When a and n not relatively prime, x doesn’t exist • When n prime, all of [1…n-1] relatively prime to n, and have an inverse in that range 30

Euler’s Phi Function: Efficient Modular Inverses on Relative Primes • φ(n) = number of integers < n that are relatively prime to n • If n prime, φ(n) = n-1 • If n=pq, where p and q prime: φ(n) = (p-1)(q-1) • If a and n relatively prime, Euler’s generalization of Fermat’s little theorem: aφ(n) mod n = 1 • and thus, to find inverse x s. t. x = a-1 mod n: x = aφ(n)-1 mod n 31

RSA Algorithm (1) • [Rivest, Shamir, Adleman, 1978] • Recall that public-key cryptosystems use two keys per user: – K, the public key, made available to all – K-1, the private key, kept secret by user 32

RSA Algorithm (2) • Choose two random, large primes, p and q, of equal length, and compute n=pq • Randomly choose encryption key e, s. t. e and (p-1)(q-1) are relatively prime • Use extended Euclidean algorithm to compute d, s. t. d = e-1 mod ((p-1)(q-1)) • Public key: K = (e, n) • Private key: K-1 = d • Discard p and q 33

RSA Algorithm (3) • Encryption: – Divide message M into blocks mi, each shorter than n – Compute ciphertext blocks ci with: ci = mie mod n • Decryption – Recover plaintext blocks mi with: mi = cid mod n 34

Why Does RSA Decryption Recover Original Plaintext? • Observe: cid = (mie)d = mied = mik(p-1)(q-1)+1 (mod n) – because e and d are inverses mod (p-1)(q-1) • mik(p-1)(q-1)+1 = mimik(p-1)(q-1) = (mi)(1) = mi (mod n) – by Euler’s generalization of Fermat’s little theorem: • n prime, so mi and n relatively prime • thus mi(p-1)(q-1) = miφ(n) = 1 (mod n) 35

Misuses of RSA Break Secrecy • When encrypting, what if plaintext drawn from very small set (e. g. , {“yes”, “no”})? • Employees escrow secret documents, encrypted with company’s public key – Upon firing or death of one employee, company releases plaintext to another – Employee E takes employee A’s ciphertext c = me mod n, escrows c 2 e mod n – Employee E fired; co-conspirator F gets 2 m! • Chosen ciphertext attack (CCA): eavesdrop a ciphertext c; submit specially concocted messages for decryption; study resulting plaintexts; learn plaintext, m = cd mod n 36

RSA: Not Quite Exponentiation • At first glance, RSA operations appear to be raising a message to a power • But they’re not, really…the mod n means RSA in fact a trap-door permutation – Map one element, m, of set {0, …, n-1} to another, c – Not invertible without knowing d • Non-invertibility applies to whole of m and c; not to individual bits of m and c, or other properties over m and c, e. g. , parity of m – In escrow attack, multiplicative relationship among RSA ciphertexts exists, despite non-invertibility • It’s possible that learning even one bit of m may help recover all of m from c 37

Adaptive Chosen Ciphertext Attack on RSA in SSL 3. 0 • SSL 3. 0 encrypted with RSA by padding plaintext into blocks using PKCS #1 standard, as follows: – 0 x 00 | 0 x 02 | 8 or more non-zero random bytes | 0 x 00 | plaintext block • SSL decrypts received ciphertext, checks if result in this format; returns “format error” if not! • Bleichenbacher’s adaptive CCA attack: with about one million messages to server, attacker can recover m for previously eavesdropped ciphertext c = me mod n – When chosen ciphertext accepted by server, attacker 38 knows first two plaintext bytes with certainty!

Making RSA Secure Against Adaptive CCA Attacks • Intuition: want plaintext input to RSA to be all-or -nothing transform of actual message – e. g. , so that multiplicative property over ciphertexts doesn’t reveal message, and knowing one bit doesn’t reveal anything about whole message • Desirable transform properties: – Randomness: unique plaintext for repeated identical messages – Redundancy: make most strings invalid ciphertexts – Entanglement: knowing partial information about input to RSA should reveal nothing about message – Invertibility: of course, must be able to recover original message when decrypting 39

Practical Padding for RSA: OAEP+ [Shoup] • Transforms message M into RSA input M’ • Not proven adaptive CCA secure, but heuristically so 40

Digital Signatures with RSA • RSA trap-door permutation also useful for digital signatures • Public-key signature operations: – Sign: S(K-1, m) {m}K-1 – Verify: V(K, {m}K-1, m} {true, false} • Provides integrity, like a MAC: – Cannot produce valid <m, {m}K-1> pair without knowing K-1 • With RSA: – Sign using private key, using trap-door applied when decrypting – Verify using public key, using permutation applied when encrypting 41

Multiplicative Attack Against RSA Signatures • As in CCA, attacker may try to exploit multiplicative relationship among RSA Lesson: permutation inputs and outputs, to decrypt Don’t sign whole messages presented to you ciphertexts byeavesdropped others! • Eve stores ciphertext c encrypted for Alice, wants to recover corresponding m • Using Alice’s public key, {n, e}, Eve: – – – Chooses random number r < n Computes y = cre mod n Eve asks Alice to sign y Alice sends Eve yd mod n = cdred mod n = rcd mod n Eve computes r-1 mod n, then recovers m = cd mod n = r-1 rcd mod n 42

Only Sign Message Hashes with RSA! • Again, want all-or-nothing transform over message before signing with trap door • Full-domain hash: – Before signing message, compute hash of message sized to be same number of bits as RSA modulus n – Sign the hash, not the message – Hash reveals nothing about underlying message, nor messages arithmetically related to it 43

Costs of Cryptography • Public-key operations significantly more computationally expensive than symmetric-key ones • Modern CPU can symmetrically encrypt and MAC faster than 100 Mbps • Public-key encryption typically 100 X slower than symmetric crypto – This relationship changes as hardware changes! • Result: tend to use public-key encryption and signatures only on short messages 44

Hybrid Cryptography • Goal: mix speed of symmetric-key flexibility of public-key cryptography • Send symmetric key encrypted with public key; message encrypted with symmetric key 45

Pitfall: Public Key Provenance • Suppose client wishes to know it’s talking to particular server • Where does client get server’s public key? • How does client know it has correct public key for real server, and not attacker? • Man-in-the-middle attack: – Client connects to attacker – Attacker gives client attacker’s public key – Client believes communicating with real server 46

Further Reading • The MIT Guide to Picking Locks • Schneier, Bruce, Applied Cryptography, 2 nd ed. • Bleichenbacher, Daniel, Chosen Ciphertext Attacks Against Protocols Based on the RSA Encryption Standard PKCS #1 47