Security Trusting Trust Swarun Kumar Based on slides

Security & Trusting Trust Swarun Kumar Based on slides courtesy: Jorge Simosa MIT 6. 033 Spring 2013

Reflections on Trusting Trust • By Ken Thompson (UNIX paper, co-invented C and UNIX) • Key Ideas: • Difficult to know what the software you use actually does. • So write all of software yourself! • . . . but that’s overwhelmingly impractical! • No choice but to trust software from certain sources.

What’s a compiler? • Transforms a code written in one language to another … usually from higher-level language (e. g. C) into machine code • New Compilers provide new features, new optimizations Input: Source Code App. c (written in C. 1) Program: A Compiler (e. g. gcc) C. 1 Output: New Compiler Binary App. exe

But where do compilers come from? • To write a new compiler say for C. 2 (C version 2. 0) • Write the program source for C. 2 in the C. 1 language • Feed it into the C. 1 compiler • Resulting binary is a new compiler C. 2 • Chicken-and-egg: But where did the first compiler C. 0 come from? Input: Source Code C. 2 (written in C. 1) Program: Old Compiler C. 1 Output: New Compiler Binary C. 2

Example: C. 2 has a new feature! • C. 2: Accepts vertical tab ‘v’ as a special character, like ‘n’ and ‘t’ • Source of C. 2 (written in C. 1): if(c[0] == ‘\’ && c[1] == ‘n’) return ‘n’; if(c[0] == ‘\’ && c[1] == ‘v’) return 11; // Note we are using 11, since C. 1 doesn’t recognize ‘v’ • Output: The new compiler (C. 2) can compile programs with ‘v’

$C. 2 Input: Source of C. 2 written in C. 1 if(c[0] == ‘\’$

C. 2 Input: Source of C. 2 written in C. 1 if(c[0] == ‘\’ && c[1] == ‘n’) return ‘n’; if(c[0] == ‘\’ && c[1] == ‘v’) Program: Output: C. 1 C. 2 return 11; Interestingly, C. 2 can now compile itself! C. 2 Input: Source of C. 2 written in C. 2! if(c[0] == ‘\’ && c[1] == ‘n’) return ‘n’; if(c[0] == ‘\’ && c[1] == ‘v’) return ‘v’; Program: Output: C. 2

So, can we discard old source of C. 1/C. 2? No! • Because: C. 2 may contain a hidden backdoor! e. g. a “master” password for all UNIX logins • So what? … Why not patch that up in C. 3? • C. 2 can infect C. 3, C. 4, etc… without leaving any evidence in their source code

How can C. 2 infect other compilers? • Source of C. 2+Trojan: if(matches login code) return (login with master password); if(matches C compiler) return (C compiler with hidden backdoor); • Input: Login Code => Output: Login with master password • Input: C. 3 compiler => Output: C. 3 compiler backdoor • Input: Regular C code => Output: Regular executable

How can we detect Trojans? Output-1 C. 3 C. 2 C. 3 Output-1 C. 3 Output-2 C. 3 C. 2’ C. 3’ Do we expect C. 3 == C. 3’ ? Not necessary • C. 2, C. 2’ may differ in optimizations • But must be functionally identical C. 3’’ Output-2 C. 3’’’ Do we expect C. 3’’ == C. 3’’’ ? Yes, absolutely! • C. 3, C. 3’ give same output with same input • If not, one of C. 2, C. 2’ has a Trojan/bug

Quiz 3 (2010) – Q 8 Answer True/False based on the Trusting Trust paper: A. Thompson believes that self-reproducing programs shouldn’t be trusted. Answer: False. He doesn’t say anything about making them more or less trustworthy. Talks about programs in general.

Quiz 3 (2010) – Q 8 Answer True/False based on the Trusting Trust paper: B. A Trojan horse like the one Thompson describes could not have been hidden in a compiler for a more modern language like Java. Answer: False. Backdoor is not language-specific.

Quiz 3 (2010) – Q 8 Answer True/False based on the Trusting Trust paper: C. The Trojan horse Thompson embedded in the login program could have been found by looking at the machine instructions being executed by the CPU. Answer: True. Even though it might take a long time to figure out what the binary is doing.

Quiz 3 (2010) – Q 8 Answer True/False based on the Trusting Trust paper: D. A programmer can prevent the type of attack Thompson describes by writing all of his or her programs in assembly code. Answer: False. Assembly code is still considered a “higher-language”, since it must be translated to machine code/instructions through an assembler.

Quiz 3 (2012) – Q 13 Ben has Ken’s compiler (B) and its “supposed” source (S). He wants to know if it still has the login Trojan. His friend Alyssa has a clean compiler binary (A). The source code for the UNIX login program is L. Give an example of two compilation chains that can be compared to detect a possible Trojan? Notation: X -> Y is the result of using binary X to compile source Y • B -> S = A -> S üNO, they might make different optimizations, i. e. not the same output

Quiz 3 (2012) – Q 13 Ben has Ken’s compiler (B) and its “supposed” source (S). He wants to know if it still has the login Trojan. His friend Alyssa has a clean compiler binary (A). The source code for the UNIX login program is L. Give an example of two compilation chains that can be compared to detect a possible Trojan? Notation: X -> Y is the result of using binary X to compile source Y • B -> S = A -> S üYES, if A and B have no Trojans, the intermediate output (new binary) should produce the same output when using the same input (S)

Quiz 3 (2012) – Q 13 Ben has Ken’s compiler (B) and its “supposed” source (S). He wants to know if it still has the login Trojan. His friend Alyssa has a clean compiler binary (A). The source code for the UNIX login program is L. Give an example of two compilation chains that can be compared to detect a possible Trojan? Notation: X -> Y is the result of using binary X to compile source Y • B -> S = A -> S üYES, since B should already be a compiled version of S, we can skip the step of B -> S

Quiz 3 (2012) – Q 13 Ben has Ken’s compiler (B) and its “supposed” source (S). He wants to know if it still has the login Trojan. His friend Alyssa has a clean compiler binary (A). The source code for the UNIX login program is L. Give an example of two compilation chains that can be compared to detect a possible Trojan? Notation: X -> Y is the result of using binary X to compile source Y • B -> S -> L = A -> S -> L üYES, similar to second answer, we can instead feed just the login source

Quiz 3 (2012) – Q 13 Ben has Ken’s compiler (B) and its “supposed” source (S). He wants to know if it still has the login Trojan. His friend Alyssa has a clean compiler binary (A). The source code for the UNIX login program is L. Give an example of two compilation chains that can be compared to detect a possible Trojan? Notation: X -> Y is the result of using binary X to compile source Y • B -> L = A -> S -> L üYES, similar to fourth answer, but we can skip the step of B -> S

More Past Quizzes (Trusting Trust) Visit http: //web. mit. edu/6. 033/www/assignments/quiz-3. shtml • 2012 Q 3 - #13 (Section 6) • 2010 Q 3 - #8 • 2010 Q 3 - #13 -15 (Section 3) • 2008 Q 3 - #5 (Section 3) • 2006 Q 3 - #2 *There may be more that I have accidentally overlooked.

Security (Part 2)

Secure Channels • Alice wants to authenticate message m sent to Bob • First cut for security: Let k be a shared key • Then Alice, besides m, sends y = H(“m|k”) where | is a delimiter • Bob verifies if y == H(“m|k”), since he also has k How can Alice and Bob securely exchange the key k?

Diffie-Hellman key exchange Bob Alice random a g a mod p gb mod k = (gb)a = gba mod p p random b k = (ga)b = gab mod p Both Alice and Bob have the same key k, without sending it on the network

Taking it a step further… • Use Public/Secret Keys (… like many of you in DP 2) • Can use a PK/SK to authenticate the shared key exchange • Can use PK/SK based signatures • Many more attacks possible (DOS, TCP SYN flooding, Botnets) Security is an arms-race… So, Fewer assumptions in threat model => stronger security

GOOD LUCK ON QUIZ 2!