Good Nulls Bad Nulls John Maenpaa Health Care
Good Nulls, Bad Nulls John Maenpaa Health Care Service Corporation Session Code: E 2 Mon, November 14, 2016 (12: 00 – 13: 00) | Platform: Cross Platform
Objectives • What does a null mean? We'll cover the reasons for using nulls. We'll also discuss good ways to handle nulls where there might be multiple reasons for the null value. • Discussion of good nulls will cover example cases where we really need to have nullable columns in our databases. • Discussion of bad nulls will cover example cases where the use of nulls may be detrimental. • How does development language selection impact our use of nulls? In the early days, COBOL programmers and their DBAs wanted to avoid nulls. What about now? How do nulls impact application and query performance? 2
Agenda • • • Why Nulls? Good Nulls Bad Nulls in Programs Performance Impacts 3
DB 2 is one of the few relational DBMS products that represents missing information independently of the type of the data that is missing—a requirement of the relation model and a requirement for ease of use. E. F. Codd 4
Missing Data in Business • Forms are incomplete or unreadable • Some data is good, some is bad • Must store work in progress • Errors requiring correction • Invalid date for employee birth date • Item is not required • Some data just doesn’t apply • Apartment number for some addresses • Termination date for active employees • SSN foreign individuals 5
Missing Data in Database • Missing Rows • Applicable but not known • Not inserted yet • Inapplicable • Do not require the data • Missing Columns • Applicable but not known • Not updated yet • Inapplicable • Do not require the data item (for this row) 6
N/A • Not Available • • Missing Unknown, but knowable Temporary (possibly) Invalid • Not Applicable • Omitted • Unknowable • Permanent 7
What does a null mean? • Missing • • • Unknown Applicable, but not available Not applicable (inapplicable) Not required Omitted • Not “Something” • Not “Nothing” • Neither “True” nor “False” 8
Logical Data Models • Required or Not Required • Used to capture business need for a data item • Just because they require it for a process does not mean we will always have it. • Required ≠ Not Null • We may require a selection for a multiple choice test question, but it may not have been entered. • Not Required ≠ Nullable • We may not require an apartment number for an address, but blanks are probably better than using a null. 9
Conceptual Nulls • • Zero Empty Set Vacuum Absence Weightlessness (null gravity) Nil Negative Undone 10
Technical Nulls • • • Null Objects Low Values (x’ 00’) Empty Strings Null Terminated Strings -- delimiter character (x’ 00’) Specific values set aside to indicate absence • These are not DBMS Nulls! • Though language/framework may translate between these structures and nulls. 11
Handling Multi-Purpose Nulls • Applicable but Missing • Switch to record a condition (CONDITION_SW) • Y, N or unknown • Not Applicable or Not Required • Switch to indicate a condition is required • Y, N (CONDITION_REQ_SW) 12
Handling Multi-Purpose Nulls • Missing Gender • M, F, or null • Being more specific • Gender not applicable switch • Y, N (GENDER_INAPPLICABLE_SW) • Gender entry code • blank = Entered • A = Applicable but Missing • I = Inapplicable 13
Null Comparisons • General Comparisons always yield NULL • Any value compared against an unknown • Equal, Not Equal, Greater Than, Less Than, etc • Null-specific comparisons may yield True or False • IS NULL • IS NOT NULL • IS DISTINCT FROM • If we look at the null as “maybe” • Those regular value comparisons give us “maybe” • Because they are not definitively true we get false 14
Three-Valued Logic • Nulls leave boolean logic behind • We now have unknown (or maybe) as an option • In practical SQL terms: • We only get rows where the predicate is TRUE 15
Four-Valued Logic • Relational Model Version 2 • Defines “marks” to replace nulls • A for Applicable but missing • I for Inapplicable and missing 16
A much more serious problem is the fact that using a valid data value to represent missing information requires explicit defensive code to be written into every application that interacts with the database. Don Chamberlin 17
The Lure of Default Values • Max. Mind – Database of IP Address to Geolocation • Country Defaults • USA default is (38. 0000, -97. 0000) • 600 million IP addresses associated • And a family gets terrorized for 14 years • State Default • In Virginia, home has 17 million addresses • Town has a few large data centers • Even the government went to the house looking for criminals • http: //fusion. net/story/287592/internet-mapping-glitch-kansas-farm/ • Null Island – location (0, 0) • http: //www. wsj. com/articles/if-you-cant-follow-directions-youll-end-up -on-null-island-1468422251 18
Using COALESCE to ignore nulls • The COALESCE function gives us a way to convert nulls to a default value as needed SELECT , FROM ORDER COALESCE(FILE_SIZE, 0) FILE_NAME FILES BY 1 • Useful for some situations • Scary when over used 19
Input Forms • Data Types • Date, Timestamp • Decimal, Integer, Float • Reasons • • • Invalid Incomplete Unreadable Not Applicable Accuracy 20
Aggregate Functions • Numeric Columns • • Currency Volumes Quantities Sizes 21
Aggregate Functions and Nulls • What is the average, minimum and maximum File Size? SELECT , , FROM AVG(FILE_SIZE) MIN(FILE_SIZE) MAX(FILE_SIZE) FILES • Not Applicable to all Files • Using nulls yields expected result • Using zero lowers average and invalidates minimum • Using -1 is even worse • Requires all access to table to use proper -1 handling logic. 22
Aggregate Functions and Nulls • What if we disallow nulls? SELECT , , FROM WHERE AVG(FILE_SIZE) MIN(FILE_SIZE) MAX(FILE_SIZE) FILES ( DELETED = ‘Y’ OR FILE_SIZE = -1 ) • Predicate needed for ALL conditions • Deleted Files have DELETED = ‘Y’ • New uploads (in progress) have FILE_SIZE = -1 • etc 23
Dates, Timestamps • Dates where defaults may be problematic • • • Birth Death Retirement Termination Age Calculation • Dates where nulls by be problematic • Ranges • Effective Start Date and Effective End Date • Event Date • Record Creation Date 24
Age Calculations • Who is nearing retirement age? SELECT , , , FROM WHERE EMPNO NAME BIRTH_DATE CURRENT_DATE – BIRTH_DATE AS AGE EMP BIRTH_DATE < CURRENT DATE – 60 YEARS • If we have default values for birth dates • Incorrect positive/negative returns • If we have null values for birth dates • Answer will not include Unknowns • But we can get those explicitly using “OR BIRTH_DATE IS NULL” 25
Foreign Keys • When parent row may not exist • Action: Delete row from People • RI rule uses SET NULL for Directories, Files 26
You can never trust the answers you get from a database with nulls. C. J. Date 27
The Part, Supplier Example SNO SCITY Get (SNO, PNO) pairs where either the supplier and part cities are different or the part city isn’t Paris (or both). S 1 London PNO PCITY P 1 <null> SELECT FROM WHERE OR SNO, PNO S, P SCITY <> PCITY <> ‘Paris’ • Both clauses are “unknown” for P 1 -> maybe • Result yields no rows • But… Did we ask the right question? • . . . “or suppliers with parts where we don’t know the part’s city”? 28
The Part, Supplier Example SNO SCITY Get (SNO, PNO) pairs where either the supplier and part cities are different or the part city isn’t Paris (or both). S 1 London PNO PCITY P 1 <null> SELECT FROM WHERE OR SNO, PNO S, P SCITY <> PCITY ( PCITY <> ‘Paris’ OR PCITY IS NULL ) SELECT FROM WHERE OR SNO, PNO S, P SCITY <> PCITY IS DISTINCT FROM ‘Paris’ • Better? • Desired outcome? • Be careful joining on nullable columns! 29
Character Strings • Blank • All spaces • Empty • No spaces or characters • Usually only VARCHAR or equivalent • Zero length string • Missing? Not Applicable? • Null • Not blank • Not empty • Not filled in 30
Why Use Nullable Varchar • Differentiate between missing and not applicable • If MIDDLE_NAME is null • We don’t know the middle name • If MIDDLE_NAME is empty • The employee has no middle name • If MIDDLE_NAME is blank • We have no idea • But removing trailing spaces yields empty string • Do we really need that differentiation? 31
Date Ranges • Contract table • Contract Start Date • Contract End Date • Easy when the contract term is fixed • Not null • When contract is open-ended, choices: • Use NULL end date • Use default end date of 9999 -12 -31 32
Find Current Contracts • With default end date select * from contracts where current date between contract_start_date and contract_end_date • With nullable end date select * from contracts where ( contract_end_date is null and contract_start_date <= current date ) or ( contract_end_date is not null and current date between contract_start_date and contract_end_date ) 33
Reference Table • Let’s take a look at a (non-temporal) code table example CREATE TABLE CODE 1 ( CODE 1_CD CODE 1_DESC EFF_START_DATE EFF_END_DATE VALID_SW WITH DEFAULT CHAR(1) CHAR(254) DATE, CHAR(1) 'Y' ) NOT NULL, -- nullable NOT NULL CREATE TABLE CODE 2 ( CODE 2_CD CODE 2_DESC EFF_START_DATE EFF_END_DATE WITH DEFAULT VALID_SW WITH DEFAULT CHAR(1) NOT CHAR(254) NOT DATE NOT ‘ 9999 -12 -31’, CHAR(1) NOT 'Y' ) NULL, NULL 34
Reference Table • Validating a code for insert SELECT FROM WHERE AND CODE 1_CD, CODE 1_DESC CODE 1_CD = : input_code 1 EFF_END_DATE IS NULL • Works if the code must be valid right now • When we have to deal with a code that might have been valid at a particular point in the past, gets complicated. 35
Reference Table • More accurate, less efficient, hard to get right SELECT FROM AND CODE 1_CD, CODE 1_DESC CODE 1 WHERE CODE 1_CD = : input_code 1 VALID_SW = 'Y‘ ( ( EFF_END_DATE IS NULL AND EFF_START_DATE <= : input_date ) OR ( EFF_END_DATE IS NOT NULL AND : input_date BETWEEN EFF_START_DATE AND EFF_END_DATE ) ); • If VALID_SW changes, we’ll need a temporal model 36
Reference Table • Still accurate, more efficient, understandable SELECT FROM AND CODE 2_CD, CODE 2_DESC CODE 2 WHERE CODE 2_CD = : input_code 2 VALID_SW = 'Y‘ : input_date BETWEEN EFF_START_DATE AND EFF_END_DATE ; • Was there any business value in using a nullable end date? 37
The Case for Default Values • The argument is that “nulls are bad” • Identify specific value to represent “unknown” • Use common defaults: 0, spaces, 0001 -01 -01 • The reasoning • Three-valued logic is not boolean logic • People think in boolean logic • The result • Multiple tables to store incomplete work in progress • Potential query errors with aggregate functions • Need for data access programs to interpret data 38
Tautological Queries • Where human thinking and three-valued logic break down • List files that are smaller than, equal to, or bigger than 10 K • Getting the predicate right based on human thought • But, didn’t we really want a list of all files SELECT FROM WHERE OR OR FILE_ID, NAME FILES FILE_SIZE < 10000 FILE_SIZE = 10000 FILE_SIZE > 10000 SELECT FROM WHERE OR OR OR FILE_ID, NAME FILES FILE_SIZE < 10000 FILE_SIZE = 10000 FILE_SIZE > 10000 FILE_SIZE IS NULL SELECT FILE_ID, NAME FROM FILES 39
Host languages do not include support specifically aimed at the semantics of the fact that information in databases may be missing. This means there is bound to be an interface problem, whatever approach is taken in the data sublanguage. E. F. Codd 40
Indicator Variables • Primarily used in languages that have no concept of an empty (nil) object • Use null indicator variable • Small integer contains -1 when field is null • Specified along with host variable in SQL statements • Never use a host variable without checking for NULL • Comparisons • Calculations • Text manipulation 41
COBOL • Working Storage 002900 01 003000 01 003100 01 WS-FILE-ID WS-NAME NULL-NAME PIC S 9(08) COMP. PIC X(08). PIC S 9(04) COMP. • Procedure Division 007000 007100 007200 007300 007400 EXEC SQL SELECT INTO FROM WHERE END-EXEC. 008000 008100 008200 008300 008400 IF NAME : WS-NAME : NULL-NAME FILES FILE_ID = : WS-FILE-ID NULL-NAME = -1 DISPLAY "File Name: (unknown)" ELSE DISPLAY "File Name: ", WS-NAME END-IF. 42
Stored Procedure Program • Stored Procedures & User-Defined Functions • With PARAMETER STYLE DB 2 SQL CREATE PROCEDURE TEST 1 ( IN FILE_ID INTEGER, OUT FILE_NAME VARCHAR(255) ) PARAMETER STYLE DB 2 SQL … extern "C” void test 1(long *p_input, char *p_colout, short *p_input_ind, short *p_colout_ind, SQLUDF_TRAIL_ARGS_ALL, SQLUDF_DBINFO p_dbinfo) 43
C/C++ Stored Procedure Returning a Value EXEC SQL INCLUDE SQLCA; extern "C" void udfupd 1(long *p_input, char *p_colout, short *p_input_ind, short *p_colout_ind, SQLUDF_TRAIL_ARGS_ALL, SQLUDF_DBINFO p_dbinfo) { strcpy(sqludf_sqlstate, "38999"); strcpy(sqludf_msgtext, "An unknown error has occurred"); *p_colout_ind = -1; /* Pre-set output indicators to null */ EXEC SQL UPDATE MY_INFO SET LAST_UPD_LOGON_ID = USER; if(SQLCODE == 0) { strcpy(p_colout, "Success"); *p_colout_ind = 0; strcpy(sqludf_sqlstate, "00000"); strcpy(sqludf_msgtext, "Success"); } return; 44
COBOL Linkage Section 003700 LINKAGE SECTION. 003800 003900 01 INPUT-FILE-ID PIC S 9(08) COMP. 003910 01 OUTPUT-FILE-NAME. 003911 49 OUT-FILE-LEN PIC S 9(04) COMP. 003912 49 OUT-FILE-TEXT PIC X(255). 004000 01 NULL-FILE-ID PIC S 9(04) COMP. 004010 01 NULL-FILE-NAME PIC S 9(04) COMP. 004100 01 OUT-SQLSTATE PIC X(05). 004200 01 IN-PROCNAME PIC X(27). 004300 01 IN-SPEC PIC X(18). 004400 01 OUT-DIAG. 004500 49 DIAG-LEN PIC S 9(04) COMP. 004600 49 DIAG-TEXT PIC X(70). 004700 004800*---------------------------* 004900 PROCEDURE DIVISION USING INPUT-DBNAME, 004910 INPUT-TSNAME, 005000 NULL-DBNAME, 005010 NULL-TSNAME, 005100 OUT-SQLSTATE, 005200 IN-PROCNAME, 005300 IN-SPEC, 005400 OUT-DIAG. 45
Nulls in Java • Java has its own null String query = "SELECT FILE_NAME, FILE_SIZE FROM FILES"; Result. Set rs = stmt. execute. Query(query); while (rs. next()) { String file. Name = rs. get. String("FILE_NAME"); int file. Size = rs. get. Int("FILE_SIZE"); boolean file. Size. Null = rs. was. Null(); if (file. Name != null) { if (!file. Size. Null) { System. out. println(file. Name + " // Neither was null } } " + file. Size); } 46
Ruby • In Ruby, everything is an object • nil is used to identify empty objects • nil? Is a class method that returns true for nil objects @article = Article. find(params[: id]) unless params[: article_links]. nil? params[: article_links]. each_with_index do |id, position| @article_links. update(id, : position => position + 1) end unless params[: article_images]. nil? params[: article_images]. each_with_index do |id, position| @article_images. update(id, : position => position + 1) end @article. save 47
Swift – Optionals • Optionals contain a value or nil • Like DB 2, Swift treats nil as a non-value let def. Name : String = "Dilbert" var my. Name : String? // regular string (constant) // optional string (variable) let greeting 1 = "Hi (my. Name ? ? def. Name)" // Swift-way to coalesce // -> "Hi Dilbert“ my. Name = "John“ let greeting 2 = "Hi (my. Name ? ? def. Name)" // Swift-way to coalesce // -> "Hi John“ var my. X my. Y my. X : Int? = 0 my. Y : Int = 0 = nil // // // Nullable Integer w/ value Regular Integer Set to nil Nil? TRUE syntax error 48
Nulls should not be used for everything, but they do have their place. Richard Yevich and Susan Lawson 49
Program Logic for Character Fields • Programs must check indicator value • If character field: if name-nullind == -1. . . handle null. . . elsif name == spaces. . . handle blank field. . . else. . . handle real content. . . endif 50
Query Predicates • In addition to coding the predicate • The DBMS must evaluate it select empno, bonus from emp where bonus is not null; 51
Program Logic for Numeric Fields • Programs must check indicator value • If numeric (currency) field: if bonus-nullind == -1. . . No bonus. . . else. . . Add bonus to paycheck. . . endif • Or you could let the DBMS perform the calculation. 52
Additional Storage • The null indicator in DB 2 uses a full byte • For a 1 billion row table, that’s 1 GB per column • For 1, 000 columns, that’s a Terabyte • Storage space adds up • Larger rows mean more pages • More pages mean more I/Os • More I/Os mean more CPU • Use nulls wisely and with intent 53
John Maenpaa Health Care Service Corporation john_maenpaa@bcbsil. com johnmaenpaa@db 2 solutions. org Please fill out your session evaluation before leaving! E 2 Good Nulls, Bad Nulls Photo by Steve from Austin, TX, USA
- Slides: 54