Software Development Techniques Topic 3 Data Representation V

  • Slides: 36
Download presentation
Software Development Techniques Topic 3: Data Representation V 1. 0 © NCC Education Limited

Software Development Techniques Topic 3: Data Representation V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 2 Scope of Topic • In this topic

Data Representation Topic 3 - 3. 2 Scope of Topic • In this topic we discuss data representation. – Both in pseudocode and in traditional computer languages • Data representation is one of the most important things to get right when designing an algorithm. – A good structure will make it much easier to manipulate. • In this topic, we also discuss some of the ways in which you can decide how your data is going to be structured. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 3 Data and Algorithms • Algorithms represent the

Data Representation Topic 3 - 3. 3 Data and Algorithms • Algorithms represent the ‘moving parts’ of your code. – The parts are always fixed, although the order in which they move is not. • Data represents the raw material that those moving parts are manipulating. – In technical terms, these are our variable parts, or just variables. • You need to make sure the moving parts are suited to the size and shape of the data. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 4 Data Types - 1 • Most programming

Data Representation Topic 3 - 3. 4 Data Types - 1 • Most programming languages come with a suite of data types. – Each holds a different category of information, and this usually impacts on what can be done with it. • We have seen thus far: – Whole numbers (often known as integers) – Real numbers (often known as floats) – Strings of text • Other more specialised kinds of data exist. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 5 Data Types - 2 • The following

Data Representation Topic 3 - 3. 5 Data Types - 2 • The following data types are also available to you in your pseudocode programs: – Boolean • a data type which contains either true or false. It can hold nothing else. – Character • A data type which contains a single unicode character • Later on this module, we will also see how to create our own data types. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 6 Computer Memory • Every piece of data

Data Representation Topic 3 - 3. 6 Computer Memory • Every piece of data that is used in an algorithm must be stored somewhere. – It gets stored in the computer’s memory. • There are real physical constraints that impact on how we design algorithms. – We have finite amounts of computer memory. – We have finite amounts of CPU cycles. • For very small and simple algorithms, neither of these are very limiting. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 7 Scaling • Programmers, however, spend a lot

Data Representation Topic 3 - 3. 7 Scaling • Programmers, however, spend a lot of time thinking about how scalable their programs are. – If my algorithm works for ten bits of data, will it work for one hundred? One thousand? One million? • The seemingly minor decisions you make when designing your algorithm will impact on that. – We do not design algorithms to work at only one time on one piece of data. We need to be mindful of scaling. • Choosing the wrong data at the start will impact on this. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 8 Scaling Example • Consider a program which

Data Representation Topic 3 - 3. 8 Scaling Example • Consider a program which checks to see whether a key on a keyboard was pressed in the past sixty seconds. – We could store that as a Boolean. – We could store that as an whole number. – We could store it as a string. • This program is going to be running for a full year. – On one hundred computers • Does it make a difference which we use? V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 9 Data and Computer Memory - 1 •

Data Representation Topic 3 - 3. 9 Data and Computer Memory - 1 • Pseudocode lets us ignore the implementation details that go along with an algorithm. • One of those implementation details is how much space is taken up by different kinds of data. • However, we still need to know what the relative differences are between different kinds of data. – Otherwise we cannot make sensible choices when we start designing. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 10 Data and Computer Memory - 2 •

Data Representation Topic 3 - 3. 10 Data and Computer Memory - 2 • A data type is a wrapper around some part of the computer’s memory. • When we create a piece of data, we say to the computer ‘set aside some memory, big enough to hold some data of this type. Let me refer to the address of that memory as the name I give it. – Such as in data being a whole number • The computer handles the rest, such as making sure that one piece of data does not overwrite another. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 11 Data and Computer Memory - 3 •

Data Representation Topic 3 - 3. 11 Data and Computer Memory - 3 • Because the data type is a wrapper, different languages can handle data types differently. – Some languages handle a String as a list of characters. – Some languages handle a String as a custom data type. • Real numbers in particular have very complex representation in a computer’s memory. • Pseudocode lets us ignore this for the most part. – But a bad data choice is bad regardless of language, so we need to bear it in mind. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 12 Sizes of Data Types • The size

Data Representation Topic 3 - 3. 12 Sizes of Data Types • The size is different for each language, but this table shows the size for data types in Java and C++ V 1. 0 Type Java C++ Whole number 4 bytes Real number 8 bytes 4 bytes Boolean 1 byte Character 2 bytes 1 byte String Number of letters + 2 bytes Number of letters + 1 byte © NCC Education Limited

Data Representation Topic 3 - 3. 13 Memory Requirements • You should always be

Data Representation Topic 3 - 3. 13 Memory Requirements • You should always be able to estimate just how much memory your pseudocode representation will take up. • You can do this with basic arithmetic. – You can at any stage in your desk-check tot up the memory cost with regards to a particular language. – You already have a column for each bit of data. • We will talk about how to make that representation truly general a little later. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 14 Scaling Example - 1 • Let us

Data Representation Topic 3 - 3. 14 Scaling Example - 1 • Let us go back to our scaling example. First, we need to work out how much data we are going to be storing. – One key press every sixty seconds for a year. • Sixty an hour • Twenty four hours a day • 365 days a year – 525600 units of data per computer. • We need to store 5256000 units of data across our one hundred computers for that year. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 15 Scaling Example - 2 • What does

Data Representation Topic 3 - 3. 15 Scaling Example - 2 • What does that cost us in Java with different data types? Data Type Cost in Bytes Cost in Kilobytes Whole number 21024000 20531 Boolean 5256000 5132 String 15768000 15398 • Data types can make a huge difference. Choose the smallest data type you can for your needs. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 16 Data Representation and Your Desk -Check -

Data Representation Topic 3 - 3. 16 Data Representation and Your Desk -Check - 1 • When doing your desk-check, you should include a column that gets incremented whenever you set up a new piece of data. – To be pseudocode, it should not make any assumptions about the language or size of variables. • Instead, include columns for each type of data present in the algorithm. – Data sizes can then be calculated easily by whoever might want to make sure of the algorithm. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 17 Data Representation and your Desk -Check -

Data Representation Topic 3 - 3. 17 Data Representation and your Desk -Check - 2 • When doing your desk-check, you should make a note of the type each time you set up a new piece of data. – To be pseudocode, it should not make any assumptions about the language or size of variables. • At the end step, the total memory usage can be summed up by the contents of the columns. – Data sizes for languages can then be calculated easily by whoever might want to make sure of the algorithm. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 18 Adapted Desk Check Code my. Age (integer)

Data Representation Topic 3 - 3. 18 Adapted Desk Check Code my. Age (integer) my. New. Age (integer) Data my. Age as whole number 0 0 Data my. New. Age as whole number 0 0 Output “please enter your age” 0 0 User output Input my. Age 21 0 User enters 21 my. New. Age = my. Age+1 21 22 Output “In a year you will be” 21 22 User output Output my. New. Age 21 22 User Output V 1. 0 Notes © NCC Education Limited

Data Representation Topic 3 - 3. 19 Why do we do this? • One

Data Representation Topic 3 - 3. 19 Why do we do this? • One of the common things that causes problems when using algorithms is the memory leak. • Memory leaks are caused by memory being used up but never freed when the computer is finished. • By having columns that indicate how many variables we have in use, we can see if we have flaws in our logic that result in memory leaks. • It also allows people to decide if your algorithm is appropriate for the language they are using. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 20 Primitive and Complex Data - 1 •

Data Representation Topic 3 - 3. 20 Primitive and Complex Data - 1 • There are two kinds of data type in most modern programming languages. • Primitive data types are the building blocks that are used to build all other data types. – Whole numbers, real numbers, characters • Complex data types are those made up of combinations of primitive data types. – Strings V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 21 Primitive and Complex Data - 2 •

Data Representation Topic 3 - 3. 21 Primitive and Complex Data - 2 • These usually get treated differently in terms of how they are allocated memory space. – Primitive data types are also known as value data types. – Complex data types are also known as reference data types. • This is going to become important in the coming weeks. – For now, it just has an impact on what happens when we create a new variable. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 22 Default Variable Values • When we create

Data Representation Topic 3 - 3. 22 Default Variable Values • When we create a number in our pseudocode, the first thing we do in our desk-check is set its value to 0. – This is its default value. – It is a convention of our pseudocode, it is not something necessarily honoured by all programming languages. • For complex data types, we do not do that. – Complex data types have no default value, they start off as null values. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 23 Pseudocode Example 1 2 3 4 data

Data Representation Topic 3 - 3. 23 Pseudocode Example 1 2 3 4 data 5 6 num 1 = 10 num 2 = 20 7 sum = num 1 + num 2 8 usertext = "The answer is: " 9 10 output usertext output sum V 1. 0 num 1 as whole number num 2 as whole number sum as whole number usertext as string © NCC Education Limited

Data Representation Topic 3 - 3. 24 Desk-Check with Null Values V 1. 0

Data Representation Topic 3 - 3. 24 Desk-Check with Null Values V 1. 0 Line Num 1 (integer) Num 2 (integer) Sum (Integer) user. Text (String) 1 0 2 0 0 3 0 0 0 4 0 0 0 Null 5 10 0 0 Null 6 10 20 0 Null 7 10 20 30 Null 8 10 20 30 “The answer is” 9 10 20 30 “The answer is” 10 10 20 30 “The answer is” © NCC Education Limited

Data Representation Topic 3 - 3. 25 Null Values • If you attempt to

Data Representation Topic 3 - 3. 25 Null Values • If you attempt to perform any kind of operation on a null value, a computer will usually crash. – You cannot use them in calculations. – You cannot output them. • The only thing you can do with null values is input data into them. – Either via user input • That can be complex, and we will talk about that in a later lecture. – Or via explicitly setting the value with = • Answer = “The answer is” V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 26 Memory Usage - 1 • Because our

Data Representation Topic 3 - 3. 26 Memory Usage - 1 • Because our desk-check includes the contents of all our data as well as the types, we can work out our memory usage at any particular step. • At step seven, we have three whole numbers, so our memory usage on java is 12 bytes. • At step eight, we have those three whole numbers plus the string (thirteen characters including spaces, so 15 bytes) – 27 bytes in total for Java. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 27 Memory Usage - 2 • The memory

Data Representation Topic 3 - 3. 27 Memory Usage - 2 • The memory cost of complex data types is dependent on what other data types they contain. – Thus, while they exist as variables when we use the data keyword to create them, they only take up memory when we put data into them. – The memory is allocated, but not used at that point. • For now, the difference between primitive and complex data is related to what happens when they are created. – It will become more important later on. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 28 Choosing the Right Data Type - 1

Data Representation Topic 3 - 3. 28 Choosing the Right Data Type - 1 • Choosing the right data type is important, because it makes everything else easier. • You need to consider: – What kind of information you need to store – What kind of manipulations you are going to do to the data. – What kind of format will be used for output. – How often you might need to change the representation. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 29 Choosing the Right Data Type - 2

Data Representation Topic 3 - 3. 29 Choosing the Right Data Type - 2 • What is the best data type for. . . – – V 1. 0 A phone number? An address? The gender of a student? The age of a person? © NCC Education Limited

Data Representation Topic 3 - 3. 30 Choosing the Right Data Type - 3

Data Representation Topic 3 - 3. 30 Choosing the Right Data Type - 3 • It is often dependent on context. – A phone number is usually best stored as a string. – An address is also best stored as a string. – A Boolean or a character might best represent gender. – The age can be a whole or real number. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 31 Phone Number - 1 • Why is

Data Representation Topic 3 - 3. 31 Phone Number - 1 • Why is a phone number best stored as a string? – It says number right there in the name! • It is to do with how the data gets manipulated and output. – You hardly ever do arithmetic on a phone number. – You often need to structure a phone number in chunks, such as 123 -456 -7890. – Phone numbers often have a leading 0. • 0123 -456 -7890 V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 32 Phone Number - 2 • Strings are

Data Representation Topic 3 - 3. 32 Phone Number - 2 • Strings are best for phone numbers, because on the whole, we do not treat phone numbers as numbers. – They are unique codes that just happen to be made up of all numbers. • However, there is always a trade-off. – Easier to structure as a string. – More difficult to arithmetically manipulate • We choose the path of least resistance. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 33 Conclusion • In this lecture, we discussed

Data Representation Topic 3 - 3. 33 Conclusion • In this lecture, we discussed the following key subjects: – – – V 1. 0 Data representation Memory use of data types The scaling of data within algorithms Complex and primitive data types The methods of choosing the right variable © NCC Education Limited

Data Representation Topic 3 - 3. 34 Terminology - 1 • The following new

Data Representation Topic 3 - 3. 34 Terminology - 1 • The following new pieces of terminology were introduced in this lecture: – Scaling • Making an algorithm work for large amounts of data and operations. – Variable • The formal name for a piece of data stored within an algorithm V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 35 Terminology - 2 • The following new

Data Representation Topic 3 - 3. 35 Terminology - 2 • The following new pieces of terminology were introduced in this lecture: – Primitive data type • The simplest kind of variables, such as whole and real numbers – Complex data type • Variables which are made up of other variables. V 1. 0 © NCC Education Limited

Data Representation Topic 3 - 3. 36 Topic 3 – Data Representation Any Questions?

Data Representation Topic 3 - 3. 36 Topic 3 – Data Representation Any Questions? V 1. 0 © NCC Education Limited