CENG 351 Introduction to Data Management and File

  • Slides: 26
Download presentation
CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of

CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU (Fall 2004) CENG 351 Fall 2004 1

CENG 351 - Section 1 • • Instructor: Nihan Kesim Çiçekli Office: A 308

CENG 351 - Section 1 • • Instructor: Nihan Kesim Çiçekli Office: A 308 Email: nihan@ceng. metu. edu. tr Lecture Hours: Tue. 10: 40; Thu. 11: 40, 12: 40 (BMB 2) • Office Hours: Tue. 14: 00 -15: 30 • Course Web page: http: //www. ceng. metu. edu. tr/courses/ceng 351 • Teaching Assistants: Mehmet Tan (B z 19, mtan@ceng. metu. edu. tr) CENG 351 Fall 2004 2

Text Book and References 1. Michael J. Folk, Bill Zoellick and Greg Riccardi, File

Text Book and References 1. Michael J. Folk, Bill Zoellick and Greg Riccardi, File Structures, An object oriented approach with C++, Addison-Wesley, 1998. (Current Textbook) 2. Betty Salzberg, File Structures: An Analytic Approach, Prentice Hall, 1988. 3. Raghu Ramakrishnan, Database Management Systems, Mc. Graw Hill, 1998. 4. R. Elmasri, S. B. Navathe, Fundamentals of Database Systems, 4 th edition, Addison-Wesley, 2004. CENG 351 Fall 2004 3

Course Outline 1. Introduction: storage devices (disk & tape) 2. Fundamental File Structure Concepts:

Course Outline 1. Introduction: storage devices (disk & tape) 2. Fundamental File Structure Concepts: Sequential Files 3. External Sorting 4. Indexed Sequential Files (B-trees) 5. Direct access (Hashing) 6. Introduction to Database Systems: E/R modeling, relational model, query languages, database design CENG 351 Fall 2004 4

Grading HW and programming assignments Quizzes 10% Midterm Exam 30% Final 40% 20% Exam

Grading HW and programming assignments Quizzes 10% Midterm Exam 30% Final 40% 20% Exam Date: Midterm Exam November 22, 2004 CENG 351 Fall 2004 5

Grading Policies • Policy on missed midterm: – no make-up exam • Lateness policy:

Grading Policies • Policy on missed midterm: – no make-up exam • Lateness policy: – Late assignments are penalized up to 15% per day. • All assignments, quizzes and programs are to be your own work. No group projects or assignments are allowed. CENG 351 Fall 2004 6

Introduction to File management CENG 351 Fall 2004 7

Introduction to File management CENG 351 Fall 2004 7

Motivation Ø Most computers are used for data processing (over $80 billion/year). A big

Motivation Ø Most computers are used for data processing (over $80 billion/year). A big growth area in the “information age” Ø This course covers data processing from a computer science perspective: – – Storage of data Organization of data Access to data Processing of data CENG 351 Fall 2004 8

Data Structures vs File Structures • Both involve: – Representation of Data + –

Data Structures vs File Structures • Both involve: – Representation of Data + – Operations for accessing data • Difference: – Data structures: deal with data in main memory – File structures: deal with data in secondary storage CENG 351 Fall 2004 9

Where do File Structures fit in Computer Science? Application DBMS File system Operating System

Where do File Structures fit in Computer Science? Application DBMS File system Operating System Hardware CENG 351 Fall 2004 10

Computer Architecture data is Main Memory manipulated here (RAM) - Semiconductors - Fast, expensive,

Computer Architecture data is Main Memory manipulated here (RAM) - Semiconductors - Fast, expensive, volatile, small data transfer data is stored here - disks, tape Secondary Storage CENG 351 Fall 2004 - Slow, cheap, stable, large 11

Advantages • • • Main memory is fast Secondary storage is big (because it

Advantages • • • Main memory is fast Secondary storage is big (because it is cheap) Secondary storage is stable (non-volatile) i. e. data is not lost during power failures Disadvantages • • • Main memory is small. Many databases are too large to fit in MM. Main memory is volatile, i. e. data is lost during power failures. Secondary storage is slow (10, 000 times slower than MM) CENG 351 Fall 2004 12

How fast is main memory? • Typical time for getting info from: Main memory:

How fast is main memory? • Typical time for getting info from: Main memory: ~12 nanosec = 120 x 10 -9 sec Magnetic disks: ~30 milisec = 30 x 10 -3 sec • An analogy keeping same time proportion as above: Looking at the index of a book : 20 sec versus Going to the library: 58 days CENG 351 Fall 2004 13

Normal Arrangement • • Secondary storage (SS) provides reliable, longterm storage for large volumes

Normal Arrangement • • Secondary storage (SS) provides reliable, longterm storage for large volumes of data At any given time, we are usually interested in only a small portion of the data This data is loaded temporarily into main memory, where it can be rapidly manipulated and processed. As our interests shift, data is transferred automatically between MM and SS, so the data we are focused on is always in MM. CENG 351 Fall 2004 14

Goal of the file structures • Minimize the number of trips to the disk

Goal of the file structures • Minimize the number of trips to the disk in order to get desired information • Grouping related information so that we are likely to get everything we need with only one trip to the disk. CENG 351 Fall 2004 15

Physical Files and Logical Files • physical file: a collection of bytes stored on

Physical Files and Logical Files • physical file: a collection of bytes stored on a disk or tape • logical file: a "channel" (like a telephone line) that connects the program to a physical file • The program (application) sends (or receives) bytes to (from) a file through the logical file. The program knows nothing about where the bytes go (came from). • The operating system is responsible for associating a logical file in a program to a physical file in disk or tape. Writing to or reading from a file in a program is done through the operating system. CENG 351 Fall 2004 16

Files • The physical file has a name, for instance myfile. txt • The

Files • The physical file has a name, for instance myfile. txt • The logical file has a logical name (a varibale) inside the program. • In C : FILE * outfile; • In C++: fstream outfile; CENG 351 Fall 2004 17

Basic File Processing Operations • • • Opening Closing Reading Writing Seeking CENG 351

Basic File Processing Operations • • • Opening Closing Reading Writing Seeking CENG 351 Fall 2004 18

Opening Files • Opening Files: – links a logical file to a physical file.

Opening Files • Opening Files: – links a logical file to a physical file. • In C: FILE * outfile; outfile = fopen(“myfile. txt”, “w”); • In C++: fstream outfile; outfile. open(“myfile. txt”, ios: : out); CENG 351 Fall 2004 19

Closing Files • Cuts the link between the physical and logical files. • After

Closing Files • Cuts the link between the physical and logical files. • After closing a file, the logical name is free to be associated to another physical file. • Closing a file used for output guarantees everything has been written to the physical file. (When the file is closed the leftover from the buffer is flushed to the file. ) • In C : fclose(outfile); • In C++ : outfile. close(); CENG 351 Fall 2004 20

Reading • Read data from a file and place it in a variable inside

Reading • Read data from a file and place it in a variable inside the program. • In C: char c; FILE * infile; infile = fopen(“myfile. txt”, ”r”); fread(&c, 1, 1, infile); • In C++: char c; fstream infile; infile. open(“myfile. txt”, ios: : in); infile >> c; CENG 351 Fall 2004 21

Writing • Write data from a variable inside the program into the file. •

Writing • Write data from a variable inside the program into the file. • In C: char c; FILE * outfile; outfile = fopen(“mynew. txt”, ”w”); fwrite(&c, 1, 1, outfile); • In C++: char c; fstream outfile; outfile. open(“mynew. txt”, ios: : out); outfile << c; CENG 351 Fall 2004 22

Seeking • Used for direct access; an item can be accessed by specifying its

Seeking • Used for direct access; an item can be accessed by specifying its position in the file. • In C: fseek(infile, 0, 0); // moves to the beginning fseek(infile, 0, 2); // moves to the end fseek(infile, -10, 1); //moves 10 bytes from //current position • In C++: infile. seekg(0, ios: : beg); infile. seekg(0, ios: : end); infile. seekg(-10, ios: : cur); CENG 351 Fall 2004 23

File Systems • Data is not scattered hither and thither on disk. • Instead,

File Systems • Data is not scattered hither and thither on disk. • Instead, it is organized into files. • Files are organized into records. • Records are organized into fields. CENG 351 Fall 2004 24

Example • A student file may be a collection of student records, one record

Example • A student file may be a collection of student records, one record for each student • Each student record may have several fields, such as – – – Name Address Student number Gender Age GPA • Typically, each record in a file has the same fields. CENG 351 Fall 2004 25

Properties of Files 1) Persistance: Data written into a file persists after the program

Properties of Files 1) Persistance: Data written into a file persists after the program stops, so the data can be used later. 2) Sharability: Data stored in files can be shared by many programs and users simultaneously. 3) Size: Data files can be very large. Typically, they cannot fit into MM. CENG 351 Fall 2004 26