The Memory B Ramamurthy C B Ramamurthy 1
The Memory B. Ramamurthy C B. Ramamurthy 1
Topics for discussion • • • • On chip memory On board memory System memory Off system/online storage/ secondary memory File system abstraction Offline/ tertiary memory RAID: Redundant Array of Inexpensive Disks NAS: Network Accessible Storage SAN: Storage area networks DB and DBMS: Data base and DB management systems Distributed file system Google file system Hadoop file system C B. Ramamurthy 2
Data and Computation Continuum Compute intensive Ex: computation of digits of PI Data intensive Ex: analyzing web logs C B. Ramamurthy 3
On chip memory • • Registers Cache Buffers (instruction pipeline) Characteristics: volatile C B. Ramamurthy 4
On board memory • Cache – Instructions cache – Data cache – Translation look aside buffers (TLB) • Characteristics: content addressable, setassociative organization C B. Ramamurthy 5
System memory RAM : Random access memory: main memory ROM: Read only memory: boot programs for operating systems Flash memory: Read and write possible Erasable/writable non-volatile memory volatile SDRAM: synch dynamic RAM C B. Ramamurthy others EAROM 6
Off-system storage (Earlier Lectures covered these) Off system/online storage/ secondary memory File system abstraction Offline/ tertiary memory RAID: Redundant Array of Inexpensive Disks NAS: Network Accessible Storage SAN: Storage area networks C B. Ramamurthy 7
Database and Database Management System Data source Transactional Data base server Relational db or similar foundation Tables, rows, result set, SQL ODBC: open data base connectivity Very successful business model: Oracle, DB 2, My. SQL, and others • Persistence models: EJB, DAO, ADO (I am not going to expand the abbreviation. . ) • • C B. Ramamurthy 8
Distributed file system(DFS) • A dedicated server manages the files for an compute environment • For example, nickelback, cse. buffalo. edu is your file server and that is why we did not want you to run your user applications on this machine. • DFS addresses various transparencies: location transparency, sharing, performance etc. • Examples: NFS, NFS+, AFS (Andrew FS)… (you will study these in Distributed Systems course) C B. Ramamurthy 9
Issues with ultra-scale data • How to store the large amount of data? – On commodity hardware or special hardware • Large storage implies large number of devices to store them. – How to address shortening MTTF (Mean time to failure)? – How to realize “fault tolerance”? – Redundancy/replication is a solution • How to manage the replication and the health of the large number of devices? • More importantly how to partition the large scale data to store in these storage devices (nodes)? • How to parallelize processing of the data stored at multiple “nodes”? C B. Ramamurthy 10
On to Google File • Internet introduced a new challenge in the form web logs, web crawler’s data: large scale “peta scale” • But observe that this type of data has an uniquely different characteristic than your transactional or the “order” data on amazon. com: “write once” ; so is HIPPA protected healthcare and patient information; • Google exploited this characteristics in its Google file system: S. Ghemavat C B. Ramamurthy 11
Hadoop File System (HFS) • Hadoop file system is a reverse engineered version of the GFS : this is my first opinion on HFS • HFS is a distributed file system for large scale data • Data throughput is more important than latency • Batch computing than interactive time shared computing C B. Ramamurthy 12
Map. Reduce Cat map combine reduce split map split part 0 part 1 Bat Dog Other Words (size: TByte) part 2
Exercise: Count the number of occurrences of the word in the text This is a cat. Cat sits on a roof. The roof is a tin roof. There is a tin can on the roof. Cat kicks the can. It rolls on the roof and falls on the next roof. The cat rolls too. It sits on the can. C B. Ramamurthy 14
- Slides: 14