Multiclustered Multithreaded Architecture By Sing Liu Macherla Charishma

Multiclustered & Multithreaded Architecture By: Sing Liu, Macherla Charishma, Oluwaseun Ogunmoyero

Multithreaded Architecture

Multithreading/Threads What is a Thread? ● A basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers. ● Process: an instance of a computer program that is being executed ● Process can have multiple threads – each thread broken down into many instructions ● Multiple threads can exist within the same process and share resources such as memory and registers

Multithreading/Threads cont. ● Multithreading -The ability of a central processing unit (CPU) (or a single core in a multi-core processor) to execute multiple processes or threads concurrently, supported by the operating system ● It aims to increase utilization of a single core by using thread-level parallelism, as well as instruction-level parallelism ● Differs from multiprocessing

A Quick aside Multithreads vs multiprocessing ● Multithreading and Multiprocessing are both similar because both leads to an increase in performance (computing power). ● However where both differ is their application, namely ○ ○ ○ Basic layout Execution Creation

Diagrams Multiprocessing Multithreading

Why multiple threads? ● Let’s think about a single thread and pretend that we can only use just 1 on our computer, and use a software such Spotify. T 1 A T 1 B T 2 T 3 T 1

Multithreading Advantages ● Resources are fully being utilized - CPU usage Maximized ● Responsiveness - Faster program run-times ● Better use of cache storage by utilization of resources ● Improvised GUI responsiveness ● Simultaneous and parallelized occurrence of tasks

Multithreading Disadvantages ● Thread synchronization ○ Potential for thread to interfere with each other ● Sophisticated software could increase debugging times ● Performance could decrease if not properly implemented ● Increased difficulty level in writing a program ● Unpredictable results

Multithreading Types Main Types of Multithreading Interleaved/Temporal Multithreading ● Fine-grained ● Coarse-grained Simultaneous Multithreading The main difference between the 2 lies in how many threads can be executed on a pipeline at the same time.

Coarse-grained ● The main processor pipeline contains only one thread at a time ● Process must first perform a Context Switch before processing a different thread Context Switch – Storing the current state of the thread so it can be restored and resumed at a later point where it left off. Also known as Thread switch ●

Fine-grained ● The main processor pipeline may contain multiple threads ● Context switches occurs between every pipe stage or every clock cycle ● Barrel processor - type or processor that is used to execute fine-grained threads. It switches between threads on every cycle ● Each thread is assigned its own program counter to keep track

Simultaneous Multithreading ● Implemented on a superscalar processor ○ a CPU that implements multiple instructions at the same time within a single processor. ● Permits multiple independent threads of execution ● Instructions from more than one thread can be executed in any given pipeline stage at a time ● Can be executed using basic processor architecture ● Need to be able to fetch multiple thread instructions per cycle ● A larger set of registers to hold data from multiple threads

Multiclustered Architecture

Multiclustering What is a Cluster? ● Group of independent servers (usually in close proximity to one another) interconnected through a dedicated network to work as one centralized data processing resource.

Computer cluster Set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. The approach usually (but not always) connects a number of readily available computing nodes (e. g. personal computers used as servers) via a fast local area network.

Computer cluster The formal basis of computer clustering doing any sort of parallel work is defined by Amdahl’s law. Amdahl's law can be formulated in the following way:

Cluster Attributes Computer clusters may be configured for different purposes ranging from general purpose business needs such as web-service support, to computation-intensive scientific calculations. Types of Clusters ● High Availability clusters ● Load-Balancing clusters ● High performance Clusters

High Availability ● Definition: HA clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. ○ operate using redundant computers ● Also called: Fail-over clusters ○ Fault tolerance

Load Balancing ● Load-balancing clusters are configurations in which cluster-nodes share computational workload to provide better overall performance. ○ ○ We do this with a load balancer Even if one node fails we still get service

High performance Clusters Computer clusters are used for computation-intensive rather than IO-oriented operations approach. ● For example, a computer cluster might support computational simulations of vehicle crashes or weather ○ Allows for multiple computers to process raw data ● Tightly coupled clusters can work as super computers

Benefits of a Cluster ● ● ● ● Performance Fault Tolerance Low frequency of maintenance routines Resource consolidation(e. g. RAID) Centralized management Enabling data recovery in the event of a disaster Providing parallel data processing High processing capacity

Cluster computing is used for Distributed computing ● Network of computers which communicate to solve a complex problem ● Complex job is split into multiple single tasks and each task is computed by individual computers.

Distributed Computing ● Can be either Coupled loosely or tightly ● Tightly Coupled ○ ○ Memory (many) Cache Processor Network (1) ● Loosely Coupled ○ ○ Memory (1) Cache Processor Bus (1)

Quick diagram P P P M M M C C C Network Tightly-Coupled P P C BUS M Loosely Coupled C

Examples of Multi-cluster architectures There are different examples for cluster architectures. ● ● Grid Computing Global Computing Super Computers and HPC’s Hadoop Clusters (Cloud computing)

Grid Computing make use of heterogeneous CPUs and storage devices located in different domains to solve computation problems too large for any single supercomputer

Grid Computing One slight difference between Grid and cluster computing is grid computing can use resources from different domains globally. Limitations: Grids are connected via Storage Area Networks - when huge data is transferred bottlenecks can be created High computational tasks are performed quickly, where as data transfers can cause issue.

Global Computing ● Form of grid computing ● Computational resources are provided by volunteers who are located globally.

Supercompute. Rs & High performance Computing ● High-performance computing (HPC) is the ability to process data and perform complex calculations at high speeds.

Supercompute. R & High performance Computing ● To build a high-performance computing architecture, compute servers are networked together into a cluster. ● Software programs and algorithms are run simultaneously on the servers in the cluster. ● The cluster is networked to the data storage to capture the output. Together, these components operate seamlessly to complete a diverse set of tasks.

Supercompute. R & High performance Computing ● One of the best-known types of HPC solutions is the supercomputer ● Performance is measured in FLOPS rather than MIPS ● The fastest supercomputer is IBM Summit, with a peak speed of 122. 3 PFLOPS (Peta FLOPS or 10^15 FLOPS). ● All supercomputers run on linux environments. ● Supercomputers play an important role in the field of computational science, widely used in fields like quantum mechanics, weather forecasting, detonation of nuclear weapons etc.

IBM SUMMIT Supercomputer (122. 3 PFLOPS)

Sunway Taihulight Supercomputer (93. 01 PFLOPS)

Hadoop Cluster ● Hadoop clusters are for handling big data. ● A hadoop cluster can be referred to as a computational computer cluster for storing and analysing big data (structured, semi-structured and unstructured) in a distributed environment. ● In Hadoop cluster, data is stored locally on the nodes. Hence, accessing the data is fast. ● There is no bottleneck in Hadoop unlike grid architectures for data transfers.

Advantages of hadoop clusters ● ● Analyzing big data Scalability Cost effective Resilient to failure

Disadvantages of Hadoop Cluster ● Not suitable for organizations with little data. ● Data which cannot be analyzed in parallel processing environment. ● Significant learning curve associated with building, operating and supporting the cluster.

References

References ● http: //www. businessdictionary. com/definition/cluster. html ● https: //www. geeksforgeeks. org/operarting-system-thread/ ● https: //en. wikipedia. org/wiki/Multithreading_(computer_ar chitecture)#Implementation_specifics ● Dr. Clincy’s Lecture 13 slides ● http: //www. businessdictionary. com/definition/cluster. html ● https: //stackoverflow. com/questions/3042717/what-is-thedifference-between-a-thread-process-task

References ● https: //en. wikipedia. org/wiki/Supercomputer ● https: //techdifferences. com/difference-betweenmultiprocessing-and-multithreading. html ● https: //en. wikipedia. org/wiki/Temporal_multithreading ● https: //en. wikipedia. org/wiki/High-availability_cluster ● https: //searchstorage. techtarget. com/tip/Hadoop-clusters. Benefits-and-challenges-for-big-data-analytics ● https: //www. netapp. com/us/info/what-is-high-performancecomputing. aspx ● http: //www. hacc. org/high_availability/components/application_availabil ity/cluster/load_balancing_cluster/

Any Questions?