Algorithms Lecture 3 Data structures Algorithms can be

Algorithms Lecture 3

Data structures • Algorithms can be presented at various levels of detail – Always a good idea to think about how you would actually implement an algorithm (in your programming language of choice) – Implementing an algorithm is a great way to make sure you actually understand it!

Common data structures • Array – Generally assumes a known upper bound on the number of items n in the array – Reading/writing the ith element in the array (A[i]) takes constant time – Checking whether an element is in an unsorted array requires O(n) time – Checking whether an element is in a sorted array requires O(log n) time • An unsorted array can be sorted in O(n log n) time – Deleting/inserting elements can be cumbersome

Common data structures • Doubly linked list – Does not require an upper bound on the number of items in the list – Deleting/inserting an element (at a given location, typically the start or end) takes O(1) time – Reading/writing the ith element in the list takes O(i) time – Checking whether an element is in a sorted list takes O(n) time

Review: The GS algorithm • While some employer is free: – Some free employer e makes an offer to their highest-ranked candidate c to whom they have not yet made an offer – If c is free, c becomes (tentatively) bound to e – If c is bound to e’ and prefers e’, do nothing – If c is bound to e’ but prefers e, switch (so e’ becomes free)

Data structures for GS • Number employers/candidates from 1, …, n • Store employer preferences in a 2 -D array – EPref[e, i] is the ith candidate on e’s list • Maintain array Next s. t. Next[e] is the position of the next candidate to whom e should made an offer (initialized to 1) • Maintain array Bound s. t. Bound[c] is the employer to whom c is currently bound (initialized to 0) • Store free employers as a stack (using a linked list) • Use 2 -D array s. t. Rank[c, e] indicates the rank of employer e in c’s preference list – Initialize at outset of the algorithm in O(n 2) time

Data structures for GS Using our stack • While some employer is free: EPref[e, Next[e]] – First free employer e makes an offer to their highest-ranked candidate c to whom they have If Bound[c]==0 not yet made an offer – If c is free, c becomes (tentatively) bound to e – If c is bound to e’ and prefers e’, do nothing – If c is bound to e’ but prefers e, switch (so e’ becomes free) Compare Rank[c, e] to Rank[c, e’]

Data structures for GS • Summary – O(n 2) initialization step – O(n 2) iterations, where each iteration can be done in O(1) time – O(n 2) algorithm overall

Graphs

Graphs • This should mainly be review • Graphs provide a convenient way to express relationships between pairs of items

Graphs • Undirected graph G = (V, E) – V = set of vertices or nodes – An edge is a set of two vertices – E V V, set of edges in the graph • Directed graph G = (V, E) – V = set of vertices or nodes – An edge is an ordered pair of vertices – E V V, set of edges in the graph • Assume undirected graphs by default

Terminology • (Assuming undirected graph) • Nodes are neighbors if there is an edge between them • An edge e is incident to vertex v if v e

Examples • • • Transportation networks Communication networks Web links Social networks Etc.

Graphs in pictures 2 3 1 4 V = {1, 2, 3, 4} E = { {1, 2}, {1, 3}, {1, 4}, {2, 3} }

Graphs in pictures 2 1 3 4 V = {1, 2, 3, 4} E = { (1, 2), (2, 1), (3, 1), (1, 4), (3, 2) }

Graphs in pictures • Pictures can be useful for getting intuition… • …but are useless for 100+-node graphs!

Representing graphs • Two natural ways to represent n-node graphs – Adjacency matrix – Adjacency list

Adjacency matrix • 2 -D array G of dimension |V| x |V| • G[i, j] = 1 iff there is an edge from i to j (G is undirected if G[i, j] = G[j, i]) • O(1) time to check if {i, j} is an edge • (|V|) time to find all neighbors of a node • (|V|2) memory

Adjacency list • Length-n array Adj • Adj[i] is the head of a linked list containing the neighbors of i (in arbitrary order) • O(|V|+|E|) memory – An undirected edge is in two lists – |E| = O(|V|2), and in sparse graphs |E| is much less than |V|2 • Can find all neighbors of a node in time linear in its number of neighbors • Can take O(|E|) time to check whethere is an edge between nodes i and j

Directed graphs • The adjacency-matrix representation already handles directed graphs • For adjacency-list representation, convenient to have two linked lists for each vertex v – Edges from v – Edges to v

Paths and connectivity • A path in a directed/undirected graph G is a sequence of nodes v 1, …, vk such that for all i there is an edge from vi to vi+1 – This is a path from v 1 to vk – The path is simple if no vertex repeats – A cycle is a path where the starting point and endpoint are the same • The distance from node u to node v is the length of the shortest path from u to v • An undirected graph is connected if there is a path between every pair of nodes • A directed graph is strongly connected if there is a path from any node to any other

Connected component • A subset V’ of vertices in an undirected graph forms a connected component if there is a path between every pair of nodes in V’ • Any graph can be partitioned into a collection of connected components

Trees • An undirected graph is a tree if it is connected and does not contain a cycle – Can “root” a tree at any node • Once a root r is chosen, a parent/child relationship is formed between all nodes having an edge between them based on their distance from r – The root has no parent – Nodes with no children are called leaves

Spanning tree • A spanning tree of a connected component is a subset of the edges that form a tree – Can be many spanning trees for a given graph • Often useful to form a spanning tree to answer other questions about the graph

Determining connected component • General algorithmic framework for finding the connected component R containing s: – R = {s} – While there is an edge (u, v) with u R and v R • Add v to R • This also defines a spanning tree on the connected component containing s – If v added to R because of u, then u is the parent of v • In what order should edges be visited?

Breadth-first search • Explore nodes based on their distance from s – I. e. , for i=0, …, find all nodes Li at distance i from s • Conceptual pseudocode: – Set L 0 = {s} – For i=0, … do: • For all t Li • For all neighbors t’ of t that are not already in L 0, …, Li, add t’ to Li+1

Breadth-first search • A bit more carefully: – R = {s}; label s with 0 – While there is a u R and v R with edge (u, v) • Choose such a vertex u with lowest label i • For all v R that are neighbors of u; label v with i+1