Complete Network Analysis Network Connections LargeScale network structure

  • Slides: 165
Download presentation
Complete Network Analysis Network Connections: Large-Scale network structure The basic network hypothesis is that

Complete Network Analysis Network Connections: Large-Scale network structure The basic network hypothesis is that the structure of a network affects the likelihood that goods will flow through the network. While direct measures are fine for smaller networks, we often want to make generalizations to very large-scale network structure. The next section covers large-scale network topography and bridges us to generalized images of the network structure captured by cohesive groups and blockmodels. We focus on 3 such factors today: 1) Basic structure of large-scale networks 2) Cohesive Peer Groups 3) Identifying Role positions (blockmodels)

Complete Network Analysis Network Connections: Large-Scale network structure Based on Milgram’s (1967) famous work,

Complete Network Analysis Network Connections: Large-Scale network structure Based on Milgram’s (1967) famous work, the substantive point is that networks are structured such that even when most of our connections are local, any pair of people can be connected by a fairly small number of relational steps.

Complete Network Analysis Network Connections: Large-Scale network structure Watts says there are 4 conditions

Complete Network Analysis Network Connections: Large-Scale network structure Watts says there are 4 conditions that make the small world phenomenon interesting: 1) The network is large - O(Billions) 2) The network is sparse - people are connected to a small fraction of the total network 3) The network is decentralized -- no single (or small #) of stars 4) The network is highly clustered -- most friendship circles are overlapping

Complete Network Analysis Network Connections: Large-Scale network structure Formally, we can characterize a graph

Complete Network Analysis Network Connections: Large-Scale network structure Formally, we can characterize a graph through 2 statistics. 1) The characteristic path length, L The average length of the shortest paths connecting any two actors. (note this only works for connected graphs) 2) The clustering coefficient, C • Version 1: the average local density. That is, Cv = ego-network density, and C = Cv/n • Version 2: transitivity ratio. Number of closed triads divided by the number of closed and open triads. A small world graph is any graph with a relatively small L and a relatively large C.

Complete Network Analysis Network Connections: Large-Scale network structure The most clustered graph is Watt’s

Complete Network Analysis Network Connections: Large-Scale network structure The most clustered graph is Watt’s “Caveman” graph:

Complete Network Analysis Network Connections: Large-Scale network structure C and L as functions of

Complete Network Analysis Network Connections: Large-Scale network structure C and L as functions of k for a Caveman graph of n=1000 1. 2 140 Clustering Coefficient 100 0. 8 80 0. 6 60 0. 4 40 0. 2 20 0 0 20 40 60 Degree (k) 80 100 0 120 Characteristic Path Length 120 1

Complete Network Analysis Network Connections: Large-Scale network structure Compared to random graphs, C is

Complete Network Analysis Network Connections: Large-Scale network structure Compared to random graphs, C is large and L is long. The intuition, then, is that clustered graphs tend to have (relatively) long characteristic path lengths. But the small world phenomenon rests on just the opposite: high clustering and short path distances. How is this so?

Complete Network Analysis Network Connections: Large-Scale network structure A model for pair formation, as

Complete Network Analysis Network Connections: Large-Scale network structure A model for pair formation, as a function of mutual contacts. Using this equation, a produces networks that range from completely ordered (caveman-like) to random.

Complete Network Analysis Network Connections: Large-Scale network structure C=Large, L is Small = SW

Complete Network Analysis Network Connections: Large-Scale network structure C=Large, L is Small = SW Graphs

Complete Network Analysis Network Connections: Large-Scale network structure Why does this work? Key is

Complete Network Analysis Network Connections: Large-Scale network structure Why does this work? Key is fraction of shortcuts in the network In a highly clustered, ordered network, a single random connection will create a shortcut that lowers L dramatically Watts demonstrates that Small world graphs occur in graphs with a small number of shortcuts

Complete Network Analysis Network Connections: Large-Scale network structure 1) Movie network: Actors through Movies

Complete Network Analysis Network Connections: Large-Scale network structure 1) Movie network: Actors through Movies Lo/Lr= 1. 22 Co/Cr = 2925 2) Western Power Grid: Lo/Lr= 1. 50 Co/Cr = 16 3) C. elegans Lo/Lr= 1. 17 Co/Cr = 5. 6

Complete Network Analysis Network Connections: Large-Scale network structure What are the substantive implications? Return

Complete Network Analysis Network Connections: Large-Scale network structure What are the substantive implications? Return to the initial interest in connectivity: disease diffusion 1) Diseases move more slowly in highly clustered graphs (fig. 11) - not a new finding. 2) The dynamics are very non-linear -- with no clear pattern based on local connectivity. Implication: small local changes (shortcuts) can have dramatic global outcomes (disease diffusion)

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an observed graph fits the SW model? Random expectations: For basic one-mode networks (such as acquaintance nets), we can get approximate random values for L and C as: Lrandom ~ ln(n) / ln(k) Crandom ~ k / n As k and n get large. Note that C essentially approaches zero as N increases, and K is assumed fixed. This formula uses the density-based measure of C, but the substantive implications are similar for the triad formula.

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an observed graph fits the SW model? One problem with using the simple formulas for most extant data on large graphs is that, because the data result from people overlapping in groups/movies/publications, necessary clustering results from the assignment to groups. G 1 G 2 G 3 G 4 1 0 0 1 0 1 1 0 0 0 1 0 1 0 1. . LINES CUT. . . William 0 1 0 0 Xavier 0 1 Yolanda 1 0 Zanfir 0 1 12 14 9 14 Amy Billy Charlie Debbie Elaine Frank George G 5 0 0 1 0 0 0 1 5

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an observed graph fits the SW model? Newman, M. E. J. ; Strogatz, S. J. , and Watts, D. J. “Random Graphs with arbitrary degree distributions and their applications” Phys. Rev. E. 2001 This paper extends the formulas for expected clustering and path length using a generating functions approach, making it possible to calculate E(C, L) for graphs with any degree distribution. Importantly, this procedure also makes it possible to account for clustering in a two-mode graph caused by the distribution of assignment to groups.

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an observed graph fits the SW model? Newman, M. E. J. ; Strogatz, S. J. , and Watts, D. J. “Random Graphs with arbitrary degree distributions and their applications” Phys. Rev. E. 2001 Where N is the size of the graph, Z 1 is the average number of people 1 step away (degree) and Z 2 is the average number of people 2 steps away. Theoretically, these formulas can be used to calculate many properties of the network – including largest component size, based on degree distributions. A word of warning: The math in these papers is not simple, sharpen your calculus pencil before reading the paper…

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an

Complete Network Analysis Network Connections: Large-Scale network structure How do we know if an observed graph fits the SW model? Since C is just the transitivity ratio, there a number of good formulas for calculating the expected value. Using the ratio of complete to (incomplete + complete) triads, we can use the expected values from the triad distribution in PAJEK for a simple graph or we can use the expected value conditional on the dyad types (if we have directed data) using the formulas in SPAN and Wasserman and Faust (1994).

Complete Network Analysis Network Connections: Large-Scale network structure Across a large number of substantive

Complete Network Analysis Network Connections: Large-Scale network structure Across a large number of substantive settings, Barabási points out that the distribution of network involvement (degree) is highly and characteristically skewed.

Complete Network Analysis Network Connections: Large-Scale network structure Many large networks are characterized by

Complete Network Analysis Network Connections: Large-Scale network structure Many large networks are characterized by a highly skewed distribution of the number of partners (degree)

Complete Network Analysis Network Connections: Large-Scale network structure Many large networks are characterized by

Complete Network Analysis Network Connections: Large-Scale network structure Many large networks are characterized by a highly skewed distribution of the number of partners (degree)

Complete Network Analysis Network Connections: Large-Scale network structure The scale-free model focuses on the

Complete Network Analysis Network Connections: Large-Scale network structure The scale-free model focuses on the distance-reducing capacity of high-degree nodes:

Complete Network Analysis Network Connections: Large-Scale network structure The scale-free model focuses on the

Complete Network Analysis Network Connections: Large-Scale network structure The scale-free model focuses on the distance-reducing capacity of highdegree nodes, as ‘hubs’ create shortcuts that carry network flow. The diffusion implications of mathematical models based on the preferential attachment model are dim, because the carrying capacity of the network comes to depend entirely on a vanishingly small number of stars, who are statistically hard to find. Thus, random treatment to the network does no good, but targeted treatment does.

Complete Network Analysis Network Connections: Large-Scale network structure The scale-free model focuses on the

Complete Network Analysis Network Connections: Large-Scale network structure The scale-free model focuses on the distance-reducing capacity of highdegree nodes, as ‘hubs’ create shortcuts that carry network flow. The primary mechanism hypothesized to drive a power-law degree distribution is the “preferential attachment” model. This model suggests that new nodes enter the population and connect to current nodes with probability proportional to the current node’s degree. This implies that “The rich get richer” and the graph takes on a decidedly star-like shape.

Complete Network Analysis Network Connections: Large-Scale network structure Critiques of the Scale-free model: 1)

Complete Network Analysis Network Connections: Large-Scale network structure Critiques of the Scale-free model: 1) The insights are not particularly new, having been anticipated in the epidemiology of STDs for some time. 2) Many of the empirical claims are over-stated. • The most common ‘test’ for a scale free network is to plot the degree histogram on a log-log scale and fit a regression line to it. This is poor statistical practice, and better models for fitting distributions show that most of the sexual networks are not, in fact, scale free (see Jones and Handcock, "Sexual contacts and epidemic thresholds" Nature, 423, 6940, 605 -606) 3) Theoretically, any degree-based metric has no necessary relation to the arrangement of ties within the network. That is, there are many graphs with identical degree distributions but very different topologies. • Preferential attachment scale free, but not vice versa • Finding a power-law degree distribution is really not that useful if there is any kind of blocking structure (focal aspects) to the network.

Complete Network Analysis Network Connections: Large-Scale network structure Colorado Springs High-Risk (Sexual contact only)

Complete Network Analysis Network Connections: Large-Scale network structure Colorado Springs High-Risk (Sexual contact only) • Network is approximately scale-free, with l = -1. 3 • But connectivity does not depend on the hubs.

Complete Network Analysis Network Connections: Large-Scale network structure White, D. R. and F. Harary.

Complete Network Analysis Network Connections: Large-Scale network structure White, D. R. and F. Harary. 2001. "The Cohesiveness of Blocks in Social Networks: Node Connectivity and Conditional Density. " Sociological Methodology 31: 305 -59. Moody, James and Douglas R. White. 2003. “Structural Cohesion and Embeddedness: A hierarchical Conception of Social Groups” American Sociological Review 68: 103 -127 White, Douglas R. , Jason Owen-Smith, James Moody, & Walter W. Powell (2004) "Networks, Fields, and Organizations: Scale, Topology and Cohesive Embeddings. " Computational and Mathematical Organization Theory. 10: 95 -117 Moody, James "The Structure of a Social Science Collaboration Network: Disciplinary Cohesion from 1963 to 1999" American Sociological Review. 69: 213 -238

Complete Network Analysis Network Connections: Large-Scale network structure Analytically, most of work on connectivity

Complete Network Analysis Network Connections: Large-Scale network structure Analytically, most of work on connectivity has focused on summaries of completely local properties (degree distributions or clustering). We turn the argument around ask what features of a network are essential for holding the whole structure together? Def. 1: “A collectivity is cohesive to the extent that the social relations of its members hold it together. ” What network pattern embodies all the elements of this intuitive definition?

Complete Network Analysis Network Connections: Large-Scale network structure This definition contains 5 essential elements:

Complete Network Analysis Network Connections: Large-Scale network structure This definition contains 5 essential elements: 1. 2. 3. 4. 5. Focuses on what holds the group together Expressed as a group level property The conception is continuous Rests on observable social relations Applies to groups of any size

Complete Network Analysis Network Connections: Large-Scale network structure 1) Actors must be connected: a

Complete Network Analysis Network Connections: Large-Scale network structure 1) Actors must be connected: a collection of isolates is not cohesive. Not cohesive Minimally cohesive: a single path connects everyone

Complete Network Analysis Network Connections: Large-Scale network structure 1) Reachability is an essential element

Complete Network Analysis Network Connections: Large-Scale network structure 1) Reachability is an essential element of relational cohesion. As more paths re -link actors in the group, the ability to ‘hold together’ increases. The important feature is not the density of relations, but the pattern. Cohesion increases as # of paths connecting people increases

Complete Network Analysis Network Connections: Large-Scale network structure Consider the minimally cohesive group: D

Complete Network Analysis Network Connections: Large-Scale network structure Consider the minimally cohesive group: D = . 25 Moving a line keeps density constant, but changes reachability.

Complete Network Analysis Network Connections: Large-Scale network structure What if density increases, but through

Complete Network Analysis Network Connections: Large-Scale network structure What if density increases, but through a single person? D = . 25 D = . 39 Removal of 1 person destroys the group.

Complete Network Analysis Network Connections: Large-Scale network structure Cohesion increases as the number of

Complete Network Analysis Network Connections: Large-Scale network structure Cohesion increases as the number of independent paths in the network increases. Ties through a single person are minimally cohesive. D = . 39 Minimal cohesion D = . 39 More cohesive

Complete Network Analysis Network Connections: Large-Scale network structure Substantive differences between networks connected through

Complete Network Analysis Network Connections: Large-Scale network structure Substantive differences between networks connected through a single actor and those connected through many. Minimally Cohesive Power is centralized Information is concentrated Expect actor inequality Vulnerable to unilateral action Segmented structure Strongly Cohesive Power is decentralized Information is distributed Actor equality Robust to unilateral action Even structure Def 2. “A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together. ”

Complete Network Analysis Network Connections: Large-Scale network structure Def 2. “A group is structurally

Complete Network Analysis Network Connections: Large-Scale network structure Def 2. “A group is structurally cohesive to the extent that multiple independent relational paths among all pairs of members hold it together. ” 0 2 1 Node Connectivity 3

Complete Network Analysis Network Connections: Large-Scale network structure Formalize the argument: If there is

Complete Network Analysis Network Connections: Large-Scale network structure Formalize the argument: If there is a path between every node in a graph, the graph is connected, and called a component. In every component, the paths linking actors i and j must pass through a set of nodes, S, that if removed would disconnect the graph. The number of nodes in the smallest S is equal to the number of independent paths connecting i and j.

Complete Network Analysis Network Connections: Large-Scale network structure The relation between cut-set size and

Complete Network Analysis Network Connections: Large-Scale network structure The relation between cut-set size and number of paths (recall our discussion of bicomponents) leads to the two versions of our final definition: Def 3 a “A group’s structural cohesion is equal to the minimum number of actors who, if removed from the group, would disconnect the group. ” Def 3 b “A group’s structural cohesion is equal to the minimum number of independent paths linking each pair of actors in the group. ” These two definitions are equivalent.

Complete Network Analysis Network Connections: Large-Scale network structure Some graph theoretic properties of k-components

Complete Network Analysis Network Connections: Large-Scale network structure Some graph theoretic properties of k-components 1) Every member of a k-components must have at least k-ties. If a person has less than k ties, then there would be fewer than k paths connecting them to the rest of the network. 2) A graph where every person has k-ties is not necessarily a k-component. That is, (1) does not work in reverse. Structures can have high degree, but low connectivity. 3) Two k-components can only overlap by k-1 members. If the k-components overlap by more than k-1 members, then there would be at least k paths connecting the two components, and they would be a single k-component. 4) A clique is n-1 connected. 5) k-components can be nested, such that a k+l component is contained within a kcomponent.

Complete Network Analysis Network Connections: Large-Scale network structure Nested connectivity sets: An operationalization of

Complete Network Analysis Network Connections: Large-Scale network structure Nested connectivity sets: An operationalization of embeddedness. 2 3 1 8 4 5 14 17 18 19 20 21 22 23 10 11 7 6 9 12 13 15 16

Complete Network Analysis Network Connections: Large-Scale network structure “Embeddedness” refers to the fact that

Complete Network Analysis Network Connections: Large-Scale network structure “Embeddedness” refers to the fact that economic action and outcomes, like all social action and outcomes, are affected by actors’ dyadic (pairwise) relations and by the structure of the overall network of relations. As a shorthand, I will refer to these as the relational and the structural aspects of embeddedness. The structural aspect is especially crucial to keep in mind because it is easy to slip into “dyadic atomization, ” a type of reductionism. (Granovetter 1992: 33, italics in original)

Complete Network Analysis Network Connections: Large-Scale network structure G {7, 8, 9, 10, 11

Complete Network Analysis Network Connections: Large-Scale network structure G {7, 8, 9, 10, 11 12, 13, 14, 15, 16} {7, 8, 11, 14} {1, 2, 3, 4, 5, 6, 7, 17, 18, 19, 20, 21, 22, 23} {1, 2, 3, 4, 5, 6, 7} {17, 18, 19, 20, 21, 22, 23}

Complete Network Analysis Network Connections: Large-Scale network structure Empirical Examples: a) Embeddedness and School

Complete Network Analysis Network Connections: Large-Scale network structure Empirical Examples: a) Embeddedness and School Attachment b) Political similarity among Large American Firms

Complete Network Analysis Network Connections: Large-Scale network structure School Attachment

Complete Network Analysis Network Connections: Large-Scale network structure School Attachment

Complete Network Analysis Network Connections: Large-Scale network structure Business Political Action

Complete Network Analysis Network Connections: Large-Scale network structure Business Political Action

Complete Network Analysis Network Connections: Large-Scale network structure Theoretical Implications: • Resource and Risk

Complete Network Analysis Network Connections: Large-Scale network structure Theoretical Implications: • Resource and Risk Flow Structural cohesion increases the probability of diffusion in a network, particularly if flow depends on individual behavior (as opposed to edge capacity).

Complete Network Analysis Network Connections: Large-Scale network structure Structural Cohesion also provides a new

Complete Network Analysis Network Connections: Large-Scale network structure Structural Cohesion also provides a new way of thinking about STD cores Project 90, Sex-only network (n=695) 3 -Component (n=58)

Complete Network Analysis Network Connections: Large-Scale network structure IV Drug Sharing Largest BC: 247

Complete Network Analysis Network Connections: Large-Scale network structure IV Drug Sharing Largest BC: 247 k > 4: 318 Max k: 12 Structural Cohesion simultaneously gives us a positional and subgroup analysis. Connected Bicomponents

Complete Network Analysis Network Connections: Large-Scale network structure Development of STD cores in low-degree

Complete Network Analysis Network Connections: Large-Scale network structure Development of STD cores in low-degree networks: rapid transition without stars.

Complete Network Analysis Network Connections: Large-Scale network structure

Complete Network Analysis Network Connections: Large-Scale network structure

Complete Network Analysis Network Connections: Social Subgroups A primary interest in Social Network Analysis

Complete Network Analysis Network Connections: Social Subgroups A primary interest in Social Network Analysis is the identification of “significant social subgroups” – some smaller collection of nodes in the graph that can be considered, at least in some senses, as a “unit” based on the pattern, strength, or frequency of ties. There are many ways to identify groups. They all insist on a group being in a connected component, but other than that the variation is wide.

Complete Network Analysis Network Connections: Social Subgroups A) Graph theoretical methods: Cliques and extensions

Complete Network Analysis Network Connections: Social Subgroups A) Graph theoretical methods: Cliques and extensions of cliques • Cliques • k-cores • k-plexes • Freeman (1992) Models • K-components B) Algorithmic methods: search through a network trying to maximize for a particular pattern. Adjust assignment of actors to groups until a particular pattern of ties (block diagonal, usually) is identified. • Standard models: - Factions (UCI-NET) - NEGOPY (Richards) - Klique. Finder (Frank) - RNM (Moody) - CROWDS (Moody) - General Distance & Clustering Methods

Complete Network Analysis Network Connections: Social Subgroups Graph Theoretical Models. Start with a clique.

Complete Network Analysis Network Connections: Social Subgroups Graph Theoretical Models. Start with a clique. A clique is defined as a maximal subgraph in which every member of the graph is connected to every other member of the graph. Cliques are collections of nodes where density = 1. 0. Properties of cliques: • Density: 1. 0 • Everyone connected to n-1 alters • Distance between every pair is 1 • Ratio of within group ties to between group ties is infinite • All triads are transitive

Complete Network Analysis Network Connections: Social Subgroups Graph Theoretical Models. In practice, complete cliques

Complete Network Analysis Network Connections: Social Subgroups Graph Theoretical Models. In practice, complete cliques are not very useful. They tend to overlap heavily and are limited in their size. Graph theorists have thus relaxed the complete connectivity requirement (with varying degrees of success). See the Moody & White (2003) for a discussion of these attempts.

Complete Network Analysis Network Connections: Social Subgroups Graph Theoretical Models. k-cores: Every person connected

Complete Network Analysis Network Connections: Social Subgroups Graph Theoretical Models. k-cores: Every person connected to at least k other people. Ideally, they would look something like this (here two 3 cores). However, adding a single tie from A to B would make the whole graph a 3 -core

Complete Network Analysis Network Connections: Social Subgroups Extensions of this idea include: K-Core: Every

Complete Network Analysis Network Connections: Social Subgroups Extensions of this idea include: K-Core: Every person has ties to at least k other people in the set. K-plex: Every member connected to at least n-k other people in the graph (recall in a clique everyone is connected to n-1, so this relaxes that condition. n-clique: Every person is connected by a path of N or less (recall a clique is with distance = 1). N-clan: same as an n-clique, but all paths must be inside the group. I’ve never had much luck with any of these methods empirically. Real data is usually too messy to work well. Since many of the graph-theoretic options seem not to work well, authors have used optimization techniques, that attempt to identify groups iteratively.

Complete Network Analysis Network Connections: Social Subgroups Algorithmic Approaches to Identifying Primary Groups: 1)

Complete Network Analysis Network Connections: Social Subgroups Algorithmic Approaches to Identifying Primary Groups: 1) Measures of fit To identify a primary group, we need some measure of how clustered the network is. Usually, this is a function of the number of ties that fall within group to the number of ties that fall between group. 2. 1) Processes designed to maximize (1) Once we have such an index, we need a method for searching through the network to maximize the fit. 2. 2) Generalized cluster analysis In addition to maximizing a group function such as (1) we can use the relational distance directly, and look for clusters in the data.

Complete Network Analysis Network Connections: Social Subgroups Segregation Index (Freeman, L. C. 1972. "Segregation

Complete Network Analysis Network Connections: Social Subgroups Segregation Index (Freeman, L. C. 1972. "Segregation in Social Networks. " Sociological Methods and Research 6411 -30. ) Freeman asked how we could identify segregation in a social network. Theoretically, he argues, if a given attribute (group label) does not matter for social relations, then relations should be distributed randomly with respect to the attribute. Thus, the difference between the number of cross-group ties expected by chance and the number observed measures segregation.

Complete Network Analysis Network Connections: Social Subgroups Consider the (hypothetical) network below. There are

Complete Network Analysis Network Connections: Social Subgroups Consider the (hypothetical) network below. There are two attributes in this network: people with Blue eyes and Brown eyes and people who are square or not (they must be hip).

Complete Network Analysis Network Connections: Social Subgroups Segregation Index Mixing Matrix: Blue Brown Blue

Complete Network Analysis Network Connections: Social Subgroups Segregation Index Mixing Matrix: Blue Brown Blue 6 17 Brown 17 16 Seg = -0. 25 Hip 20 Square 3 30 Seg = 0. 78

Complete Network Analysis Network Connections: Social Subgroups Segregation Index To calculate the number of

Complete Network Analysis Network Connections: Social Subgroups Segregation Index To calculate the number of expected, we use the standard formula for a contingency table: Row marginal * column Marginal / Total observed Blue 6 Expected Brown 17 Blue 23 Brown 17 16 33 23 33 56 In matrix form: E(X) = R*C/T Brown Blue 9. 45 13. 55 23 Brown 13. 55 19. 45 33 23 33 56

Complete Network Analysis Network Connections: Social Subgroups Segregation Index observed Blue 6 Expected Brown

Complete Network Analysis Network Connections: Social Subgroups Segregation Index observed Blue 6 Expected Brown 17 Blue 23 Brown 17 16 33 23 33 56 Blue 9. 45 13. 55 23 Brown 13. 55 19. 45 33 23 33 56 E(X) = (13. 55+13. 55) X = (17+17) Seg Brown = 27. 1 - 34 / 27. 1 = -6. 9 / 27. 1 = -0. 25

Complete Network Analysis Network Connections: Social Subgroups Segregation Index Observed Hip 20 Square 3

Complete Network Analysis Network Connections: Social Subgroups Segregation Index Observed Hip 20 Square 3 23 Expected Square 3 Hip 23 30 33 33 56 Hip 9. 45 13. 55 23 Square 13. 55 19. 45 33 23 33 56 E(X) = (13. 55+13. 55) X = (3+3) Seg Square = 27. 1 - 6 / 27. 1 = 21. 1 / 27. 1 = 0. 78

Complete Network Analysis Network Connections: Social Subgroups Segregation Index One problem with the segregation

Complete Network Analysis Network Connections: Social Subgroups Segregation Index One problem with the segregation index is that it is not ‘margin free. ’ That is, if you were to change the distribution of the category of interest (say race) by a constant but not the core association between race and friendship choice, you can get a different segregation level. One antidote to this problem is to use odds ratios. In this case, and odds ratio tells us the relative likelihood that two people in the same category will choose each other as friends.

Complete Network Analysis Network Connections: Social Subgroups Odds Ratios The odds ratio tells us

Complete Network Analysis Network Connections: Social Subgroups Odds Ratios The odds ratio tells us how much more likely people in the same group are to nominate each other. You calculate the odds ratio based on the number of ties in a group and their relative size, based on the following table: Member of: Same Group Different Group Friends A B Not Friends C D OR = AD/ BC

Complete Network Analysis Network Connections: Social Subgroups Odds Ratios Observed Hip 20 Square 3

Complete Network Analysis Network Connections: Social Subgroups Odds Ratios Observed Hip 20 Square 3 23 Square 3 23 There are 6 hip people and 9 square people in this network. This implies that there are the following number of possible ties in the network: 30 33 33 56 Hip Group Same Dif Yes 50 6 Friend No 52 102 OR = (50)102 / 52(6) = 16. 35 Square 30 54 Square 54 72 Diagonal = ni(ni-1) off diagonal = ni 2

Complete Network Analysis Network Connections: Social Subgroups Friendship Segregation Index Segregation index compared to

Complete Network Analysis Network Connections: Social Subgroups Friendship Segregation Index Segregation index compared to the odds ratio: r=. 95 Log(Same-Sex Odds Ratio)

Complete Network Analysis Network Connections: Social Subgroups The segregation index is one metric used

Complete Network Analysis Network Connections: Social Subgroups The segregation index is one metric used to identify groups. Others include: a) The ratio of in-group to out-group ties (Negopy, UCINET Factions) b) Maximizing the probability of in-group contact (Clique. Finder) c) The Segregation Matrix Index (SMI) d) The dyadic factor loadings for overlapping groups (akin to a latent class model) e) Minimize the within-group distance Once a metric has been chosen, some algorithm is needed to search through the graph to identify clusters. These algorithms range from very sophisticated “graph-intelligent” algorithms, such as NEGOPY, to simple cluster analysis of distance matrices. In most cases, you have to pre-set the number of groups to use (the exceptions are NEGOPY and Clique. Finder. Moody’s CROWDS algorithm also has automatic stopping criteria, but you have to give it starting values.

Complete Network Analysis Network Connections: Social Subgroups In practice, the different algorithms will give

Complete Network Analysis Network Connections: Social Subgroups In practice, the different algorithms will give different results. Here, I compare the NEGOPY results to the RNM results. NEGOPY returned one large group, RNM found many smaller, denser groups. It’s usually a good idea to explore multiple solutions and algorithms.

Complete Network Analysis Network Connections: Social Subgroups Gangon Prison Network In practice, the different

Complete Network Analysis Network Connections: Social Subgroups Gangon Prison Network In practice, the different algorithms will give different results. Here, I compare NEGOPY, FACTIONS and RNM. Groups A and B are identical, C is close. F, E and D differ. It’s usually a good idea to explore multiple solutions and algorithms. (all solutions constrained to 6 groups)

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis In addition to tools like

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis In addition to tools like FACTIONS, we can use the distance information contained in a network to cluster observations that are ‘close’ to each other. In general, cluster analysis is a set of techniques that allows you to identify collections of objects that are simmilar to each other in some degree. A very good reference is the SAS/STAT manual section called, “Introduction to clustering procedures. ” (http: //wks. uts. ohio-state. edu/sasdoc/8/sashtml/stat/chap 8/index. htm) (See also Wasserman and Faust, though the coverage is spotty). We are going to start with the general problem of hierarchical clustering applied to any set of analytic objects based on similarity, and then transfer that to clustering nodes in a network.

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis How Smart you are Imagine

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis How Smart you are Imagine a set of objects (say people) arrayed in a two dimensional space. You want to identify groups of people based on their position in that space. How do you do it? How Cool you are

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis x Start by choosing a

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis x Start by choosing a pair of people who are very close to each other (such as 15 & 16) and now treat that pair as one point, with a value equal to the mean position of the two nodes.

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis Now repeat that process for

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis Now repeat that process for as long as possible.

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis This process is captured in

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis This process is captured in the cluster tree (called a dendrogram)

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis As with the network cluster

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis As with the network cluster algorithms, there are many options for clustering. The three that I use most are: • Ward’s Minimum Variance -- the one I use almost 95% of the time • Average Distance -- the one used in the example above • Median Distance -- very similar The SAS manual is the best single place I’ve found for information on each of these techniques. Some things to keep in mind: Units matter. The example above draws together pairs horizontally because the range there is smaller. Get around this by standardizing your data. This is an inductive technique. You can find clusters in a purely random distribution of points. Consider the following example.

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis The data in this scatter

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis The data in this scatter plot are produced using this code: data random; do i=1 to 20; x=rannor(0); y=rannor(0); output; end; run;

Complete Network Analysis Network Connections: Social Subgroups Resulting dendrogram

Complete Network Analysis Network Connections: Social Subgroups Resulting dendrogram

Complete Network Analysis Network Connections: Social Subgroups Resulting cluster solution

Complete Network Analysis Network Connections: Social Subgroups Resulting cluster solution

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis works by building a distance

Complete Network Analysis Network Connections: Social Subgroups Cluster analysis works by building a distance matrix between each pair of points. In the example above, it used the Euclidean distance which in two dimensions is simply the physical distance between the points in a plot. Can work on any number of dimensions. To use cluster analysis in a network, we base the distance on the pathdistance between pairs of people in the network. Consider again the blue-eye hip example:

Complete Network Analysis Network Connections: Social Subgroups 0 1 3 2 3 3 4

Complete Network Analysis Network Connections: Social Subgroups 0 1 3 2 3 3 4 3 3 2 2 1 1 1 0 2 2 2 3 3 3 2 1 2 1 3 2 0 3 2 4 3 3 2 1 1 1 2 2 3 Distance 2 3 3 4 3 2 2 3 3 2 4 3 3 0 1 1 2 1 1 0 2 1 1 1 2 0 1 1 2 2 1 3 2 2 3 2 4 3 3 3 3 4 4 4 2 3 3 4 3 1 2 2 3 2 Matrix 3 2 2 1 2 2 2 1 1 2 3 3 1 1 2 2 2 3 4 4 2 2 3 3 1 2 3 3 0 1 2 2 1 0 1 1 2 1 0 1 2 1 1 0 3 2 2 1 2 2 3 2 2 1 2 3 3 4 4 4 3 2 2 1 0 2 2 1 2 2 2 3 3 4 3 3 2 2 1 2 0 1 1 1 3 1 2 2 3 2 2 1 0

Complete Network Analysis Network Connections: Social Subgroups The distance matrix implies a space that

Complete Network Analysis Network Connections: Social Subgroups The distance matrix implies a space that nodes are embedded within. Using something like MDS, we can represent the space implied by the distance matrix in two dimensions. This is the image of the network you would get if you did that.

Complete Network Analysis Network Connections: Social Subgroups When you use variables, the cluster analysis

Complete Network Analysis Network Connections: Social Subgroups When you use variables, the cluster analysis program generates a distance matrix. We can, instead use the network distance matrix directly. If we do that with this example network, we get the following:

Complete Network Analysis Network Connections: Social Subgroups

Complete Network Analysis Network Connections: Social Subgroups

Complete Network Analysis Network Connections: Social Subgroups The CROWDS algorithm combines the density approach

Complete Network Analysis Network Connections: Social Subgroups The CROWDS algorithm combines the density approach above with an initial cluster analysis and a routine for determining how many clusters are in the network. It does so by using the Segregation index and all of the information from the cluster hierarchy, combining two groups only if it improves the segregation fit for both groups.

Complete Network Analysis Network Connections: Social Subgroups The one other program you should know

Complete Network Analysis Network Connections: Social Subgroups The one other program you should know about is NEGOPY. Negopy is a program that combines elements of the density based approach and the graph theoretic approach to find groups and positions. Like CROWDS, NEGOPY assigns people both to groups and to ‘outsider’ or ‘between’ group positions. Negopy also determines how many groups are in the network, though in my experience it often finds a single large group.

Complete Network Analysis Network Connections: Social Subgroups The Recursive Neighborhood Means algorithm creates the

Complete Network Analysis Network Connections: Social Subgroups The Recursive Neighborhood Means algorithm creates the variables that are then used in the cluster analysis to identify groups based on a simulated peer influence process. • Start by randomly assigning every node a random value on k variables • Then calculate the average for each variable for the people each person is tied to • Repeat this process many times This results in people who have many ties to each other having similar values on the k random variables. This similarity then gets picked up in a cluster analysis.

Complete Network Analysis Network Connections: Social Subgroups Example of the RNM procedure Time 1

Complete Network Analysis Network Connections: Social Subgroups Example of the RNM procedure Time 1 Time 2 Time 3

Complete Network Analysis Network Connections: Social Subgroups As an example, consider the process active

Complete Network Analysis Network Connections: Social Subgroups As an example, consider the process active on a known-to-be clustered networks, starting with 2 random k variables. You get something like this, where the nodes are now placed according to their resulting values on the 2 variables.

Complete Network Analysis Network Connections: Social Subgroups All of these techniques are inductive procedures.

Complete Network Analysis Network Connections: Social Subgroups All of these techniques are inductive procedures. It is possible to specify a deductive, test for group membership, if you have a prior reasons to assume that a particular set of people are a group. The simplest way would be to specify a dyadic model on the adjacency matrix, then model the probability (strength) of a tie as a function of dyadic characteristics and your indicator for being in the same group. If this parameter is large and significant, then you have evidence for the group. This is, in fact, what Klique. Finder does inductively.

Complete Network Analysis Network Connections: Role Positions Overview • Social life can be described

Complete Network Analysis Network Connections: Role Positions Overview • Social life can be described (at least in part) through social roles. • To the extent that roles can be characterized by regular interaction patterns, we can summarize roles through common relational patterns. • Identifying these sets is the goal of block-model analyses. Nadel: The Coherence of Role Systems • Background ideas for White, Boorman and Brieger. Social life as interconnected system of roles • Important feature: thinking of roles as connected in a role system = social structure White, Harrison C. ; Boorman, Scott A. , and Breiger, Ronald L. Social Structure from Multiple Networks I. American Journal of Sociology. 1976; 81730 -780. • The key article describing theoretical and technical elements of block-modeling

Complete Network Analysis Network Connections: Role Positions Elements of a Role: • Rights and

Complete Network Analysis Network Connections: Role Positions Elements of a Role: • Rights and obligations with respect to other people or classes of people • Roles require a ‘role compliment’ another person who the role-occupant acts with respect to Examples: Parent – child Teacher – student Lover – lover Friend – Friend Husband - Wife Nadel (Following functional anthropologists and sociologists) defines ‘logical’ types of roles, and then examines how they can be linked together.

Complete Network Analysis Network Connections: Role Positions White et al: From logical role systems

Complete Network Analysis Network Connections: Role Positions White et al: From logical role systems to empirical social structures Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Thus, we might represent a family as: H W C C C Romantic Love Provides food for (and there are, of course, many other relations inside a family) Bickers with

Complete Network Analysis Network Connections: Role Positions The key idea, is that we can

Complete Network Analysis Network Connections: Role Positions The key idea, is that we can express a role through a relation (or set of relations) and thus a social system by the inventory of roles. If roles equate to positions in an exchange system, then we need only identify particular aspects of a position. But what aspect? Structural Equivalence Two actors are structurally equivalent if they have the same types of ties to the same people.

Complete Network Analysis Network Connections: Role Positions Structural Equivalence A single relation

Complete Network Analysis Network Connections: Role Positions Structural Equivalence A single relation

Complete Network Analysis Network Connections: Role Positions Structural Equivalence Graph reduced to positions

Complete Network Analysis Network Connections: Role Positions Structural Equivalence Graph reduced to positions

Complete Network Analysis Network Connections: Role Positions Blockmodeling: basic steps In any positional analysis,

Complete Network Analysis Network Connections: Role Positions Blockmodeling: basic steps In any positional analysis, there are 4 basic steps: 1) Identify a definition of equivalence 2) Measure the degree to which pairs of actors are equivalent 3) Develop a representation of the equivalencies 4) Assess the adequacy of the representation

Complete Network Analysis Network Connections: Role Positions 1) Identify a definition of equivalence Structural

Complete Network Analysis Network Connections: Role Positions 1) Identify a definition of equivalence Structural Equivalence: Two actors are equivalent if they have the same type of ties to the same people.

Complete Network Analysis Network Connections: Role Positions Automorphic Equivalence: Actors occupy indistinguishable structural locations

Complete Network Analysis Network Connections: Role Positions Automorphic Equivalence: Actors occupy indistinguishable structural locations in the network. That is, that they are in isomorphic positions in the network. Automorphically equivalent nodes are equivalent with respect to all graph theoretic properties (I. e. degree, number of people reachable, centrality, etc. ) (Which suggests a simple way of using cluster analyses to find these groups)

Complete Network Analysis Network Connections: Role Positions Automorphic Equivalence:

Complete Network Analysis Network Connections: Role Positions Automorphic Equivalence:

Complete Network Analysis Network Connections: Role Positions Regular Equivalence: Regular equivalence does not require

Complete Network Analysis Network Connections: Role Positions Regular Equivalence: Regular equivalence does not require actors to have identical ties to identical actors or to be structurally indistinguishable. Actors who are regularly equivalent have identical ties to and from equivalent actors. If actors i and j are regularly equivalent, then for all relations and for all actors, if i k, then there exists some actor l such that j l and k is regularly equivalent to l.

Complete Network Analysis Network Connections: Role Positions Regular Equivalence: There may be multiple regular

Complete Network Analysis Network Connections: Role Positions Regular Equivalence: There may be multiple regular equivalence partitions in a network, and thus we tend to want to find the maximal regular equivalence position, the one with the fewest positions.

Complete Network Analysis Network Connections: Role Positions Role or Local Equivalence: While most equivalence

Complete Network Analysis Network Connections: Role Positions Role or Local Equivalence: While most equivalence measures focus on position within the full network, some measures focus only on the patters within the local tie neighborhood. These have been called ‘local role’ equivalence. Note that: Structurally equivalent actors are automorphically equivalent, Automorphically equivalent actors are regularly equivalent. Structurally equivalent and automorphically equivalent actors are role equivalent In practice, we tend to ignore some of these distinctions, as they get blurred quickly once we have to operationalize them in real-world graphs. It turns out that few people are ever exactly equivalent, and thus we approximate the links between the types. In all cases, the procedure can work over multiple relations simultaneously. The process of identifying positions is called blockmodeling, and requires identifying a measure of similarity among nodes.

Complete Network Analysis Network Connections: Role Positions Once you identify equivalent actors, block them

Complete Network Analysis Network Connections: Role Positions Once you identify equivalent actors, block them in the matrix and reduce it, based on the number of ties in the cell of interest. The key values are a zero block (no ties) and a one-block (all ties present): 1 2 1. 1 0 3 1 0 0 1 4 0 1 0 0 5 0 0 0 0 6 0 0 0 3 1 0 0 1 1 0 0 4 1 0 1. 0 0 1 1 0 0 0 1 1 5 0 1 0 0 1 1 0 0 0 0 0 1 1 0 0 0 6 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0. 1 2 3 4 5 6 1 0 1 1 0 0 0 2 1 0 0 3 1 0 1 0 4 0 1 0 1 5 0 0 1 0 0 0 6 0 0 0 1 0 0 Structural equivalence thus generates 6 positions in the network

Complete Network Analysis Network Connections: Role Positions Once you partition the matrix, reduce it:

Complete Network Analysis Network Connections: Role Positions Once you partition the matrix, reduce it: . 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1. 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0. 1 1 1 2 1 3 0 1 2 1 1 1 2 3 Regular equivalence (here I placed a one in the image matrix if there were any ties in the ij block) 3 0 1 0

Complete Network Analysis Network Connections: Role Positions Operationally, you have to measure the similarity

Complete Network Analysis Network Connections: Role Positions Operationally, you have to measure the similarity between actors. If two actors are structurally equivalent, then they will have identical ties to other people. Consider the example again: 1 2 1. 1 0 3 1 0 0 1 4 0 1 0 0 5 0 0 0 0 6 0 0 0 3 1 0 0 1 1 0 0 4 1 0 1. 0 0 1 1 0 0 0 1 1 5 0 1 0 0 1 1 0 0 0 0 0 1 1 0 0 0 6 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0. C D Match 1 1 1 0 0 1. 1. 1. . 0 0 1 1 1 1 0 0 1 Sum: 12 C and D match on all 12 other people, and are thus structurally equivalent.

Complete Network Analysis Network Connections: Role Positions If the model is going to be

Complete Network Analysis Network Connections: Role Positions If the model is going to be based on asymmetric or multiple relations, you simply stack the various relations, usually including both “directions” of asymmetric relations: H Romance 0 1 0 0 0 0 0 0 W C C C Romantic Love Provides food for Bickers with 0 0 0 Feeds 0 1 1 0 0 0 0 Bicker 0 0 0 0 0 1 1 0 0 0 Stacked 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0

Complete Network Analysis Network Connections: Role Positions The metric used to measure structural equivalence

Complete Network Analysis Network Connections: Role Positions The metric used to measure structural equivalence by White, Boorman and Brieger is the correlation between each node’s set of ties. For the example, this would be: 1. 00 -0. 20 0. 08 -0. 19 0. 77 -0. 26 -0. 20 1. 00 -0. 19 0. 08 -0. 26 0. 77 0. 08 -0. 19 1. 00 -1. 00 0. 36 0. 36 -0. 45 -0. 19 0. 08 -1. 00 1. 00 -0. 45 -0. 45 0. 36 0. 77 -0. 26 0. 36 -0. 45 1. 00 1. 00 -0. 20 -0. 20 0. 77 -0. 26 0. 36 -0. 45 1. 00 1. 00 -0. 20 -0. 26 0. 77 -0. 45 -0. 45 0. 36 0. 36 -0. 20 -0. 20 1. 00 1. 00 Another common metric is the Euclidean distance between pairs of actors, which you then use in a standard cluster analysis.

Complete Network Analysis Network Connections: Role Positions The initial method for finding structurally equivalent

Complete Network Analysis Network Connections: Role Positions The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations. Concor iteration 1: 1. 00 -. 77 0. 55 -. 57 0. 95 -. 75 -. 77 1. 00 -. 57 0. 55 -. 75 0. 95 0. 55 -. 57 1. 00 -1. 0 0. 73 0. 73 -. 75 -. 57 0. 55 -1. 0 1. 00 -. 75 -. 75 0. 73 0. 95 -. 75 0. 73 -. 75 1. 00 1. 00 -. 77 -. 77 0. 95 -. 75 0. 73 -. 75 1. 00 1. 00 -. 77 -. 75 0. 95 -. 75 -. 75 0. 73 0. 73 -. 77 -. 77 1. 00 1. 00

Complete Network Analysis Network Connections: Role Positions The initial method for finding structurally equivalent

Complete Network Analysis Network Connections: Role Positions The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations. Concor iteration 2: 1. 00 -. 99 0. 94 -. 94 0. 99 -. 99 1. 00 -. 94 0. 94 -. 99 0. 94 -. 94 1. 00 -1. 0 0. 97 0. 97 -. 94 0. 94 -1. 0 1. 00 -. 97 -. 97 0. 99 -. 99 0. 97 0. 97 -. 97 1. 00 1. 00 -. 99 -. 99 0. 99 -. 97 -. 97 0. 97 -. 99 -. 99 1. 00 1. 00

Complete Network Analysis Network Connections: Role Positions The initial method for finding structurally equivalent

Complete Network Analysis Network Connections: Role Positions The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations. Concor iteration 3: 1. 00 -1. 00 1. 00 -1. 0 -1. 0 1. 00 1. 00 -1. 0 1. 00 1. 00 -1. 0 1. 00 1. 00 -1. 0 -1. 0 1. 00 1. 00 -1. 0 -1. 0 1. 00 1. 00 -1. 0 1. 00 1. 00

Complete Network Analysis Network Connections: Role Positions Automorphic and Regular equivalence are more difficult

Complete Network Analysis Network Connections: Role Positions Automorphic and Regular equivalence are more difficult to find, and require iteratively searching over possible class assignments for sets that have the same graph theoretic patterns. Usually start with a set of nodes defined as similar on a number of network measures, then look within these classes for automorphic equivalence classes. A theoretically appealing method for finding structures that are very similar to regular equivalence, role equivalence, uses the triad census. Each node is involved in (n-1)(n-2)/2 triads, and occupies a particular position in each of these triads.

Complete Network Analysis Network Connections: Role Positions Triadic Position Census: 40 Positions within all

Complete Network Analysis Network Connections: Role Positions Triadic Position Census: 40 Positions within all on two types of mutual ties

Complete Network Analysis Network Connections: Role Positions Moving from a similarity/distance matrix to a

Complete Network Analysis Network Connections: Role Positions Moving from a similarity/distance matrix to a blockmodel: number of groups and determining blocks: “An important decision in an analysis using CONCOR is how fine the partition should be; in other words, when should one stop splitting positions? Theory and the interpretability of the solution are the primary consideration in deciding how many positions to produce. ” (W&F, p. 378) “In defining positions of actors, the ‘trick’ is to choose the point along the series that gives a useful and interpretable partition of the actors into equivalence classes. ” (W&F p. 383)

Complete Network Analysis Network Connections: Role Positions Once you have decided on a number

Complete Network Analysis Network Connections: Role Positions Once you have decided on a number of blocks, you need to determine what counts as a ‘one’ block or a ‘zero’ block. Usually this is a some function of the density of the resulting block. General rules: “Fat Fit” Only put a one in blocks with all ones in the adjacency matrix “Lean Fit” Put a zero if all the cells are zero, else put a one “Density fit” If the average value of the cell is above a certain cutoff. White, Boorman and Breiger used a ‘lean fit’ (zeroblock) rule for the examples in their paper:

Complete Network Analysis Network Connections: Role Positions Most common block structures identified in Add

Complete Network Analysis Network Connections: Role Positions Most common block structures identified in Add Health Based on CONCOR, imposing a 5 -block fit

Complete Network Analysis Network Connections: Role Positions An example: Padgett, J. F. and Ansell,

Complete Network Analysis Network Connections: Role Positions An example: Padgett, J. F. and Ansell, C. K. Robust action and the rise of the Medici, 1400 -1434. American Journal of Sociology. 1993; 9812591319. “Political Groups” in the attribute sense do not seem to exist, so P&A turn to the pattern of network relations among families. This is the block reduction of the full 92 family network.

Complete Network Analysis Network Connections: Role Positions An example based on regular equivalence using

Complete Network Analysis Network Connections: Role Positions An example based on regular equivalence using the Add Health data. 003 021 C_S 030 T_S 120 U_E 012_S 021 C_B 030 T_B 120 U_S 012_E 021 C_E 030 T_E 120 C_S 012_I 111 D_S 030 C 120 C_B 102_D 111 D_B 201_S 102_I 111 D_E 201_B 210_S 021 D_S 111 U_S 120 D_S 210_B 021 D_E 111 U_B 120 D_E 021 U_S 111 U_E 021 U_E 120 C_E 210_E 300

Complete Network Analysis Network Connections: Role Positions Jefferson High School provides a good boundary

Complete Network Analysis Network Connections: Role Positions Jefferson High School provides a good boundary for social relations Sunshine High School does not provide a good boundary for social relations

Complete Network Analysis Network Connections: Role Positions Jefferson High School Sunshine High School 4%

Complete Network Analysis Network Connections: Role Positions Jefferson High School Sunshine High School 4% 34% 43% 32% 52% 33% Image networks. Width of tie is proportional to the ratio of cell density to mean cell density.

Complete Network Analysis Network Connections: Role Positions Jefferson High School Sunshine High School

Complete Network Analysis Network Connections: Role Positions Jefferson High School Sunshine High School

Complete Network Analysis Network Connections: Role Positions Jefferson High School Sunshine High School

Complete Network Analysis Network Connections: Role Positions Jefferson High School Sunshine High School

Complete Network Analysis Network Connections: Role Positions Jefferson High School • Being in the

Complete Network Analysis Network Connections: Role Positions Jefferson High School • Being in the same block significantly increases the likelihood of being the same behavioral cluster • “Locally” defined: OR = 1. 13 • “Globally” defined: OR = 1. 12 • The effect is differential across blocks: Block: Local Global 1: Semi P Outsiders 1. 03 1. 16 *** 2: Semi P Insiders 0. 89*** 0. 83 *** 3: Periphery Outsiders 1. 76 *** 2. 08 *** 4: Periphery Insiders 1. 03 1. 09 ** 5: “Aloof” Core 1. 15 *** 0. 97 6: Popular Core 1. 01 0. 79 *** 7: 2 nd string Core 1. 08 1. 04 • Being adjacent in the network has a consistent positive effect: • Local: OR = 1. 21 • Global: OR = 1. 35 Coefficients based on a dyad-level logistic regression model. Models control for grade, gender and SES.

Complete Network Analysis Network Connections: Role Positions Sunshine High School • Being in the

Complete Network Analysis Network Connections: Role Positions Sunshine High School • Being in the same block barely increases the likelihood of being the same behavioral cluster • Locally defined: OR = 1. 03 • Globally define: OR = 1. 02 • The effect is differential across blocks: Block: Local Global 1: Receiving Periphery 1. 25 *** 1. 27** 2: Sending Periphery 0. 99 1. 01 3: Semi – Periphery 0. 93 ** 0. 89 ** 4: Lieutenants 0. 93** 0. 88 ** 5: Popular Core 0. 87** 1. 08 • Being adjacent in the network has a weaker, but still positive effect: • Local: 1. 13 • Global: 1. 08 Coefficients based on a dyad-level logistic regression model. Models control for grade, race, gender & SES.

Complete Network Analysis Network Connections: Role Positions Compound Relations One of the most powerful

Complete Network Analysis Network Connections: Role Positions Compound Relations One of the most powerful tools in role analysis involves looking at role systems through compound relations. A compound relation is formed by combining relations in single dimensions. The best example of compound relations come from kinship. Sibling 0 1 0 0 0 0 0 0 0 x Child of 0 0 1 1 0 0 0 0 0 Child of S C = SC = Nephew/Niece 0 0 1 1 0 0 0 0

Complete Network Analysis Network Connections: Role Positions An example of compound relations can be

Complete Network Analysis Network Connections: Role Positions An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as”

Complete Network Analysis Network Connections: Role Positions The newest work in block modeling comes

Complete Network Analysis Network Connections: Role Positions The newest work in block modeling comes from Doreian, Batagelj, and Ferligoj, who have proposed a system for ‘generalized block models’ Instead of having blocks composed of zeros or ones, you can specify the type of relation within and between each block as an ideal type. For example, you might specify a block as “row regular” meaning that every row in the block has at least one tie (i. e. every person in the block sends a tie to at least one person in another block). Two advances: a) conceptually they generalize the meaning of a block-block tie b) algorithmically, they make it possible to specify the tie pattern in advance, moving us from an inductive to a deductive approach. • While promising, the routine is still a little sensitive. Many of the models described are not well matched to substantive theory and many of the examples fit are quite small and don’t always reduce the data much. • But it is likely where blockmodeling will go in the future.

Complete Network Analysis Stochastic Network Analysis Confidence Intervals: Bootstraps and Jackknifes (Snijders & Borgatti,

Complete Network Analysis Stochastic Network Analysis Confidence Intervals: Bootstraps and Jackknifes (Snijders & Borgatti, 1999) Goal: “Useful to have an indication of how precise a given description is, particularly when making comparisons between groups. ” Assumes that “a researcher is interested in some descriptive statistic … and wishes to have a standard error for this descriptive statistic without making implausibly strong assumptions about how the network came about. ”

Complete Network Analysis Stochastic Network Analysis Jackknifes. Given a dataset w. N sample elements,

Complete Network Analysis Stochastic Network Analysis Jackknifes. Given a dataset w. N sample elements, N artificial datasets are created by deleting each sample element in turn from the observed dataset. In standard practice, the formula for the standard error is then:

Complete Network Analysis Stochastic Network Analysis Jackknifes: Example on regular data Obs i x

Complete Network Analysis Stochastic Network Analysis Jackknifes: Example on regular data Obs i x s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 10 1 1 0. 85 0. 85 2 2 0. 70 0. 70 3 3 1. 00 1. 00 4 4 0. 59 0. 59 5 5 0. 22 0. 22 6 6 0. 69 7 7 0. 43 0. 43 8 8 0. 32 0. 32 9 9 0. 50 0. 50 10 10 0. 67 0. 67. 11 MEAN: 0. 60 0. 57 0. 58 0. 55 0. 60 0. 64 0. 59 0. 61 0. 63 0. 61 0. 59

Complete Network Analysis Stochastic Network Analysis SEj = 0. 0753 SE = 0. 0753

Complete Network Analysis Stochastic Network Analysis SEj = 0. 0753 SE = 0. 0753

Complete Network Analysis Stochastic Network Analysis For networks, we need to adjust the scaling

Complete Network Analysis Stochastic Network Analysis For networks, we need to adjust the scaling parameter: Where Z-i is the network statistic calculated without vertex i, and Z- is the average of Z-1 … Z-N. Theoretically, this procedure will work for any network statistic Z UCINET will use it to test differences in network density.

Complete Network Analysis Stochastic Network Analysis An example based on the Trade data. Density,

Complete Network Analysis Stochastic Network Analysis An example based on the Trade data. Density, Std. Errors and confidence intervals for each matrix. DIP_DEN DIP_SEJ DIP_UB DIP_LB 0. 6684783 0. 0636125 0. 7931588 0. 5437978 CRUDE_DEN CRUDE_SEJ CRUDE_UB CRUDE_LB 0. 5561594 0. 0676669 0. 6887866 0. 4235323 FOOD_DEN FOOD_SEJ FOOD_UB FOOD_LB 0. 5561594 0. 0633776 0. 6803794 0. 4319394 MAN_DEN MAN_SEJ MAN_UB MAN_LB 0. 5615942 0. 0724143 0. 7035263 0. 4196621 MIN_DEN MIN_SEJ MIN_UB MIN_LB 0. 2445652 0. 0530224 0. 3484891 0. 1406414 In practice, I think the estimates can be pretty wide for other network statistics

Complete Network Analysis Stochastic Network Analysis In general, bootstrap techniques effectively treat the given

Complete Network Analysis Stochastic Network Analysis In general, bootstrap techniques effectively treat the given sample as the population, then draw samples, with replacement, from the observed distribution. For networks, we draw random samples of the vertices, creating a new network Y* If i(k) = i(h), then randomly fill in the dyads based from the set of all possible dyads (I. e. fill in this cell with a random draw from the population).

Complete Network Analysis Stochastic Network Analysis For each bootstrap sample: • Draw N random

Complete Network Analysis Stochastic Network Analysis For each bootstrap sample: • Draw N random numbers, with replacement, from 1 to N, denoted i(1). . i(N) • Construct Y* based on i(1). . i(N) • Calculate the statistic of interest, called Z*m, Repeat this process M (=thousands) of times.

Complete Network Analysis Stochastic Network Analysis Bootstraps: Comparing density BOOTSTRAP PAIRED SAMPLE T-TEST ----------------------------------------Density

Complete Network Analysis Stochastic Network Analysis Bootstraps: Comparing density BOOTSTRAP PAIRED SAMPLE T-TEST ----------------------------------------Density of trade_min is: 0. 2446 Density of trade_dip is: 0. 6685 Difference in density is: -0. 4239 Number of bootstrap samples: 5000 Variance of ties for trade_min: 0. 1851 Variance of ties for trade_dip: 0. 2220 Classical standard error of difference: 0. 0272 Classical t-test (indep samples): -15. 6096 Estimated bootstrap standard error for density of trade_min: 0. 0458 Estimated bootstrap standard error for density of trade_dip: 0. 0553 Bootstrap standard error of the difference (indep samples): 0. 0719 95% confidence interval for the difference (indep samples): [-0. 5648, -0. 2831] bootstrap t-statistic (indep samples): -5. 8994 Bootstrap SE for the difference (paired samples): 0. 0430 95% bootstrap CI for the difference (paired samples): [-0. 5082, -0. 3396] t-statistic: -9. 8547 Average bootstrap difference: -0. 3972 Proportion of absolute differences as large as observed: 0. 0002 Proportion of differences as large as observed: 1. 0000 Proportion of differences as large as observed: 0. 0002

Complete Network Analysis Stochastic Network Analysis In general, one can test the sensitivity of

Complete Network Analysis Stochastic Network Analysis In general, one can test the sensitivity of a particular network measure by randomly perturbing the original network. 1) Randomly add or delete a small percent of ties and recalculate Z. Do this 1000 s of times and generate a distribution for your statistic of interest. 2) Treat the ties in the network as realizations of an underlying probability distribution, and then generate networks from this probability distribution: Simple: Specify probabilities based on the current structure: Pij = P 1 if Xij = Xji = 1 Pij = P 2 if Xij = 1 or Xji = 1 Pij = P 3 if Xij = Xji = 0 This is a standard sensitivity process, so you specify P 1. . Pn as reasonable ranges, perhaps with constraints. Complex: Use a random graph model to generate the edge probabilities, and simulate from that

Complete Network Analysis Stochastic Network Analysis: Exponential Random Graph Models A long research tradition

Complete Network Analysis Stochastic Network Analysis: Exponential Random Graph Models A long research tradition in statistics and random graph theory has lead to parametric models of networks. These are models of the entire graph, though as we will see they often work on the dyads in the graph to be estimated. Substantively, the approach is to ask whether the graph in question is an element of the class of all random graphs with the given known elements. For example, all graphs with 5 nodes and 3 edges, or, put probabilistically, the probability of observing the current graph given the conditions.

Complete Network Analysis Stochastic Network Analysis The earliest approaches are based on simple random

Complete Network Analysis Stochastic Network Analysis The earliest approaches are based on simple random graph theory, but there’s been a flurry of activity in the last 10 years or so. Key references: - Holland Leinhardt (1981) JASA - Frank and Strauss (1986) JASA - Wasserman and Faust (1994) – Chap 15 & 16 - Wasserman and Pattison (1996) Thanks to Mark Handcock for sharing some figures/slides about these models.

Complete Network Analysis Stochastic Network Analysis Where: q is a vector of parameters (like

Complete Network Analysis Stochastic Network Analysis Where: q is a vector of parameters (like regression coefficients) z is a vector of network statistics, conditioning the graph k is a normalizing constant, to ensure the probabilities sum to 1.

Complete Network Analysis Stochastic Network Analysis The simplest graph is a Bernoulli random graph,

Complete Network Analysis Stochastic Network Analysis The simplest graph is a Bernoulli random graph, where each Xij is independent: Where: qij = logit[P(Xij = 1)] k(q) =P[1 + exp(ij )] Note this is one of the few cases where k(q) can be written.

Complete Network Analysis Stochastic Network Analysis Typically, we add a homogeneity condition, so that

Complete Network Analysis Stochastic Network Analysis Typically, we add a homogeneity condition, so that all isomorphic graphs are equally likely. The homogeneous bernulli graph model: Where: k(q) =[1 + exp(q)]g

Complete Network Analysis Stochastic Network Analysis If we want to condition on anything much

Complete Network Analysis Stochastic Network Analysis If we want to condition on anything much more complicated than density, the normalizing constant ends up being a problem. We need a way to express the probability of the graph that doesn’t depend on that constant. It turns out we can do this by conditioning on a ‘complement’ graph. First some terms:

Complete Network Analysis Stochastic Network Analysis After some algebra: Note that we can now

Complete Network Analysis Stochastic Network Analysis After some algebra: Note that we can now model the conditional probability of the graph, as a function of a set of difference statistics, without reference to the normalizing constant. The model, then, simply reduces to a logit model on the dyads. This is a pseudo-liklihood estimate. And is not optimal under many circumstances. In new work (2005), Wasserman suggests that the statistical inference on the parameters be viewed with caution. New methods based on MCMC are coming out, and they are much better.

Complete Network Analysis Stochastic Network Analysis Fitting p* models I highly recommend working through

Complete Network Analysis Stochastic Network Analysis Fitting p* models I highly recommend working through the p* primer examples, which can be found at: http: //kentucky. psych. uiuc. edu/pstar/index. html Including: A Practical Guide To Fitting p* Social Network Models Via Logistic Regression The site includes the PREPSTAR program for creating the difference variables of interest.

Complete Network Analysis Stochastic Network Analysis We can model this network based on parameters

Complete Network Analysis Stochastic Network Analysis We can model this network based on parameters for overall degree of Choice ( ), Differential Choice Within Positions ( W), Mutuality( ), Differential Mutuality Within Positions ( W), and Transitivity ( T). The vector of model parameters to be estimated is: = { W W T }.

Complete Network Analysis Stochastic Network Analysis The first step is to calculate the vector

Complete Network Analysis Stochastic Network Analysis The first step is to calculate the vector of change statistics. This is done by first calculating the value of the statistic if the ij tie is present, then if it is absent, then take the difference. The program PREPSTAR does this for you (see also pspar – for large networks: http: //www. sfu. ca/~richards/Pages/pspar. html) For example, the simple choice parameter is Xij, so if forced present Xij=1, if absent, Xij=0, the difference is going to be 1. Since this is true for every dyad, it is a constant, equivalent to the model intercept.

Complete Network Analysis Stochastic Network Analysis The model described above would be written in

Complete Network Analysis Stochastic Network Analysis The model described above would be written in W&P notation as: • z 1(x) = L = i, j Xij is the statistic for the Choice parameter, , • z 2(x) = LW = i, j Xij ij is the statistic for the Choice Within Positions parameter, W, • z 3(x) = M = i<j Xij Xji is the statistic for the Mutuality parameter, , • z 4(x) = MW = i<j Xij Xji ij is the statistic for the Mutuality Within Positions parameter, W, • z 5(x) = TT = i, j, k Xij Xjk Xik is the statistic for the Transitivity parameter, T. Note that the indicator variable ij=1 if actors i and j are in the same position, and 0 otherwise.

Complete Network Analysis Stochastic Network Analysis proc logistic descending ; tie = l lw

Complete Network Analysis Stochastic Network Analysis proc logistic descending ; tie = l lw m mw tt / noint; run; L = Choice LW = Within Group M = Mutuality MW = Mutual within Group TT = Transitivity Substantively, this graph is likely from the random class of graphs with similar mutuality and size

Complete Network Analysis Stochastic Network Analysis One practical problem is that the resulting values

Complete Network Analysis Stochastic Network Analysis One practical problem is that the resulting values are often quite correlated, making estimation difficult. This is particularly difficult with “star” parameters. lw m mw tt lw 1. 00000 0. 58333 0. 0007 0. 80178 <. 0001 0. 15830 0. 4034 m 0. 58333 0. 0007 1. 00000 0. 80178 <. 0001 -0. 02435 0. 8984 mw 0. 80178 <. 0001 1. 00000 -0. 11716 0. 5375 tt 0. 15830 0. 4034 -0. 02435 0. 8984 -0. 11716 0. 5375 1. 00000

Complete Network Analysis Stochastic Network Analysis Parameters that are often fit include: 1) Expansiveness

Complete Network Analysis Stochastic Network Analysis Parameters that are often fit include: 1) Expansiveness and attractiveness parameters. = dummies for each sender/receiver in the network 2) Degree distribution 3) Mutuality 4) Group membership (and all other parameters by group) 5) Transitivity / Intransitivity 6) K-in-stars, k-out-stars 7) Cyclicity

Complete Network Analysis Stochastic Network Analysis A second, perhaps more fundamental problem, is that

Complete Network Analysis Stochastic Network Analysis A second, perhaps more fundamental problem, is that many of the models themselves are impossible to fit, because they imply graphs that cannot exist in the real world. Mark Handcock (UW, Statistics) has shown that some of the simplest models predict ‘degenerate networks’ networks where everyone is connected to everyone or noone. Others have recently suggested that this is a problem of model specification, and that if you include higher-order graph statistics, the models do not fail. In either case, the implied link between a probability model of the graph and the statistical estimation of the graph makes it simple to simulate graphs from parameter estimates. This might hold the key for moving from local network data to global network estimates.

Complete Network Analysis Stochastic Network Analysis An example: Network Model Coefficients, In school Networks

Complete Network Analysis Stochastic Network Analysis An example: Network Model Coefficients, In school Networks 0. 8 0. 6 0. 4 0. 2 0 -0. 2 -0. 4 -0. 6 t gh Fi ng ki rin ge le ol D C e ok e ac m S th Bo PA G S SE R e m x bs lu Se e C e de ra G e m m m Sa Sa ty vi iti ns ity v iti s an ity oc r ip ec tra In Tr R

Complete Network Analysis Stochastic Network Analysis Other statistical / computational models for social networks:

Complete Network Analysis Stochastic Network Analysis Other statistical / computational models for social networks: 1) Actor-oriented models (Snijders). These models attempt to get to the same place as the p* models, but by specifying the “parameters” as optimization rules in an oriented micro-simulation. Very effective at dealing with real-world graphs, so long as they are not too big. The SIENA software deals with this. 2) Dynamic network models • Both the actor-oriented models and the ERGM models can use time, by including past graph features as covariates. This effectively models the change in an arc/edge over time. • Tom Snijders has developed a set of HLM-like models for dealing with networks over time.

Complete Network Analysis Stochastic Network Analysis A conceptual merge between random graph models and

Complete Network Analysis Stochastic Network Analysis A conceptual merge between random graph models and QAP models is to identify a sample of graphs from the universe you are trying to model. So, instead of estimating: generate X empirically, then compare z(x) to see how likely a measure on x would be given X. The difficulty, however, is generating X.

Complete Network Analysis Stochastic Network Analysis The first option would be to generate all

Complete Network Analysis Stochastic Network Analysis The first option would be to generate all isomorphic graphs within a given constraint. This is possible for small graphs, but the number gets large fast. For a network with 3 nodes, there are 16 possible directed graphs. For a network with 4 nodes, there are 218, for 5 nodes 9608, for 6 nodes 1, 540, 944, and so on… So, the best approach is to sample from the universe, but, of course, if you had the universe you wouldn’t need to sample from it. How do you sample from a population you haven’t observed? Use a construction algorithm that generates a random graph with known constraints.

Complete Network Analysis Stochastic Network Analysis Example: Bearman, Peter S. , James Moody and

Complete Network Analysis Stochastic Network Analysis Example: Bearman, Peter S. , James Moody and Katherine Stovel (2004) “Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks” American Journal of Sociology 110: 44: 92 Romantic Relations in Jefferson High

Complete Network Analysis Stochastic Network Analysis Simulate random networks with similar degree distribution:

Complete Network Analysis Stochastic Network Analysis Simulate random networks with similar degree distribution:

Complete Network Analysis Stochastic Network Analysis Simulated networks preserve observed degree, isolated dyad distribution,

Complete Network Analysis Stochastic Network Analysis Simulated networks preserve observed degree, isolated dyad distribution, and four-cycle constraint

Complete Network Analysis Stochastic Network Analysis Simulated networks preserve observed degree, isolated dyad distribution,

Complete Network Analysis Stochastic Network Analysis Simulated networks preserve observed degree, isolated dyad distribution, and four-cycle constraint: 4 examples from the simulated set

Social Network Software UCINET • The Standard network analysis program, runs in Windows •

Social Network Software UCINET • The Standard network analysis program, runs in Windows • Good for computing measures of network topography for single nets • Input-Output of data is a special 2 -file format, but is now able to read PAJEK files directly. • Not optimal for large networks • Available from: Analytic Technologies

Social Network Software PAJEK • Program for analyzing and plotting very large networks •

Social Network Software PAJEK • Program for analyzing and plotting very large networks • Intuitive windows interface • Used for most of the real data plots in this presentation • Started mainly a graphics program, but has expanded to a wide range of analytic capabilities • Can link to the R statistical package • Free • Available from:

Social Network Software Cyram Netminer for Windows • Newest Product, not yet widely used

Social Network Software Cyram Netminer for Windows • Newest Product, not yet widely used • Price range depends on application • Limited to smaller networks O(100) http: //www. netminer. com/Net. Miner/home_01. jsp

Social Network Software Net. Draw • Also very new, but by one of the

Social Network Software Net. Draw • Also very new, but by one of the best known names in network analysis software. • Free • Limited to smaller networks O(100)

Social Network Software NEGOPY • Program designed to identify cohesive sub-groups in a network,

Social Network Software NEGOPY • Program designed to identify cohesive sub-groups in a network, based on the relative density of ties. • DOS based program, need to have data in arc-list format • Moving the results back into an analysis program is difficult. • Available from: William D. Richards http: //www. sfu. ca/~richards/Pages/negopy. htm SPAN - Sas Programs for Analyzing Networks (Moody, ongoing) • is a collection of IML and Macro programs that allow one to: a) create network data structures from nomination data b) import/export data to/from the other network programs c) calculate measures of network pattern and composition d) analyze network models • Allows one to work with multiple, large networks • Easy to move from creating measures to analyzing data • Available by sending an email to: Moody. 77@sociology. osu. edu