Differentially Private Analysis of Graphs and Social Networks

  • Slides: 27
Download presentation
Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University 1

Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University 1

Graphs and networks Many types of data can be represented as graphs, where nodes

Graphs and networks Many types of data can be represented as graphs, where nodes represent individuals and edges capture relationships. Image source: Nykamp DQ, “An introduction to networks. ” From Math Insight. http: //mathinsight. org/network_introduction. 2

Potentially sensitive information in graphs • • • Social, romantic and sexual relationships “Friendships”

Potentially sensitive information in graphs • • • Social, romantic and sexual relationships “Friendships” in an online social network Financial transactions Phone calls and email communication Doctor-patient relationships Source: Christakis, Fowler. The Spread of Obesity in a Large Social Network over 32 Years. N Engl J Med 2007; 357: 370 -379 Source: B. Aven. The effects of corruption on organizational networks and individual behavior. MIT workshop: Information and Decision in Social Networks, 2011. 3

Two conflicting goals • Privacy: protecting information of individuals. • Utility: drawing accurate conclusions

Two conflicting goals • Privacy: protecting information of individuals. • Utility: drawing accurate conclusions about aggregate information. Privacy Utility 4

``Anonymized’’ graphs still pose privacy risk • False dichotomy: personally identifying vs. non-personally identifying

``Anonymized’’ graphs still pose privacy risk • False dichotomy: personally identifying vs. non-personally identifying information. • Links and any other information about individual can be used for de-anonymization. In a typical real-life network, many nodes have unique neighborhoods. Bearman, Moody, Stovel. Chains of affection: The structure of adolescent romantic and sexual networks, American J. Sociology, 2008 5

Some published de-anonymization attacks – Movie ratings [Narayanan, Shmatikov 08] De-identified Netflix users based

Some published de-anonymization attacks – Movie ratings [Narayanan, Shmatikov 08] De-identified Netflix users based on information from a public movie database IMDb. – Social networks [Backstrom, Dwork, Kleinberg 07; Movies People Narayanan, Shmatikov 09; Narayanan, Shi, Rubinstein 12] Re-identified users in an online social network (anonymized Twitter) based information from a public online social network (Flickr). – Computer networks [Coull, Wright, Monrose, Collins, Reiter 07; Ribeiro, Chen, Miklau, Townsley 08, …] Can reidentify individuals based on external sources. 6

Who’d want to de-anonymize a social network graph? • • • Government agency for

Who’d want to de-anonymize a social network graph? • • • Government agency for surveillance. A phisher/spammer to write a personalized message. Health insurance provider to check preexisting conditions. Marketers to focus advertising on influential nodes. Stalkers, nosy neighbors, colleagues, or employers. image sources: Andrew Joyner, http: //dukeromkey. com/ 7

What information can be released without violating privacy? 8

What information can be released without violating privacy? 8

Differential privacy (for graph data) Graph G Data processing Data release Algorithm output •

Differential privacy (for graph data) Graph G Data processing Data release Algorithm output • image source http: //www. queticointernetmarketing. com/new-amazing-facebook-photo-mapper/ 9

Two variants of differential privacy for graphs • Edge differential privacy G: Two graphs

Two variants of differential privacy for graphs • Edge differential privacy G: Two graphs are neighbors if they differ in one edge. • Node differential privacy G: Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges. 10

Differential privacy (for graph data) Graph G Data processing Data release Algorithm output •

Differential privacy (for graph data) Graph G Data processing Data release Algorithm output • image source http: //www. queticointernetmarketing. com/new-amazing-facebook-photo-mapper/ 11

Some useful properties of differential privacy • 12

Some useful properties of differential privacy • 12

Is differential privacy too strong? • No weaker notion has been proposed that satisfies

Is differential privacy too strong? • No weaker notion has been proposed that satisfies all three useful properties. • We can actually attain it for many useful statistics! 13

What graph statistics can be computed accurately with differential privacy? 14

What graph statistics can be computed accurately with differential privacy? 14

Graph statistics … • … Fraction of nodes of degree d … … Degree

Graph statistics … • … Fraction of nodes of degree d … … Degree d The degree of a node is the number of connections it has. 15

Tools used in differentially private graph algorithms • Smooth sensitivity – A more nuanced

Tools used in differentially private graph algorithms • Smooth sensitivity – A more nuanced notion of sensitivity than the one mentioned in the previous talk • • • Sample and aggregate Maximum flow Linear and convex programming Random projections Iterative updates Postprocessing 16

Differentially private graph analysis A taste of techniques 17

Differentially private graph analysis A taste of techniques 17

Basic question: how to compute a statistic f Graph G Data processing Algorithm Data

Basic question: how to compute a statistic f Graph G Data processing Algorithm Data release image source http: //www. queticointernetmarketing. com/new-amazing-facebook-photo-mapper/ 18

Challenge for node privacy: high sensitivity • 19

Challenge for node privacy: high sensitivity • 19

Challenge for node privacy: high sensitivity • 20

Challenge for node privacy: high sensitivity • 20

Idea: project onto graphs with low sensitivity. [Kasiviswanathan Nissim Raskhodnikova Smith 13] See also

Idea: project onto graphs with low sensitivity. [Kasiviswanathan Nissim Raskhodnikova Smith 13] See also [Blocki Blum Datta Sheffet 13, Chen Zhou 13] 21

“Projections” on graphs of small degree • All graphs 22

“Projections” on graphs of small degree • All graphs 22

Lipschitz extensions • All graphs 23

Lipschitz extensions • All graphs 23

Summary • Accurate subgraph counts for realistic graphs can be computed by node-private algorithms

Summary • Accurate subgraph counts for realistic graphs can be computed by node-private algorithms – Use Lipschitz extensions and linear programming – It is one example of many graph statistics that node-private algorithms do well on. 24

What can’t be computed differentially privately? • Differential privacy explicitly excludes the possibility of

What can’t be computed differentially privately? • Differential privacy explicitly excludes the possibility of computing anything that depends on one person’s data: – Is there a node in the graph that has atypical connections? – ``suspicious communication patterns’’? 25

What we are working on • Node differentially private algorithms for releasing – a

What we are working on • Node differentially private algorithms for releasing – a large number of graph statistics at once – synthetic graphs • Exciting area of research: – Edge-private algorithms [Nissim, Raskhodnikova, Smith 07; Hay, Rastogi, Miklau, Suciu 09; Hay, Li, Miklau, Jensen 09; Hardt, Rothblum 10; Karwa, Raskhodnikova, Smith, Yaroslavtsev 11; Karwa, Slavkovic 12; Blocki, Blum, Datta, Sheffet 12; Gupta, Roth, Ullman 12; Mir, Wright 12; Kifer, Lin 13, …] – Node-private algorithms [Gehrke Lui Pass 12; Blocki Blum Datta Sheffet 13, Kasiviswanathan Nissim Raskhodnikova Smith 13, Chen Zhou 13, Raskhodnikova Smith, . . ] 26

Conclusions • We are close to having edge-private and node-private algorithms that work well

Conclusions • We are close to having edge-private and node-private algorithms that work well in practice for many basic graph statistics. • Accurate node-private algorithms were thought to be impossible only a few years ago. • Differential privacy is influencing other scientific disciplines – Next talk: reducing false discovery rate. 27