Safe Researcher Training Statistical disclosure control minimising risks
- Slides: 31
Safe Researcher Training Statistical disclosure control: minimising risks in output In this section: 1. Basic theory, using simple tables as examples 2. Extending this to the research environment
Safe outputs BASIC PRINCIPLES 2
What is statistical disclosure control? • addressing residual risk in results for publication • being precautionary, but utility is important – balance with risk consistent with good research • Thought more important than advanced stats 3
SDC: our example dataset Which of these variables might • be sensitive? • help to identify someone? 4
SDC example: small counts 1 Potential problems with this table? Has fragile. X gene? no yes Total Gender female 85 58 Total 6 1 143 x 91 59 7 150 5
SDC example: small counts 2 Potential problems with this table? Has fragile. X gene? No Diabetes diagnosed no yes Total yes 114 29 143 2 x 5 7 Total 116 34 15 6
Summary: why 1 and 2 aren’t allowed Average salary of those in the box: £ 30, 000 Male, with gene Diabetes with gene Female, aged 50 -59, with gene Who knows how much he earns? Who knows how much either of these earns? Who knows how much any of these earn? 7
SDC example: small counts 3 Potential problems with this table? At least one value imputed no yes Total Gender female Total 89 58 2 1 91 59 147 3 150 8
SDC example: class disclosure Potential problems with this table? Income quartile (lowest 1, highest 4) 1 2 3 4 Total Highest qualification postgrad 1 x 8 18 28 degree 2 x 6 14 17 39 college 8 18 16 3 45 school 13 9 0 0 22 none 13 3 0 0 16 Total 37 37 38 38 9 150
SDC example: class disclosure • Which of these statements is disclosive? “all of the students aged 14+ said that they had tried cannabis at least once” “no nurse in the survey earns over £ 22. 50/hour” “no-one in Shetland earns over £ 50, 000/year” “no-one in Shetland earns over £ 5 m/year” “no-one in Shetland earns over £ 500 m/year” • empty and full (100%) cells problematic irrespective of the number of observations 10
SDC example: structural zeros Potential problems with this table? Age and education of young respondents 16 -17 18 -19 20 -23 24 -29 Total Highest qualification degree college school none 0 0 15 8 Total 0 25 18 7 23 51 33 19 12 50 64 57 41 17 115 115 93 44 179 367 11
What can we do with this table? • Suggest at least four solutions Income quartile (lowest 1, highest 4) 1 2 3 4 Total Highest qualification postgrad 1 1 8 18 28 degree 2 6 14 17 39 college 8 18 16 3 45 school 13 9 0 0 22 none 13 3 0 0 16 Total 37 37 38 38 12 150
Option 1: hide the offending results • Cell suppression - blanking offending cells Income quartile (lowest 1, highest 4) 1 2 3 4 Total Highest qualification <3 <31 26 postgrad 1 8 18 28 <3 37 degree 2 6 14 17 39 college 8 18 16 3 45 <3 <30 school 13 9 0 22 <30 none 13 3 0 16 <3 Total 34 37 36 37 38 38 13 146 150
Calculate totals afterwards! before after 14
Option 2: change the offending results • Rounding 15
Option 2: change the offending results • Make the data into something less disclosive ratios growth rates calculating proportions (but limit decimal places) etc 16
Option 3: redesign the output • Why do we recommend this? Income quartile (lowest 1, highest 4) 1 2 3 4 Total Highest qualification postgrad 1 1 8 18 28 degree 2 6 14 17 39 UG/PG degree 3 7 22 35 57 college 8 18 16 3 45 school 13 9 0 0 22 none 13 3 0 0 16 Total 37 37 38 38 17 150
Your choice… • You know what’s important Þyou decide what SDC methods to use • User support team can help 18
SDC example: dominance Potential problems with this table? Hobbiton Mean N income Bywater Mean N income Degree 9 £ 62, 384 22 £ 98, 836 £ 88, 253 College 11 £ 42, 367 29 £ 28, 323 £ 32, 185 School/ none 13 £ 16, 017 30 £ 12, 274 £ 13, 406 Overall £ 37, 446 £ 41, 531 Overall Mean income 19
Dealing with dominance • Can use same approach as frequencies redesign, suppress, round etc • but: best protection is lots of observations also deals with the problem of finding it • consistent with quality again • dominance is rare: know your data! 20
SDC example: ranks, maxima, minima Potential problems with this output? Income Age Minimum £ 8, 351 50 Maximum £ 385, 604 70 Mean £ 34, 353 60 Median £ 11, 446 59 N. obs = 150 21
SDC example: ranks, maxima, minima • Max and min not always problematic assume they are • Ranks are another form of class disclosure 22
SDC example: differencing Potential problems with these tables? • No theoretical solution • Ad-hoc solution: use higher limits 23
SDC and statistical quality • Ideally: no conflict between SDC and research • Bad for SDC: small numbers dominant observations/huge outliers very skewed distributions ÞAlso to be avoided in analysis • Be wary of analysis on a single unit 24
Safe outputs SDC AND APPLIED RESEARCH 25
Moving beyond tables… • Consider linear regression coefficients a scatter plot of regression residuals odds ratio box plots etc… • do the rules described above apply? do we need to check every statistic? Þ‘high review’ and ‘low review’ statistics 26
‘High review’ and ‘low review’ stats inherently low disclosure risk inherently high disclosure risk ‘low review’ statistics ‘high review’ statistics publish once specific values checked example: regression coefficients example: tables 27
LRS versus HRS low review high review 28
Classification and clearance is this output high or low review? low review high review Administrative checks is this specific output okay to release? try again yes no Send back to researcher to apply SDC release for publication 29
Defining LRSs and HRSs • Are your statistics low or high review? • LRS means that you don’t need to know about the data • some LRSs might have conditions Low review High review Research results more likely to be LRS 30
SDC and practical research • Research outputs have low statistical risk precautionary because we can be apply at the point of release/publication • SDC aligned with statistical value complex outputs, simple outputs with many obs all low risk be careful output highlighting eg rare events • be careful with class disclosure 31
- A medical school claims that more than 28
- Army unauthorized disclosure training
- Safe feed safe food
- Safe people safe places
- Statistical process control ppt
- Operations management quality control
- Introduction to statistical quality control montgomery
- Statistical control in research
- What is sqc in operations management
- Introduction to statistical quality control
- Statistical inventory control
- Airline ticket
- Gauginf
- Statistical process control tutorial
- Mssc introduction to spc
- Statistical process control
- Project control chart
- Controle spc
- Statistical process control
- Correlational research advantages and disadvantages
- Probability vs non probability sampling
- Contrived and non contrived setting
- Qualities of a good researcher slideshare
- Think like a researcher
- Ano ang mga katangian ng isang mahusay na pananaliksik
- American researcher who involved in getting heart rate
- Effective researcher
- Thinking like a researcher
- Minimal moderate and excessive interference
- A researcher claims that the average wind speed
- Research gate
- Active researcher