CQe S S EScience and Statistical Modelling in

  • Slides: 50
Download presentation
CQe. S S E-Science and Statistical Modelling in Social Research Daniel Grose Audrienne Cutajar

CQe. S S E-Science and Statistical Modelling in Social Research Daniel Grose Audrienne Cutajar Bezzina CQe. SS University of Lancaster 1

Contents CQe. S S • • • Some Background on Statistical Methods and Social

Contents CQe. S S • • • Some Background on Statistical Methods and Social Research; Disentangling Complexity: Educational attainment, truancy and PT work (NCDS) Re. DRe. SS RELOAD and Copper. Core Demo of SAKAI Questions. 2

CQe. S S Some Background on Statistical Methods and Social Research 3

CQe. S S Some Background on Statistical Methods and Social Research 3

Objectives of Social Science Research CQe. S S • To develop evidence based substantive

Objectives of Social Science Research CQe. S S • To develop evidence based substantive theory. We want to know “what determines what”, e. g. the (wage) returns to education; • To explore the consequences of policy changes on individual behaviour, e. g. the impact of increasing the staying on rate at school on educational attainment & wages; 4

Objectives of Social Science Research CQe. S S • Randomised experiments offer the most

Objectives of Social Science Research CQe. S S • Randomised experiments offer the most powerful tool to meet these objectives, but outside of psychology, they are infeasible, unethical or flawed (e. g. for instance we can not allocate pupils to different levels of education); • Social scientists must therefore rely on observational data from longitudinal and other surveys e. g. YCS, NCDS, BHPS, this raises complications. 5

CQe. S S Complication 1. Cluster Effects (CE) • Most large scale surveys use

CQe. S S Complication 1. Cluster Effects (CE) • Most large scale surveys use multi-stage sample designs to obtain 'representative' samples; this procedure often creates cluster effects, e. g. BHPS (households), YCS (schools); • Pupils in the same class are often more behaviourally alike that pupils in different classes (even in the same school) some non nested cluster structures can also be present e. g. siblings (children of the same family) at different schools; 6

CQe. S S Complication 1. Cluster Effects (CE) • Procedures have been developed to

CQe. S S Complication 1. Cluster Effects (CE) • Procedures have been developed to take cluster effects into account by means of shared random effects in the model - MLwi. N, Stata (Gllamm www. gllamm. org/ ); • The estimation of non-identity link and non nested CE models, e. g. probit, can be computationally demanding; 7

Complication 2. Measurement Errors (ME) CQe. S S • Ignoring ME can seriously mislead

Complication 2. Measurement Errors (ME) CQe. S S • Ignoring ME can seriously mislead the quantification of the link between explanatory and response variables; • In observational studies, it is rarely possible to measure all relevant covariates accurately, e. g. age, educational attainment; • ME in one covariate can bias the association between other covariates and the response variable, even if those other covariates are measured without error; 8

Complication 2. Measurement Errors (ME) CQe. S S • Also some important determinants of

Complication 2. Measurement Errors (ME) CQe. S S • Also some important determinants of behaviour are either not measured (i. e. omitted) or are unmeasurable (e. g. motivation); • Repeated measures and longitudinal data provide the opportunity to deal with ME in explanatory variables, this adds to the computational demands of the analysis. 9

CQe. S S Complication 3. Missing Data, Dropout and Selection • All of the

CQe. S S Complication 3. Missing Data, Dropout and Selection • All of the major data sets available to the British social science community, (e. g. YCS, BHPS and NCDS), contain missing data and dropout; • This creates bias in the data; • We need to model, as realistically as possible, the process by which the observed subjects have been retained in the sample, otherwise we will not know how much bias is present in our results; • Some sample designs create selection effects, e. g. by using a subset of locations, or oversampling the poor; • These add to the computational demands of the analysis. 10

CQe. S S Complication 4. Parametric Assumptions • Our statistical tools are assumption rich:

CQe. S S Complication 4. Parametric Assumptions • Our statistical tools are assumption rich: – Parametric linear predictors, – Parametric link functions and error structures; • What if the assumed parametric relationships do not hold, (no gaussian errors? ) • We need more robust alternatives; • BUT - Nonparametric statistical models are usually computationally intensive. 11

CQe. S S Complication 5. Endogenous effects • The curse of endogenous effects, everything

CQe. S S Complication 5. Endogenous effects • The curse of endogenous effects, everything seems to depend on everything else; • We need multiprocess models (simultaneous equations) to disentangle this complexity, adds to computation; 12

CQe. S S Disentangling complexity with existing tools: an example • These are the

CQe. S S Disentangling complexity with existing tools: an example • These are the kind of examples that got me interested in e. Science. • As we start to more fully acknowledge the stochastic complexity of social processes our results will change. 13

CQe. S S Example 1: Allowing for Cluster effects • Stata, e. g. dprobit

CQe. S S Example 1: Allowing for Cluster effects • Stata, e. g. dprobit with the cluster option (http: //www. stata. com/help. cgi? dprobit) • Ml. Win (http: //multilevel. ioe. ac. uk/index. html) • AMl, SAS • What happens if we have more than one response, training and promotion? Standard software can’t do it. • What happens if we have previous outcomes in the model? standard software can’t do it. 14

CQe. S S Example 2: Allowing for Endogenous effects • Simultaneous equation systems •

CQe. S S Example 2: Allowing for Endogenous effects • Simultaneous equation systems • Commands in Stata • Commands in Aml 15

Some existing web based tools CQe. S S • Nesstar allows 66 major datasets

Some existing web based tools CQe. S S • Nesstar allows 66 major datasets to be explored online(http: //www. nesstar. com/); • Only uses one data set at a time; • Has very limited facilities for sub-setting and none for fusing; • Restricted statistical facilities, e. g. descriptive analysis, linear regression; • No facilities for handling missing data. 16

Joining Up the Analysis Cycle Main ESDS Data Sets CQe. S S TTWA Data,

Joining Up the Analysis Cycle Main ESDS Data Sets CQe. S S TTWA Data, NOMIS Select Data Set and Appropriate Variables: Merge Files: Add Variables Contextual Data Working Data Results 17

CQe. S S Portals make all our e-tools easier to use • Portals provide

CQe. S S Portals make all our e-tools easier to use • Portals provide a framework to deploy our e-tools (aka rectangles), they focus on how the user wants to arrange these “rectangles”; • The portal allows component integration, the goal is for the tools to work together closely and seem to really be parts of a larger “tool”; 18

SAKAI Provides our VRE Portal CQe. S S Sakai = Collaboration & Research/Learning Environment

SAKAI Provides our VRE Portal CQe. S S Sakai = Collaboration & Research/Learning Environment Res 1 Res 2 Discussion, Video Conf and VOIP Res 4 Res 3 GE Resource Discovery E-Collaboration Portlets Portal GE Statistical Analysis Res 5 Res 6 GE DBMS Quantitative Methods Portlets 19

Sakai CQe. S S • Sakai is open source, it’s the hosting framework of

Sakai CQe. S S • Sakai is open source, it’s the hosting framework of choice for VLE and VRE (OGCE) development in the US; • Big investment from Mellon Foundation and Ivy League Universities ($6. 8 M); • Sakai 2. 0 (release 10 th June 05) will take WSRP compliant portlets. • http: //redress. lancs. ac. uk: 8080/portal 20

Using WSRP and to Federate across sites and provide extreme user flexibility in presentation

Using WSRP and to Federate across sites and provide extreme user flexibility in presentation Portal WSRP CQe. S S Non-Sakai Tool tool Non-Sakai Non-Java Tools WSRP P WSRP HTTP tool Sakai tool HTTP tool Sakai 21

CQe. S S LDCue for Structuring Content • LDCue integrates content created by most

CQe. S S LDCue for Structuring Content • LDCue integrates content created by most standard authoring systems (incl. video) that is visible on the web; • A resource discoverer will be able to specify where am I now and where I want to be, then the are supplied, by the LDCue tool, with a list of potentially suitable learning object URIs; • The metadata on these URIs are then used to create learning designs that sequence material (read this first, then this, etc ). 22

CQe. S S Reload & Copper. Core • Just like a musician, Reload is

CQe. S S Reload & Copper. Core • Just like a musician, Reload is used to compose the structure for the learning design. • The learner is the deejay who plays back the learning design created in Reload. 23

CQe. S S Reload & Copper. Core (cont) • Copper. Core is the medium

CQe. S S Reload & Copper. Core (cont) • Copper. Core is the medium used to play back the learning design created in Reload. • Copper. Core gives a structure to the learning modules, and keeps track of what has been covered by the learner. 24

CQe. S S Reload Structure • The IMS Learning Design package within Reload is

CQe. S S Reload Structure • The IMS Learning Design package within Reload is made up of the following tabs: – General – Roles – Environment – Activities – Methods – Resources 25

CQe. S S General 26

CQe. S S General 26

CQe. S S General (cont) • This contains the top level information for the

CQe. S S General (cont) • This contains the top level information for the IMS Learning Design. • The most important to fill in are the objectives, requirements, description and overview. 27

CQe. S S Roles 28

CQe. S S Roles 28

CQe. S S Roles (cont) • This tab allows the user to choose input

CQe. S S Roles (cont) • This tab allows the user to choose input learner and staff, both with different characteristics. Various information can be added, such as minimum and maximum size of group. 29

CQe. S S Environment • This describes the environments in which the learning occurs.

CQe. S S Environment • This describes the environments in which the learning occurs. 30

CQe. S S Activities 31

CQe. S S Activities 31

Activities (cont) CQe. S S • In here, the designer can group activities together

Activities (cont) CQe. S S • In here, the designer can group activities together from a selection of resources. The activities can be presented as selections or in sequence. 32

CQe. S S Method 33

CQe. S S Method 33

CQe. S S Method (Cont) • Learning designs consist of one or more plays,

CQe. S S Method (Cont) • Learning designs consist of one or more plays, each with one or more acts following sequentially. • The roles need to be specified here as well. 34

CQe. S S Resources • The Resources tab allows the user to manage the

CQe. S S Resources • The Resources tab allows the user to manage the resources needed by the Learning Design. 35

Validation CQe. S S • When the learning design has been saved, create a

Validation CQe. S S • When the learning design has been saved, create a zip file and upload this into Coppercore. 36

Uploading CQe. S S • If there are no errors, this is what you

Uploading CQe. S S • If there are no errors, this is what you get. 37

CQe. S S Running Copper. Core 38

CQe. S S Running Copper. Core 38

CQe. S S Running Copper. Core • After entering the student names, and setting

CQe. S S Running Copper. Core • After entering the student names, and setting runs and roles, this is what happens. 39

CQe. S S Copper. Core (cont) • Clicking on the run will open a

CQe. S S Copper. Core (cont) • Clicking on the run will open a new web browser. 40

CQe. S S RELOAD Project • Creates a learning design in Reload –use Coppercore

CQe. S S RELOAD Project • Creates a learning design in Reload –use Coppercore http: //coppercore. org/ to play it back. 41

42 CQe. S S

42 CQe. S S

CQe. S S Advantages of LDCue over a search engine on the web •

CQe. S S Advantages of LDCue over a search engine on the web • Search engines do not sequence material by difficulty/complexity; • With Learning Design you get semantically coherent content; • Search Engines (e. g. Google) typically gives associative learning, which can be inefficient, especially when you get a lot of hits; 43

CQe. S S Some of the VRE Tools we have written E-Collaboration • Distributed

CQe. S S Some of the VRE Tools we have written E-Collaboration • Distributed Whiteboard; • Voice and Video over IP; • Broadcast Display (e. g. word and ppt). E-Discovery • LDCue for Structuring Content. 44

Re. DRe. SS CQe. S S • Re. DRess is a joint project between

Re. DRe. SS CQe. S S • Re. DRess is a joint project between Lancaster University and CCLRC Daresbury. • It is a training and awareness project in e. Science and e. Social Science. • We are commissioning social scientists to write material for our portal • http: //redress. lancs. ac. uk 45

CQe. S S Content Jan-May 2005 Re. DRESS NCe. SS Conference paper Other 46

CQe. S S Content Jan-May 2005 Re. DRESS NCe. SS Conference paper Other 46

CQe. S S Content Jan-May 2005 (cont) Finished NCe. SS Other NCe. SS/ Re.

CQe. S S Content Jan-May 2005 (cont) Finished NCe. SS Other NCe. SS/ Re. DRe. SS 47

CQe. S S Content May – Aug 2005 Re. DRESS NCe. SS Conference Paper

CQe. S S Content May – Aug 2005 Re. DRESS NCe. SS Conference Paper Re. DRe. SS/ NCe. SS 48

CQe. S S Content May–Aug 2005 (cont) Re. DRe. SS NCe. SS Other Re.

CQe. S S Content May–Aug 2005 (cont) Re. DRe. SS NCe. SS Other Re. DRe. SS/ NCe. SS 49

CQe. S S Any Questions ? 50

CQe. S S Any Questions ? 50