Introduction to Statistical Computing in Clinical Research Biostatistics
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1
Today. . . • Course overview – Course objectives – Course details: grading, homework, etc – Schedule, lecture overview • • Where does Stata fit in? Basic data analysis with Stata demos Lab
Course Objectives • Introduce you to using STATA and Excel for – Data management – Basic statistical and epidemiologic analysis – Turning raw data into presentables, figures and other research products • Prepare you for Fall courses • Start analyzing your own data
Course details Introduction to Statistical Computing - 1 unit Schedule – 7 lectures, 7 lab sessions, on 7 Tuesdays in a row Dates: August 4 – September 15 Lectures 1: 15 -2: 45 Labs 3: 00 -4: 00 All in China Basin, CBL 6702 (6704 for lab) Final Project Due 9/22/09
Course details Introduction to Statistical Computing Grading: Satisfactory/Unsatisfactory Requirements: -Hand in all six Labs (even if late) -Satisfactory Final Project -80% of total points Reading: Optional
Course details, cont Course Director Mark Pletcher Teaching Assistants Justin Parekh – Section 1 Elena Flowers – Section 2 (Mac) Tamara Castillo Maurice Garcia Lecturers Andy Choi Jennifer Cocohoba Lab Instructor Mandana Khalili
Overview of lecture topics • • 1 - Introduction to STATA 2 - Do files, log files, and workflow in STATA 3 - Generating variables and manipulating data with STATA 4 - Using Excel 5 - Basic epidemiologic analysis with STATA 6 - Making a figure with STATA 7 - Organizing a project, making a table
Overview of labs • Lab 1 – Load a dataset and analyze it • Lab 2 – Learn how to use do and log files • Lab 3* – Import data from excel, generate new variables and manipulate data, document everything with do and log files. • Lab 4 – Using and creating Excel spreadsheets • Lab 5* – Epidemiologic analysis using Stata • Lab 6 – Making a figure with Stata Last lab session will be dedicated to working on the Final Project * - Labs 3 and 5 are significantly longer and harder than the others
Overview of labs, cont • Official Lab time is 3: 00 -4: 00, but we will start right after lecture, and you can leave when you are done.
Overview of labs, cont • Labs are due the following week prior to lecture. Labs turned in late (less than 1 week) will receive only half credit; after that, no points will be awarded. However, ALL labs must be turned in to pass the class (even if no points are awarded). • Lab 1 is paper • Labs 2 -6 are electronic files, and should be emailed to your section leader’s course email address: biostat 212_section 1@yahoo. com (Justin) or biostat 212_section 2@yahoo. com (Elena)
Final Project • Create a Table and a Figure using your own data, document analysis using Stata. • Due 1 week after last lab session, 20 points docked for each 1 day late.
Course Materials • • • Course Overview Final Project Lectures and Labs (just in time) Other handouts Books
Getting started with STATA Session 1
Types of software packages used in clinical research • • Statistical analysis packages Spreadsheets Database programs Custom applications – Cost-effectiveness analysis (Tree. Age, etc) – Survey analysis (SUDAAN, etc)
Software packages for analyzing data • • • STATA SAS S-plus, and R SPS-S SUDAAN Epi-Info JMP Mat. Lab Stat. Exact
Why use STATA? • • Quick start, user friendly Immediate results, response You can look at the data Menu-driven option Good graphics Log and do files Good manuals, help menu
Why NOT use STATA? • • • SAS is used more often? SAS does some things STATA does not Programming easier with S-plus and R? R is free Complicated data structure and manipulation easier with SAS? • Epi-info (free) is even easier than STATA?
STATA – Basic functionality • Holds data for you – Stata holds 1 “flat” file dataset only (. dta file) • Listens to what you want – Type a command, press enter • Does stuff – Statistics, data manipulation, etc • Shows you the results – Results window
Demo #1 • • Open the program Load some data Look at it Run a command
STATA - Windows • Two basic windows – Command – Results • Optional windows – Variable list – History of commands • Other functions – Data browser/editor – Do file editor – Viewer (for log, help files, etc)
STATA - Buttons • • • The usual – open, save, print Log-file open/suspend/close Do-file editor Browse and Edit Break
STATA - Menus • Almost every command can be accessed via menu
Demo #2 • Enter in some data • Look at it • Run a couple of commands
Menu vs. Command line • Menu advantages – Look for commands you don’t know about – See the options for each command – Complex commands easier – learn syntax • Command line advantages – Faster (if you know the command!) – “Closer” to the program – Only way to write “do” files • Document and repeat analyses
STATA commands Describing your data • describe [varlist] – Displays variable names, types, labels • list [varlist] – Displays the values of all observations • codebook [varlist] – Displays labels and codes for all variables
STATA commands Descriptive statistics – continuous data • summarize [varlist] [, detail] – # obs, mean, SD, range – “, detail” gets you more detail (median, etc) • ci [varlist] – Mean, standard error of mean, and confidence intervals – Actually works for dichotomous variables, too.
STATA commands Graphical exploration – continuous data • histogram varname – Simple histogram of your variable • graph box varlist – Box plot of your variable • qnorm varname – Quantile plot of your variable to check normality
STATA commands Descriptive statistics – categorical data • tabulate [varname] – Counts and percentages – (see also, table - this is very different!)
STATA commands Analytic statistics – 2 categorical variables
STATA commands Analytic statistics – 2 categorical variables • tabulate [var 1] [var 2] – “Cross-tab” – Descriptive options , row , col (row percentages) (column percentages) – Statistics options , chi 2 , exact (chi 2 test) (fisher’s exact test)
Getting help • Try to find the command on the pull-down menus • Help menu – If you don’t know the command - Search. . . – If you know the command - Stata command. . . • Try the manuals – more detail, theoretical underpinnings, etc
STATA commands Analytic statistics – 1 categorical, 1 continuous
STATA commands Analytic statistics – 1 categorical, 1 continuous • bysort catvar: summarize [contvar] – mean, SD, range of one in subgroup • ttest [contvar], by(catvar) – t-test • oneway [contvar] [catvar] – ANOVA • table [catvar] [, contents(mean [contvar]…) – Table of statistics
STATA commands Analytic statistics – 2 continuous
STATA commands Analytic statistics – 2 continuous • scatter [var 1] [var 2] – Scatterplot of the two variables • pwcorr [varlist] [, sig] – Pairwise correlations between variables – “sig” option gives p-values • spearman [varlist] [, stats(rho p)]
Demo #3 • • Load a STATA dataset Explore the data Describe the data Answer some simple research questions – Gender and HTN, age and HTN
In Lab Today… • Familiarize yourself with Stata • Load a dataset • Use Stata commands to analyze data and fill in the blanks
Next week • Do files, log files, and workflow in Stata • Find a dataset!
Website addresses • Course website – http: //www. epibiostat. ucsf. edu/courses/schedule/biostat 212. html • Computing information – http: //www. epibiostat. ucsf. edu/courses/China. Basin. Location. html# computing • Download RDP for Macs (for Stata 10 Server) – http: //www. microsoft. com/mac/otherproducts. aspx? pid=remotedesktopclient • Citrix Web Server – http: //apps. epi-ucsf. org/
- Slides: 39