NYC Syndromic Surveillance Transition from SAS to R

  • Slides: 62
Download presentation
NYC Syndromic Surveillance Transition from SAS to R Robert Mathes Director, Syndromic Surveillance New

NYC Syndromic Surveillance Transition from SAS to R Robert Mathes Director, Syndromic Surveillance New York City Department of Health and Mental Hygiene September 20, 2021 1

A quick overview of the system Why R? (when SAS works just fine) Testing

A quick overview of the system Why R? (when SAS works just fine) Testing The packages and why Walk-through of dashboard September 20, 2021 2

A few comments on syndromic surveillance in NYC • Five data streams in system:

A few comments on syndromic surveillance in NYC • Five data streams in system: - EMS (1999) ED (2001) Pharmacy (2002) – includes over-the-counter and prescription sales School nurse visits (2008) Urgent Care (2018) • Data is reported directly to NYC DOHMH • Home-grown system September 20, 2021 3

A few comments on syndromic surveillance in NYC • Five staff (MPH and Ph.

A few comments on syndromic surveillance in NYC • Five staff (MPH and Ph. D level) • All processes run in SAS until 2018 September 20, 2021 4

Why SAS All systems were originally coded in SAS We knew SAS It worked

Why SAS All systems were originally coded in SAS We knew SAS It worked September 20, 2021 5

Over the years… Many analysts made many contributions Programs turned into Frankenstein Result was

Over the years… Many analysts made many contributions Programs turned into Frankenstein Result was code that was difficult to understand tricky to modify September 20, 2021 6

However… September 20, 2021 7

However… September 20, 2021 7

If it ain’t broke, don’t fix it September 20, 2021 8

If it ain’t broke, don’t fix it September 20, 2021 8

A few things gave us a nudge to R Seeing the really nice visualization

A few things gave us a nudge to R Seeing the really nice visualization packages developed for R Move from desktop SAS to server (to save $$) SAS server couldn’t run cluster detection software (Sa. TScan) Buy in from team September 20, 2021 9

Initial concerns Can we improve our current system? Can analysts learn R? September 20,

Initial concerns Can we improve our current system? Can analysts learn R? September 20, 2021 10

SAS Dashboard September 20, 2021 11

SAS Dashboard September 20, 2021 11

Group took intro R classes offered at DOHMH But most helpful was to code

Group took intro R classes offered at DOHMH But most helpful was to code (and learn as we go along) We used stackoverflow…a lot September 20, 2021 12

First, let’s see what these graphs look like with our data September 20, 2021

First, let’s see what these graphs look like with our data September 20, 2021 13

Read in SAS datasets into R (using haven package) September 20, 2021 14

Read in SAS datasets into R (using haven package) September 20, 2021 14

Pick a couple of packages to try (dygraphs, plotly) September 20, 2021 15

Pick a couple of packages to try (dygraphs, plotly) September 20, 2021 15

Now let’s compare SAS R vs September 20, 2021 16

Now let’s compare SAS R vs September 20, 2021 16

R graph is clear winner September 20, 2021 17

R graph is clear winner September 20, 2021 17

Why? - Additional data with mouseover - Adjustable x-axis - Dynamic legend - And…it

Why? - Additional data with mouseover - Adjustable x-axis - Dynamic legend - And…it looks better September 20, 2021 18

Now let’s put together a simple R dashboard September 20, 2021 19

Now let’s put together a simple R dashboard September 20, 2021 19

…using flexdashboard September 20, 2021 20

…using flexdashboard September 20, 2021 20

This is a great improvement! September 20, 2021 21

This is a great improvement! September 20, 2021 21

Things to consider How to process and store data What packages to use Simplicity

Things to consider How to process and store data What packages to use Simplicity vs efficiency Integration of several analyses Automation September 20, 2021 22

How to write data was one of our first questions September 20, 2021 23

How to write data was one of our first questions September 20, 2021 23

With SAS, we saved data in large SAS data sets September 20, 2021 24

With SAS, we saved data in large SAS data sets September 20, 2021 24

Processing times could be long September 20, 2021 25

Processing times could be long September 20, 2021 25

We would archive the data ~4 months September 20, 2021 26

We would archive the data ~4 months September 20, 2021 26

With R, could we reduce processing times and use a format that is easily

With R, could we reduce processing times and use a format that is easily accessible? September 20, 2021 27

Saving and reading data Faster to save as individual files September 20, 2021 28

Saving and reading data Faster to save as individual files September 20, 2021 28

Saving and reading data September 20, 2021 29

Saving and reading data September 20, 2021 29

Saving and reading data Tab delimited text file September 20, 2021 30

Saving and reading data Tab delimited text file September 20, 2021 30

Data manipulation September 20, 2021 31

Data manipulation September 20, 2021 31

We try to use base R as much as we can September 20, 2021

We try to use base R as much as we can September 20, 2021 32

Most often we use dplyr - concatenate long triage notes - bind multiple data

Most often we use dplyr - concatenate long triage notes - bind multiple data frames (if not all columns are same) - change case of text for lots of variables September 20, 2021 33

Lubridate - makes working with dates easier September 20, 2021 34

Lubridate - makes working with dates easier September 20, 2021 34

Sqldf - create count tables September 20, 2021 35

Sqldf - create count tables September 20, 2021 35

Stringr - extract matching patterns from a text string September 20, 2021 36

Stringr - extract matching patterns from a text string September 20, 2021 36

Analysis September 20, 2021 37

Analysis September 20, 2021 37

rsatscan - temporal and spatio-temporal analysis September 20, 2021 38

rsatscan - temporal and spatio-temporal analysis September 20, 2021 38

Graphing September 20, 2021 39

Graphing September 20, 2021 39

We settled on highcharts - clean interface - lots of options to customize through

We settled on highcharts - clean interface - lots of options to customize through api - requires a license September 20, 2021 40

Also use dygraphs - clean interface - easy to add upper/lower bars September 20,

Also use dygraphs - clean interface - easy to add upper/lower bars September 20, 2021 41

Mapping September 20, 2021 42

Mapping September 20, 2021 42

Leaflet is popular and we liked it - ability to make choropleths/add hospital markers

Leaflet is popular and we liked it - ability to make choropleths/add hospital markers - mouseovers (can include additional information) - nice selection of basemaps September 20, 2021 43

Tables September 20, 2021 44

Tables September 20, 2021 44

We liked DT - can sort column values - can search for terms/values -

We liked DT - can sort column values - can search for terms/values - can export table We also use kable. Extra - simple and clean table format - easy to code September 20, 2021 45

Pulling it all together September 20, 2021 46

Pulling it all together September 20, 2021 46

flexdashboard - in Rmarkdown - can include multiple graphs/tables in same report - seemed

flexdashboard - in Rmarkdown - can include multiple graphs/tables in same report - seemed simpler than Shiny - can distribute html via email September 20, 2021 47

Landing page was a challenge September 20, 2021 48

Landing page was a challenge September 20, 2021 48

Includes: data summary, signal history, syndrome menus September 20, 2021 49

Includes: data summary, signal history, syndrome menus September 20, 2021 49

How to account for varying number of signals? September 20, 2021 50

How to account for varying number of signals? September 20, 2021 50

September 20, 2021 51

September 20, 2021 51

We need to dynamically create the R Markdown file for each analysis run September

We need to dynamically create the R Markdown file for each analysis run September 20, 2021 52

September 20, 2021 53

September 20, 2021 53

September 20, 2021 54

September 20, 2021 54

ED is run hourly Others are run daily September 20, 2021 55

ED is run hourly Others are run daily September 20, 2021 55

In summary… September 20, 2021 56

In summary… September 20, 2021 56

Goal was to keep code simple Use base R as much as possible September

Goal was to keep code simple Use base R as much as possible September 20, 2021 57

Packages Data management Analysis Visualization dplyr rsatscan flexdashboard lubridate leaflet sqldf dygraphs stringr dt

Packages Data management Analysis Visualization dplyr rsatscan flexdashboard lubridate leaflet sqldf dygraphs stringr dt highcharter* kable. Extra *There is a license fee for Highcharter September 20, 2021 58

Cost SAS (desktop): ~ $1000/license SAS (server): ~ $300/user/year R Studio (desktop): no cost

Cost SAS (desktop): ~ $1000/license SAS (server): ~ $300/user/year R Studio (desktop): no cost (+ $1000 for highcharter) R Studio (server): $50, 000 for server, $10, 000 yearly license (+ $1000 for highcharter) With 250 users: R Studio (desktop): $0 R Studio (server): ~$60, 000 yr 1, ~$10, 000 yr 2, … SAS Server: ~$75, 000/yr September 20, 2021 59

There were security concerns… IT security has removed desktop R Studio from our machines

There were security concerns… IT security has removed desktop R Studio from our machines R Studio is run from server environment b/c it is considered more secure September 20, 2021 60

A lot of time A lot of learning But a lot of fun Lower

A lot of time A lot of learning But a lot of fun Lower cost, improved capability Revolutionized the way visualizations are done at DOHMH September 20, 2021 61

Questions? NYC DOHMH Syndromic Group Ramona Lall Robert Mathes Hilary Parton Jessie Sell September

Questions? NYC DOHMH Syndromic Group Ramona Lall Robert Mathes Hilary Parton Jessie Sell September 20, 2021 62