Simple Thematic Mapping in Stata 3 rd German

  • Slides: 48
Download presentation
Simple Thematic Mapping in Stata 3 rd German Stata Users Group Meeting Berlin, 8

Simple Thematic Mapping in Stata 3 rd German Stata Users Group Meeting Berlin, 8 April 2005 Maurizio Pisati University of Milano Bicocca – Italy maurizio. pisati@unimib. it

Thematic maps l Thematic maps represent the spatial distribution of one or more variables

Thematic maps l Thematic maps represent the spatial distribution of one or more variables of interest within a given geographical unit

Thematic maps l Examples: – – A sociologist could use a choropleth map (a.

Thematic maps l Examples: – – A sociologist could use a choropleth map (a. k. a. shaded map) to show the percentage of families below the poverty line varies across the states or the provinces of a given country A police officer could be interested in analyzing a dot map showing the locations of drug markets within a given city

Software for thematic mapping l l Usually, to produce state-of-the-art thematic maps one has

Software for thematic mapping l l Usually, to produce state-of-the-art thematic maps one has to resort to specialized software (e. g. , Arc. View, Map. Info) In some cases, however, it is possible to exploit the graphical engine of a general-purpose statistical package to draw simple but effective thematic maps

Stata’s mapping capabilities l l Up until version 7, Stata offered very limited mapping

Stata’s mapping capabilities l l Up until version 7, Stata offered very limited mapping capabilities On the other hand, the graphical engine introduced in Stata 8 is quite flexible and makes it possible to draw several kinds of maps in a relatively simple manner

The tmap package l The tmap package is a suite of Stata programs designed

The tmap package l The tmap package is a suite of Stata programs designed to draw five kinds of thematic map: – – – Choropleth maps Proportional symbol maps Deviation maps Dot maps Label maps

The tmap package l l l Choropleth, proportional symbol, and deviation maps are intended

The tmap package l l l Choropleth, proportional symbol, and deviation maps are intended to depict area data Dot maps are suitable for representing point data Label maps can be used to show data of both types

The tmap package l The tmap package exploits the possibility – offered by the

The tmap package l The tmap package exploits the possibility – offered by the new Stata graphical engine – to overlay a large number of different graphs, each of which is used to create a distinct element of the desired map

The tmap package l Specifically: – – graph twoway area is used to draw

The tmap package l Specifically: – – graph twoway area is used to draw the outlines of the geographical areas of interest and to fill them with the appropriate colors graph twoway scatter is used to plot the proper symbols or labels when required

Required datasets l To use tmap, one typically needs two Stata datasets: – –

Required datasets l To use tmap, one typically needs two Stata datasets: – – The master dataset The boundary dataset

Master dataset l l The master dataset is intended to store the spatial data

Master dataset l l The master dataset is intended to store the spatial data to be represented It is a standard cases-by-variables dataset whose rows represent the geographical areas or locations objects of analysis

Master dataset: example id land turnout 1 2 3 4 5 6 7 8

Master dataset: example id land turnout 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Baden-Wuerttemberg Bayern Bremen Hamburg Hessen Niedersachsen Nordrhein-Westfalen Rheinland-Pfalz Saarland Schleswig-Holstein Brandenburg Mecklenburg-Vorpommern Sachsen-Anhalt Thueringen Berlin 62. 6 57. 1 61. 3 68. 7 64. 6 67. 0 56. 7 62. 1 55. 5 66. 5 56. 4 70. 6 59. 6 56. 5 53. 8 68. 1

Boundary dataset l The boundary dataset is intended to store the geographical boundaries of

Boundary dataset l The boundary dataset is intended to store the geographical boundaries of the whole geographical unit of interest R or of its subareas Ai (i = 1, …, n)

Boundary dataset l The boundary dataset must always include the following three variables: –

Boundary dataset l The boundary dataset must always include the following three variables: – – – _ID, which contains the numeric identifier of R or of each sub-area Ai _X, which contains the x-coordinates of the polygon or polygons that make up R or each sub-area Ai _Y, which contains the y-coordinates of the polygon or polygons that make up R or each sub-area Ai

Boundary dataset l If one or more of the sub-areas Ai are “islands”, i.

Boundary dataset l If one or more of the sub-areas Ai are “islands”, i. e. , are completely surrounded by the territory of another sub-area, then the boundary dataset must include an additional variable: – _ISLAND, which takes value 1 when the corresponding sub-area is an “island”, and value 0 otherwise

Boundary dataset l The boundary dataset must always be sorted by variable _ID

Boundary dataset l The boundary dataset must always be sorted by variable _ID

Boundary dataset l Each polygon included in the boundary dataset must be defined by

Boundary dataset l Each polygon included in the boundary dataset must be defined by 1+k+1 records, each of which corresponds to a proper pair of (x, y) coordinates

Boundary dataset l The first record denotes the beginning of a new polygon and

Boundary dataset l The first record denotes the beginning of a new polygon and corresponds to a missing coordinate pair (. , . )

Boundary dataset l l The 2 nd to (k+1)th records denote the k nodes

Boundary dataset l l The 2 nd to (k+1)th records denote the k nodes of the polygon and correspond to the k coordinate pairs (x, y) that define such nodes These records must be arranged so as to correspond to consecutive nodes

Boundary dataset l The last record denotes the end of the polygon and corresponds

Boundary dataset l The last record denotes the end of the polygon and corresponds to a coordinate pair which is an exact replica of the first node of the polygon

Boundary dataset: example

Boundary dataset: example

Boundary dataset: example +-------------+ | _ID _X _Y _ISLAND | |-------------| | 1. .

Boundary dataset: example +-------------+ | _ID _X _Y _ISLAND | |-------------| | 1. . 0 | | 1 10 10 0 | | 1 10 30 0 | | 1 18 10 0 | | 1 10 10 0 | | 1. . 0 | | 1 22 10 0 | | 1 22 30 0 | | 1 30 10 0 | | 1 22 10 0 | |-------------| | 2. . 0 | | 2 10 30 0 | | 2 10 50 0 | | 2 30 30 0 | | 2 10 30 0 | |-------------| | 3. . 1 | | 3 22 48 1 | | 3 28 42 1 | | 3 22 48 1 | +-------------+ <- Polygon 1: start <- Polygon 1: end <- Polygon 2: start <- Polygon 2: end <- Polygon 3: start <- Polygon 3: end <- Polygon 4: start

Boundary dataset: mif 2 dta l l l mif 2 dta is a simple

Boundary dataset: mif 2 dta l l l mif 2 dta is a simple Stata program that converts Map. Info Interchange Format boundary files into Stata boundary datasets mif 2 dta converts any given pair of files rootname. mif and rootname. mid into a new pair of Stata datasets: rootname-Coordinates. dta (the boundary dataset) and rootname-Database. dta (the master dataset) Optionally, mif 2 dta also computes the coordinates of the centroids of the geographical areas of interest

Boundary dataset: shapefiles l To convert a shapefile into a Stata boundary dataset: –

Boundary dataset: shapefiles l To convert a shapefile into a Stata boundary dataset: – – Convert the shapefile of interest into the proper pair of MIF files (e. g. , using the freeware DOS program shp 2 mif. exe) Use mif 2 dta to convert the MIF files into the corresponding Stata master and boundary datasets

tmap choropleth l tmap choropleth represents the spatial distribution of area data by means

tmap choropleth l tmap choropleth represents the spatial distribution of area data by means of choropleth maps, i. e. , maps where each of n sub-areas is colored (or shaded) according to a discrete scale based on the value taken on by a quantitative variable of interest in that subarea

tmap choropleth l l The number of classes that make up the discrete scale

tmap choropleth l l The number of classes that make up the discrete scale must be between 2 and 9 The corresponding class breaks can be based on four different criteria: – – Quantiles Equal intervals Standard deviates Custom

tmap choropleth: example tmap choropleth spd 2, id(id) map(Germany-Coordinates. dta) clmethod(quantile) clnumber(4) palette(Reds) title("Pct.

tmap choropleth: example tmap choropleth spd 2, id(id) map(Germany-Coordinates. dta) clmethod(quantile) clnumber(4) palette(Reds) title("Pct. votes for SPD at latest election") subtitle("(Two-party share)") legpos(5) /// /// ///

tmap propsymbol l tmap propsymbol represents the spatial distribution of area data by means

tmap propsymbol l tmap propsymbol represents the spatial distribution of area data by means of proportional symbol maps, i. e. , maps where the value taken on by a quantitative variable of interest in each of n sub-areas is represented by a symbol whose size is proportional to the value itself

tmap propsymbol: example tmap propsymbol spd 2, x(x_coord) y(y_coord) map(Germany-Coordinates. dta) scolor(red) sshape(O) ssize(1.

tmap propsymbol: example tmap propsymbol spd 2, x(x_coord) y(y_coord) map(Germany-Coordinates. dta) scolor(red) sshape(O) ssize(1. 2) ocolor(white) fcolor(sand) title("Pct. votes for SPD at latest election") subtitle("(Two-party share)") note("Symbol size proportional to variable value", span) /// ///

tmap deviation l tmap deviation represents the spatial distribution of area data by means

tmap deviation l tmap deviation represents the spatial distribution of area data by means of deviation maps, a particular kind of proportional symbol maps where: – – symbol size expresses the absolute deviation of the quantitative variable of interest from its mean or median symbol fill expresses the sign of the deviation (positive or negative)

tmap deviation: example tmap deviation spd 2, x(x_coord) y(y_coord) map(Germany-Coordinates. dta) scolor(blue) sshape(O) ssize(1.

tmap deviation: example tmap deviation spd 2, x(x_coord) y(y_coord) map(Germany-Coordinates. dta) scolor(blue) sshape(O) ssize(1. 2) ocolor(white) fcolor(bluishgray) title("Pct. votes for SPD at latest election") subtitle("(Two-party share)") note(`"`"Solid circles denote positive deviations from the mean"'"' `"`"Hollow circles denote negative deviations from the mean"'"' `"`"Circle size proportional to absolute value of deviation"'"', span) /// /// ///

tmap label l l tmap label is an auxiliary program that allows the user

tmap label l l tmap label is an auxiliary program that allows the user to superimpose onto a base map the values taken on by a numeric or string variable at different locations This program can be used, for example, to plot sub-area names or to represent the spatial distribution of a given quantitative variable of interest in numeric form

tmap label: example tmap label state, x(x_coord) y(y_coord) map(Germany-Coordinates. dta) lcolor(red) llength(30) lsize(0. 8)

tmap label: example tmap label state, x(x_coord) y(y_coord) map(Germany-Coordinates. dta) lcolor(red) llength(30) lsize(0. 8) ocolor(sienna) fcolor(eggshell) title("Länder") /// /// ///

tmap dot l tmap dot represents the spatial distribution of point data by means

tmap dot l tmap dot represents the spatial distribution of point data by means of dot maps, i. e. , maps where the locations at which some “events” of interest have occurred are indicated by symbols whose color and/or shape can vary according to the type of “event”

tmap dot: example tmap dot, x(x) y(y) map("Milano. Outline-Coordinates. dta") by(type) fcolor(stone) title("Location of

tmap dot: example tmap dot, x(x) y(y) map("Milano. Outline-Coordinates. dta") by(type) fcolor(stone) title("Location of police stations") subtitle(`"`"Milano, 2004"'"') legtitle("Police force", size(*0. 7)) legbox(lc(black)) /// ///

tmap choropleth: more examples tmap choropleth murder if conterminous, id(id) map(Us 48 -Coordinates. dta)

tmap choropleth: more examples tmap choropleth murder if conterminous, id(id) map(Us 48 -Coordinates. dta) palette(Blues) ocolor(white) bcolor(navy) title(`"`"Murders per 100, 000 population & Pct. pop. with high school diploma"'"', color(white) span) subtitle("United States 1994", color(white)) legbox(lc(white) fc(navy) margin(medsmall)) legpos(5) legcol(white) legtitle("Murder rate", color(white) size(*0. 8)) addplot(deviation hsdip if conterminous, x(x) y(y) sc(red) ssi(0. 8)) note(`"`"Circles represent pct. pop. with high school diploma"'"' `"`"Solid circles denote positive deviations from the mean"'"' `"`"Hollow circles denote negative deviations from the mean"'"' `"`"Circle size proportional to absolute value of deviation"'"', color(white) span) /// /// /// ///

tmap choropleth: more examples tmap choropleth winner if conterminous, id(id) map(Us 48 -Coordinates. dta)

tmap choropleth: more examples tmap choropleth winner if conterminous, id(id) map(Us 48 -Coordinates. dta) clmethod(unique) palette(Custom) colors(`"`"203 24 29"'"' navy) title(US Presidential Elections 2004) subtitle(Pct. votes for Bush) legpos(5) legsize(1. 2) legtitle("Winner", size(*0. 8)) legcount addplot(label votebushpct if conterminous, x(x) y(y) lc(gs 14) ls(0. 9)) /// /// ///

tmap and spatial data analysis l l tmap can be used to display results

tmap and spatial data analysis l l tmap can be used to display results produced by other Stata programs, e. g. , spatial data analysis programs What follows is an example of a choropleth map + LISA cluster map (Anselin 1995) created using a combination of tmap and a modified version of spatlsa

European Parliament Elections 2004 - Lombardia Pct. votes for Northern League

European Parliament Elections 2004 - Lombardia Pct. votes for Northern League

Ackowledgments l The color schemes used in tmap choropleth were designed by Dr. Cynthia

Ackowledgments l The color schemes used in tmap choropleth were designed by Dr. Cynthia A. Brewer, Department of Geography, The Pennsylvania State University, University Park, Pennsylvania, USA. The color schemes are used with Dr. Brewer’s permission and are from the Color. Brewer map design tool available at Color. Brewer. org

Ackowledgments l I wish to thank Nick Cox and Ian S. Evans for helping

Ackowledgments l I wish to thank Nick Cox and Ian S. Evans for helping improve the first release of tmap l The second release owes much to ideas and suggestions by Nick Cox and Vince Wiggins Any remaining errors and limitations are mine l