STOP /** ******************************************************************************** * * * * * * * Graphs and Graphical Representation in Stata * * (or things I wished I had known and been able to do 15 years ago) * * * * * * * * * * © Vernon Gayle, University of Edinburgh. * * Professor Vernon Gayle (vernon.gayle@ed.ac.uk) * * * ******************************************************************************** Update: 26th November 2018 Introduction A picture paints a thousand words! In my experience, with the exception of weighting, graphing data and results is one of the most troublesome aspects of data analysis. Here is an introduction to producing graphs in Stata. It might not be possible to answer your exact graphing query in this session. Please be patient! Computers often go wrong. ******************************************************************************** Preamble 'The reexpression of data in pictorial form capitalizes upon one of the most highly developed human information processing capabilities - the ability to recognize, classify, and remember visual patterns' (Lewandowsky & Spence 1989 p.200). The four pillars of wisdom Accuracy: minimising information loss and errors in analyses and output Efficiency: automation, maximising features in software Transparency: showing what you did, why, when and how Reproducibility: producing the same results every time whoever or wherever when editing, rewriting a dissertation or re-submitting papers ******************************************************************************** SOME GUIDING THOUGHTS Graphing should be part of the workflow. Never resort to using Excel. Always use syntax. You can never have too many comments. Graphs should be 'standalone'. ABC Always Be Clear. KIS Keep It Simple. Black and white is often required for journal articles. Minimise grey. Colors can be helpful for presentations (but be thoughtful). Be smart! Figure 4 might be Figure 22 in two years time. Think carefully about what is included in your graph and what can best be handled by your document management system. Graphing might even become fun. ******************************************************************************** Citing this .do file Gayle, V. (2016). Graphs and Graphical Representation in Stata, University of Edinburgh. © Vernon Gayle, University of Edinburgh. Professor Vernon Gayle (vernon.gayle@ed.ac.uk) **/ ******************************************************************************** /** Contents of this .do file Getting Started Installing Packages Part 1 Graphs and Graphical Outputs A little theory Getting help Simple graphs Scatter plots #delimit Markers Labelling Points Adding titles Legends Axis Line plots Saving graphs Exporting graphs Exporting graphs to Word Exporting graphs to LaTex Schemes Aspect ratios Jitter A normal density plot Bar charts Graphs in panels Heat maps Difficult data sources Part 2 Graphing Results Statsby Overlays Kaplan-Meir plots Graphing regression results Graphing coefficients - coefplot Graphing coefficients - parmest Graphing coefficients - a general approach Plotting hazards QV plots Constructing tables **/ ******************************************************************************** ********************************************** * IT IS IMPORTANT THAT YOU READ THE COMMENTS * * AND FOLLOW THE STATA.DO FILE LINE BY LINE! * ********************************************** * some preliminary settings to help the session run smoothly * clear all macro drop _all graph drop _all set more off set scheme s1mono * change the directory to one that you can write to * cd c:\temp pwd ******************************************************************************** /** Installing Packages in Stata A benefit of Stata is that new commands and functions are developed which can be incorporated into your current version of Stata. It is possible to acquire and manage downloads from the internet using the command net. The findit command can be used to search the Stata site and other sites for information. For example imagine that you heard about a program to draw graphs using quasi-variance estimates, then using the syntax findit qvgraph would lead you to the module written by Aspen Chen of the University of Connecticut. Many new packages are deposited at the Statistical Software Components (SSC) archive which is sometime called the Boston College Archive and is administered by http://repec.org . The SSC archive has become the premier Stata download site for user-written software and also archives proceedings of Stata Users Group meetings and conferences. Programmes can be downloaded from the SSC archive using the syntax ssc install followed by the new programme’s name. People who do not have administrative access to Stata (for example when working on their university network), may first require permission to download packages. An alternative approach may be to set up an area locally where you have write access (e.g. c:\temp ) and then use the following Stata syntax global path10 "c:\temp\" capture mkdir $path10\stata capture mkdir $path10\stata\ado adopath + $path10\stata\ado net set ado $path10\stata\ado You can test this by installing a package from SSC for example the estout package ssc install estout Help on this new package should then be available help estout In some settings(e.g. using OneDrive) paths can get awkward for example using a file at global path1 "C:\Users\Vernon\OneDrive - University of Edinburgh\Documents\" will require use "${path1}/file.dta" , clear **/ ******************************************************************************** /** PART 1: Graphs and Graphical Outputs **/ ******************************************************************************** ******************************************************************************** ******************************************************************************** * * * A Little Theory * * * ******************************************************************************** /** Understanding a little bit of the theory that guides the construction of graphs in Stata will help you. The appearance of graphs is defined by a series of elements. 1. Elements that control the display of the data, including the shape, color, and size of the 'marker symbols', as well as lines, bars and other ways to display data. 2. Elements that control the size and shape of the graph. 3. Elements which convey additional information within the graph region e.g. marker symbol labels. 4. Information outside the plot region e.g. axis labels. see Kohler and Kreuter (2009 pp.107-108). **/ ******************************************************************************** ******************************************************************************** * * * Getting Help * * * ******************************************************************************** help graph ******************************************************************************** ******************************************************************************** * * * Some Simple Graphics * * * ******************************************************************************** * the famous auto dataset * sysuse auto.dta, clear * add number labels * numlabel _all, add codebook, compact summarize * A simple box graph * graph box price * A simple pie chart * graph pie, over(rep78) graph pie, over(rep78) pie(2, explode) * A simple bar chart * graph bar (mean)mpg, over(foreign) * An alternative version of the bar chart * graph hbar (mean)mpg, over(foreign) * a simple histogramm * histogram mpg * a simple dot chart * graph dot mpg, over(foreign) * simple plots of functions * twoway function y=(x^2), range(1 20) twoway function y=cos(x), range(1 20) * naming graphs * * graph g1 and graph g2 * graph bar (mean)mpg, over(foreign) name(g1, replace) graph hbar (mean)mpg, over(foreign) name(g2, replace) * combining graphs in outputs * graph combine g1 g2 ******************************************************************************** * * * Scatter Plots * * * ******************************************************************************** /** A gentle reminder... Have you run the following commands at the start of the file clear all macro drop _all graph drop _all set more off set scheme s1mono cd c:\temp pwd ******************************************************************************** The Program Effort Data Here are the famous program effort data from Mauldin and Berelson. This extract consists of observations on an index of social setting, an index of family planning effort, and the percent decline in the crude birth rate (CBR) between 1965 and 1975, for 20 countries in Latin America setting effort change Bolivia 46 0 1 Brazil 74 0 10 Chile 89 16 29 Colombia 77 16 25 CostaRica 84 21 29 Cuba 89 15 40 DominicanRep 68 14 21 Ecuador 70 6 0 ElSalvador 60 13 13 Guatemala 55 9 4 Haiti 35 3 0 Honduras 51 7 7 Jamaica 87 23 21 Mexico 83 4 9 Nicaragua 68 0 7 Panama 84 19 22 Paraguay 74 3 6 Peru 73 0 2 TrinidadTobago 84 15 29 Venezuela 91 7 11 Reference: P.W. Mauldin and B. Berelson (1978) Conditions of fertility decline in developing countries, 1965-75 Studies in Family Planning,9:89-147 JSTOR: http://www.jstor.org/stable/1965523 **/ * Input the data * clear input country setting effort change 1 46 0 1 2 74 0 10 3 89 16 29 4 77 16 25 5 84 21 29 6 89 15 40 7 68 14 21 8 70 6 0 9 60 13 13 10 55 9 4 11 35 3 0 12 51 7 7 13 87 23 21 14 83 4 9 15 68 0 7 16 84 19 22 17 74 3 6 18 73 0 2 19 84 15 29 20 91 7 11 end * Adding labels to the dataset * label variable country "country" label variable setting "index of social setting" label variable effort "index of family planning effort" label variable change "percent decline in the crude birth rate" label define country1 /// 1 "Bolivia" 2 "Brazil" 3 "Chile" 4 "Colombia" 5 "CostaRica" 6 "Cuba" /// 7 "DominicanRep" 8 "Ecuador" 9 "ElSalvador" 10 "Guatemala" 11 "Haiti" /// 12 "Honduras" 13 "Jamaica" 14 "Mexico" 15 "Nicaragua" 16 "Panama" /// 17 "Paraguay" 18 "Peru" 19 "TrinidadTobago" 20 "Venezuela" label values country country1 numlabel _all, add summarize * save this file on your c:\ drive or your memory stick * save "c:\temp\effort.dta", replace ******************************************************************************** * a very very simple graph * graph twoway scatter change setting ******************************************************************************** ******************************************************************************** * * * #delimit * * * ******************************************************************************** /** Many graphing commands end up being very long thereforfe use #delimit ; The #delimit command resets the character that marks the end of a command. Commands in a do-file may be delimited with a carriage return or a semicolon. When a do-file begins, the delimiter is a carriage return. The command " #delimit ; " changes the delimiter to a semicolon. To restore the carriage return delimiter inside a file, use #delimit cr **/ #delimit ; graph twoway (scatter change setting) (lfit change setting); #delimit cr ******************************************************************************** ******************************************************************************** * * * Markers * * * ******************************************************************************** #delimit ; graph twoway (scatter change setting, msymbol(circle)); #delimit cr #delimit ; graph twoway (scatter change setting, msymbol(circle_hollow)); #delimit cr #delimit ; graph twoway (scatter change setting, msymbol(smcircle_hollow)); #delimit cr * a scatter plot with weighted markers * #delimit ; graph twoway (scatter change setting [w=effort], msymbol(circle_hollow)); #delimit cr * this will remind you of the main marker symbols that are available * palette symbolpalette * here is another way of getting the same information * graph query symbol * alternatively you can use O Oh oh rather than circle etc * #delimit ; graph twoway (scatter change setting, msymbol(oh)); #delimit cr * adding color * #delimit ; graph twoway (scatter change setting, msymbol(O)mcolor(green)); #delimit cr * changing size * #delimit ; graph twoway (scatter change setting, msymbol(O)mcolor(green) msize(large)); #delimit cr ******************************************************************************** ******************************************************************************** * * * Labelling Points * * * ******************************************************************************** * using the variable country as a label for the points * graph twoway (scatter change setting, mlabel(country) ) * this graph is too cluttered * * the position of the marker labels can be rotated using mlabposition * graph twoway (scatter change setting, mlabel(country)mlabposition(6) ) * positioning labels at nine o'clock on the watch face * graph twoway (scatter change setting, mlabel(country)mlabposition(9) ) /** There are better ways of labelling and positioning points. We can generate a variable "pos" which has specific values for where we want the labels on the points to be positioned (the hours on a watch face). **/ gen pos=3 label variable pos "position for graph" * here are the labels at position 3 using the variable "pos" * graph twoway (scatter change setting, mlabel(country) mlabv(pos) ) /** The graph is still a wee bit cluttered however. There are some labels that could be in better poisitions e.g. TrinidadTobago CostaRica Panama Nicaragua. **/ replace pos = 9 if country==19 * TrinidadTobago * replace pos = 11 if country==5 * CostaRica * replace pos = 2 if country==16 * Panama * replace pos = 2 if country==15 * Nicaragua* * positioning labels on the watch face according to the variable "pos" * graph twoway (scatter change setting, mlabel(country) mlabv(pos) ) /** I am not convinced the numbers alongside the country labels are that useful. Let us remove them from the graph. **/ label dir numlabel country1, remove * re-draw the graph * graph twoway (scatter change setting, mlabel(country) mlabv(pos) ) ******************************************************************************** ******************************************************************************** * * * Titles * * * ******************************************************************************** graph twoway (scatter change setting) /** The stuff after the comma "," This is where much of the real work in getting your graph to look publication ready will take place. A very common mistake is to forget the "," . Try to put it on a seperate line! **/ #delimit ; graph twoway (scatter change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Legends * * * ******************************************************************************** * turning legends on and off * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(on) ; #delimit cr #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * changing the shape of the legend (rows and columns) * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(row(1)) ; #delimit cr #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(col(1)) ; #delimit cr * adding a subtitle to the legend * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(col(1) subtitle("This is my legend")) ; #delimit cr * altering the position of the legend * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(col(1) pos(12) subtitle("This is my legend")) ; #delimit cr * position 12 refers to twelve o'clock on a watch face * * changing the order within the legend * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(subtitle("This is my legend") order(2 "Line" 1 "Dot")) ; #delimit cr #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(subtitle("This is my legend") order(1 "Dot" 2 "Line")) ; #delimit cr * using ring(0) to move the legend inside the plotting area * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(ring(0)subtitle("This is my legend") order(1 "Dot" 2 "Line")) ; #delimit cr * using ring(0) to move the legend inside the plotting area and position * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(ring(0) pos(5)subtitle("This is my legend") order(1 "Dot" 2 "Line")) ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Axis (ticks and grids) * * * ******************************************************************************** * turning the scales off * #delimit ; graph twoway (scatter change setting) (lfit change setting) , xscale(off) yscale(off) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr /** Removing the axis line. This is a little less obvious and can be fiddly at first. **/ #delimit ; graph twoway (scatter change setting) (lfit change setting) , xscale(noline) yscale(noline) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * notice that the axis lines look like they are still on the graph * * it is worth learning something about the plot region * #delimit ; graph twoway (scatter change setting, plotregion(style(none))); #delimit cr /** The axes and the border around the plot region were right on top of each other. Specifying plotregion(style(none)) will do away with the border and reveal only the axes to us. **/ #delimit ; graph twoway (scatter change setting, plotregion(style(none))) , xscale(noline) yscale(noline); #delimit cr * changing scales * #delimit ; graph twoway (scatter change setting) (lfit change setting) , xscale(r(40 110)) yscale(r(-10 50)) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * ticks * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * by default about five or six values are labeled and ticked on each axis * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(#10) xlabel(#10) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * specifying the values to be labeled and ticked * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(10 20 30 40 50 60 70 80 90) xlabel(10 20 30 40 50 60 70 80 90) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr /** In the label options we specify the rule label 0 to 50 in steps of 5 on the y axis and 0 to 100 in steps of 5 on the x axis **/ #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(0(5)50) xlabel(0(5)100) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * adding ticks 10 ticks * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ytick(#10) xtick(#10) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * adding minor ticks 5 minor ticks * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ymtick(##5) xmtick(##5) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr /** 10 minor labels between major ticks on the x axis and 5 minor label between the major ticks on the y axis **/ #delimit ; graph twoway (scatter change setting) (lfit change setting) , ymlabel(##5) xmlabel(##10) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off); #delimit cr * grids * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(, grid) xlabel(, grid) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * grids colors * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(, grid glcolor(pink)) xlabel(, grid) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * grids lines patterns * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(, grid glpatter(shortdash) ) xlabel(, grid) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Line plots * * * ******************************************************************************** * US life expectancy data * sysuse uslifeexp, clear graph twoway line le_wmale le_bmale year /** If you are puzzled by the dip prior to 1920 just search "US life expectancy 1918" using Duck Duck Go. **/ /** Altering line color. These are sometimes called fcline options which is short for fitted and connected lines. **/ #delimit graph twoway (line le_wmale le_bmale year , clcolor(green red) ) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) ; #delimit cr * altering the line pattern * #delimit graph twoway (line le_wmale le_bmale year , clpatter(dash dot )) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) ; #delimit cr * altering the line width * #delimit graph twoway (line le_wmale le_bmale year , clwidth(thin thick)) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Saving Graphs * * * ******************************************************************************** * use the saved effort.dta file on your c:\ drive or your memory stick * use "c:\temp\effort.dta", clear * naming a graph * #delimit ; graph twoway (scatter change setting) , name(g1, replace); #delimit cr * now close your graph window * * displaying a previously named graph * graph display g1 * saving a graph * #delimit ; graph twoway (scatter change setting) , name(g1, replace); #delimit cr graph save "c:\temp\g1", replace * recalling a saved graph * graph use "c:\temp\g1" * draw another graph * #delimit ; graph twoway (scatter change setting[w=effort], msymbol(circle_hollow)) , name(g2, replace); #delimit cr graph save "c:\temp\g2", replace * combining graphs * graph combine g1 g2 * list names of graphs in memory * graph dir * the graph that is currently in memory is called "Graph" * * describe graph stored as "Graph" * graph describe Graph * discards a graph stored in memory * graph drop Graph graph dir * discards all graphs in memory * graph drop _all graph dir ******************************************************************************** ******************************************************************************** * * * Exporting Graphs * * * ******************************************************************************** /** Let us change dataset for the next section. **/ sysuse census, clear gen drate = divorce / pop18p label var drate "Divorce rate" drop if state=="Nevada" /** Exporting a graph to your document... Here are some potential formats .ps PostScript .eps Encapsulated Postscript .tif Tagged Image Format .png Portable Network Graphic .wmf Windows Metafile .emf Windows Enhanced Metafile .pdf Portable Document Format **/ * an emf file works well with word * #delimit ; graph twoway (scatter drate medage, msymbol(circle)) , name(g3, replace); #delimit cr * these formats work well with Word * * windows enhanced metafile * graph export "c:\temp\g3.emf", replace * portable network graphic * graph export "c:\temp\g3.png", replace * portable document format * graph export "c:\temp\g3.ps", replace * this is a pdf format * graph export "c:\temp\g3.pdf", replace /** here is a useful wee page on file types etc https://www.ssc.wisc.edu/sscc/pubs/4-23.htm **/ ******************************************************************************** ******************************************************************************** * * * Exporting Graphs to Word * * * ******************************************************************************** /** 1. Example of writing a graph straight into a Word file 2. Example of appending (adding) a graph into an existing Word file The rtfutil package is a suite of file handling utilities for producing Rich Text Format (RTF) files in Stata, possibly containing plots and tables. These RTF files can then be opened by Microsoft Word, and possibly by alternative free word processors. The plots can be included by inserting, as linked objects, graphics files that might be produced by the graph export command in Stata. Newson, R. B. 2012. From resultssets to resultstables in Stata. The Stata Journal 12(2): 191-213. Download from The Stata Journal website. http://www.stata-journal.com/sjpdf.html?articlenum=st0254 **/ * you may have to install these packages * ssc install rtfutil net install rtfutil.pkg help rtfutil /** In your own time go through the help as the syntax for rtfutil is not very intuitive! **/ #delimit ; twoway (scatter drate medage) (lfit drate medage) , title("US States Divorce Rates and Median Age") subtitle("(Number of divorces by Population age 18+)",size(medsmall)) ytitle("Divorce Rate") xtitle(" " "Median Age") note("note: State data excluding Nevada") legend(off) scheme(s1mono); #delimit cr * export the graph to a file * graph export c:\temp\myplot2.emf, replace * beware you might have closed the graph window! * * now use rtfutil - run all of these lines together* tempname handle1 rtfopen `handle1' using c:\temp\mydoc2.rtf, template(fnmono1) replace capture noisily { file write `handle1' "{\pard\b Publication ready graph in Word \par}" _n rtflink `handle1' using "myplot2.emf" file write `handle1' _n "{\line}" _n "{\pard text can be added here {\ul " file write `handle1' "}\par}" file write `handle1' "\line" _n "{\pard more text can be added here too " file write `handle1' ".\par}" _n "\line" _n rtfclose `handle1' } * ******************************************************************************** /** The graph will be in Word in an .rtf format file in c:\temp . **/ * draw a new graph * #delimit ; twoway (scatter divorce marriage) , title("Number of Divorces & Marriages") subtitle("(US States)") ytitle("Number of Divorces" " ") xtitle(" " "Number of Marriages") note("note:State data excluding Nevada") legend(off) scheme(s1mono); #delimit cr * export the graph to disk * graph export "c:\temp\myplot3.emf", replace * use the rtfappend to add the append the new graph myplot3 * tempname handle4 rtfappend `handle4' using c:\temp\mydoc2.rtf, replace capture noisily { rtflink `handle4' using myplot3.emf file write `handle4' "\line" rtfclose `handle4' } * /** The new graph will be at the bottom of the Word .rtf format file in c:\temp . **/ ******************************************************************************** ******************************************************************************** * * * Exporting Graphs to Latex * * * ******************************************************************************** /** I am not a Latex user but here is the perceived wisdom. The package graph2tex is required graph2tex does two things 1. It takes the most recently created graph and exports it as a .eps file. 2. It displays LaTeX code you could insert for displaying the figure in your LaTeX document. **/ * you may have to install graph2tex first * findit graph2tex net install http://www.ats.ucla.edu/stat/stata/ado/analysis/graph2tex.pkg #delimit ; twoway (scatter divorce marriage) , title("Number of Divorces & Marriages") subtitle("(US States)") ytitle("Number of Divorces" " ") xtitle(" " "Number of Marriages") note("note:State data excluding Nevada") legend(off) scheme(s1mono); #delimit cr graph2tex, epsfile(c:\temp\myplot2) /** Stata has saved the file and also given you the Latex code to work with it in your document . graph2tex, epsfile(c:\temp\myplot2) % exported graph to c:\temp\myplot2.eps \begin{figure}[h] \begin{centering} \includegraphics[height=3in]{c:\temp\myplot2} \end{centering} \end{figure} Here is a useful wee page http://www.ats.ucla.edu/stat/stata/latex/graph_stata_latex.htm **/ ******************************************************************************** ******************************************************************************** * * * Graph Schemes * * * ******************************************************************************** /** A scheme specifies the overall look of the graph. **/ sysuse census, clear gen drate = divorce / pop18p label var drate "Divorce rate" drop if state=="Nevada" graph twoway (scatter drate medage, msymbol(circle)) graph query, schemes /** Available schemes are s2color see help scheme_s2color s2mono see help scheme_s2mono s2manual see help scheme_s2manual s2gmanual see help scheme_s2gmanual s2gcolor see help scheme_s2gcolor s1color see help scheme_s1color s1mono see help scheme_s1mono s1rcolor see help scheme_s1rcolor s1manual see help scheme_s1manual sj see help scheme_sj economist see help scheme_economist s2color8 see help scheme_s2color8 **/ * use the scheme for the Economist * graph twoway (scatter drate medage, msymbol(circle)), scheme(economist) * use the scheme for the Stata Journal * graph twoway (scatter drate medage, msymbol(circle)), scheme(sj) * a crazy color sheme * graph twoway (scatter drate medage, msymbol(circle)), scheme(s1rcolor) * setting the scheme for all graphs * set scheme s1mono ******************************************************************************** ******************************************************************************** * * * Aspect Ratio * * * ******************************************************************************** sysuse uslifeexp, clear * an aspect ratio greater than 1 creates a tall skinny graph * #delimit ; graph twoway (line le_wmale le_bmale year , clwidth(thin thick)) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) aspectratio(1.3); #delimit cr * an aspect ratio less than 1 creates a shorter graph * #delimit ; graph twoway (line le_wmale le_bmale year , clwidth(thin thick) ) /// , title("U.S. Life Expectancy") subtitle("Males") /// legend( order(1 "white" 2 "black")) /// aspectratio(0.5); #delimit cr ******************************************************************************** ******************************************************************************** * * * Jitter * * * ******************************************************************************** /** Scatter will add spherical random noise to your data before plotting if you specify jitter(#), where # represents the size of the noise as a percentage of the graphical area. This can be useful for creating graphs of categorical data when, if the data are not jittered, many of the points would be on top of each other, making it impossible to tell whether the plotted point represented one or 1,000 observations. For instance, in a variation on auto.dta used below, mpg is recorded in units of 5 mpg, and weight is recorded in units of 500 pounds. A standard scatter has considerable overprinting. **/ sysuse autornd, clear tab mpg scatter mpg weight, name(nonjitter) /** There are 74 points in the graph, even though it appears because of overprinting as if there are only 19. Jittering solves this problem. **/ scatter mpg weight, jitter(7) name(jitter) graph combine nonjitter jitter ******************************************************************************** ******************************************************************************** * * * A Normal Density Example * * * ******************************************************************************** /** Normal density example from Tim Collier http://repec.org/usug2005/Collier_SUG2005_Stata_graphics.pdf (Similar example in [G] manual) Note a change in syntax since Version 9 graph twoway (function y=normden(x) , range(-4 4) ) **/ graph twoway (function y=normalden(x) , range(-4 4) ) #delimit ; graph twoway (function y=normalden(x) , range(-4 -1.96) bcolor(gs12) recast(area) ) (function y=normalden(x) , range(1.96 4) bcolor(gs12) recast(area) ) (function y=normalden(x) , range(-1.96 1.96) bcolor(gs14) recast(area) ) (function y=normalden(x) , range(-4 4) clstyle(foreground) dropline(-1.96 0 1.96) ) , plotregion( style(none) icolor(white) ) yscale(off) xscale(noline) ylabel( , nogrid ) legend(off) xlabel(-4 "-4 sd" -3 "-3 sd" -2 "-2 sd" -1 "-1sd" 0 "mean" 1 "1 sd" 2 "2 sd" 3 "3 sd" 4 "4 sd" ) xtitle("") ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Bar Charts * * * ******************************************************************************** sysuse citytemp, clear #delimit ; graph bar (mean) tempjuly tempjan, over(division, label(labsize(*.75))) over(region) bargap(-30) nofill ytitle("Degrees Fahrenheit") legend( label(1 "July") label(2 "January") ) title("Average July and January temperatures") subtitle("by region and division of the United States") note("Source: U.S. Census Bureau, U.S. Dept. of Commerce") name(city1, replace) ; #delimit cr * using bar labels * #delimit ; graph bar tempjuly tempjan, over(region) bargap(-30) legend( label(1 "July") label(2 "January") ) ytitle("Degrees Fahrenheit") title("Average July and January temperatures") subtitle("by regions of the United States") note("Source: U.S. Census Bureau, U.S. Dept. of Commerce") blabel(bar, position(inside) format(%9.1f) color(white)) name(city2, replace) ; #delimit cr * the same graph as a hbar graph * #delimit ; graph hbar tempjuly tempjan, over(region) bargap(-30) legend( label(1 "July") label(2 "January") ) ytitle("Degrees Fahrenheit") title("Average July and January temperatures") subtitle("by regions of the United States") note("Source: U.S. Census Bureau, U.S. Dept. of Commerce") blabel(bar, position(inside) format(%9.1f) color(white)) name(city2, replace) ; #delimit cr * a stacked bar graph * sysuse educ99gdp, clear generate total = private + public #delimit ; graph hbar (asis) public private, over(country, sort(total) descending) stack title("Spending on tertiary education as % of GDP, 1999", span pos(11)) subtitle(" ") note("Source: OECD, Education at a Glance 2002", span) ; #delimit cr * bar graphs by another variable (e.g. union membership) * sysuse nlsw88, clear #delimit ; graph hbar wage, over( occ, axis(off) sort(1) ) blabel( group, pos(base) color(bg) ) ytitle( "" ) by( union, title("Average Hourly Wage, 1988, Women Aged 34-46") note ("Source: 1988 data from NLS,U.S. Dept. of Labor, Bureau of Labor Statistics") ) ; #delimit cr * a more convoluted example * clear input /// str20 htype_string poverty povshare popshare exitrate entryrate /// str35 htype str20 period "All persons" 21 100 100 34 9 "All persons" "1991-1997" "1A,0K" 24 7 6 38 10 "Single adult" "1991-1997" "1A,1+K" 60 17 6 22 27 "Single adult + child(ren)" "1991-1997" "MC=2A,0K" 8 5 14 48 4 "Couple, no children" "1991-1997" "MC=2A,1+K" 18 27 33 38 9 "Couple + child(ren)" "1991-1997" "MC+A,1+K" 13 4 7 53 7 "Couple + adult(s) + child(ren)" "1991-1997" "Pen:1A,0K" 46 19 9 24 22 "Single pensioner" "1991-1997" "Pen:MC=2A,0K" 23 10 10 31 9 "Pensioner couple" "1991-1997" "Other HH" 13 10 16 40 6 "Other" "1991-1997" "All children" 27 29 23 31 8 "All children" "1991-1997" "All persons" 19 100 100 40 8 "All persons" "1998-2004" "1A,0K" 19 7 4 45 8 "Single adult" "1998-2004" "1A,1+K" 48 16 6 34 16 "Single adult + child(ren)" "1998-2004" "MC=2A,0K" 9 7 14 45 4 "Couple, no children" "1998-2004" "MC=2A,1+K" 14 23 31 45 8 "Couple + child(ren)" "1998-2004" "MC+A,1+K" 13 4 6 46 6 "Couple + adult(s) + child(ren)" "1998-2004" "Pen:1A,0K" 38 19 9 31 15 "Single pensioner" "1998-2004" "Pen:MC=2A,0K" 25 14 11 27 9 "Pensioner couple" "1998-2004" "Other HH" 12 11 17 55 7 "Other" "1998-2004" "All children" 22 26 22 40 8 "All children" "1998-2004" end summarize codebook, compact graph hbar (asis) poverty , over(period) over(htype) #delimit ; graph hbar (asis) poverty , over(period) over(htype) asyvars bar(1, color(blue)) bar(2, color(red)) legend( region(lstyle(none)) ) saving(poverty.gph, replace) ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Graphs in Panels * * * ******************************************************************************** /** Graphs in panels by Maarten L. Buis. http://statacodeilike.blogspot.com/2009/10/graphs-in-panels-by-maarten-l-buis.html **/ sysuse auto, clear recode rep78 1/2=3 egen byvar = group(foreign rep78) label define byvar 1 "Average" 2 "Good" 3 "Excelent" /// 4 " " 5 " " 6 " " label value byvar byvar scatter price mpg , /// by(byvar, /// r2title(Car type, orientation(rvertical)) /// t1title(Repair Record 1978, size(medsmall))) /// name(col, replace) .col.plotregion1.r1title[3]._set_orientation rvertical .col.plotregion1.r1title[3].text = {} .col.plotregion1.r1title[3].text.Arrpush Domestic .col.plotregion1.r1title[6].text = {} .col.plotregion1.r1title[6].text.Arrpush Foreign graph display ******************************************************************************** ******************************************************************************** * * * A Heat Map * * * ******************************************************************************** /** An example of a heat graph from Survey Design and Analysis Services Pty Ltd. http://www.survey-design.com.au/Stata%20Graphs.html **/ sysuse nlsw88, clear mean wage if married==1, over(grade) mean wage if married==0, over(grade) generate g=max(grade,8) label var g "Years of education" rename married m collapse wage, by(g m) tab wage generate w=round(wage,.01) tab w summarize wage, meanonly generate c=round((wage-r(min))/(r(max)-r(min))*255) * run this block of commands together * quiet levelsof c, loc(cs) local g foreach c of loc cs { local c1=120+round(`c'/2) local c2=255-round(`c'/2) local m mc("`c2' `c2' `c1'") local g `g'||sc m g if c==`c', ms(S) msize(ehuge) `m' } local g `g'||sc m g, ms(i) mlab(w) mlabp(0) local g `g' leg(off) yla(-.75 " " 0 "N" 1 "Y" 1.75 " ", notick) local g `g' xla(7 " " 8/18 19 " ", notick) twoway `g' scheme(s1mono) title(Wage heat map) xsize(8) ysize(5) ******************************************************************************** ******************************************************************************** * * * An Nasty and Difficult Example * * * ******************************************************************************** /** If the original data are in a spreadsheet, it is often daunting but Stephen Jenkins shows us that it relatively straightforward to get the data into Stata and to manipulate and graph them! **/ insheet using http://www.staff.stir.ac.uk/vernon.gayle/ONSclaimantcount.csv, clear browse * remember to close your browser, especially in earlier versions of Stata * * descriptive text at start of file * list in 1/30 * end of file contains notes * list in 685/691 * series with different frequency * list in 530/550 * drop unwanted freq series and notes * keep in 233/536 list in 1/10 keep v1 v9 v13 rename v1 date rename v9 dpac label variable dpac "Claimant count rate (%), men" rename v13 dpad label variable dpad "Claimant count rate (%), women" destring dpac, replace destring dpad, replace tab date split date , destring list date* in 1/10 gen mydate = ym(date1, date2) tab mydate label variable mydate "Year, month" format mydate %tm tab mydate * basic graph with few frills * twoway line dpac dpad mydate /// , legend( region(lstyle(none)) ) /// name(ccount01, replace) * same graph using time series options * tsset mydate tsline dpac dpad ******************************************************************************** This is a handy reminder of some elements of the graph. **/ use "c:\temp\effort.dta", clear #delimit ; graph twoway (scatter change setting) , title("Title Here" " ") ytitle("y title here") xtitle("x title here") subtitle("sub title here") note("note here") caption("caption goes here") ; #delimit cr ******************************************************************************** ******************************************************************************** /** PART 2: Graphing Results **/ ******************************************************************************** ******************************************************************************** ******************************************************************************** * * * Statsby * * * ******************************************************************************** /** statsby- collects statistics for a command across a by list. **/ sysuse auto, clear summarize statsby mean=r(mean) sd=r(sd) size=r(N), /// by(rep78) : /// summarize mpg list * get statistics for whole sample as well * sysuse auto, clear statsby mean=r(mean) sd=r(sd) size=r(N), /// by(rep78) total : /// summarize mpg list * alternatively save the by-group statistics in separate file * sysuse auto, clear statsby mean=r(mean) sd=r(sd) size=r(N), /// by(rep78) saving(statsby01, replace): /// summarize mpg use statsby01, clear describe list * now draw a graph * graph twoway connect mean rep78 graph twoway scatter mean rep78 graph bar mean, over(rep78) graph hbar mean, over(rep78) * collapse as a different way of getting data into the same shape * sysuse auto, clear collapse (mean) mean = mpg (sd) sd = mpg (count) size = mpg, /// by(rep78) list * contract makes a dataset containing frequencies * sysuse auto, clear tab rep78, missing contract rep78 list sysuse auto, clear * drop any missing * contract rep78, nomiss list sysuse auto, clear tab foreign rep78 contract foreign rep78, freq(freq) cfreq(cumfreq) /// percent(pc) cpercent(cumpc) nomiss list * now include the combinations with zero frequencies sysuse auto, clear tab foreign rep78, nokey contract foreign rep78, freq(freq) cfreq(cumfreq) /// percent(pc) cpercent(cumpc) nomiss zero list ******************************************************************************** ******************************************************************************** * * * Overlays * * * ******************************************************************************** use statsby01, clear graph twoway connect mean rep78, saving(olay01.gph, replace) graph twoway connect sd rep78, saving(olay02.gph, replace) graph combine olay01.gph olay02.gph graph twoway (connect mean rep78)(connect sd rep78) /// , saving(olay03.gph, replace) * graph sd on a second axis RHS * graph twoway (connect mean rep78)(connect sd rep78, yaxis(2) ) /** - separate - creates separate variables by subgroup. This is useful for subgroup graphs. **/ sysuse auto, clear separate mpg, by(foreign) twoway scatter mpg0 mpg1 price, msymbol(Oh Th) /** Lowess a built-in command using overlays. Lowess carries out a locally weighted regression of yvar on xvar, displays the graph, and optionally saves the smoothed variable. Warning: Lowess is computationally intensive and may therefore take a long time to run on a slow computer. Lowess calculations on 1,000 observations, for instance, require performing 1,000 regressions. **/ lowess mpg price, saving(olay04.gph, replace) ******************************************************************************** ******************************************************************************** * * * Kaplan-Meier Plots * * * ******************************************************************************** /** Heart Transplant Data This dataset (Crowley and Hu 1977) consists of 103 patients admitted to the Stanford Heart Transplantation Program. Patients were admitted into the program after review by a committee and then waited for an available donor heart. While waiting, some patients died or were transferred out of the program, but 67% received a transplant. The dataset includes the year the patient was accepted into the program along with the patient's age, whether the patient had other heart surgery previously, and whether the patient received a transplant. In the data, posttran becomes 1 when a patient receives a new heart, so it is a time-varying covariate. **/ * graphing the survival curve * webuse stan3, clear stset t1, failure(died) id(id) summarize tab surgery stcox age posttran surg year sts graph, by(surgery) * displaying the number lost due to censoring as small numbers on the plots * sts graph, by(surgery) lost * displayed on separate graphs * sts graph, separate by (surgery) * plot of the hazard function * sts graph, hazard by(surgery) * Nelson-Aalen cummulative hazard estimate sts graph, na sts graph, cumhaz * Nelson-Aalen cummulative hazard estimate with numbers at-risk displayed * sts graph, na atrisk sts graph, na by(surgery) sts graph, na atrisk by(surgery) ******************************************************************************** ******************************************************************************** * * * Graphing Regression Model Results * * * ******************************************************************************** /** In this next section we cover graphing results. The focus is graphing results from statistical models. Rather than graphing "data" we are now graphing "resultssets" (i.e. sets of results). Roger Newson uses the term "resultssets" but he also states that he would like to thank Nicholas J. Cox, of Durham University, UK, for coining the term resultsset to describe a Stata dataset of results. **/ /** A graph from one of my old papers Gayle, Vernon, and Paul S. Lambert. "Using quasi-variance to communicate sociological results from statistical models." Sociology 41.6 (2007): 1191-1208. **/ clear input region beta lower upper source 0 0 0 0 1 1 0.0943551 0.1143471 0.0743631 1 2 0.1209922 0.1419642 0.1000202 1 3 0.148519 0.170275 0.126763 1 4 0.1302653 0.1510413 0.1094893 1 5 0.3158952 0.3368672 0.2949232 1 6 0.3567166 0.3765126 0.3369206 1 7 0.2606339 0.2819979 0.2392699 1 8 0.1741903 0.1981023 0.1502783 1 9 0.2696731 0.2914291 0.2479171 1 0.2 0 0.0170324 -0.0170324 2 1.2 0.0943551 0.1049783 0.0837319 2 2.2 0.1209922 0.133399 0.1085854 2 3.2 0.148519 0.1620626 0.1349754 2 4.2 0.1302653 0.1423389 0.1181917 2 5.2 0.3158952 0.3281844 0.303606 2 6.2 0.3567166 0.3669282 0.346505 2 7.2 0.2606339 0.2734131 0.2478547 2 8.2 0.1741903 0.1910855 0.1572951 2 9.2 0.2696731 0.2832559 0.2560903 2 end summarize label variable region "GOR region" label variable beta "Parameter estimates" label variable upper "Upper bound" label variable lower "Lower bound" label define regl 0 "North East" 1 "North West" 2 "Yorks" 3 "E Mids" /// 4 "W Mids" 5 "East" 6 "South East" /// 7 "South West" 8 "Inner London" 9 "Outer London" label values region regl tab region summarize tab region #delimit ; graph twoway (scatter beta region if source==1, msymbol(circle_hollow) mlcolor(gs0) msize(medium)) (scatter beta region if source==2, msymbol(diamond_hollow) mlcolor(gs0) msize(medium)) (rspike upper lower region if source==1, blwidth(medium) ) (rspike upper lower region if source==2, blwidth(medthick) ) , ytitle("") xtitle("") yscale(range(-0.02 0.39)) xscale(range(-0.3,9.5)) xlabel(0 1 2 3 4 5 6 7 8 9, valuelabel alternate ) title("Predictions of Good Health, by Government Office Region", size(large) justification(center) ) subtitle("Confidence intervals of regression coefficients, by estimation method", size(medsmall) justification(center) ) note("Source: UK Census 2001 SARS for England, n=1099294." "Model 1: Logistic regression predicting 'Good Health'. Other controls for education and gender" "Gayle and Lambert (2007 p.1195)", justification(left) ) legend( order(1 2) label(1 "Conventional regression") label(2 "Quasi-Variance") ); #delimit cr ******************************************************************************** ******************************************************************************** * * * Plotting Coefficients with coefplot * * * ******************************************************************************** /** Here is a very recent development in extracting and plotting coefficients from the great Ben Jann. ftp://repec.sowi.unibe.ch/files/wp1/jann-2013-coefplot.pdf **/ * you might have to find and install the package first * * see the section at the start Installing Packages in Stata * findit coefplot ssc install coefplot webuse womenwk, clear tab educ, gen(ed) label variable age "Age in years" rename ed1 no_ed rename ed2 low_education rename ed3 medium_education rename ed4 high_education label variable no_ed "No education" label variable low_education "Low education" label variable medium_education "Medium education" label variable high_education"High education" regress wage low_education medium_education high_education age * plotting the coefficients * coefplot coefplot, vertical baselevels drop(_cons age) yline(0) * a more publication ready graph * #delimit ; coefplot, vertical baselevels drop(_cons age) yline(0) ytitle("Regression Coefficient" " ") xtitle(" " "Education Level") title("Regression Model of Women's Hourly Wage", size(medium) justification(right) ) subtitle("Educational Levels", size(medsmall) justification(right)) scheme(s1mono) ; #delimit cr * estimate two models * regress wage low_education medium_education high_education age estimates store model1 regress wage low_education medium_education high_education age married estimates store model2 #delimit ; coefplot model1, vertical baselevels drop(_cons age married) xline(0) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model" "of Women's Hourly Wage", size(medium) justification(center) ) subtitle("Educational Levels", size(medsmall) justification(center)) scheme(s1mono) name(gmodel1, replace) ; #delimit cr #delimit ; coefplot model2, vertical baselevels drop(_cons age married) yline(0) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model" "of Women's Hourly Wage", size(medium) justification(center) ) subtitle("Educational Levels", size(medsmall) justification(center)) scheme(s1mono) name(gmodel2, replace) ; #delimit cr * plotting the coefficients from both models * #delimit ; coefplot (model1, label(model 1)) (model2, label(model 2)) , vertical baselevels drop(_cons age married) yline(0) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model of Women's Hourly Wage", size(medium) justification(center) ) subtitle("Educational Levels", size(medsmall) justification(center)) scheme(s1mono) ; #delimit cr * an example of the versitility of this package * * plotting coefficients with Harrell style confidence intervals * regress wage low_education medium_education high_education age #delimit ; coefplot, vertical drop(_cons age) yline(0) msymbol(circle_hollow) mcolor(white) levels(99 95 90 80 70) ciopts(lwidth(3 ..) lcolor(*.2 *.4 *.6 *.8 *1)) legend(order(1 "99" 2 "95" 3 "90" 4 "80" 5 "70") row(1) subtitle("Confidence Intervals")) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model of Women's Hourly Wage", size(medium) justification(center)) subtitle("Educational Levels", size(medsmall) justification(center)); #delimit cr /** Further information and examples are available here http://www.stata.com/meeting/germany14/abstracts/materials/de14_jann.pdf . **/ ******************************************************************************** ******************************************************************************** * * * Plotting Coefficients with parmest * * * ******************************************************************************** /** Another approach is to use Roger Newson's parmest program. Parmest takes, as input, the most recently calculated set of estimation results, created by the most recently executed estimation command. It creates, as output, a new dataset, with one observation per estimated parameter, and variables containing parameter names, estimates, standard errors, z-test or t-test statistics, p-values, confidence limits, and other estimation results if requested by the user. **/ * you may have to install this package first * ssc install parmest ssc d parmest /** The hsb2 dataset is taken from a national survey of high school seniors two hundred observation were randomly sampled from the High School and Beyond survey. **/ use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear numlabel _all, add summarize tab race, missing tabulate race, gen(ethnic) rename ethnic1 hispanic rename ethnic2 asian rename ethnic3 africam rename ethnic4 white reg read female hispanic asian africam * save the regression results * #delimit ; parmest,format(estimate min95 max95 %8.2f p) list(,) saving(c:\temp\parmest1,replace) ; #delimit cr * use the new file with the regression results * use c:\temp\parmest1.dta, clear browse * take a look at the data * summarize browse gen id=_n * it can be handy to give each row an id number * * a simple graph of the estimates for ethnicity in the model * #delimit ; twoway (scatter estimate id if id>1 & id<5, msymbol(circle_hollow)) (rspike min95 max95 id if id>1 & id<5); #delimit cr * a more publication-ready graph of the estimates for ethnicity in the model * #delimit ; twoway (scatter estimate id if id>1 & id<5, msymbol(circle_hollow)) (rspike min95 max95 id if id>1 & id<5) , ytitle("") xtitle("") xscale(range(2, 4.5)) xlabel(2 "Hispanic" 3 "Asian" 4 "African-American" , valuelabel alternate ) title("Standardised Reading Score, Ethnicity Effects") subtitle("Regression coefficients and confidence intervals") note("Source: High School and Beyond, n=200." "Model 1: Regression model 'Reading Score' ethnicity and gender.") legend( order(1 2)label(1 "Parameter estimate") label(2 "95% C.I.") ); #delimit cr ******************************************************************************** ******************************************************************************** * * * A More General Approach to Plotting Coefficients * * * ******************************************************************************** /** Here is another approach to extracting and plotting coefficients. This is more general and can be adapted for other purposes. **/ use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear numlabel _all, add summarize tab race, missing tabulate race, gen(ethnic) rename ethnic1 hispanic rename ethnic2 asian rename ethnic3 africam rename ethnic4 white reg read female hispanic asian africam * take a look at the matrix of estimation results * matrix list e(b) * make your own matrix * matrix b = (e(b)) * convert matrix info to variables * svmat b keep b1 b2 b3 b4 b5 summarize gen id=_n * take a look at the data * summarize id browse * all you want are the values in row 1 keep if id==1 browse * drop the estimate for female and the constant * drop b1 b5 * calculate a value for whites repsondents * gen b0=0 * calculate some values for the X axis * gen id0=1 gen id2=1.5 gen id3=2 gen id4=2.5 #delimit ; twoway (scatter b0 id0, msymbol(square_hollow) mlcolor(gs0) msize(medium)) (scatter b2 id2, msymbol(circle_hollow) mlcolor(gs0) msize(medium)) (scatter b3 id3, msymbol(triangle_hollow) mlcolor(gs0) msize(medium)) (scatter b4 id4, msymbol(diamond_hollow) mlcolor(gs0) msize(medium)) , ytitle("") xtitle("Ethnicity") yscale(range() titlegap(1) ) xscale(range(1, 3)) xlabel(1 2 3 4 5 6 7 8 9, valuelabel alternate ) xlabel(1 "White" 1.5 "Hispanic" 2 "Asian" 2.5 "African-American", valuelabel alternate ) title("Standardised Reading Score and Ethnicity", size(large) justification(center) ) subtitle("Regression coefficients and confidence intervals", size(medsmall) justification(center) ) note("Source: High School and Beyond, n=200." "Model 1: Regression model 'Reading Score' gender and ethnicity.", justification(left) ) legend(off) ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Plotting a Hazards * * * ******************************************************************************** /** This example is from Jenkins, S. and Garcia-Serrano, C.(2004) 'The Relationship between Unemployment Benefits and Re-employment Probabilities: Evidence from Spain', Oxford Bulletin of Economics and Statistics, 66(2), pp. 239-259. **/ clear input /// month thazard thlb thub bhazard f 1 .030065 .025568 .035323 .030065 1 2 .043142 .03694 .050331 .042387 1 3 .04445 .03817 .051707 .042908 1 4 .055762 .047999 .064695 .052917 1 5 .048781 .041955 .056653 .045468 1 6 .043398 .037494 .050184 .039725 1 7 .039824 .034598 .045802 .036558 1 8 .044353 .038333 .051269 .040862 1 9 .035381 .030757 .04067 .032676 1 10 .032756 .028455 .037683 .030343 1 11 .031901 .027653 .036777 .029643 1 12 .032016 .027841 .036793 .029846 1 13 .033998 .029644 .038965 .031337 1 14 .027006 .023505 .031013 .024594 1 15 .025501 .022242 .029225 .022954 1 16 .024788 .02146 .028616 .022053 1 17 .024007 .020741 .027773 .021111 1 18 .019184 .016616 .02214 .016665 1 19 .015757 .013637 .0182 .014551 1 20 .020071 .017391 .023155 .019715 1 21 .009206 .008032 .01055 .012149 1 22 .00796 .006827 .009279 .014113 1 23 .009482 .008198 .010964 .014721 1 24 .009883 .008529 .01145 .013697 1 1 .032621 .028583 .037207 .032621 2 2 .046461 .040935 .052692 .045941 2 3 .047563 .042011 .053808 .046504 2 4 .059247 .052308 .067042 .057302 2 5 .05153 .045441 .058384 .049267 2 6 .045569 .04041 .05135 .043066 2 7 .039513 .035428 .044048 .039644 2 8 .041583 .037208 .046448 .044294 2 9 .024887 .022694 .027286 .035446 2 10 .017238 .015706 .018916 .032922 2 11 .019197 .017469 .021091 .032165 2 12 .021634 .019664 .023796 .032384 2 1 .032513 .029105 .036304 .032513 3 2 .043134 .039004 .04768 .04579 3 3 .032763 .029997 .035776 .046352 3 4 .030368 .027774 .033197 .057116 3 5 .029615 .027127 .032323 .049107 3 6 .028876 .026478 .031484 .042925 3 end twoway /// (connected thazard month if month >= 1 & month <= 24 & f==1, sort ) /// (connected thazard month if month >= 1 & month <= 12 & f==2, sort ) /// (connected thazard month if month >= 1 & month <= 6 & f==3, sort ) twoway /// (connected thazard month if month >= 1 & month <= 24 & f==1, /// sort msymbol(Oh) mcolor(black) /// clcolor(black) clpat(dash) t1title() ) /// (rspike thlb thub month if month >=1 & month <= 24 & f==1, blcolor(black) sort ) /// (connected thazard month if month >= 1 & month <= 12 & f==2, /// sort msymbol(th) mcolor(black) /// clcolor(black) clpat(solid) ) /// (rspike thlb thub month if month >=1 & month <= 12 & f==2, blcolor(black) sort ) /// (connected thazard month if month >= 1 & month <= 6 & f==3, /// sort msymbol(dh) mcolor(black) /// clcolor(black) clpat(shortdash) ) /// (rspike thlb thub month if month >=1 & month <= 6 & f==3, blcolor(black) sort ) /// , ytitle(Hazard rate, margin(medium)) yscale(range(0 0.07) /// lcolor(black)) ylabel(0(0.01)0.07, angle(0) ) /// xtitle(Month of UI receipt, alignment(bottom) margin(medium)) /// xscale(range(1 24)) xlabel(1(1)24) legend(off) scheme(s1mono) /// graphregion(fcolor(none) lcolor(none) ifcolor(none) ilcolor(none)) /// plotregion(style(none)) saving(OBES-Figure1, replace) /// text(.015 23 "E = 24" , place(c)) /// text(.019 13 "E = 12" , place(s)) /// text(.025 5 "E = 6" , place(e)) /** Predicted hazard rates, by initial UI entitlement (E, in months). Figure 1 shows the predicted hazard rates for three men sharing a common set of characteristics except that their initial UI entitlements (E) differ: E equals 24 months, 12 months and 6 months. Differences in characteristics besides t and s simply shift the hazard function proportionately: see section II.) In all three cases, the hazard initially increases but then declines thereafter, broadly speaking. The increase in the hazard associated with imminent exhaustion of entitlement is reflected in the upward slope at spell months 22-24 in the E=24 case, at months 10-12 in the E=12 case, and in the levelling off of the decline in the hazard for the E=6 case (p.255) **/ ******************************************************************************** ******************************************************************************** * * * QV-Plot * * * ******************************************************************************** /** The qv command is a post-estimation command. It uses e(V) from the most recently fit model to compute quasi variances for all categories in a multi-category variable. Aspen Chen's (aspen.chen@uconn.edu) qv package is available from http://econpapers.repec.org/software/bocbocode/s457831.htm The package used David Firth's methodology (see Firth 2003). Gayle and Lambert (2007) provide an explication of this method with a variety of sociology examples. Further resources are available at http://www.restore.ac.uk/Longitudinal/qv/ **/ ************************************************************************************* ssc install qv ssc install coefplot /** The coefplot is a module to plot regression coefficients and other results(see Jann 2014). **/ * a simple regression example * sysuse census, clear codebook, compact gen dpop=death/pop*1000 label variable dpop "deaths per 1000" * regression model with region and median age * reg dpop i.region medage coefplot, vertical baselevels drop(_cons medage) xline(0)name(census1) * a basic qv plot * qv i.region, level(95) plot * The qv graph function * qvgraph i.region, name(qv1) * Combining the two graphs * graph combine census1 qv1 * regression with a block of dummy variables * gen NC=region==2 gen SO=region==3 gen WS=region==4 reg dpop NC SO WS medage qv NC SO WS, ref(NE) level(95) plot ************************************************************************************* * An example with research data * clear * Synthetic data based on the Youth Cohort Study of England and Wales (YCS) * input count_y y region 3981 0 1 2737 1 1 8259 0 2 7962 1 2 6475 0 3 4851 1 3 4926 0 4 4224 1 4 6782 0 5 5378 1 5 2407 0 6 2248 1 6 5719 0 7 5653 1 7 11326 0 8 12957 1 8 4747 0 9 5326 1 9 3730 0 10 3319 1 10 end label variable count_y "count" label variable y "five + GCSEs" label variable region "Government Office Region" expand count_y sort region y list in 1/10 summarize tab region y, missing logit y i.region set scheme s1mono coefplot, vertical baselevels drop(_cons medage) xline(0) /// xlabel(1 2 3 4 5 6 7 8 9 10) name(coef1) qvgraph i.region, xlabel(1 2 3 4 5 6 7 8 9 10) name(qv2) graph combine qv2 coef1 , col(1) iscale(1) capture label drop regl label define reg1 /// 1 "North" 2 "Yorkshire" 3 "NW" /// 4 "E. Mids" 5 "W. Mids" 6 "East Ang." /// 7 "London" 8 "SE" 9 "SW" /// 10 "Wales" label values region reg1 numlabel _all, add qvgraph i.region, /// ytitle("") xtitle("") /// xlabel(1 2 3 4 5 6 7 8 9 10 11 " ", valuelabel alternate ) /// title("GCSE Attainment by Government Office Region", /// size(large) justification(center) ) /// subtitle("Logistic regression coefficients and 95% comparison intervals", /// size(medsmall) justification(right) ) /// note(" " "Source: Palimpsest of Youth Cohort Study (YCS) data, n=113,007." /// "Model 1: Logistic regression predicting 5+ GCSEs at Grades A*-C", /// justification(left) ) /// legend(off) ************************************************************************************* * Create graph from the stored matrices * mat lb=e(qvlb) mat ub=e(qvub) svmat lb, names(lb) svmat ub, names(ub) gen b=(lb1+ub1)/2 summarize gen group=_n in 1/10 label variable group "Region" label values group reg1 graph twoway (scatter b group) /// (rspike ub1 lb1 group, vert) , /// ytitle("") xtitle("") /// xlabel(1 2 3 4 5 6 7 8 9 10 11 " ", valuelabel alternate ) /// title("GCSE Attainment by Government Office Region", /// size(large) justification(center) ) /// subtitle("Logistic regression coefficients and 95% comparison intervals", /// size(medsmall) justification(center) ) /// note(" " "Source: Palimpsest of Youth Cohort Study (YCS) data, n=113,007." /// "Model 1: Logistic regression predicting 5+ GCSEs at Grades A*-C", /// justification(left) ) /// legend(off) ******************************************************************************** * graphic qv comparison intervals with regular standard errors * xi:logit y i.region capture drop lb2 ub2 gen lb2=. gen ub2=. replace ub2=0 if _n==1 replace lb2=0 if _n==1 replace ub2=b+1.96*_se[_Iregion_2] if _n==2 replace lb2=b-1.96*_se[_Iregion_2] if _n==2 replace ub2=b+1.96*_se[_Iregion_2] if _n==2 replace lb2=b-1.96*_se[_Iregion_3] if _n==3 replace ub2=b+1.96*_se[_Iregion_3] if _n==3 replace lb2=b-1.96*_se[_Iregion_4] if _n==4 replace ub2=b+1.96*_se[_Iregion_4] if _n==4 replace lb2=b-1.96*_se[_Iregion_5] if _n==5 replace ub2=b+1.96*_se[_Iregion_5] if _n==5 replace lb2=b-1.96*_se[_Iregion_6] if _n==6 replace ub2=b+1.96*_se[_Iregion_6] if _n==6 replace lb2=b-1.96*_se[_Iregion_7] if _n==7 replace ub2=b+1.96*_se[_Iregion_7] if _n==7 replace lb2=b-1.96*_se[_Iregion_8] if _n==8 replace ub2=b+1.96*_se[_Iregion_8] if _n==8 replace lb2=b-1.96*_se[_Iregion_9] if _n==9 replace ub2=b+1.96*_se[_Iregion_9] if _n==9 replace lb2=b-1.96*_se[_Iregion_10] if _n==10 replace ub2=b+1.96*_se[_Iregion_10] if _n==10 capture drop _Iregion_* capture drop group2 gen group2=group replace group=group+0.3 label values group2 reg1 graph twoway (scatter b group2, msymbol(diamond_hollow)) /// (rspike ub2 lb2 group2, vert) /// (scatter b group, msymbol(circle)) /// (rspike ub1 lb1 group, vert) /// , /// ytitle("") xtitle("") /// xlabel(1 2 3 4 5 6 7 8 9 10 11 " ", valuelabel alternate ) /// title("GCSE Attainment by Government Office Region", /// size(large) justification(center) ) /// subtitle("Logistic regression coefficients and confidence intervals", /// size(medsmall) justification(center) ) /// note(" " "Source: Palimpsest of Youth Cohort Study (YCS) data, n=113,007." /// "Model 1: Logistic regression predicting 5+ GCSEs at Grades A*-C", /// justification(left) ) /// legend( order(1 3) /// label(1 "Conventional 95% C.I.") label(3 "Quasi-Variance comparison interval") ) ******************************************************************************** ******************************************************************************** * * * Constructing Tables * * * ******************************************************************************** /** This is a graphs workshop but many participants asks for my table construction syntax. **/ webuse womenwk, clear tab educ, gen(ed) rename ed1 no_ed rename ed2 low_education rename ed3 medium_education rename ed4 high_education label variable no_ed "no education" label variable low_education "low education" label variable medium_education "medium education" label variable high_education"high education" regress wage low_education medium_education high_education age estimates store reg1 esttab reg1 using c:\temp\regress1.rtf, /// cells(b(star fmt(%9.3f)) se(par)) /// stats(r2 r2_a N, fmt(%9.3f %9.3f) labels(R-Squared AdjR-Squared n)) /// starlevels(* .10 ** .05 *** .01) stardetach /// label mtitles("Regression Model") /// nogaps replace * generate a variable to identify selection generate wageseen = (wage < .) heckman wage low_education medium_education high_education age, /// select(wageseen = married children low_education medium_education high_education age) estimates store heck1 esttab reg1 heck1 using c:\temp\heckman1.rtf, /// cells(b(star fmt(%9.3f)) se(par)) /// stats(r2 r2_a N, fmt(%9.3f %9.3f) labels(R-Squared AdjR-Squared n)) /// starlevels(* .10 ** .05 *** .01) stardetach /// label mtitles("Regression Model" "Heckman Model") /// nogaps replace * here are some other sneaky wee tricks! * estimates table reg1 heck1, /// star(.10 .05 .01) b(%9.3f) stats(N ll aic bic) esttab reg1 heck1 using c:\temp\heckman2.rtf, /// cells(b(star fmt(%9.3f)) t(par)) /// stats(r2 r2_a N, fmt(%9.3f %9.3f) labels(R-Squared AdjR-Squared n)) /// starlevels(* .10 ** .05 *** .01) stardetach /// label mtitles("Regression Model" "Heckman Model") /// nogaps replace esttab reg1 heck1 using c:\temp\heckman3.rtf, /// cells(b(star fmt(%9.3f)) p(par)) /// stats(r2 r2_a N, fmt(%9.3f %9.3f) labels(R-Squared AdjR-Squared n)) /// starlevels(*** .01) stardetach /// label mtitles("Regression Model" "Heckman Model") /// replace ******************************************************************************** /** Useful References Cox, Nicholas J. Speaking Stata Graphics: A Collection from the Stata Journal. Stata Press, 2014 Gayle, Vernon, and Paul S. Lambert. "Using quasi-variance to communicate sociological results from statistical models." Sociology 41.6 (2007): 1191-1208. Kohler, Ulrich, and Frauke Kreuter. Data analysis using Stata. Stata press, 2012. Jann, Ben. "Plotting regression coefficients and other estimates." Stata Journal 14.4 (2014): 708-737. Lewandowsky, Stephan, and Ian Spence. "The perception of statistical graphs." Sociological Methods & Research 18.2-3 (1989): 200-242. Long, J. Scott. The workflow of data analysis using Stata. Stata Press, 2009 Mitchell, Michael N. A visual guide to Stata graphics. Stata Press, 2008. Newson, Roger B. "From resultssets to resultstables in Stata." Stata Journal 12.2 (2012): 191. http://www.ats.ucla.edu/stat/stata/library/GraphExamples/ http://www.stata.com/features/example-graphs/ ******************************************************************************** /** © Vernon Gayle, University of Edinburgh. Professor Vernon Gayle (vernon.gayle@ed.ac.uk) This file has been produced by Vernon Gayle. Any material in this file must not be reproduced, published or used for teaching without permission from Professor Gayle. The original idea for teaching graphing in this way came from my colleague Professor Stephen Jenkins (s.jenkins@lse.ac.uk) who is a Stata genius. Johannes Langer a graduate student at the University of Edinburgh provided very useful comments and helped expunge some typos and errors. Over the last decade much of the Stata materials that Professor Gayle has developed have been in close collaboration with Professor Paul Lambert, Stirling University. However, Professor Gayle is responsible for any errors in this file. Citing this .do file Gayle, V. (2016). Graphs and Graphical Representation in Stata, University of Edinburgh. © Vernon Gayle, University of Edinburgh. Professor Vernon Gayle (vernon.gayle@ed.ac.uk) ******************************************************************************** * End of file *