STOP /** ******************************************************************************** * * * * * * * Producing Publication Ready Graphs in Stata * * (or things I wished I had known and been able to do 15 years ago) * * * * * * * * * * © Vernon Gayle, University of Edinburgh. * * Professor Vernon Gayle (vernon.gayle@ed.ac.uk) * * * ******************************************************************************** Introduction A picture paints a thousand words! In my experience, with the exception of weighting, graphing data and results is one of the most troublesome aspects of data analysis. Here is a brief introduction to producing publication ready graphs in Stata. This is adapted from a two-day practical workshop. It is not feasible to fully understand every graphing topic in two hours. It might not be possible to answer your exact graphing query in this session. Please be patient! Computers often go wrong. Preamble 'The reexpression of data in pictorial form capitalizes upon one of the most highly developed human information processing capabilities - the ability to recognize, classify, and remember visual patterns' (Lewandowsky & Spence 1989 p.200). The four pillars of wisdom Effectiveness: minimising information loss and errors in analyses and output Efficiency: automation, maximising features in software Transparency: showing what you did, why, when and how Reproducibility: producing the same results every time whoever or wherever when editing, rewriting a dissertation or re-submitting papers ******************************************************************************** SOME GUIDING THOUGHTS Graphing should be part of the workflow. Never resort to using Excel. Always use syntax. You can never have too many comments. Graphs should be 'standalone'. ABC Always Be Clear. KIS Keep It Simple. Black and white is often required for journal articles. Minimise grey. Colors can be helpful for presentations (but be thoughtful). Be smart! Figure 4 might be Figure 22 in two years time. Think carefully about what is included in your graph and what can best be handled by your document management system. Graphing might even become fun. ******************************************************************************** Citing this .do file Gayle, V. (2016). Producing Publication Ready Graphs in Stata, University of Edinburgh. © Vernon Gayle, University of Edinburgh. Professor Vernon Gayle (vernon.gayle@ed.ac.uk) **/ ******************************************************************************** ********************************************** * IT IS IMPORTANT THAT YOU READ THE COMMENTS * * AND FOLLOW THE STATA.DO FILE LINE BY LINE! * ********************************************** * some preliminary settings to help the session run smoothly * clear all macro drop all graph drop _all set more off set scheme s1mono cd c:\temp pwd ******************************************************************************** ******************************************************************************** * * * A Little Role Play * * * ******************************************************************************** * this is a file on women and employment * webuse womenwk, clear tab educ, gen(ed) label variable age "Age in years" rename ed1 no_ed rename ed2 low_education rename ed3 medium_education rename ed4 high_education label variable no_ed "No education" label variable low_education "Low education" label variable medium_education "Medium education" label variable high_education"High education" * a simple regression model * regress wage low_education medium_education high_education age estimates store reg1 * using esttab to get the results in a publication ready format in Word * #delimit ; esttab reg1 using c:\temp\regress1.rtf, cells(b(star fmt(%9.3f)) se(par)) stats(r2 r2_a N, fmt(%9.3f %9.3f) labels(R-Squared AdjR-Squared n)) starlevels(* .10 ** .05 *** .01) stardetach label mtitles("Regression Model") nogaps replace ; #delimit cr * this is a coefplot - we will return to these later * #delimit ; coefplot, vertical baselevels drop(_cons age) xline(0) ytitle("Regression Coefficient" " ") xtitle(" " "Education Level") title("Regression Model of Women's Hourly Wage", size(medium) justification(right) ) subtitle("(educational level)", size(medsmall) justification(right)) scheme(s1mono) note(" " "Source:womenwk dataset; n= 1,343; Adjusted R-Squared =.26") name(myplot, replace) ; #delimit cr * keep the graph window open * * export the graph to a file * graph export c:\temp\myplot.png, replace tempname handle0 rtfappend `handle0' using c:\temp\regress1.rtf, replace capture noisily { file write `handle0' "\line" file write `handle0' "\line" rtflink `handle0' using myplot.png file write `handle0' "\line" file write `handle0' _n "{\line}" _n "{\pard text can be added here {\ul " file write `handle0' "}\par}" file write `handle0' "\line" _n "{\pard more text can be added here too " file write `handle0' ".\par}" _n "\line" _n rtfclose `handle0' } * /** the graph is now in the Word document regress1.rtf in folder c:\temp\ it has been appended under the regression table **/ ******************************************************************************** ******************************************************************************** * * * A Little Theory * * * ******************************************************************************** /** Understanding a little bit of the theory that guides the construction of graphs in Stata will help you. The appearance of graphs is defined by a series of elements. 1. Elements that control the display of the data, including the shape, color, and size of the 'marker symbols', as well as lines, bars and other ways to display data. 2. Elements that control the size and shape of the graph. 3. Elements which convey additional information within the graph region e.g. marker symbol labels. 4. Information outside the plot region e.g. axis labels. see Kohler and Kreuter (2009 pp.107-108). **/ ******************************************************************************** ******************************************************************************** * * * The Program Effort Dataset * * * ******************************************************************************** /** A gentle reminder... Have you run the following commands at the start of the file clear all macro drop all graph drop _all set more off set scheme s1mono cd c:\temp pwd ******************************************************************************** The Program Effort Data Here are the famous program effort data from Mauldin and Berelson. This extract consists of observations on an index of social setting, an index of family planning effort, and the percent decline in the crude birth rate (CBR) between 1965 and 1975, for 20 countries in Latin America setting effort change Bolivia 46 0 1 Brazil 74 0 10 Chile 89 16 29 Colombia 77 16 25 CostaRica 84 21 29 Cuba 89 15 40 DominicanRep 68 14 21 Ecuador 70 6 0 ElSalvador 60 13 13 Guatemala 55 9 4 Haiti 35 3 0 Honduras 51 7 7 Jamaica 87 23 21 Mexico 83 4 9 Nicaragua 68 0 7 Panama 84 19 22 Paraguay 74 3 6 Peru 73 0 2 TrinidadTobago 84 15 29 Venezuela 91 7 11 Reference: P.W. Mauldin and B. Berelson (1978) Conditions of fertility decline in developing countries, 1965-75 Studies in Family Planning,9:89-147 JSTOR: http://www.jstor.org/stable/1965523 **/ * Input the data * clear input country setting effort change 1 46 0 1 2 74 0 10 3 89 16 29 4 77 16 25 5 84 21 29 6 89 15 40 7 68 14 21 8 70 6 0 9 60 13 13 10 55 9 4 11 35 3 0 12 51 7 7 13 87 23 21 14 83 4 9 15 68 0 7 16 84 19 22 17 74 3 6 18 73 0 2 19 84 15 29 20 91 7 11 end * Adding labels to the dataset * label variable country "country" label variable setting "index of social setting" label variable effort "index of family planning effort" label variable change "percent decline in the crude birth rate" label define country1 /// 1 "Bolivia" 2 "Brazil" 3 "Chile" 4 "Colombia" 5 "CostaRica" 6 "Cuba" /// 7 "DominicanRep" 8 "Ecuador" 9 "ElSalvador" 10 "Guatemala" 11 "Haiti" /// 12 "Honduras" 13 "Jamaica" 14 "Mexico" 15 "Nicaragua" 16 "Panama" /// 17 "Paraguay" 18 "Peru" 19 "TrinidadTobago" 20 "Venezuela" label values country country1 numlabel _all, add summarize * save this file on your c:\ drive or your memory stick * save "c:\temp\effort.dta", replace ******************************************************************************** * a very very simple graph * graph twoway scatter change setting ******************************************************************************** ******************************************************************************** * * * #delimit * * * ******************************************************************************** /** Many graphing commands end up being very long thereforfe use #delimit ; The #delimit command resets the character that marks the end of a command. Commands in a do-file may be delimited with a carriage return or a semicolon. When a do-file begins, the delimiter is a carriage return. The command " #delimit ; " changes the delimiter to a semicolon. To restore the carriage return delimiter inside a file, use #delimit cr **/ #delimit ; graph twoway (scatter change setting) (lfit change setting); #delimit cr ******************************************************************************** ******************************************************************************** * * * Markers * * * ******************************************************************************** #delimit ; graph twoway (scatter change setting, msymbol(circle)); #delimit cr #delimit ; graph twoway (scatter change setting, msymbol(circle_hollow)); #delimit cr #delimit ; graph twoway (scatter change setting, msymbol(smcircle_hollow)); #delimit cr * a scatter plot with weighted markers * #delimit ; graph twoway (scatter change setting [w=effort], msymbol(circle_hollow)); #delimit cr * this will remind you of the main marker symbols that are available * palette symbolpalette * here is another way of getting the same information * graph query symbol * alternatively you can use O Oh oh rather than circle etc * #delimit ; graph twoway (scatter change setting, msymbol(oh)); #delimit cr * adding color * #delimit ; graph twoway (scatter change setting, msymbol(O)mcolor(green)); #delimit cr * changing size * #delimit ; graph twoway (scatter change setting, msymbol(O)mcolor(green) msize(large)); #delimit cr ******************************************************************************** ******************************************************************************** * * * Labelling Points * * * ******************************************************************************** * using the variable country as a label for the points * graph twoway (scatter change setting, mlabel(country) ) * this graph is too cluttered * * the position of the marker labels can be rotated using mlabposition * graph twoway (scatter change setting, mlabel(country)mlabposition(6) ) * positioning labels at nine o'clock on the watch face * graph twoway (scatter change setting, mlabel(country)mlabposition(9) ) /** There are are better ways of labelling and positioning points. We can generate a variable "pos" which has specific values for where we want the labels on the points to be positioned (the hours on a watch face). **/ gen pos=3 label variable pos "position for graph" * here are the labels at position 3 using the variable "pos" * graph twoway (scatter change setting, mlabel(country) mlabv(pos) ) * the graph is still a wee bit cluttered however * replace pos = 9 if country==19 * TrinidadTobago * replace pos = 11 if country==5 * CostaRica * replace pos = 2 if country==16 * Panama * replace pos = 2 if country==15 *Nicaragua* * positioning labels on the watch face according to the variable "pos" * graph twoway (scatter change setting, mlabel(country) mlabv(pos) ) /** I am not convinced the numbers alongside the country labels are that useful. Let us remove them from the graph. **/ label dir numlabel country1, remove graph twoway (scatter change setting, mlabel(country) mlabv(pos) ) ******************************************************************************** ******************************************************************************** * * * Adding Titles * * * ******************************************************************************** graph twoway (scatter change setting) /** The stuff after the comma "," This is where much of the real work in getting your graph to look publication ready will take place A very common mistake is forget the "," . Try to put it on a seperate line! **/ #delimit ; graph twoway (scatter change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") ; #delimit cr /** This is a handy reminder of some elements of the graph. **/ #delimit ; graph twoway (scatter change setting) , title("Title Here" " ") ytitle("y title here") xtitle("x title here") subtitle("sub title here") note("note here") caption("caption goes here") ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Legends * * * ******************************************************************************** * turning legends on and off * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(on) ; #delimit cr #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * changing the shape of the legend (rows and columns) * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(row(1)) ; #delimit cr #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(col(1)) ; #delimit cr * adding a subtitle to the legend * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(col(1) subtitle("This is my legend")) ; #delimit cr * altering the position of the legend * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(col(1) pos(12) subtitle("This is my legend")) ; #delimit cr * position 12 refers to twelve o'clock on a watch face * * changing the order within the legend * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(subtitle("This is my legend") order(2 "Line" 1 "Dot")) ; #delimit cr #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(subtitle("This is my legend") order(1 "Dot" 2 "Line")) ; #delimit cr * using ring(0) to move the legend inside the plotting area * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(ring(0)subtitle("This is my legend") order(1 "Dot" 2 "Line")) ; #delimit cr * using ring(0) to move the legend inside the plotting area and position * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(ring(0) pos(5)subtitle("This is my legend") order(1 "Dot" 2 "Line")) ; #delimit cr /** Using ring(0) to move the legend inside the plotting area and put it at the five o'clock position **/ ******************************************************************************** ******************************************************************************** * * * Axis * * * ******************************************************************************** * turning the scales off * #delimit ; graph twoway (scatter change setting) (lfit change setting) , xscale(off) yscale(off) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr /** Removing the axis line. This is a little less obvious and can be fiddly at first. **/ #delimit ; graph twoway (scatter change setting) (lfit change setting) , xscale(noline) yscale(noline) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * notice that the axis lines look like they are still on the graph * * it is worth learning something about the plot region * #delimit ; graph twoway (scatter change setting, plotregion(style(none))); #delimit cr #delimit ; graph twoway (scatter change setting, plotregion(style(none))) , xscale(noline) yscale(noline); #delimit cr /** The axes and the border around the plot region were right on top of each other. Specifying plotregion(style(none)) will do away with the border and reveal only the axes to us. **/ * changing scales * #delimit ; graph twoway (scatter change setting) (lfit change setting) , xscale(r(40 110)) yscale(r(-10 50)) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * ticks * #delimit ; graph twoway (scatter change setting) (lfit change setting) , title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * by default about five or six values are labeled and ticked on each axis * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(#10) xlabel(#10) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * specifying the values to be labeled and ticked * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(10 20 30 40 50 60 70 80 90) xlabel(10 20 30 40 50 60 70 80 90) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr /** in the label options we specify the rule label 0 to 50 in steps of 5 on the y axis and 0 to 100 in steps of 5 on the x axis **/ #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(0(5)50) xlabel(0(5)100) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * adding ticks every 10 units * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ytick(#10) xtick(#10) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * adding minor ticks 5 minor ticks * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ymtick(##5) xmtick(##5) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr /** 10 minor labels between major ticks on the x axis and 5 minor label between the major ticks on the y axis **/ #delimit ; graph twoway (scatter change setting) (lfit change setting) , ymlabel(##5) xmlabel(##10) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off); #delimit cr * grids * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(, grid) xlabel(, grid) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * grids colors * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(, grid glcolor(pink)) xlabel(, grid) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr * grids lines patterns * #delimit ; graph twoway (scatter change setting) (lfit change setting) , ylabel(, grid glpatter(shortdash) ) xlabel(, grid) title("Fertility Decline by Social Setting" " ") ytitle("Fertility Decline") xtitle("Index of Social Setting") legend(off) ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Line plots * * * ******************************************************************************** * US life expectancy data * sysuse uslifeexp, clear graph twoway line le_wmale le_bmale year /** If you are puzzled by the dip prior to 1920 just search "US life expectancy 1918" using Duck Duck Go. **/ /** altering line color these are sometimes called fcline options which is short for fitted and connected lines **/ #delimit graph twoway (line le_wmale le_bmale year , clcolor(green red) ) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) ; #delimit cr * altering the line pattern * #delimit graph twoway (line le_wmale le_bmale year , clpatter(dash dot )) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) ; #delimit cr * altering the line width * #delimit graph twoway (line le_wmale le_bmale year , clwidth(thin thick)) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) ; #delimit cr ******************************************************************************** ******************************************************************************** * * * Saving Graphs * * * ******************************************************************************** * use the saved effort.dta file on your c:\ drive or your memory stick * use "c:\temp\effort.dta", clear * naming a graph * #delimit ; graph twoway (scatter change setting) , name(g1, replace); #delimit cr * now close your graph window * * displaying a previously named graph * graph display g1 * saving a graph * #delimit ; graph twoway (scatter change setting) , name(g1, replace); #delimit cr graph save "c:\temp\g1", replace * recalling a saved graph * graph use "c:\temp\g1" * draw another graph * #delimit ; graph twoway (scatter change setting[w=effort], msymbol(circle_hollow)) , name(g2, replace); #delimit cr graph save "c:\temp\g2", replace * combining graphs * graph combine g1 g2 * list names of graphs in memory * graph dir * the graph that is currently in memory is called "Graph" * * describe graph stored as "Graph" * graph describe Graph graph describe myplot * discards a graph stored in memory * graph drop Graph graph dir * discards all graphs in memory * graph drop _all graph dir ******************************************************************************** ******************************************************************************** * * * Exporting Graphs * * * ******************************************************************************** /** Let us change dataset for the next section. **/ sysuse census, clear gen drate = divorce / pop18p label var drate "Divorce rate" drop if state=="Nevada" /** exporting a graph to your document here are some potential formats .ps PostScript .eps Encapsulated Postscript .tif Tagged Image Format .png Portable Network Graphic .wmf Windows Metafile .emf Windows Enhanced Metafile .pdf Portable Document Format **/ * an emf file works well with word * #delimit ; graph twoway (scatter drate medage, msymbol(circle)) , name(g3, replace); #delimit cr * these formats work well with Word * * windows enhanced metafile * graph export "c:\temp\g3.emf", replace * portable network graphic * graph export "c:\temp\g3.png", replace * portable document format * graph export "c:\temp\g3.ps", replace * this is a pdf format * graph export "c:\temp\g3.pdf", replace /** here is a useful wee page on file types etc https://www.ssc.wisc.edu/sscc/pubs/4-23.htm **/ ******************************************************************************** ******************************************************************************** * * * Exporting Graphs to Word * * * ******************************************************************************** ******************************************************************************** /** 1. Example of writing a graph straight into a Word file 2. Example of appending (adding) a graph into an existing Word file The rtfutil package is a suite of file handling utilities for producing Rich Text Format (RTF) files in Stata, possibly containing plots and tables. These RTF files can then be opened by Microsoft Word, and possibly by alternative free word processors. The plots can be included by inserting, as linked objects, graphics files that might be produced by the graph export command in Stata. Newson, R. B. 2012. From resultssets to resultstables in Stata. The Stata Journal 12(2): 191-213. Download from The Stata Journal website. http://www.stata-journal.com/sjpdf.html?articlenum=st0254 **/ * you may have to install these packages * ssc install rtfutil net install rtfutil.pkg help rtfutil /** In your own time go through the help as the syntax for rtfutil is not very intuitive! **/ #delimit ; twoway (scatter drate medage) (lfit drate medage) , title("US States Divorce Rates and Median Age") subtitle("(Number of divorces by Population age 18+)",size(medsmall)) ytitle("Divorce Rate") xtitle(" " "Median Age") note("note: State data excluding Nevada") legend(off) scheme(s1mono); #delimit cr * export the graph to a file * graph export c:\temp\myplot2.emf, replace * beware you might have closed the graph window! * * now use rtfutil - run all of these lines together* tempname handle1 rtfopen `handle1' using c:\temp\mydoc2.rtf, template(fnmono1) replace capture noisily { file write `handle1' "{\pard\b Publication ready graph in Word \par}" _n rtflink `handle1' using "myplot2.emf" file write `handle1' _n "{\line}" _n "{\pard text can be added here {\ul " file write `handle1' "}\par}" file write `handle1' "\line" _n "{\pard more text can be added here too " file write `handle1' ".\par}" _n "\line" _n rtfclose `handle1' } * ******************************************************************************** /** The graph will be in Word in an .rtf format file in c:\temp . **/ * draw a new graph * #delimit ; twoway (scatter divorce marriage) , title("Number of Divorces & Marriages") subtitle("(US States)") ytitle("Number of Divorces" " ") xtitle(" " "Number of Marriages") note("note:State data excluding Nevada") legend(off) scheme(s1mono); #delimit cr * export the graph to disk * graph export "c:\temp\myplot3.emf", replace * use the rtfappend to add the append the new graph myplot3 * tempname handle4 rtfappend `handle4' using c:\temp\mydoc2.rtf, replace capture noisily { rtflink `handle4' using myplot3.emf file write `handle4' "\line" rtfclose `handle4' } * /** The new graph will be at the bottom of the Word .rtf format file in c:\temp . **/ ******************************************************************************** ******************************************************************************** * * * Exporting Graphs to Latex * * * ******************************************************************************** /** I am not a Latex user but here is the perceived wisdom. The package graph2tex is required graph2tex does two things 1. It takes the most recently created graph and exports it as a .eps file. 2. It displays LaTeX code you could insert for displaying the figure in your LaTeX document. **/ * you may have to install graph2tex first * findit graph2tex net install http://www.ats.ucla.edu/stat/stata/ado/analysis/graph2tex.pkg #delimit ; twoway (scatter divorce marriage) , title("Number of Divorces & Marriages") subtitle("(US States)") ytitle("Number of Divorces" " ") xtitle(" " "Number of Marriages") note("note:State data excluding Nevada") legend(off) scheme(s1mono); #delimit cr graph2tex, epsfile(c:\temp\myplot2) /** Stata has saved the file and also given you the Latex code to work with it in your document . graph2tex, epsfile(c:\temp\myplot2) % exported graph to c:\temp\myplot2.eps \begin{figure}[h] \begin{centering} \includegraphics[height=3in]{c:\temp\myplot2} \end{centering} \end{figure} Here is a useful wee page http://www.ats.ucla.edu/stat/stata/latex/graph_stata_latex.htm **/ ******************************************************************************** ******************************************************************************** * * * Graph Schemes * * * ******************************************************************************** /** A scheme specifies the overall look of the graph. **/ sysuse census, clear gen drate = divorce / pop18p label var drate "Divorce rate" drop if state=="Nevada" graph twoway (scatter drate medage, msymbol(circle)) graph query, schemes /** Available schemes are s2color see help scheme_s2color s2mono see help scheme_s2mono s2manual see help scheme_s2manual s2gmanual see help scheme_s2gmanual s2gcolor see help scheme_s2gcolor s1color see help scheme_s1color s1mono see help scheme_s1mono s1rcolor see help scheme_s1rcolor s1manual see help scheme_s1manual sj see help scheme_sj economist see help scheme_economist s2color8 see help scheme_s2color8 **/ * use the scheme for the Economist * graph twoway (scatter drate medage, msymbol(circle)), scheme(economist) * use the scheme for the Stata Journal * graph twoway (scatter drate medage, msymbol(circle)), scheme(sj) * a crazy color sheme * graph twoway (scatter drate medage, msymbol(circle)), scheme(s1rcolor) * setting the scheme for all graphs * set scheme s1mono ******************************************************************************** ******************************************************************************** * * * Graphing Results * * * ******************************************************************************** /** In this section we cover graphing results. The focus is graphing results from statistical models. Rather than graphing "data" we are now graphing "resultssets" (i.e. sets of results). Roger Newson uses the term "resultssets" but he also states that he would like to thank Nicholas J. Cox, of Durham University, UK, for coining the term resultsset to describe a Stata dataset of results. **/ /** Here is a very recent development in extracting and plotting coefficients from the great Ben Jann. ftp://repec.sowi.unibe.ch/files/wp1/jann-2013-coefplot.pdf **/ * you might have to find and install the package first * findit coefplot ssc install coefplot webuse womenwk, clear tab educ, gen(ed) label variable age "Age in years" rename ed1 no_ed rename ed2 low_education rename ed3 medium_education rename ed4 high_education label variable no_ed "No education" label variable low_education "Low education" label variable medium_education "Medium education" label variable high_education"High education" regress wage low_education medium_education high_education age * plotting the coefficients * coefplot coefplot, vertical baselevels drop(_cons age) yline(0) * a more publication ready graph * #delimit ; coefplot, vertical baselevels drop(_cons age) yline(0) ytitle("Regression Coefficient" " ") xtitle(" " "Education Level") title("Regression Model of Women's Hourly Wage", size(medium) justification(right) ) subtitle("Educational Levels", size(medsmall) justification(right)) scheme(s1mono) ; #delimit cr * estimate two models * regress wage low_education medium_education high_education age estimates store model1 regress wage low_education medium_education high_education age married estimates store model2 #delimit ; coefplot model1, vertical baselevels drop(_cons age married) xline(0) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model" "of Women's Hourly Wage", size(medium) justification(center) ) subtitle("Educational Levels", size(medsmall) justification(center)) scheme(s1mono) name(gmodel1, replace) ; #delimit cr #delimit ; coefplot model2, vertical baselevels drop(_cons age married) yline(0) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model" "of Women's Hourly Wage", size(medium) justification(center) ) subtitle("Educational Levels", size(medsmall) justification(center)) scheme(s1mono) name(gmodel2, replace) ; #delimit cr * plotting the coefficients from both models * #delimit ; coefplot (model1, label(model 1)) (model2, label(model 2)) , vertical baselevels drop(_cons age married) yline(0) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model of Women's Hourly Wage", size(medium) justification(center) ) subtitle("Educational Levels", size(medsmall) justification(center)) scheme(s1mono) ; #delimit cr * an example of the versitility of this package * * plotting coefficients with Harrell style confidence intervals * regress wage low_education medium_education high_education age #delimit ; coefplot, vertical drop(_cons age) yline(0) msymbol(circle_hollow) mcolor(white) levels(99 95 90 80 70) ciopts(lwidth(3 ..) lcolor(*.2 *.4 *.6 *.8 *1)) legend(order(1 "99" 2 "95" 3 "90" 4 "80" 5 "70") row(1) subtitle("Confidence Intervals")) ytitle(" ""Regression Coefficient") xtitle(" " "Education Level") xlabel(1 "Low" 2 "Medium" 3 "High" , valuelabel alternate ) title("Regression Model of Women's Hourly Wage", size(medium) justification(center)) subtitle("Educational Levels", size(medsmall) justification(center)); #delimit cr /** Further information and examples are available here http://www.stata.com/meeting/germany14/abstracts/materials/de14_jann.pdf . **/ ******************************************************************************** /** Finally, the graph from one of my old papers Gayle, Vernon, and Paul S. Lambert. "Using quasi-variance to communicate sociological results from statistical models." Sociology 41.6 (2007): 1191-1208. **/ clear input region beta lower upper source 0 0 0 0 1 1 0.0943551 0.1143471 0.0743631 1 2 0.1209922 0.1419642 0.1000202 1 3 0.148519 0.170275 0.126763 1 4 0.1302653 0.1510413 0.1094893 1 5 0.3158952 0.3368672 0.2949232 1 6 0.3567166 0.3765126 0.3369206 1 7 0.2606339 0.2819979 0.2392699 1 8 0.1741903 0.1981023 0.1502783 1 9 0.2696731 0.2914291 0.2479171 1 0.2 0 0.0170324 -0.0170324 2 1.2 0.0943551 0.1049783 0.0837319 2 2.2 0.1209922 0.133399 0.1085854 2 3.2 0.148519 0.1620626 0.1349754 2 4.2 0.1302653 0.1423389 0.1181917 2 5.2 0.3158952 0.3281844 0.303606 2 6.2 0.3567166 0.3669282 0.346505 2 7.2 0.2606339 0.2734131 0.2478547 2 8.2 0.1741903 0.1910855 0.1572951 2 9.2 0.2696731 0.2832559 0.2560903 2 end summarize label variable region "GOR region" label variable beta "Parameter estimates" label variable upper "Upper bound" label variable lower "Lower bound" label define regl 0 "North East" 1 "North West" 2 "Yorks" 3 "E Mids" /// 4 "W Mids" 5 "East" 6 "South East" /// 7 "South West" 8 "Inner London" 9 "Outer London" label values region regl tab region summarize tab region #delimit ; graph twoway (scatter beta region if source==1, msymbol(circle_hollow) mlcolor(gs0) msize(medium)) (scatter beta region if source==2, msymbol(diamond_hollow) mlcolor(gs0) msize(medium)) (rspike upper lower region if source==1, blwidth(medium) ) (rspike upper lower region if source==2, blwidth(medthick) ) , ytitle("") xtitle("") yscale(range(-0.02 0.39)) xscale(range(-0.3,9.5)) xlabel(0 1 2 3 4 5 6 7 8 9, valuelabel alternate ) title("Predictions of Good Health, by Government Office Region", size(large) justification(center) ) subtitle("Confidence intervals of regression coefficients, by estimation method", size(medsmall) justification(center) ) note("Source: UK Census 2001 SARS for England, n=1099294." "Model 1: Logistic regression predicting 'Good Health'. Other controls for education and gender" "Gayle and Lambert (2007 p.1195)", justification(left) ) legend( order(1 2) label(1 "Conventional regression") label(2 "Quasi-Variance") ); #delimit cr ******************************************************************************** /** Useful References Cox, Nicholas J. Speaking Stata Graphics: A Collection from the Stata Journal. Stata Press, 2014 Gayle, Vernon, and Paul S. Lambert. "Using quasi-variance to communicate sociological results from statistical models." Sociology 41.6 (2007): 1191-1208. Kohler, Ulrich, and Frauke Kreuter. Data analysis using Stata. Stata press, 2012. Jann, Ben. "Plotting regression coefficients and other estimates." Stata Journal 14.4 (2014): 708-737. Lewandowsky, Stephan, and Ian Spence. "The perception of statistical graphs." Sociological Methods & Research 18.2-3 (1989): 200-242. Long, J. Scott. The workflow of data analysis using Stata. Stata Press, 2009 Mitchell, Michael N. A visual guide to Stata graphics. Stata Press, 2008. Newson, Roger B. "From resultssets to resultstables in Stata." Stata Journal 12.2 (2012): 191. http://www.ats.ucla.edu/stat/stata/library/GraphExamples/ http://www.stata.com/features/example-graphs/ **/ ******************************************************************************** ******************************************************************************** ******************************************************************************** * * * Aditional Material * * * ******************************************************************************** ******************************************************************************** ******************************************************************************** ******************************************************************************** * * * Aspect Ratio * * * ******************************************************************************** sysuse uslifeexp, clear * an aspect ratio greater than 1 creates a tall skinny graph * #delimit ; graph twoway (line le_wmale le_bmale year , clwidth(thin thick)) , title("U.S. Life Expectancy") subtitle("Males") legend( order(1 "White men" 2 "Black men")) aspectratio(1.3); #delimit cr * an aspect ratio less than 1 creates a shorter graph * #delimit ; graph twoway (line le_wmale le_bmale year , clwidth(thin thick) ) /// , title("U.S. Life Expectancy") subtitle("Males") /// legend( order(1 "white" 2 "black")) /// aspectratio(0.5); #delimit cr ******************************************************************************** ******************************************************************************** * * * Jitter * * * ******************************************************************************** /** Scatter will add spherical random noise to your data before plotting if you specify jitter(#), where # represents the size of the noise as a percentage of the graphical area. This can be useful for creating graphs of categorical data when, if the data are not jittered, many of the points would be on top of each other, making it impossible to tell whether the plotted point represented one or 1,000 observations. For instance, in a variation on auto.dta used below, mpg is recorded in units of 5 mpg, and weight is recorded in units of 500 pounds. A standard scatter has considerable overprinting **/ sysuse autornd, clear tab mpg scatter mpg weight, name(nonjitter) /** There are 74 points in the graph, even though it appears because of overprinting as if there are only 19. Jittering solves this problem. **/ scatter mpg weight, jitter(7) name(jitter) graph combine nonjitter jitter ******************************************************************************** ******************************************************************************** * * * Graphing Results * * * ******************************************************************************** /** Another approach is to use Roger Newson's parmest program. Parmest takes, as input, the most recently calculated set of estimation results, created by the most recently executed estimation command. It creates, as output, a new dataset, with one observation per estimated parameter, and variables containing parameter names, estimates, standard errors, z-test or t-test statistics, p-values, confidence limits, and other estimation results if requested by the user. **/ * you may have to install this package first * ssc install parmest ssc d parmest /** The hsb2 dataset is taken from a national survey of high school seniors two hundred observation were randomly sampled from the High School and Beyond survey. **/ use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear numlabel _all, add summarize tab race, missing tabulate race, gen(ethnic) rename ethnic1 hispanic rename ethnic2 asian rename ethnic3 africam rename ethnic4 white reg read female hispanic asian africam * save the regression results * #delimit ; parmest,format(estimate min95 max95 %8.2f p) list(,) saving(c:\temp\parmest1,replace) ; #delimit cr * use the new file with the regression results * use c:\temp\parmest1.dta, clear browse * take a look at the data * summarize browse gen id=_n * it can be handy to give each row an id number * * a simple graph of the estimates for ethnicity in the model * #delimit ; twoway (scatter estimate id if id>1 & id<5, msymbol(circle_hollow)) (rspike min95 max95 id if id>1 & id<5); #delimit cr * a more publication-ready graph of the estimates for ethnicity in the model * #delimit ; twoway (scatter estimate id if id>1 & id<5, msymbol(circle_hollow)) (rspike min95 max95 id if id>1 & id<5) , ytitle("") xtitle("") xscale(range(2, 4.5)) xlabel(2 "Hispanic" 3 "Asian" 4 "African-American" , valuelabel alternate ) title("Standardised Reading Score, Ethnicity Effects") subtitle("Regression coefficients and confidence intervals") note("Source: High School and Beyond, n=200." "Model 1: Regression model 'Reading Score' ethnicity and gender.") legend( order(1 2)label(1 "Parameter estimate") label(2 "95% C.I.") ); #delimit cr ******************************************************************************** /** Here is another approach to extracting and plotting coefficients. This is more general and can be adapted for other purposes. **/ use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear numlabel _all, add summarize tab race, missing tabulate race, gen(ethnic) rename ethnic1 hispanic rename ethnic2 asian rename ethnic3 africam rename ethnic4 white reg read female hispanic asian africam * take a look at the matrix of estimation results * matrix list e(b) * make your own matrix * matrix b = (e(b)) * convert matrix info to variables * svmat b keep b1 b2 b3 b4 b5 summarize gen id=_n * take a look at the data * summarize id browse * all you want are the values in row 1 keep if id==1 browse * drop the estimate for female and the constant * drop b1 b5 * calculate a value for whites repsondents * gen b0=0 * calculate some values for the X axis * gen id0=1 gen id2=1.5 gen id3=2 gen id4=2.5 #delimit ; twoway (scatter b0 id0, msymbol(square_hollow) mlcolor(gs0) msize(medium)) (scatter b2 id2, msymbol(circle_hollow) mlcolor(gs0) msize(medium)) (scatter b3 id3, msymbol(triangle_hollow) mlcolor(gs0) msize(medium)) (scatter b4 id4, msymbol(diamond_hollow) mlcolor(gs0) msize(medium)) , ytitle("") xtitle("Ethnicity") yscale(range() titlegap(1) ) xscale(range(1, 3)) xlabel(1 2 3 4 5 6 7 8 9, valuelabel alternate ) xlabel(1 "White" 1.5 "Hispanic" 2 "Asian" 2.5 "African-American", valuelabel alternate ) title("Standardised Reading Score and Ethnicity", size(large) justification(center) ) subtitle("Regression coefficients and confidence intervals", size(medsmall) justification(center) ) note("Source: High School and Beyond, n=200." "Model 1: Regression model 'Reading Score' gender and ethnicity.", justification(left) ) legend(off) ; #delimit cr ******************************************************************************** /** © Vernon Gayle, University of Edinburgh. Professor Vernon Gayle (vernon.gayle@ed.ac.uk) This file has been produced by Vernon Gayle. Any material in this file must not be reproduced, published or used for teaching without permission from Professor Gayle. The original idea for teaching graphing in this way came from my colleague Professor Stephen Jenkins (s.jenkins@lse.ac.uk) who is a Stata genius. Johannes Langer a graduate student at the University of Edinburgh provided very useful comments and helped expunge some typos and errors. Over the last decade much of the Stata materials that Professor Gayle has developed have been in close collaboration with Professor Paul Lambert, Stirling University. However, Professor Gayle is responsible for any errors in this file. Citing this .do file Gayle, V. (2016). Producing Publication Ready Graphs in Stata, University of Edinburgh. © Vernon Gayle, University of Edinburgh. Professor Vernon Gayle (vernon.gayle@ed.ac.uk) ******************************************************************************** * End of file *