top of page

Stata

Stata is an econometric software, used worldwide by researchers and students. It is a powerful package that offers data manipulation, visualization, statistics, and reproducible reporting. Of course, there exist several other options that users can adopt:

Matlab: Very powerful package allowing you to do virtually anything;

 

EViews: Quite popular in financial sector, but static (only few add-ins available and substantial improvements are available only with a new release);

R: Open source, flexible and powerful.

 

Then, why Stata? My choice is based on several reasons. The first one is that the software is quite user-friendly and can be be easily used also by beginners (it has a very comprehensive menu bar that allows users to master it). Moreover, it can be fully programmed (an useful option for advanced users) . Finally, because of its popularity, there are numerous additional routines that can be downloaded and used free of charge (below I suggest some of them, that I find particularly useful).

Useful Resources

There is a vast number of resources about Stata that can be retrivied over the net. In my view, some of them are more useful than others no matter the level of proficiency: 

 

Oscar Torres-Reyna webpage at Princeton University: It contains a set of notes, covering material for beginners (but useful for researchers as well);

 

The IDRE website at UCLA: It offers numerous explanations and tutorials that can promptly solve several issues;

 

Statlist forum (free registration required): StataCorp technicians, researchers and beginners can share their thoughts, post questions and find solutions even to very advanced problems;

 

Official Stata channel on YouTube: StataCorp posts here several useful videos, where its technicians offer explanations about the most popular routines and procedures.

Stata news webpage: If you like to be updated regarding the most recent advancements among Stata users (I find particularly useful the section related to the Stata Conference and Users Group Meetings).

Routines

A huge number of additional routines can be downloaded free of charge. From the command window in Stata, simply type: ssc install <name of the routine>. There are more than 3,000 available routines (just click here to see how many they are).

 

While it is impossible to provide information about all of them, there are some that I find particulary useful:

xtabond2: Written by David Roodman, this routine is now a must for those that want to estimate a dynamic panel data model using the Arellano-Bond or the Blundell-Bond estimators. It "replaces" the official routines in Stata, since it is quite flexible and provides much more information.

xtlsdvc: Based on the work by Kiviet (1995), this routine, written by Giovanni Bruno, allows to estimate bias corrected LSDV estimators for standard dynamic panel data models. 

xtbcfe: This routine performs the iterative bootstrap-based bias correction for the fixed effects (FE) estimator in dynamic panel data models, developed in Everaert and Pozzi (2007). Montecarlo simulation proves that this estimator performs better than the standard GMM techniques and the Kiviet bias corrected fixed effect estimator.

xtdpdgmm: This routine implements GMM estimators for linear dynamic panel data models. It is able to reproduce the same results that can be obtained using xtabond2 or the other routines officially available in Stata for the estimation of such models. 

Additionally, it allows nonlinear moment conditions suggested by Ahn and Schmidt (1995), which may be particularly useful for data displaying high persistency. Additional routines are available from the webpage of the author.

getsymbols and mvport: This two routines are related each other. They have been written by Alberto Dorantes and are particularly useful for interested in finance. The first one allows the research to collect and integrate time series of stock tickers, indexes, economic series from Quandl, Yahoo Finance, Google Finance and Alpha Vantage (an API key is required for some of those sources, as explained in the help file). Simply use the same symbol that identifies a financial series in the specific source to download data for that asset with different frequency. In some cases, even 1-minute data are available. Alpha Vantage has stock quotes, market indexes, and also cryptocurrency quotes. The second routine, mvport, calculates the minimum variance financial portfolio given a specific return and a set of financial instrument returns specified in a varliable list. The latter can be used in combination with getsymbols. A good explanation of them is provided by a presentation delivered by Alberto Dorantes at the 2018 Mexico Stata Users Meeting, available from here. Additionally, some useful information about how to use Stata to carry out research in finance can be found here.

 

kountry: This is one of my favourite routines. Collecting data can be stressful. This is particularly true, if someone tries to aggregate a macro dataset, withdrawing data from different sources. In fact, most of them reports different countries' names or they use abbreviations or numeric codes to identify them. Hence, the researcher needs to convert the names/abbreviations/numeric codes to uniform the name of countries and merge the datasets from different sources. kountry is a "universal" converter. It standardizes country names from various sources, making much easier to merge datasets that report different spellings, abbreviations, and numeric codes for the same country. In addition, it converts country names from one coding scheme to another, and, finally, it generates a "geographical region" variable.

 

sdmxuse: Using this routine, a researcher can import data from statistical agencies using the SDMX standard. Data can be downloaded from the European Central Bank (ECB), Eurostat (ESTAT), the International Monetary Fund (IMF), the Organisation for Economic Co-operation and Development (OECD), the United Nations Statistics Division (UNSD) and the World Bank (WB). 

utest: The routine allows to test correctly for the presence of a U shaped (or inverse U shaped) relationship on an interval. This routine implements the procedure explained in Lind and Mehlum (2010).

inteff: The routine allows to estimate the marginal effect of a change in two variables in nonlinear models. Specifically, it computes correctly the marginal effect - and the associated standard errors - of a change in two interacted variables for a logit or probit model. A deep explanation of the routine can be found in the associated Stata journal paper.

nwcommands: Thomas Grund wrote a set of routines that can be used to carry out network analysis in Stata. The installation procedure is slightly different from that stated above. In the companion website nwcommands, you can find a clear explanation about how to install the set of routines and about how to exploit all its capabilities.

ivreg2: This routine performs an IV estimation similarly to the ivregress Stata command. However, it reports more detailed statistics to evaluate the correctness of instruments.

ivreg2h: This routine is based on the paper by Lewbel (2012). It estimates an IV regression model by generating instruments from the heteroskedatic errors in the absence of traditional identifying information, such as external instruments or repeated measurements. In this way, instruments may be constructed as simple functions of the model's data. This approach is useful when no external instruments are available, or, alternatively, used to supplement external instruments to improve the efficiency of the IV estimator. Arthur Lewbel's website contains a useful description of the methodology, which can be downloaded from here

grstyle: This is a Stata command, written by Ben Jann, that allows you to customize the overall look of graphs from within a do-file. The plus of this routine is that you do not need to change the graph itself. Simply, before plotting it you can set in few words the style you prefer. In addition, grstyle provides a number of useful features such as assigning color palettes or setting absolute sizes. Ben Jann's website containes additional and useful routines, that may be worthy to look at.

esttab: Very useful routine to transfer Stata results in LaTex. The companion website provides information about other routines that can be helpful in making research.

tfools: A tool developed by Dicle and Levendis to performe various financial technical analysis tools including moving averages, Bollinger bands, MACD and RSI.      

Just for fun

Sport and Stata

If you are interested in US sports, you will find the next two routines very interesting:

nba2stata: It allows you to download statistics directly from the NBA statistics page. Information about each single team or player or seasons can be downloaded and formatted into stata in few second.

nfl2stata: Similary to the prevous one, this routine allows you to obtain statistics from the NFL statistics page. Also in this case, you can obtain information about teams, players, matches and seasons. 

Graph animation

Stata gives the possibility to create an animated graph/gif. Actually, this is not a specific characteristics of the software. However, combining its capabilities with those of an external software (to be recalled from inside Stata), it is still possible to animate a graph (The example below is based on the one reported in the Stata Blog with some minor changes). 

Consider the following gif. 

The graph above is based on a regression model, which tries to uncover the relationship between the rate of homicide in the counties in Texas and the rate of unemployment. However, the homicide rate in a county is likely to affect the one in neighbouring counties. Hence, if the rate of unemployment increases in county i, it will affect the rate of homicide in the same county, but, in turn, the rate of homicide in country i will affect that rate in neighbouring counties. Said differently, there are some spillover effects that someone should take into account. 

If someone wants to take into consideration such spillover effects, a spatial model should be estimated. In particular, the graph above is based on the estimation of a spatial autoregressive (SAR) model (more details about spatial econometrics can be found here).

But how to generate the graph above? The following steps are required:

1) Download and save on your pc FFMPEG, a free software designed to record, manage and edit videos and audio files. Notice that installation requires some steps that are clearly explained here.

2) The gif above is generated from a mpeg file. However, Windows10 cannot play such a file, as it does not build in the necessary codec. Instead of making several attempts to install the appropriate code, it is preferable to download a player that can read such files, as VLC from here.

The trick to generate the above gif consists of generating multiple pics that are later assembled in a single file. 

The do file to create the above gif can be downloaded from here.

If you like animation in Stata and you want to have additional information, the page created by Robert Grant is particularly useful.

graph.gif
new_edited_edited_edited.png
bottom of page