How to Create and Save STATA Databases

By Roberto Pedace

In order to begin doing any exploratory data analysis or econometric work, you need a dataset that can be opened by specialized econometric software such as those in STATA format (*.dta). (STATA is one of the most popular econometrics software programs and makes the application of econometric techniques possible for anyone who’s not a computer programming genius.)

If you’re downloading data from an online source, you may be able to obtain the data in STATA format. Many econometrics textbooks also give you access to data files in STATA format. In addition, the STATA program is preloaded with examples that you can use to familiarize yourself with the basic commands.

After opening STATA, you can access the sample datasets by selecting File→Example Datasets… If you want to open any other dataset that’s already in STATA format, select File→Open and then choose the file you want to work with. On the command line, you can open a STATA dataset by typing “use filename” and hitting return.

If you’re inputting data manually or downloading it in a non-STATA format, then you can use one of two methods to read it into STATA:

  • Select File→Import: This option can be used if the data is in Excel, SAS XPORT, or Text format. You select the appropriate format of your raw data, and then you’re prompted to select the file you’d like to import into STATA.

  • Select Data→Data Editor: This option opens an editor that resembles a spreadsheet. You can paste columns of data into the editor or input data manually.

If you import a dataset that wasn’t originally in STATA format, you need to save the dataset in STATA format in order to use it again, particularly if you inputted data through the editor and want to avoid replicating all your efforts. Also, if you made any changes to an existing STATA dataset and want to retain those changes, you need to save the revised dataset.

Select File→Save As (or type “save new filename” on the command line) and choose a new name for the modified file. That way if you accidentally delete a variable or drop observations, you can always go back to the original data file.