Data analysis 4 (Loading Data)

4. Loading data

command [action]
4.1. use data [ loads a Stata format data set (.dta) ]
4.2. import delim filename [ loads a text delimited data file]
4.3. import excel filename, variable name row [ loads an Excel data file and sets the row to used as variable names (often first) ]

Now that you know how to use Stata the next step is loading data. We will focus on the 3 most common types of data you will encounter.

  • Stata formatted data (.dta files)
  • Text delimited files (commonly .csv files)
  • Microsoft Excel files (.xlsx files)

Of course you can open many other types of files, but the above 3 are the most likely for you to encounter in the beginning.

The most difficult part of loading data is understanding your working directory and the path to the data. In order for your do-file to work on different machines it is important to use relative paths to open files. Let's download some data files into your working directory. Type:

    do "https://eddie-hearn.github.io/teaching/ZEM/do-files/loading_data"

This will run the do-file "loading-data" which will place some data files on your computer in a new working directory name "data". The directory will open automatically in a window so you can verify the files are on your computer. You can also check your working directory in Stata with the command pwd and view available files with the command ls.

The easiest file to open is a Stata formatted .dta file. You simply use the use command and the filename. You do not need to type .dta as that is the default extension.

    use births-per

*** If there are spaces in your filename, you need to put it inside " " ***

If we want to open the coma delimited file (.csv), we use the import delim command and the full filename.

*** To open a new dataset, you have to "clear" any loaded datasets, use the clear command ***

    clear
    import delim births-per.csv

We can open the Excel file (.xlsx) with the import excel command. We will also need to specify that the first row is variable names when working with an excel file.

    clear
    import excel births-per.xlsx, first