Introduction to Stata

1. Loading data

command [action]
1.1. use file_name [loads Stata formatted data]

We will use the current data from the Human Mortality Database. To load the data, type:

    use "https://eddie-hearn.github.io/teaching/FYS/covid.dta"

2. Viewing data

command [action]
2.1. describe [describes loaded data]
2.2. sum variable [summarizes variable]
2.3. tab variable [produces 1-way table of variable ]

To see the countries in the data, type:

    tab countrycode

3. Visualizing data

command [action]
3.1. line y-variable x-variable [creates line graph *line is just one type of graph you can make]
3.2. twoway (graph_1) (graph_n) [combine multiple graphs]
3.3. if [specifies a conditional statement]

Now using what we have learned we can summarize total deaths in the US the year before Covid (2019) and make a basic line graph. Type:

    sum dtotal if year == 2019 & countrycode=="USA"
    set scheme s1color
    line dtotal week if year == 2019 & countrycode=="USA"
    twoway (line dtotal week if year == 2019 & countrycode=="USA")(line dtotal week if year == 2020 & countrycode=="USA")

Let's add the 5 years prior to covid and the covid years to our graph. Type:

    twoway (line dtotal week if year == 2015 & countrycode=="USA")(line dtotal week if year == 2016 & countrycode=="USA")(line dtotal week if year == 2017 & countrycode=="USA")(line dtotal week if year == 2018 & countrycode=="USA") (line dtotal week if year == 2019 & countrycode=="USA") (line dtotal week if year == 2020 & countrycode=="USA")(line dtotal week if year == 2021 & countrycode=="USA") (line dtotal week if year == 2022 & countrycode=="USA")

This is a little difficult to understand. Let's change the non covid years to gray and add a legend so we know which year is which. Type:

    twoway (line dtotal week if year == 2015 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2016 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2017 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2018 & countrycode=="USA", lcolor(gray)) (line dtotal week if year == 2019 & countrycode=="USA", lcolor(gray)) (line dtotal week if year == 2020 & countrycode=="USA")(line dtotal week if year == 2021 & countrycode=="USA") (line dtotal week if year == 2022 & countrycode=="USA"), title("Total Deaths") ytitle("Deaths") legend( order(6 "2020" 7 "2021" 8 "2022"))

Of course we can make our graph a little prettier. Type:

    twoway (line dtotal week if year == 2015 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2016 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2017 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2018 & countrycode=="USA", lcolor(gray)) (line dtotal week if year == 2019 & countrycode=="USA", lcolor(gray)) (line dtotal week if year == 2020 & countrycode=="USA")(line dtotal week if year == 2021 & countrycode=="USA") (line dtotal week if year == 2022 & countrycode=="USA"), title("Weekly Deaths USA (2015-2022)") ytitle("Deaths") legend(order(5 "2015-2019" 6 "2020" 7 "2021" 8 "2022") col(4)) xlabel(.3 "Jan" 4.3 "Feb" 8.6 "Mar" 12.9 "Apr" 17.3 "May" 21.6 "Jun" 25.9 "Jul" 30.2 "Aug" 34.5 "Sep" 38.8 "Oct" 43.1 "Nov" 47.4 "Dec", ang(45))

And we can check if things improved in 2023. Type:

    twoway (line dtotal week if year == 2015 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2016 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2017 & countrycode=="USA", lcolor(gray))(line dtotal week if year == 2018 & countrycode=="USA", lcolor(gray)) (line dtotal week if year == 2019 & countrycode=="USA", lcolor(gray)) (line dtotal week if year == 2020 & countrycode=="USA")(line dtotal week if year == 2021 & countrycode=="USA") (line dtotal week if year == 2022 & countrycode=="USA") (line dtotal week if year == 2023 & countrycode=="USA"), title("Weekly Deaths USA (2015-2023)") ytitle("Deaths" " ") legend(order(5 "2015-19" 6 "2020" 7 "2021" 8 "2022" 9 "2023") col(5)) xlabel(.3 "Jan" 4.3 "Feb" 8.6 "Mar" 12.9 "Apr" 17.3 "May" 21.6 "Jun" 25.9 "Jul" 30.2 "Aug" 34.5 "Sep" 38.8 "Oct" 43.1 "Nov" 47.4 "Dec", ang(45))

4. Comparing data

command [action]
4.1. display [displays output - can be used as a calculator]
4.2. ttesti obs_1 mean_1 sd_1 obs_2 mean_2 sd_2 [conducts 2-way ttest]

Let's compare death rates in the US in 2020 (observed) to the averge deaths from 2014-2019 (expected) and test if they are statistically different. Type:

    sum rtotal if countrycode == "USA" & year >2014 & year < 2020
    sum rtotal if countrycode == "USA" & year == 2020
    ttesti 53 .010187 .0012105 260 .0085809 .0005351

Now we can calculate excess deaths. We just need to multiply the difference in the expected and observed death rate (.001684) by the population 331,002,651. We can use the 95% confidence interval to calculate high and low estimates. Type:

    display .0016061 * 331002651
    display .0013999 * 331002651
    display .0018123 * 331002651

***US reported 375,546 total COVID death on Jan. 1 2021***

5. Try it

Choose a country from the dataset other than the US. Create a graph of your country's weekly deaths from 2015-2022. Answer the questions on the handout. After you finish, type the following into Stata:

    do "https://eddie-hearn.github.io/teaching/FYS/loop"

Find your country name.log and name.pang in the folder "reports" that will automatically open on your computer. Check your answers.