A Spatial Investigation into Heart Disease Mortality Rates and Youth Tobacco Rates

Among 16,700 data sets on data.gov, I found two data sets particularly interesting.

My goal: to see if there is a correlation between 35 year olds Heart Disease Mortality Rates and Youth Tobacco Rates of middle and high school students.

The main programming language I used in this research was R, and packages I installed were ggmap, mapproj, leaflet (Javascript Library), ggplot2, and dplyr.

Marking each State

The first step was to mark each state correctly on the map. - csv file taken from inkplant - markers from R Studio

Semester

Fall 2017

Researcher

Mee Kyoung Seo

Navigation

Introduction
Background
Marking each State
Data Cleaning
Mapping Data on Leaflet
Correlation between the Variables
Finishing Thoughts
References

Data Cleaning

I cleaned the csv file by taking out all the NA values and grouping values by state for both data sets. To use the package Leaflet, I had to merge the two data sets, Heart Mortality data and Youth Tobacco rate data, with the longitude and latitude table to make the format appliable to Leaflet.

Mapping Data on Leaflet

The two maps coded below will show:

Regions that show high heart mortality rates and high teenage tobacco rates
Different colors and sizes of circles that show high vs low rates

(Referenced from R graph gallery.)

Correlation between the Variables

I first visualized the relationship by grouping by state. I then used Pearson’s product-moment correlation, which is the method used for finding relationship between two variables, to calculate the correlation of 0.48 (round up to 2nd digit). I used this method to calculate the correlation coefficient because it accurately shows the linear relationship between the two columns.

Finishing Thoughts

It was interesting to see that there was a moderately positive linear relationship: some correlation between youth tobacco Rates and heart mortality rates mortality rates. There are multiple possible reasons why we see this, the most obvious being that states with higher youth tobacco rates likely have a greater proportion of people who smoke. As smoking increases the risk of cardiovascular disease, this may lead to higher heart mortality rates. In addition, states with higher youth tobacco rates may have fewer smoking regulations, leading to higher exposure to the smoke (air pollution). Certain states have stricter ban for tobacco in public spaces (notably 25 of them, such as California, Hawaii, Massachusetts, Michigan, New York, etc), which may also explain why states show big differences in terms of smoking rates and heart mortality rates.

Executive / Directors

Member Profiles

Big-Little Tree