EPPS 6356 Assignment 4

A designer is trained through repetitive practices and creating multiple works on the same object. This assignment follows that method and requests doing charts via a programming approach. Let’s do a hackathon and generate charts in the next 24 hours. Generate a program to create a chart using either R Graphics or ggplot2 or any R packages (no other methods/software are allowed).

The data set used aimed to observe the net domestic immigration rates (NDIR), which represents the net movement of people into and out of a state, over the period 1990-1994 divided by the population of the state. Alaska and Hawaii are excluded from the analysis because the environments of these states are significantly different from the other 48, and their locations present certain barriers to immigration. Eleven predictor variables thought to influence NDIR are defined in Table 1.4 below.

Independent Variable (IV): Regional Tax Brackets

Dependent Variable (DV): NDIR (Net Domestic Immigration Rate)

Legend of Dummy DV (NDIR):

Immigration into the Region - “1”

Emigration out of the Region - “0”

Below is our analysis of our generated charts:

Figure 1: The Effect of Tax Brackets on NDIR.

The first and second tax bracket shows a higher number of people immigrating in this tax bracket than emigrating. However, the third and fourth tax brackets have lower immigration rates than emigration rates. This shows there is no general trend of NDIR in relation to all four tax brackets, as seen in Figure 1.

What can be observed is that the lower the value of NDIR, the more people are emigrating out of the state, the higher the value of NDIR, the more people are immigrating into the state.

Figure 2: The Effect of Region on NDIR.

Figure 3: The Effect of Region on Tax Brackets.

The immigration/emigration graph organized by region, suggests that people in the Northeast region almost exclusively emigrate, as seen in Figures 2 and 3 above. The effects of region on tax bracket model shows that none of the participants responded that they belonged in either of the first two tax brackets. This could also be the reason that we do not see Northeast noticeably represented in the aggregated model, Figure 4.

Figure 4: The Effect of Tax Brackets on NDIR by Region.

# Setting up the new R environment, starting fresh, click run! 
(list=ls())
character(0)
# Setting up the working directory, click run!
setwd("/Users/sami_manuel/Documents/Fall 2022/EPPS 6356/samantha-manuel.github.io")

# Reading the file, click run!
HW4 <- read.delim("HW4data.txt") 
head(HW4)
        State   NDIR Unemp  Wage Crime Income Metrop Poor Taxes Educ BusFail
1     Alabama  17.47   6.0 10.75   780  27196   67.4 16.4  1553 66.9    0.20
2     Arizona  49.60   6.4 11.17   715  31293   84.7 15.9  2122 78.7    0.51
3    Arkansas  23.62   5.3  9.65   593  25565   44.7 15.3  1590 66.3    0.08
4  California -37.21   8.6 12.44  1078  35331   96.7 17.9  2396 76.2    0.63
5    Colorado  53.17   4.2 12.27   567  37833   81.8  9.0  2092 84.4    0.42
6 Connecticut -38.41   5.6 13.53   456  41097   95.7 10.8  3334 79.2    0.33
   Temp    Region
1 62.77     South
2 61.09      West
3 59.57     South
4 59.25      West
5 43.43      West
6 48.63 Northeast
#Turning on the packages required for HW4, click run!
library("Hmisc") 
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'
The following objects are masked from 'package:base':

    format.pval, units
library("tidyverse")
── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──
✔ tibble  3.1.6      ✔ dplyr   1.0.10
✔ tidyr   1.2.0      ✔ stringr 1.4.0 
✔ readr   2.1.1      ✔ forcats 0.5.1 
✔ purrr   0.3.4      
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()    masks stats::filter()
✖ dplyr::lag()       masks stats::lag()
✖ dplyr::src()       masks Hmisc::src()
✖ dplyr::summarize() masks Hmisc::summarize()
#Identifying the mean of HW4 for later usage: mean= 10.88854, click run!
mean(HW4$NDIR)
[1] 10.88854
#Creating a new dummy variable, i.e. above 1 or below 0 mean, click run!
HW4$NDIR_dummy <- ifelse(HW4$NDIR>=10.88854, 1, 0)

#Creating taxes into an ordinal variable with 4 equally sized bins, click run!
HW4$tax_ord <- cut2(HW4$Taxes, m=12)

#Creating cross tabulation, click run!
table(HW4$NDIR_dummy,HW4$tax_ord)
   
    [1535,1853) [1853,2126) [2126,2371) [2371,3655]
  0           5           4           7          10
  1           7           8           5           2
#Creating bar plot, click run!
barplot(table(HW4$NDIR_dummy,HW4$tax_ord), beside=TRUE, main= "The Effect of Tax Brackets on NDIR", xlab= "Tax Brackets", ylab= "NDIR", legend = TRUE)

barplot(table(HW4$NDIR_dummy,HW4$Region), beside=TRUE, main= "The Effect of Region on NDIR", xlab= "Region", ylab= "NDIR", legend = TRUE)

barplot(table(HW4$tax_ord,HW4$Region), beside=TRUE, main= "The Effect of Region on Tax Brackets", xlab= "Region", ylab= "Tax Bracket Prevalence", legend = TRUE)

#Code for a Table with Embedded Charts
#ggplot(df,aes(z,x,fill=as.factor(y)),angle=45,size=16)+ geom_bar(position="dodge",stat="identity") +facet_wrap(~z,nrow=3)

#Creating Table with Embedded Charts, click run!
df <- data.frame(HW4) 
p <- ggplot(df,aes(tax_ord,NDIR,fill=as.factor(Region)), angle=4, size=5)+ geom_bar(position="dodge",stat="identity") + facet_wrap(~tax_ord,nrow=3) 
p + ggtitle("The Effect of Tax Brackets on NDIR by Region") + xlab("Tax Brackets") + ylab("NDIR") + guides(fill=guide_legend(title="Region")) + theme(axis.text.x = element_text(size = 6))