As seen in Figure 1, a regression plot is appropriate for data visualization and analysis. The plot points generally fit the regression line. However, because the plot points are very spaced out, the R^2 value may be very small, as the regression line does not fit plot points very well.
Figure 2: Regression plot of X2 and Y2.
As seen in Figure 2, a regression plot is most likely not the best tool for data analysis. This is because all of the plot points resemble a parabolic function. For this reason, a different or supplementary data visualization/analysis tools may be required.
Figure 3: Regression plot of X3 and Y3.
Figure 3 is similar to Figure 1, in that a regression plot is appropriate for data visualization and analysis. The plot points generally fit the regression line, except for one outlier point at x=13. It may be beneficial to take this outlier out, and therefore visualize how much better the regression line would fit the data.
Figure 4: Regression plot of X4 and Y4.
Like Figure 2, a regression plot is most likely not the best tool for data analysis for Figure 4. This is because all of the plot points except for one are all at x=8. For this reason, different or supplementary data visualization/analysis tools are required.
b. Compare different ways to create the plots (e.g. changing colors, line types, plot characters).
Figure 5: Aggregate regression plots from X1-X4 and Y1-Y4.
As seen in Figure 5, all four regression plots were generated as a composite chart by using the code to “plot for a loop”. This composite/aggregate chart also was about to generate a title to describe the plots. This code was also able to add further design components such as regression line color, as well as plot point size, color, and shape.
2.Can you finetune the charts without using other packages (consult RGraphics by Murrell).
Figure 6: Finetuned aggregate regression plots for X1-X4 and Y1-Y4.
Above is Figure 6, which is the “finetuned” version of the regression plots for X1-X4 and Y1-Y4, using the RGraphics by Murrell. I changed the plot points to be smaller, so that the reader can more effectively read the charts and distinguish between the plot points. This is especially helpful for Plot 4, where many of the plot points are stacked on top of each other. Making the plot point sizes smaller also allows the reader to be able see the distance between plot points easier, as well. I also changed the color of the plot points and regression line to colors that are easier on the eye and less distracting.
3.How about with ggplot2? (use tidyverse package)
See the code below to see how the ggplot2 function (attached to the tidyverse package) can be used to efficiently generate the four regression plots for Anscombe’s (1973) Quartlet. The generated chart is also seen below in Figure 7. This aggregate/composite plot was produced by using only two lines of code, instead of 13+ lines of code the traditional way without the ggplot2 function within the tidyverse package.
Figure 7: Aggregate regression plots from X1-X4 and Y1-Y4 generated using the ggplot2 function.
$lm1
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0000909 1.1247468 2.667348 0.025734051
x1 0.5000909 0.1179055 4.241455 0.002169629
$lm2
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.000909 1.1253024 2.666758 0.025758941
x2 0.500000 0.1179637 4.238590 0.002178816
$lm3
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0024545 1.1244812 2.670080 0.025619109
x3 0.4997273 0.1178777 4.239372 0.002176305
$lm4
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.0017273 1.1239211 2.670763 0.025590425
x4 0.4999091 0.1178189 4.243028 0.002164602
# Preparing for the plotsplot.new
function ()
{
for (fun in getHook("before.plot.new")) {
if (is.character(fun))
fun <- get(fun)
try(fun())
}
.External2(C_plot_new)
grDevices:::recordPalette()
for (fun in getHook("plot.new")) {
if (is.character(fun))
fun <- get(fun)
try(fun())
}
invisible()
}
<bytecode: 0x7f8db7b0ce98>
<environment: namespace:graphics>
op <-par(mfrow =c(2, 2), mar =0.1+c(4,4,1,1), oma =c(0, 0, 2, 0))# Plot charts using for loopfor(i in1:4) { ff[2:3] <-lapply(paste0(c("y","x"), i), as.name)plot(ff, data = anscombe, col ="darkorchid4", pch =23, bg ="mediumpurple", cex =1,xlim =c(3, 19), ylim =c(3, 13))abline(mods[[i]], col ="steelblue")}mtext("Anscombe's 4 Regression Data Sets", outer =TRUE, cex =1.5)
par(op)## 3.How about with ggplot2? (use tidyverse package)library(tidyverse)