EPPS 6302 Assignment 5: Event Data and Data Analytics Colloquium Review
1.Acquire data using the sample program UTDEventData1.R.
a.Collect data based on two to three countries of own choice (China, USA, and Canada)
b.Set the time period to six months (e.g. start_date <-“20220101”; end_date = “20220630”)
Unfortunately, this portion of the assignment could not be completed, as there were issues with the Event Data programming, and will likely not be fixed until after the semester ends. I did leave the code in this page so that it can be visualized what would have been used to complete the assignment.
2.Register and attend Data Analytics Colloquium on 11/17/2022.
a.Write a review on time series data and methods used in the presentation by the speaker and post on own GitHub website.
3.Note the data and modeling methods used and do a Google scholar search on related studies.
Here is my review of the Colloquium:
I am going to be completely 100 percent candid: this colloquium was very challenging and difficult for me to understand. This is my very first semester intensively using R Studio and coding statistical analyses, let alone actually analyzing those statistical visualizations that I produce! Furthermore, I normally only am required to use generic regressions for my other courses; we really never really need to use other data visualization methods in order to demonstrate or display our important statistical findings. The fanciest thing I have ever done in R Studio prior to this course was an interaction effects plot, which honestly, I still have issues interpreting! All this is being said in order to give you an idea of my background knowledge of statistical methods, visualization, and interpretation skills: they are very minimal. Here is what I understood from the colloquium.
Dr. Patrick Brandt discussed multiple categories or “types” or models. These models were specific to “Statistics & Time Series for Policy Intervention & Change Identification”, which was the topic of the colloquium presentation. There were four types of models or visualizations that Dr. Brandt described: Type 0 (Basic Time Series), Type 1 (Binary Segmentation), Type 2 (Regularization and Fused Lasso), and Type 3 (Bayesian methods). Below are further descriptions of each method of these data visualizations:
Model Type 0: Basic Time Series Model
Simple model where the moment where the intervention occurs is identifiable.
Easy for estimations.
Models allow for many differences; Type 0 does not necessarily show distinct difference between multiple factors.
Example used: Gas prices and seat belts; Salmonella in chicken breeding
Model Type 1: Binary Segmentation
This model is commonly used for trends that are more complex, deterministic, or stochastic.
What is stochastic? According to the Oxford Dictionary, “stochastic” means that the data/values are “randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely”.
Example used: US Industrial Revolution and COVID.
Model Type 2: L1/L0 Regularization and Fused Lasso
This type of model treats the issue by identifying a correct set of covariates from the possible sets of interventions presented by the data.
There are different ways to identify these covariates:
Search for each of the change-points individually.
Assessing the probability of a change-point.
Fused Lasso Model: univariate vs. multivariate analyses
(Safikhani et al. 2022).
Model Type 3: Bayesian Methods
This method was mentioned briefly, but not discussed in detail.
Here is a definition of Bayesian models from a quick Google search: “A Bayesian model is a statistical model where you use probability to represent all uncertainty within the model, both the uncertainty regarding the output but also the uncertainty regarding the input (aka parameters) to the model.”
# Event Data Replication# Clearing the environmentrm(list=ls())# Install the packages and load the libraries.library(devtools)
Loading required package: usethis
library(remotes)
Attaching package: 'remotes'
The following objects are masked from 'package:devtools':
dev_package_deps, install_bioc, install_bitbucket, install_cran,
install_deps, install_dev, install_git, install_github,
install_gitlab, install_local, install_svn, install_url,
install_version, update_packages
The following object is masked from 'package:usethis':
git_credentials
Skipping install of 'UTDEventData' from a github remote, the SHA1 (2ddb9364) has not changed since last install.
Use `force = TRUE` to force installation
library(UTDEventData)# Creating the variables to pull from the serverk <-"La7UsSUjrPmIZ3a7qqgbgVXp1wsLiWgN"countries <-c("CHN","USA","CAN")start_date <-"20220101"end_date <-"20220630"table <-"phoenix_rt"# Below is the code that currently cannot be ran due to outside circumstances. # EventData <- pullData(k, table, countries, start_date, end_date, citation = FALSE)# View(EventData)# Data Analytics Colloquium Replication of Model Type 0: Times Series Modellapply(c("quantmod", "tidyverse","TSstudio"), require, character.only =TRUE)
Loading required package: quantmod
Loading required package: xts
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: TTR
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo