Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield

Introduction to Data Science


 

What is Data Science

Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer questions like what happened, why it happened, what will happen, and what can be done with the results. (Definition by Amazon)

 



Ethical problems in data science

Introduction

Before we start

Data Exploration / Statistical Summary / Statistical Tools (t-test)

Data integration / Normalization / Feature Abstraction / Feature selection / Dimension reduction

Anomaly detection / Outlier detection

How to model the data (HMM)

Linear Regression

Data Spliting / Overfitting

Pattern discovery / Association & correlation (Distance Function) / Classification / Clustering / Outlier analysis

Pattern evaluation / Pattern selection / Pattern interpretation / Pattern visualization

Trend, time-series, and deviation analysis / Sequential pattern mining / Periodicity analysis / Motifs and biological sequence analysis / Similarity-based analysis

Deep Learning / Reinforce Learning

Graph mining / Information network analysis / Web mining

Result Presentation / Visualization

Tableau
Plotly
WordClouds.com Dialogflow  |  Google Cloud KNIME | Open for Innovation
SeaTable Power BI by Microsoft



Data Collection
 

Understand Data / Summarize Data


Computer Science
 

Statistics
 

Visualization