Teaching

Data Science

What is Data Science

Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer questions like what happened, why it happened, what will happen, and what can be done with the results. (Definition by Amazon)

Under the under big umbrella "Science of Intelligence", Data Science usually refers to Data Mining and Machine Learning

What is the difference between Machine Learning and Data Mining

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy.

Ethical problems in data science

Before we start

3 Lines Programming

Introduction

Data Collection

Data Preprocessing
Link

Data Cleaning
Data Integration
Data Reduction

Intro to python

Python Crawler

Intro to R

Data Exploration / Understand Data

Data Summaries
Link

Data Central Tendency
Standard Deviation / Variance

Data Plots / Visualized Summarization
Link

Basic Plots
Boxplot
Histogram
Quantile plots
Heatmap / Mesh

Data Visualization
Link

Simpson’s Paradox

WordClouds

Computer Science

Math/Statistics

Visualization

Optimization Problems

Classic Optimization
Link

Brute Force
Greedy Algorithm
Dynamic Programming

Stochastic Search
Link

GA
ES
DE
PSO
ACO

Learning Types
Link

Supervised
Unsupervised
Semi-supervised
Reinforcement Learning

Training and Testing
Data Spliting
Overfitting

Classification
Link

Decision trees
Logistic regression
Naive Bayes
K-Nearest Neighbours
Random Forest
SVM
Artificial neural networks

Clustering
Link

Hierarchical Clustering
K-means Clustering
Mean Shift Clustering
DBSCAN
Agglomerative Clustering
Affinity Propagation

Soft Clustering

Network Analysis
Link

Graph Mining
Random Walk
PageRank
Web mining
Infor-network analysis

Trend Analysis
Link

Time-series Prediction
Deviation Analysis
Sequential PatternMining Periodicity Analysis

Biological Data Analysis
Link

Motif Finding
Bio Sequence Analysis
Bio Network Analysis Pathway Analysis

Basic Tools

Measurements
Link

Distance Measure
Similarity/Corrolation

Statistical Analysis
Link

Z-test
t-test
U-test
Statistical Dependence
p-value
Confidence Interval
ANOVA table

Data Preprocessing
Link

Normalization
Data Sampling
Data Cleaning

Evaluations
Link

Regression Problems
Classification Problems
Clustering Problems

Regression
Link

Simple Linear Regression
Polynomial Regression
Curve Fitting
Logistic Regression

Statistical Modeling
Link

Bayesian Network
Hidden Markov Model

Anomaly Analysis
Link

Anomaly detection
Anomaly/Outlier analysis

Find Features
Link

Feature Selection
Dimension Reduction
Feature Abstraction
Similarity-based Analysis

Pattern Discovery
Link

Pattern evaluation
Pattern selection
Pattern interpretation
Pattern visualization

Python Data Visualization Cookbook

Result Presentation
/
Result Visualization

Popular Tools

Pyplot / Plotly

KNIME

Tableau
SeaTable
Power BI
Dialogflow

Introduction to Data Science

Data Mining

Machine Learning

Data Visualization

Useful Math Skills

Mathematics for Machine Learning Local download

Statistical Thinking for the 21st Century github

Data Visualization

Python Data Visualization Cookbook

Chengwei LEI, Ph.D. Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield

Data Science

Data Collection

Data Exploration / Understand Data

Computer Science

Math/Statistics

Visualization

Chengwei LEI, Ph.D. Associate Professor

Department of Computer and Electrical Engineering and Computer Science California State University, Bakersfield

Data Science

Data Collection

Data Exploration / Understand Data

Computer Science

Math/Statistics

Visualization

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield