DATA SCIENCE AND ML INTERMEDIATE WORKSHOP : COURSE OUTLINE
Duration: 3 Days
Introductions, Stats, Prob, Python, Data Manipulation, Visualization (Day 1)
- Introduction to Data Science and Understanding of problem Statement
- Basic Statistics – Measures of Central Tendencies and Variance
- Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
- Inferential Statistics -Sampling – Concept of Hypothesis Testing Statistical Methods – Z/t-tests (One sample, independent, paired), Analysis of variance, Correlations and Chi-square
- Important modules for statistical methods: NumPy, SciPy, Pandas
- Using Statistical methods on visualization and understanding concepts
- Treatment of Data
- Data Manipulation
Supervised Learning (Day 2)
- What is Linear & Non –Linear
- Different types of Data – Numerical & Categorical
- Vector Space , Mathematical functions Dimensions and Their Graphical/Vector Representation
- Introduction to Machine Learning & What is Model
- Types of ML problem
- Model and Curve
- Linear Regression & Equation
- LR Solvers – OLS method
- LR Solvers – Gradient Descent
- Assumptions of LR
- Evaluation metrics of LR
- Advance LR concept , Non-linear L1 & L2
- Case-study
- Basics of probability and Odds
- Classification using Linear Regressions
- Logit Equation and Logit function to solve the Classification
- Classification Evaluation – Accuracy, Confusion Matrix, Precision, Recall & F1
- ROC and AUC curve
- Feature Engineering & Feature Selection in ML Algorithms
- Model Interpretability using SHAP
- Use-cases
Unsupervised Learning (Day 3)
- Introduction to Unsupervised Learning
- Concepts behind Unsupervised techniques and understanding according to business use-cases
- Clustering & Segmentations in ML
- K-means Clustering technique
- Spectral Clustering , DBSCAN & Optics algorithms for clustering
- Multi-Cluster Algorithm Analysis of Unsupervised Problems
- Evaluation Metrics for Clustering
- Use-Cases