# DATA SCIENCE AND ML INTERMEDIATE WORKSHOP : COURSE OUTLINE

## DATA SCIENCE AND ML INTERMEDIATE WORKSHOP : COURSE OUTLINE

Duration: 3 Days

Introductions, Stats, Prob, Python, Data Manipulation, Visualization (Day 1)

• Introduction to Data Science and Understanding of problem Statement
• Basic Statistics – Measures of Central Tendencies and Variance
• Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
• Inferential Statistics -Sampling – Concept of Hypothesis Testing Statistical Methods – Z/t-tests (One sample, independent, paired), Analysis of variance, Correlations and Chi-square
• Important modules for statistical methods: NumPy, SciPy, Pandas
• Using Statistical methods on visualization and understanding concepts
• Treatment of Data
• Data Manipulation

Supervised Learning (Day 2)

• What is Linear & Non –Linear
• Different types of Data – Numerical & Categorical
• Vector Space , Mathematical functions Dimensions and Their Graphical/Vector Representation
• Introduction to Machine Learning & What is Model
• Types of ML problem
• Model and Curve
• Linear Regression & Equation
• LR Solvers – OLS method
• LR Solvers – Gradient Descent
• Assumptions of LR
• Evaluation metrics of LR
• Advance LR concept , Non-linear L1 & L2
• Case-study
• Basics of probability and Odds
• Classification using Linear Regressions
• Logit Equation and Logit function to solve the Classification
• Classification Evaluation – Accuracy, Confusion Matrix, Precision, Recall & F1
• ROC and AUC curve
• Feature Engineering & Feature Selection in ML Algorithms
• Model Interpretability using SHAP
• Use-cases

Unsupervised Learning (Day 3)

• Introduction to Unsupervised Learning
• Concepts behind Unsupervised techniques and understanding according to business use-cases
• Clustering & Segmentations in ML
• K-means Clustering technique
• Spectral Clustering , DBSCAN & Optics algorithms for clustering
• Multi-Cluster Algorithm Analysis of Unsupervised Problems
• Evaluation Metrics for Clustering
• Use-Cases