...

DATA SCIENCE AND ML INTERMEDIATE WORKSHOP : COURSE OUTLINE

DATA SCIENCE AND ML INTERMEDIATE WORKSHOP : COURSE OUTLINE

Duration: 3 Days

Introductions, Stats, Prob, Python, Data Manipulation, Visualization (Day 1)

  • Introduction to Data Science and Understanding of problem Statement
  • Basic Statistics – Measures of Central Tendencies and Variance
  • Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
  • Inferential Statistics -Sampling – Concept of Hypothesis Testing Statistical Methods – Z/t-tests (One sample, independent, paired), Analysis of variance, Correlations and Chi-square
  • Important modules for statistical methods: NumPy, SciPy, Pandas
  • Using Statistical methods on visualization and understanding concepts
  • Treatment of Data
  • Data Manipulation

Supervised Learning (Day 2)

  • What is Linear & Non –Linear
  • Different types of Data – Numerical & Categorical
  • Vector Space , Mathematical functions Dimensions and Their Graphical/Vector Representation
  • Introduction to Machine Learning & What is Model
  • Types of ML problem
  • Model and Curve
  • Linear Regression & Equation
  • LR Solvers – OLS method
  • LR Solvers – Gradient Descent
  • Assumptions of LR
  • Evaluation metrics of LR
  • Advance LR concept , Non-linear L1 & L2
  • Case-study
  • Basics of probability and Odds
  • Classification using Linear Regressions
  • Logit Equation and Logit function to solve the Classification
  • Classification Evaluation – Accuracy, Confusion Matrix, Precision, Recall & F1
  • ROC and AUC curve
  • Feature Engineering & Feature Selection in ML Algorithms
  • Model Interpretability using SHAP
  • Use-cases

Unsupervised Learning (Day 3)

  • Introduction to Unsupervised Learning
  • Concepts behind Unsupervised techniques and understanding according to business use-cases
  • Clustering & Segmentations in ML
  • K-means Clustering technique
  • Spectral Clustering , DBSCAN & Optics algorithms for clustering
  • Multi-Cluster Algorithm Analysis of Unsupervised Problems
  • Evaluation Metrics for Clustering
  • Use-Cases