My Notes

Predictive Analytics Problem Statements

Table of Content

Section 1: Data Preparation & EDA


Section 2: Regression


Section 3: Classification


Section 4: Clustering & Association


Section 5: PCA & Neural Networks


Section 6: Model Evaluation & Ensemble


🔹 Section 1: Data Preparation & EDA

  • Explain difference between supervised and unsupervised learning.
  • How do you handle missing values in a dataset?
  • What is normalization and why is it important?
  • Explain correlation and its interpretation.
  • What is exploratory data analysis (EDA)?
  • How do you detect outliers?
  • Difference between mean, median and mode.

🔹 Section 2: Regression

  • Explain simple linear regression with equation.
  • Difference between simple and multiple regression.
  • What is polynomial regression?
  • Explain assumptions of linear regression.
  • What is Ordinary Least Squares (OLS)?
  • Difference between MAE, MSE and RMSE.
  • What does R² score indicate?
  • Difference between correlation and regression.

🔹 Section 3: Classification

  • Explain K-Nearest Neighbors algorithm.
  • How do you choose value of K in KNN?
  • Explain Naive Bayes and why it is called “Naive”.
  • What is Decision Tree and how does it split data?
  • Explain Support Vector Machine with margin concept.
  • What is a confusion matrix?
  • Difference between precision and recall.
  • When is accuracy misleading?
  • Explain ROC curve and AUC.

🔹 Section 4: Clustering & Association

  • Explain K-Means clustering algorithm.
  • What is the Elbow Method?
  • Difference between K-Means and Hierarchical Clustering.
  • What is a dendrogram?
  • Explain Support, Confidence and Lift.
  • What is Market Basket Analysis?
  • Explain Apriori algorithm.

🔹 Section 5: PCA & Neural Networks

  • What is dimensionality reduction?
  • Explain Principal Component Analysis (PCA).
  • What is explained variance?
  • Explain structure of a neural network.
  • What is an activation function?
  • Difference between ReLU and Sigmoid.
  • What is CNN and where is it used?
  • What is RNN and why is it used for sequential data?

🔹 Section 6: Model Evaluation & Ensemble

  • Explain bias-variance tradeoff.
  • Difference between underfitting and overfitting.
  • What is cross-validation?
  • Explain K-Fold cross validation.
  • What is bagging?
  • What is boosting?
  • Explain Random Forest.
  • Difference between bagging and boosting.

Mini Project of using Next.js and Tailwind CSS