Best Practices & Common Pitfalls in Machine Learning

Data Biases & Sampling Issues

Data Drift & Distribution Shift

Statistical Pitfalls

Imbalanced Datasets

Why it matters

Accuracy becomes misleading when classes are imbalanced.

Solutions

Choose better metrics

Data-level methods: Resampling

Use metric to measure imblance

Data Labeling & Label Quality

Feature Generalization

Always consider two aspects with regards to generalization:

Model Selection

Always consider: