← all projects
Machine Learning · March 2024 · 1 min read

Decision Trees & Ensemble Methods

From-scratch implementation of decision trees with pruning, random forests, and AdaBoost. Comprehensive analysis of overfitting, feature selection, and ensemble performance on real datasets.

PythonNumPyScikit-learnMachine Learning

Complete decision tree algorithms from first principles — tree construction, pruning, Random Forest, and AdaBoost. Matches scikit-learn performance with full algorithmic transparency.

algorithmsDecision Tree, Random Forest, AdaBoost from scratch performanceRandom Forest: 94.2% wine dataset, AdaBoost: 89.7% spam analysisBias-variance decomposition, cost-complexity pruning

Overfitting Analysis

Depth vs Accuracy: Training accuracy increases monotonically while validation peaks at depth 9. Cost-complexity pruning reduces tree size 40% while improving generalization.

Ensemble Methods

Bootstrap Aggregating: Out-of-bag error estimation provides unbiased performance validation without separate test set. Feature subsampling creates diverse trees.

AdaBoost Evolution: Sample weights amplify hard examples. Sequential weak learners focus on previously misclassified data points.

Feature Importance: Gini importance reveals alcohol content and volatile acidity dominate wine quality prediction. Information gain distribution shows typical exponential decay.

#decision-trees#ensemble-methods#random-forest#adaboost

Related projects