Boosting

Boosting is the process of combining multiple weak learners/models into a strong model by training each learner to focus on the mistakes of the previous ones. The boosting process minimizes a loss function (e.g., mean squared error) using gradient descent. It builds each new model by fitting to the negative gradient of the loss function, which represents the direction to reduce error and we therefore refers at it as gradient boosting

How Boosting Works

  1. Start with an initial prediction, such as the mean of the target variable for regression tasks.
  2. Compute residuals (errors) based on the loss function.
  3. Fit a weak learner to the residuals.
  4. Update the model by adding the predictions of the weak learner, scaled by a learning rate.
  5. Repeat the process until the loss is minimized or a specified number of iterations is reached.

Type of models using in boosting

Gradient boosting is most commonly used with shallow trees (3,2, 1 depth. the latter are called stumps), because of their capability of handling non-linear relationships and interactions between features. However, in the case where the data has strong linear structure, boosting can be used with linear regression. Alternatively, Support Vector machines can also be used as weak learners, but this is less common

Bagging

Bagging (Bootstrap Aggregating) is an ensemble technique that combines the predictions of multiple models trained independently to improve robustness and accuracy. Unlike boosting, bagging trains models in parallel rather than sequentially.

Each model is trained on a random subset of the data sampled with replacement, in parallel:

  • For classification tasks, predictions are combined through majority voting.
  • For regression tasks, predictions are averaged.