Categorical Features in XGBoost Without Manual Encoding

Originally published at:

XGBoost is a decision-tree–based, ensemble machine learning algorithm based on gradient boosting. However, until recently, it didn’t natively support categorical data. Categorical features had to be manually encoded before they could be used for training or inference. In the case of ordinal categories, for example, school grades, this is often done using label encoding where…

It was exciting to explore how XGBoost’s experimental categorical support can save time and improve performance when working with categorical data. If you have any questions or comments, let us know!

I was a great post, and happy to find out XGBoost supports categorical data. Do you know if this also extends to arrays of categorical data and how we should deal with them?