Abstract A novel genetic algorithm-based feature selection approach is incorporated and based on these features, four different ML methods were investigated. According to the findings, ML models could reliably predict bio-oil yield. The results showed that Random forest (RF) is preferred for bio-oil yield prediction (R2 ~ 0.98) and highly recommended when dealing with the complex correlation between variables and target. Multi-Linear regression model showed relatively poor generalization performance (R2 ~ 0.75). The partial dependence analysis was done for ML models to show the influence of each input variable on the target variable. Lastly, an easy-to-use software package was developed based on the RF model for the prediction of bio-oil yield. The current study offered new insights into the pyrolysis process of biomass and to improve bio-oil yield. It is an attempt to reduce the time-consuming and expensive experimental work for estimating the bio-oil yield of biomass during pyrolysis.