Mitigating challenges: Handling mis sing values and imbalanced data in bankruptcy prediction using machine learning

Abstract: The research on financial distress has become essential because the predicted results can serve as an early warning for managers, investors, and banks. Financial ratios calculated in financial reports can serve as indicators to assess the company’s condition. One of the approaches used for bankruptcy prediction is employing machine learning methods. Data requirements with balanced classes and the need to process data with complete parameters/features are prerequisites for building an accurate bankruptcy prediction model. In this study, we employed data balancing techniques such as downsampling and filling missing feature values using the average of nearest neighbors in data preprocessing before training the prediction model. From our experiments, we found that by addressing missing values and balancing the data, the F1 score of the prediction model using Random Forest (RF) improved by 30% compared to not addressing missing data and data imbalance. Although our testing used the Polish company dataset, which may have different characteristics from companies in other countries, the proposed strategies can serve as an initial approach for training datasets of other companies using machine learning methods.

Authors: Ednawati Rainarli, Amine Sabek
Keywords: Bankruptcy prediction, financial distress, imbalanced data, machine learning, missing value
Volume: 16
Issue: 2

Full article