Imputation methods for filling missing data in urban air pollution data for Malaysia

Nur Afiqah Zakaria, Norazian Mohamed Noor

Rezumat/Abstract. The air quality measurement data obtained from the continuous ambient air quality monitoring (CAAQM) station usually contained missing data. The missing observations of the data usually occurred due to machine failure, routine maintenance and human error. In this study, the hourly monitoring data of CO, O3, PM10, SO2, NOx, NO2, ambient temperature and humidity were used to evaluate four imputation methods (Mean Top Bottom, Linear Regression, Multiple Imputation and Nearest Neighbour). The air pollutants observations were simulated into four percentages of simulated missing data i.e. 5%, 10%, 15% and 20%. Performance measures namely the Mean Absolute Error, Root Mean Squared Error, Coefficient of Determination and Index of Agreement were used to describe the goodness of fit of the imputation methods. From the results of the performance measures, Mean Top Bottom method was selected as the most appropriate imputation method for filling in the missing values in air pollutants data.

Cuvinte cheie/Key words: air pollution, missing data, imputation methods, multiple imputation

Text integral/Full text