A Mixture of Solid Particles
In the recent years, the industrialization and urbanization of Indian society has led to an increase in the concentration of pollutants in the atmosphere.
Air pollution is defined as a mixture of solid particles and gases in the air which has harmful and poisonous effects. Various experiments and studies have shown that long term exposure to such air pollution can lead to serious health issues such as: aggravated cardiovascular and respiratory illness, accelerated aging of lungs, diseases like asthma, bronchitis, cancer and a shortened life p.
According to the World Health Organization (WHO), over 12 million people die from environmental health risks annually. Air pollution has become the 4th highest risk factor for premature deaths.Such degradation in the air quality levels has made air pollution a serious threat at a global level, especially for the developing countries, towards the sustainability of mankind.
This has grabbed the attention of public as well as the government agencies. An air quality index (AQI) is a parameter used by the government agencies to communicate to the public how polluted the air quality currently is and how polluted it is forecast to become. As the AQI of a region increases, an increasingly large percentage of population of that area will experience adverse health effects.
Several projects have been launched to combat air pollution in all major countries worldwide. For e.g.:
Hebei Air Pollution Prevention and Control Program (HAP- 2016:18) project in China to reduce the emissions of specific pollutants in Hebei;
The Odd-Even Scheme implemented by the Indian Government in national capital Delhi (2016).There are ceaseless fighting efforts for air pollution reduction all around the world. As an endeavor on the course of machine learning based air quality forecasting, this report presents an initiative and algorithmic details of various statistical models in solving this challenging problem.
The Machine Learning models used in this paper, to facilitate the prediction of pollutant concentrations, include: 1
Random Forest Classification
Decision Tree Regression
Decision Tree Classification
Support Vector regression
Support Vector Classification
KNN Classification We target our air pollution forecast to the city of Delhi, India as it is at the forefront for battling against air pollution.
We focus on predicting the Air Quality Index (AQI) level of Delhi, as it is a quantitative method to profile air pollution level. In order to reduce the pollution levels in Delhi, we will be analyzing 5 pollutants and 5 other environment parameters responsible for increase in AQI levels.
The fixed station data is taken for 3 stations namely: NSIT (Dwarka), RK Puram and Shadipur .
Compare results of Air Quality Index (AQI) values obtained by different regression models and then propose the best model.
Classify the dataset into 5 different AQI categories, and then use Classification models to forecast the pollution category for next month.
Analyze the most prominent pollutant, using Back Propagation, responsible for air pollution and suggest methods to control it.
The rest of this paper is organized as follows: Section II describes related work, and Section III provides background on data sources, participatory sensing systems and details the 5 regression and 5 classification models used in this study. Section IV describes the steps in our model, while model implementation and estimation accuracy is studied in Section V.
The paper concludes in Section VI. RELATED WORKOver the years, several approaches have been used to predict the air pollution. These can be classified into the following categories:
Numerical Methods: There are plenty of numerical models used to forecast pollution levels, often referred to as the Atmospheric dispersion Modeling. Some of the commonly used models are: Weather Research and Forecasting model coupled to Chemistry (WRF-Chem), Community Multi-scale Air Quality Model (CMAQ), Comprehensive Air Quality Model with Extensions (CAMx), NAQPMS, etc.
Machine Learning Methods: Such methods are data-driven, in which a statistical model is trained on a dataset containing several pollutants responsible for an increase in AQI level. The model forms a pattern in the training data, and later uses it to predict the AQI level for next month. Some of the commonly used ML models are: Support Vector Regression (SVR), Decision Tree Regression (DTR), and Random Forest Regression (RFR). Some non–linear models i.e., Artificial Neural Networks have also be used to forecast the pollutant concentrations.
Hybrid Methods: Hybrid methods have been extensively applied for air pollution forecasting in recent. To achieve an appropriate forecast, it is not just adopting one method.
E.g.: To predict ozone concentrations, multiple linear regression and artificial neural networks are used simultaneously based on principal components.