OPEN ACCESS
Intl.
Journal on ICT Vol.
No.
Dec 2024.
doi: 10.
21108/ijoict.
ISSN 2356-5462
http://socj.
id/ijoict/ Prediction of Classification of Air Quality Distribution in Java Island using ANN with Time-Based Feature Expansion and Spatial Analysis Soni Andika Gutama 1*.
Sri Suryani Prasetiyowati 2.
Yuliant Sibaroni 3
1,2,3
School of Informatics.
Informatics.
Telkom University.
Bandung.
Indonesia *soniandika@student.
Abstract Air pollution has a significant impact on human health and the environment, especially in densely populated areas such as Java Island in Indonesia.
Air pollution is caused by high air quality indexes originating from concentrations of hazardous pollutants such as sulfur dioxide (SO.
, carbon monoxide (CO), ozone (O.
, nitrogen dioxide (NO.
, and hydrocarbons (HC), and particles (PM10.
PM2.
This study uses Artificial Neural Network (ANN) with time-based feature expansion to predict the classification of air quality indexes in Java Island for the next few months.
While LSTM is used as a baseline for performance comparison with the proposed method.
The results obtained show that the performance of the ANN model with time-based feature expansion can match the performance of LSTM with an accuracy of 92.
30% and an F1 Score of 92.
This shows that the time-based feature expansion scenario in ANN is able to capture the spatial dynamics of time in the distribution of air quality in Java Island.
The contribution of this study is to support the creation of effective policies and strategies in preventing and handling the impacts of air pollution as early as possible.
Keywords: Air quality index.
Artificial Neural Network.
Prediction.
Time-based feature expansion.
Spatial analysis.
Java Island
INTRODUCTION
ir quality is one of the main environmental issues that requires serious attention, both in urban and rural Increased transportation mobility and uncontrolled industrial activities have become a major source of air pollution, leading to a significant increase in pollutant emissions .
The decline in air quality has adverse effects on human health, including an increased risk of respiratory illnesses, and also harms the environment, contributing to issues like global warming, acid rain, and climate change .
The island of Java, as the region with the largest population and the most economically active in Indonesia, faces a high risk of air pollution.
Transportation accounts for about 70% of the region's total air pollution, with emissions such as carbon monoxide (CO), nitrogen oxides (NO.
, and dust particles (SPM.
frequently exceeding safe thresholds .
In addition to transportation, industrial activities also worsen air quality in the Received on 01 Jan 2025.
Revised on 8 Jan 2020.
Accepted and Published on 11 Mar 2025.
INTL.
JOURNAL ON ICT VOL.
NO.
DECEMBER 2024
Natural factors such as wind direction, seasonality, and topography also affect the distribution of pollutants, making the distribution pattern of air quality very dynamic .
Various efforts have been made to reduce the impact of air pollution, such as planting trees in urban areas.
Trees play an important role in absorbing pollutants such as carbon dioxide (CO.
through photosynthesis, while also capturing dust particles in the atmosphere .
However, this step alone is not enough to significantly address the complexity of the air pollution problem.
To understand and manage pollution more effectively, a technology-based approach that is able to accurately predict the air quality index is needed.
Traditionally, statistical methods such as linear regression have been widely used for air quality analysis.
These methods are valued for their interpretability and effectiveness in modeling simple relationships between However, they often struggle to capture complex, nonlinear interactions inherent in environmental data, especially when dealing with large and dynamic datasets .
In contrast, machine learning approaches like Artificial Neural Networks (ANN.
excel in identifying intricate patterns within such data, providing more robust and adaptive solutions for air quality prediction .
Air quality prediction plays an important role in supporting data-driven decision-making, environmental policy planning, evaluating the effectiveness of interventions, and mitigating pollution impacts.
In this context, machine learning methods such as Artificial Neural Networks (ANN.
have become an attractive option.
ANNs, with backpropagation algorithms, are able to identify patterns in complex historical data, even on limited and poorly structured datasets .
, .
The model offers a balance between simplicity, flexibility, and effectiveness in a variety of studies, including air quality prediction .
, .
This study employs the Artificial Neural Network model to predict the classification of air quality across Java Island.
The model incorporates time-based features and spatial analysis to offer a more comprehensive understanding of air pollution distribution patterns.
With this approach, the research aims to support more effective and strategic environmental management on the island of Java.
II.
LITERATURE REVIEW
Prediction Predictive modeling is crucial in many sectors, including healthcare, business, climate change, and .
In air quality research, the ability to predict future conditions is essential for understanding the dynamics of air pollution in a specific area.
Previously, predictive approaches often relied on statistical methods that utilized temporal and spatial patterns.
However, with advancements in technology, machine learning has increasingly become a widely used method.
This technique excels not only in handling structured and unstructured data but can also be applied for various purposes, including regression, classification, clustering, and prediction .
Artificial Neural Network An Artificial Neural Network (ANN) is a computational framework inspired by the human brain's structure, consisting of layers of interconnected neurons.
Each connection between neurons is assigned a weight that signifies the strength of the relationship.
The Perceptron, a basic ANN model, is a single-layer network that can only process linear data.
To overcome this limitation, the Multi-Layer Perceptron (MLP) was developed, incorporating an input layer, one or more hidden layers, and an output layer.
The addition of multiple hidden layers allows MLPs to solve non-linear problems and model more complex patterns in the data .
The MLP architecture is made up of multiple layers, where every neuron in one layer is entirely connected to the neurons in the subsequent layer.
At each hidden layer, neurons output ycyc calculated by the equation 1, where ycuycn is the input value, ycycnyc is the weight, ycayc is the bias, and OIEa is the activation function on the hidden layer, usually using the ReLU function .
SONI ANDIKA GUTAMA ET AL.
PREDICTION OF CLASSIFICATION OF AIR QUALITY DISTRIBUTION IN JAVA ISLAND USING ANN A
ycu ycyc = OIEa (Oc ycycnyc ycuycn ycayc ) .
ycn=1 At the output layer, the output of neurons ycyco calculated using equations 2, where represents the weight from the hidden layer to the output layer, is the bias in the output layer, and is the activation function in the output layer, usually the Softmax function, which is used to generate probabilities ycycyco ycayco OIycu .
yco ycyco = OIo (Oc ycycyco ycyc ycayco ) .
yc=1 The ReLU (Rectified Linear Uni.
activation function transforms negative values into zero while preserving positive values, calculated using equation 3.
The Softmax function, applied to the output layer, transforms the output value into probabilities using the equation 4, ycyco where the output of the neuron is and represents the number of classes present.
Softmax guarantees that the total of the probabilities across all classes equals 1, making it ideal for multi-class classification tasks.
ReLU.
= max.
, yc.
yce ycyco Ocya ycn=1 yce Softmax.
cyco ) = The training process of the MLP model utilizes a backpropagation algorithm to adjust the weights by minimizing prediction errors.
The sparse categorical cross entropy loss function is used for multi-class classification tasks.
To enhance convergence, the Adam optimizer is commonly used due to its efficiency in managing large and complex datasets .
In air quality prediction, the application of ANNs and MLPs is particularly effective, as they excel at capturing complex interactions between various environmental factors and air quality.
These models deliver more accurate, efficient, and reliable predictions by accounting for the intricate relationships among multiple input variables .
Feature Expansion Feature engineering methods, such as feature selection, feature expansion, and oversampling, are crucial for building classification models with Artificial Neural Networks (ANN.
Feature selection aims to identify the most important features from the dataset, while feature expansion involves creating new features from the existing data to reveal additional patterns and relationships .
This methodical process seeks to identify the best combination of features that most effectively capture the patterns in the data.
The selected features are then employed to train the ANN model, enabling it to learn intricate relationships between input variables by assessing different feature combinations .
The process also involves dividing the data into training and testing sets, along with using the SMOTE (Synthetic Minority Over-sampling Techniqu.
approach to tackle class imbalance, thereby improving the model's capacity to generalize .
These feature engineering techniques greatly enhance prediction accuracy and provide more reliable results, especially in cases with underrepresented data or complex non-linear relationships between features.
Data Collecting This study uses air quality data provided by the Ministry of Environment and Forestry, covering the period from June 2019 to April 2022.
The data, accessible in CSV format on the Ministry's official website and the open data platform for the Java Island region, consists of 11,114 entries, each containing various air quality indicators as detailed in Table 1.
These indicators provide a detailed overview of pollution trends across INTL.
JOURNAL ON ICT VOL.
NO.
DECEMBER 2024
different times and locations.
The dataset spans several regions on Java Island, reflecting diverse environmental conditions, allowing the model to learn from a variety of scenarios, thus improving the accuracy and reliability of predictions.
The data preprocessing involves addressing missing values, standardizing features, dividing the dataset into training and testing sets, and thoroughly assessing model performance.
TABLE I
DATA ATTRIBUTES DESCRIPTION
Attributes Description X10
X11
X12
X13
Minimum Temperature Maximum Temperature Average Temperature Average Humidity Rainfall Length of Sunshine Wind Speed Wind Speed Direction Most Wind Speed Total Population Number of Trees Number of Vehicles Altitude Air quality indexes Air Pollution Standard Index Based on the Regulation of the Minister of Environment and Forestry Number 14 of 2020, the Air Pollutant Standard Index (ISPU) is a dimensionless value that indicates the quality of the air in a particular area, considering its impact on human health and other living organisms .
This study concentrated on three ISPU categories: good, moderate, and unhealthy.
TABLE II
APSI PARAMETER VALUE CONVERSION
APSI
>300
24Hour
PM10
usya/yayc ) 24Hour PM2.
usya/yayc ) 15,5` 24Hour
SO2
usya/yayc ) 24Hour
usya/yayc ) 24Hour
usya/yayc ) 24Hour
NO2
usya/yayc ) 24Hour HC .
usya/yayc ) Description:
Data from continuous measurements conducted over a 24-hour period.
Hourly results of APSI calculations for particulates (PM2.
recorded over 24 hours.
The maximum and minimum APSI values for each hour are chosen for particulates (PM.
, sulfur dioxide (SO.
, carbon monoxide (CO), ozone (O.
, nitrogen dioxide (NO.
, and hydrocarbons (HC) to be used as the calculation outcomes.
The ISPU calculation relies on several key parameters, including the upper and lower limits of ISPU, the upper and lower limits of ambient, and the measurable ambient concentration value.
The mathematical formulation for this calculation is given by the equation 5.
yayca Oe yayca .
cU Oe ycUyca ) yayca ya= .
ycUyca Oe ycUyca yca SONI ANDIKA GUTAMA ET AL.
PREDICTION OF CLASSIFICATION OF AIR QUALITY DISTRIBUTION IN JAVA ISLAND USING ANN A
The results of data analysis show that the ISPU classification in this study is divided into three levels: good, moderate, and unhealthy, by adopting AQI color standards to facilitate the visualization of air quality status.
Good categories are marked with green, medium with yellow, and unhealthy with red.
This categorization allows for a more structured analysis of air quality trends in Java over a 15-month study period.
This division into three specific categories with color markers provides a clear framework for analyzing patterns of air quality change and designing effective treatment strategies.
TABLE i AIR QUALITY INDEX CATEGORY Label Category Good Medium Unhealty Color Status Range i.
RESEARCH METHOD
The research process for predicting air quality distribution classifications on Java Island using a time-based Artificial Neural Network (ANN) began with designing a feature extension based on historical time data.
This was followed by the development of an ANN model incorporating time-based feature expansion.
Subsequently, an optimization process was conducted to identify the most effective combination of features for each dataset.
Data Matrix Design with Feature Expansion Feature expansion is employed for selecting and iterating through various combinations of relevant features to enhance model performance .
This approach supports the development of a classification model that predicts the standard air pollution index in Java.
The process involves randomly selecting feature combinations from the dataset, enabling the model to explore a broader range of variables and patterns.
By assessing these combinations based on the model's performance, it identifies the optimal subset of features that improve prediction accuracy.
This method results in more precise and dependable air pollution predictions, which are crucial for environmental monitoring and effective decision-making in Java.
TABLE IV
DATA CLASS LABEL
Models Combination Training Data Attributes June 2019 July 2019 March 2022 June 2019 Ae July 2019 July 2019 Ae August 2019 February 2022 Ae March 2022 June 2019 Ae February 2022 July 2019 Ae March 2022 June 2019 Ae March 2022 Target July 2019 August 2019 April 2022 August 2019 September 2019 April 2022 March 2022 April 2022 April 2022 The features selected for expansion include critical air pollution indicators such as PM10.
CO, and NO2, which are directly related to human health risks and environmental impacts.
Studies have shown that PM10 is a significant contributor to respiratory diseases, while CO and NO2 play a major role in the development of air quality-related health problems, such as asthma and cardiovascular conditions .
These pollutants are key INTL.
JOURNAL ON ICT VOL.
NO.
DECEMBER 2024
components of the air quality index and their inclusion in the feature expansion process enhances the model's ability to predict pollution levels accurately.
In this study, the exploration of feature combinations was carried out randomly as much as possible until the best combination was found based on the F1 score value generated by the classification model.
This approach refers to iterative exploratory methods that are often used to identify relevant features in data-driven models .
This process guarantees that the chosen features are the most relevant for enhancing the model's accuracy and efficiency, while preserving key information from the dataset.
By focusing on the best combination of features, the model becomes better equipped to generate accurate and reliable predictions on air pollution data .
Design of Feature Selection and Expansion for Artificial Neural Network-based Classification Artificial Neural Network (ANN) is used to build a classification prediction model in the yc Oe yco and predict the classification on yc yco.
The implementation of ANN is carried out through two main stages, namely the identification of the best combination of features in the yc Oe yco and the application of ANN to prediction yc yco.
Implementation of Artificial Neural Network in Phase yc Oe yco In the yc Oe yco.
ANNs are used to evaluate different combinations of features to determine the best subset of features that provide the highest classification performance.
This process begins with data pre-processing, where the SMOTE technique is applied to address the class imbalance.
SMOTE creates a synthetic sample by calculating the nearest k-neighbors for each minority sample and generating a new sample using the equation ycuycuyceyc = ycuycoycnycuycuycycnycyc rand.
Oo .
cuycuyceycnyciEaycaycuyc Oe ycuycoycnycuycuycycnycyc ) .
After pre-processing, feature combinations are evaluated using an ANN, with the data divided into training and test sets in an 80:20 ratio.
The ANN model is trained on the training data and assessed using a weighted F1-score metric, calculated using equation 7.
ya ya1ycyceycnyciEaycyceycc = Oc ycyco Oo ya1yco yco=1 The ANN architecture consists of three hidden layers, with a ReLU activation function applied to the hidden layers and a Softmax activation function used in the output layer.
The output of each hidden layer is calculated through linear operations, which are subsequently processed by their respective activation function.
Implementation of Artificial Neural Network in Phase yc yco In the yc yco.
Artificial Neural Network (ANN) model is used to predict classification by using the features that have been selected in the yc Oe yco.
The ANN model applied in this phase has a similar architecture to the previous model, with inputs in the form of identified features and outputs calculated using the Softmax layer for class prediction.
The output of the model is calculated using equation 8, ycCis ycn the prediction probability for the ith class, ycycn is the output of the previous layer for class I, and ya is the number of classes.
ycCycn = yce ycycn Ocyayc=1 yce ycyc The model is trained using Adam's optimization algorithm with a sparse_categorical_crossentropy loss function, which is defined in equation 10, where ycA is the number of samples, ya is the number of classes, ycycnyco is the actual label for the ith sample and the class ke-k, and ycCycn,yco is the predicted probability for the kth class.
The SONI ANDIKA GUTAMA ET AL.
PREDICTION OF CLASSIFICATION OF AIR QUALITY DISTRIBUTION IN JAVA ISLAND USING ANN A
training process was carried out by oversampling techniques using SMOTE to handle class imbalances and normalize features using MinMaxScaler.
ya ycA ya = Oe Oc Oc ycycnyco log .
cCycn,yco ) ycA ycn=1 yco=1 Interpolation kriging ordinary Interpolation is a technique of structuring values in areas without data to describe the distribution of values in an area.
Among the various interpolation methods.
Kriging exists as a geostatistical approach that analyzes spatial relationships based on distances and orientations between sample points.
This method applies a series of mathematical functions through systematic stages including statistical analysis, variogram construction, and surface generation .
Specifically.
Ordinary Kriging implements the spatial concept by utilizing sample values and variograms to estimate values at points that have not yet been measured.
The accuracy of the prediction depends largely on the degree of proximity to the locations that already have measurement data .
ycA ycC.
c yc.
cIycu ) = Oc ycn=0 yuI0ya ycn ycyc yco .
cIycn ) .
The equation 11 shows the mechanism of value estimation ycC.
c yc.
cIycu ) at t k time for a location ycIycu .
This process involves values ycyc yco .
cIycn ) from N nearby locations ycIycn , with Ordinary Kriging weight yuI0ya ycn optimized to integrate spatial data, resulting in accurate predictions at the ycC.
c yc.
cIycu ).
Evaluation The confusion matrix, which includes True Positive (TP).
False Positive (FP).
False Negative (FN), and True Negative (TN), will be utilized to evaluate the performance of the Artificial Neural Network (ANN) classification method.
From this matrix, metrics like accuracy, precision, recall, and F1 score are derived.
Accuracy indicates the proportion of correct predictions, precision assesses the accuracy of positive predictions, recall gauges the model's ability to identify positive cases, and the F1 score combines precision and recall for a comprehensive evaluation, particularly for imbalanced datasets.
Based on the confusion in Table V, the matrix will be used to calculate some of the evaluation matrices in equation 12 - 15.
TABLE V
CONFUSION MATRIX
Data
Positive Predictions Negative Predictions Neutral Prediction Actually positive Accuracy = Actually Negative Actually Neutral
TP TN
TP FP TN FN
ycNycE
TP FP
ycEycyceycaycnycycnycuycu = ycIyceycaycaycoyco = ycNycE
TP FN
INTL.
JOURNAL ON ICT VOL.
NO.
DECEMBER 2024
ya1 Oe ycIycaycuycyce = 2 y ycIyceycaycaycoyco yycEycyceycaycnycycnycuycu ycIyceycaycaycoyco ycEycyceycaycnycycnycuycu y 100 Experiments Experiments were conducted to assess and compare the performance of the Artificial Neural Network (ANN) and Long Short-Term Memory (LSTM) classification methods using an identical dataset.
In the ANN method, classification is done by applying feature expansion to explore relevant feature combinations that enhance prediction accuracy.
On the other hand, the LSTM experiment solely uses the target class without incorporating additional features, allowing the LSTM to generate the feature representation automatically.
IV.
RESULTS AND DISCUSSION
Result This section presents the results of implementing and assessing the ANN classification model with the TimeBased Feature Expansion approach, which incorporates time-based features t Oe k, based on a dataset of 34 available entries.
Best Perfomance of t-k Artificial Neural Network The t-k Artificial Neural Network (ANN) Model exhibited remarkable performance, as evidenced by the confusion matrix values presented in the Table VI.
These values highlight the model's effectiveness in TABLE VI
BEST PERFORMANCE T-K ANN TIME-BASED MODEL
Scenario
Optimal
Model
Accuracy 90,00% 80,00% 80,00% 84,61% 85,71% 92,30% 63,49% 62,38% Perfomance Precision Recall 91,67% 90,00% 85,71% 80,00% 85,71% 80,00% 87,44% 84,62% 86,90% 85,71% 93,59% 92,31% 78,57% 63,54% 63,39% 62,59% F1-Score
89,90%
79,17%
79,17%
82,83%
85,53%
92,11%
62,37%
61,93%
The analysis reveals that the highest performance was achieved in scenarios t-6, t-8, t-14, t-17, t-21, t-24, and t-25, with accuracy ranging from 92.
30% to 92.
85% and F1-scores between 92.
11% and 92.
67%, demonstrating high stability.
Models in scenarios t-5, t-7, t-9, and t-10 also exhibited consistent performance with accuracy above 85%, despite minor fluctuations.
Conversely, significant performance declines were observed in scenarios t-33 and t-34, with accuracy values of 63.
49% and 62.
38%, and F1-scores of 62.
37% and 61.
likely due to suboptimal feature combinations or model configurations.
Overall, the pattern indicates that models with high accuracy tend to exhibit greater stability, while those with lower performance show sharper declines, providing critical insights for selecting optimal models in similar applications.
SONI ANDIKA GUTAMA ET AL.
PREDICTION OF CLASSIFICATION OF AIR QUALITY DISTRIBUTION IN JAVA ISLAND USING ANN A
Comparison of ANN and LSTM Model Performance The experimental results presented in Table 1 demonstrate that ANNs surpass LSTMs, especially in terms of accuracy, precision, recall, and F1-score.
Based on these outcomes, this study chooses ANN as the main classification method.
This decision is based on ANN's excellence in producing more accurate predictions as well as higher training efficiency.
In contrast to LSTM which requires sequential data analysis.
ANN is faster and more effective in handling non-sequential datasets, making it more suitable for air quality monitoring applications in Java.
TABLE VII
COMPARISON OF ANN AND LSTM MODEL PERFORMANCE BASED ON EVALUATION METRICES
Model
90,00%
Artificial Neural Network
f1_score 91,67% 90,00% 89,90% 92,86% Long Short Term Memory f1_score 94,05% 92,86% 92,67% 80,00% 85,71% 80,00% 79,17% 77,78% 83,33% 77,78% 73,81% 80,00% 85,71% 80,00% 79,17% 87,54% 93,71% 92,41% 93,06% 84,61% 87,44% 84,62% 82,83% 76,60% 72,69% 79,32% 75,86% 85,71% 86,90% 85,71%
85,53%
69,93%
61,88%
66,92%
64,30%
62,30%
62,50%
61,90%
50,00%
25,00%
50,00%
33,33%
63,30%
TABLE Vi
OPTIMAL FEATURE COMBINATION FOR PREDICTION MODELS
Scenario Optimal Model Optimal Features x113,x17,x16,x12,x15 x210,x28,x23,x212,x211,x25,x16,x26,x18,x24,x111,x14 x11,x110,x313,x35,x23,x15,x38,x14,x113,x212,x24,x311,x310,x36,x18,x34,x27,x29,x12, x31,x21,x22,x32,x210,x211,x37 x13,x12,x33,x22,x27,x49,x110,x410,x34,x24,x311,x112,x310,x38,x17,x42,x26,x43,x212, x36,x113,x32,x39,x15,x211,x11,x46,x14,x21,x210,x16,x312,x35 x42,x24,x34,x39,x57,x28,x12,x210,x33,x112,x410,x29,x212,x411,x38,x311,x512,x510,x5 1,x13,x36,x49,x48,x513,x19,x15,x113,x310,x37,x213,x313,x31,x110,x47,x35,x11,x211,x 16,x412,x22,x58,x26,x43,x55,x41,x312,x21,x45,x27 x21,x612,x13,x511,x49,x63,x611,x54,x112,x48,x27,x312,x610,x613,x46,x38,x44,x412,x4 10,x18,x41,x17,x67,x11,x28,x110,x210,x29,x51,x211,x12,x513,x69,x52,x26,x25,x313,x6 1,x45,x212,x42,x310,x58,x55 Optimal Feature Combination The table below presents the optimal feature combinations for the prediction models.
These combinations are designed to improve the models' predictive accuracy.
Additionally, they contribute to the overall reliability of the predictions.
The results indicate that the best-performing combinations do not always utilize all features.
scenario t-34, excluding certain features improved performance, emphasizing the importance of automated feature selection in optimizing model accuracy and efficiency.
INTL.
JOURNAL ON ICT VOL.
NO.
DECEMBER 2024
TABLE IX
CLASS PREDICTION OF AIR POLLUTION DISTRIBUTION FROM MAY 2022 TO APRIL 2023
Location Jend.
Sudirman Tangerang
Sudimara Ciledug
Yani Semarang
Balai Kota Depok
Bandung
Wonorejo Surabaya
May Jun
Jul
Aug
Sep
Oct
Nov
Des
Jan
Feb
Mar
Apr
TABLE X
CLASS PREDICTION OF AIR POLLUTION DISTRIBUTION FROM MAY 2023 TO APRIL 2024
Location Jend.
Sudirman Tangerang Sudimara Ciledug Yani Semarang Balai Kota Depok Bandung Wonorejo Surabaya May Jun Jul Aug Sep Oct Nov Des Jan Feb Mar Apr Best T K Artificial Neural Network Time-Based Model The results from t-k, processed using the same algorithm to generate predictions for t k, demonstrate the algorithm's consistency and reliability.
These predictions underscore the model's capability to generalize effectively across different scenarios, highlighting its potential for robust forecasting in similar contexts.
TABLE XI
CLASS PREDICTION OF AIR POLLUTION DISTRIBUTION FROM MAY 2024 TO FEBRUARY 2025
Location Jend.
Sudirman Tangerang Sudimara Ciledug Yani Semarang Balai Kota Depok Bandung Wonorejo Surabaya May Jun Jul Aug Sep Oct Nov Des Jan Feb SONI ANDIKA GUTAMA ET AL.
PREDICTION OF CLASSIFICATION OF AIR QUALITY DISTRIBUTION IN JAVA ISLAND USING ANN A
Visualization of Air Pollution distribution classification Map The air quality index prediction results generated by the Artificial Neural Network model can be utilized to create a map depicting the distribution of air pollution across Java.
This map serves to visually and informatively represent the air quality variations in various regions.
Fig.
Prediction Map of Air Pollution Distribution May 2022 to June 2022 Fig.
Prediction Map of Air Pollution Distribution July 2022 to August 2022 Fig.
Prediction Map of Air Pollution Distribution September 2022 to October 2022 Fig.
Prediction Map of Air Pollution Distribution November 2022 to December 2022 Fig.
Prediction Map of Air Pollution Distribution January 2023 to February 2023 INTL.
JOURNAL ON ICT VOL.
NO.
DECEMBER 2024
Fig.
Prediction Map of Air Pollution Distribution March 2023 to April 2023 Fig.
Prediction Map of Air Pollution Distribution May 2023 to June 2023 Fig.
Prediction Map of Air Pollution Distribution July 2023 to August 2023 Fig.
Prediction Map of Air Pollution Distribution September 2023 to October 2023 Fig.
Prediction Map of Air Pollution Distribution November 2023 to December 2023 Fig.
Prediction Map of Air Pollution Distribution January 2024 to February 2024 SONI ANDIKA GUTAMA ET AL.
PREDICTION OF CLASSIFICATION OF AIR QUALITY DISTRIBUTION IN JAVA ISLAND USING ANN A
Fig.
Prediction Map of Air Pollution Distribution March 2024 to April 2024 Fig.
Prediction Map of Air Pollution Distribution May 2024 to June 2024 Fig.
Prediction Map of Air Pollution Distribution July 2024 to August 2024 Fig.
Prediction Map of Air Pollution Distribution September 2024 to October 2024 Fig.
Prediction Map of Air Pollution Distribution November 2024 to Desember 2024 Fig.
17 Prediction Map of Air Pollution Distribution January 2025 to February 2025 INTL.
JOURNAL ON ICT VOL.
NO.
DECEMBER 2024
Discussion This research demonstrates that the Artificial Neural Network (ANN) model, combined with a time-based feature expansion approach, performs effectively in predicting air quality on Java.
Experimental results indicate that the ANN achieves up to 92.
30% accuracy and a 92.
19% F1-score in the t-6 scenario, highlighting its capability to manage non-linear variable relationships.
The steady improvement in accuracy and F1-score up to this scenario underscores the ANN's proficiency in delivering reliable predictions for short- to medium-term However, its performance begins to decline beyond the t-10 scenario, likely due to the increased complexity of longer-term predictions.
The identification of dominant features such as PM10.
SO2.
CO.
O3, and NO2 is a key factor in the success of the model in generating accurate predictions.
These features contribute significantly to helping Artificial Neural Network (ANN) recognize air quality distribution patterns that are often difficult to handle by conventional methods or models such as Long Short-Term Memory (LSTM).
To provide additional insight, correlation analysis showed that PM10 demonstrated the strongest relationship with the air quality index, followed by SO2 and NO2.
In contrast, features such as wind speed showed a much weaker relationship, indicating their limited impact on predictions.
This reinforces the selection of relevant features in improving model performance.
In addition, time-based feature expansion methods allow for the exploration of various combinations of input variables, providing the model with the ability to capture complex temporal patterns .
The iterative approach to feature selection also revealed that certain combinations of features, such as PM10.
NO2, and population density, consistently improved prediction accuracy.
This highlights the importance of targeted feature engineering for specific scenarios.
Spatial visualization through kriging interpolation adds to the understanding of air quality distribution, especially in urban areas such as DKI Jakarta and Surabaya, which have higher concentrations of air pollution due to high vehicle and industrial activity.
Jakarta's dense urbanization and limited airflow exacerbate pollutant concentration, while seasonal factors, such as lower rainfall during dry months, contribute to temporarily elevated PM10 and CO levels.
This underscores the importance of seasonal adjustments in predictive models.
This pattern emphasizes the need for mitigation policies that focus on areas with high population density and economic activity.
In terms of policy applications, the results suggest several actionable strategies.
For example, predictive data can be used to enforce vehicle restrictions in high-pollution zones during peak times or optimize the placement of air quality sensors in areas predicted to exhibit high pollution variability.
Such measures can enhance the effectiveness of pollution control efforts while prioritizing resource allocation.
However, this study has several limitations that need to be considered.
Reliance on the quality and availability of sensor data is a major challenge, especially in regions with inadequate data coverage.
In addition, the Artificial Neural Network (ANN) model is still not able to capture sudden changes in air quality due to external factors.
, such as natural disasters or extreme weather conditions.
These limitations indicate the need for additional data integration and exploration of hybrid methods to strengthen model performance, especially in long-term predictions.
This study offers significant insights for environmental management efforts on the island of Java.
The ANN models have the potential to facilitate data-informed decision-making, including the strategic placement of air quality sensors and the formulation of more effective air pollution control policies.
Future research could investigate the integration of real-time data and spatial modeling enhancements, such as dynamic interpolation methods, to improve prediction accuracy in rapidly changing environments.
Additionally, exploring hybrid models combining ANN with deep learning architectures like RNN-LSTM or transformers may further optimize long-term forecasting capabilities.
These approaches are anticipated to contribute sustainably to improving air quality management practices.
SONI ANDIKA GUTAMA ET AL.
PREDICTION OF CLASSIFICATION OF AIR QUALITY DISTRIBUTION IN JAVA ISLAND USING ANN A
CONCLUSION
This study demonstrates that the ANN model with time-based feature expansion significantly enhances the accuracy of air quality predictions across Java, achieving a maximum accuracy of 92.
30% and an F1-score of 19% in the t-6 scenario, proving its effectiveness for short- to medium-term predictions.
The identification of dominant features, including PM10.
SO2.
CO.
O3, and NO2, validates the model's capacity to accurately capture air pollution distribution patterns.
These findings provide valuable insights for air quality management, such as optimized sensor placement strategies and evidence-based mitigation policy development.
Future research should focus on enriching the dataset with additional attributes, integrating real-time data streams, and exploring advanced machine learning models, such as hybrid or transformer-based architectures, to further enhance prediction accuracy and adaptability to sudden environmental changes.
ACKNOWLEDGMENT
The authors wish to thank Telkom University for the support of funds and infrastructure, so that this research can be completed.
REFERENCES