OPEN ACCESS
ISSN 2356-5462
http://socj.
id/ijoict/ Intl.
Journal on ICT Vol.
No.
Dec 2024.
doi: 10.
21108/ijoict.
Class Prediction of Dengue Fever Spread in Bandung Using Decision Tree with TimeBased Feature Expansion Iqlima Putri Hawa 1, *.
Sri Suryani Prasetiyowati 2.
Yuliant Sibaroni 3
1,2,3
School of Computing.
Telkom University Bandung.
Indonesia iqlima@student.
Abstract In Indonesia, dengue hemorrhagic fever (DHF) has become a serious community health concern due to fluctuating incidence rates influenced by several factors.
It requires comprehensive control strategies to prevent the rise of the incidence.
This study seeks to classify the future spread of DHF in Bandung City, accompanied by optimal factors that influence the increase in its spread.
This study proposes using Decision Tree to predict a classification of DHF spread with implementation of spatial time-based feature expansion.
The developed scenario is to build a target class with class prediction model based on the previous time period.
From the developed scenario, the selected model has optimal performance to form a class prediction model in the future.
The used classes its selves are formed by ranging the incidence rate (IR) into low, medium and high class.
The data used includes spatial-temporal information such as population, education level, rainfall, temperature, and blood type from 2017 to 2021.
The results obtained show that the performance of Decision Tree using time-based feature expansion is more than 90%, with visual predictions that help identify high risk areas.
The contribution of this study is to inform the public and health institution regarding DHF spread for the future and influential factor so that the government can provide policies as early as possible to prevent DHF spread.
Keywords: Incidence Rate.
DHF.
Prediction.
Decision Tree.
Feature Expansion.
Classification INTRODUCTION ENGUE Hemorrhagic Fever is one of the diseases concerned due to its constant appearance and fluctuating cases.
Based on Ministry of HealthAos annual data, in 2022 West Java became the province with the highest DHF cases.
The number of cases reached 36.
608 from a total of 143.
266 DHF cases in Indonesia .
In 2023 West Java also had a major number of DHF cases, reaching 19.
328 from a total of 114.
720 DHF
cases in Indonesia .
As for Bandung, the capital city of West Java, the DHF cases fluctuate throughout each Based on several health center data from Bandung between 2017 and 2022, the lowest cases are in 2017 with 1,786 cases and highest in 2022 with 5,205 cases .
This data shows how fluctuating and concerning the DHF cases in West Java is, as well as Bandung as its capital city.
Therefore, this research intends to predict and classify DHF incidence in sub districts of Bandung city for 2022, 2023, 2024 and 2025 in a form to contribute in preventing the spread of DHF.
To achieve the best result of prediction using a Decision Tree, this research examines the best feature to use based on their accuracy with the technique of feature selection and feature expansion where this dataset are the features from 2017 until 2021, that are separated into several models.
The prediction of DHF incidence rate classification for 2022, 2023.
Received on 10 Dec 2024.
Revised on 20 Dec 2024.
Accepted and Published on 23 Jan 2025 IQLIMA PUTRI HAWA ET AL.
CLASSIFICATION PREDICTION OF DENGUE FEVER SPREAD USING DECISION TREE WITH TIME-BASED FEATURE
EXPANSION
2024 and 2025 are being conducted using the finest model with an outstanding accuracy in order to give an accurate prediction.
This research will benefit health sectors and other stakeholders by giving the preemptive solution to minimize DHF incidence by informing features that have major influence on DHF rate and the class prediction that shows the severity of DHF cases in these sub districts.
Although there are a lot of studies regarding DHF incidence prediction, there are rarely studies where the researcher delves on to predict the DHF incidence rate for years ahead using a Decision Tree based on spatial temporal of a certain location.
Regarding health issue prediction, these studies .
, .
, and .
have attempted to make predictions using several methods including Decision Tree.
Study .
examines DHF case prediction using Support Vector Machines (SVM) with accuracy of 84%.
K-Nearest Neighbor (KNN) obtaining accuracy of 87%, and Decision Tree (DT) obtaining accuracy of 79%.
Implementing hybrid classification approach using the hard voting technique, the three methods of classification accuracy could be improved reaching accuracy of Study .
examines prediction of breast cancer prediction using Decision Tree (DT) and Adaptive Boosting (Ad boos.
, where the dataset is highly imbalanced.
To enhance the Decision Tree's performance in identifying malignant observations.
Adaptive Boosting is employed.
Obtaining accuracy of 92.
53% for Adaptive Boosting and 88.
80% for Decision Tree.
Study .
examines stroke probability prediction with Decision Tree and Naive Bayes.
The study obtained the decision tree by calculating Gini coefficients of each feature to select the division.
Each Decision Tree and Naive Bayes model prediction gives accuracy of 88% and These studies show how Decision Trees could give great accuracy with the right improvement.
It also shows that selecting the best method to build the Decision Tree and handle imbalance data are also crucial.
this research, feature selection and feature expansion are being implemented in an attempt for a Decision Tree to produce an accurate prediction.
Regarding spatial-temporal analysis with time-based feature expansion, these studies .
, .
, .
predict climate and disease spread based on several features that influence the cause.
Study .
uses Naive Bayes to predict classification regarding Bandung CityAos dengue fever cases and Java IslandAos monthly rainfall distribution by using feature expansion obtaining accuracy more than 97% for both predictions.
Study .
conducts research to predict classification of COVID-19 transmission and dengue fever.
Using SVM time-based feature expansion, each class prediction of DHF and COVID-19 transmission obtain accuracy of 90% and 93%.
Study .
conduct a class prediction of DHF spread by implementing Random Forest while also adapting feature expansion, resulting three optimum model three models with accuracy of 97%, 93%, and 93%.
These studies show how feature expansion succeeds to improve the accuracy of a model with overall accuracy more than 90%, even better than these studies .
, .
, .
where the model didn't use time-based feature expansion.
Despite the growing DHF case overtime in West Java especially in Bandung, there is a chance to prevent DHF spread by implementing class prediction.
As it is previously shown, studies .
, .
, .
, .
, .
succeed in undertaking classification regarding health issues with a satisfying result.
The studies also showed that Decision Tree is one of the methods that is oftentimes being used to handle classification with a good result, while the usage of time-based is capable of enhancing the accuracy of the model.
In this research.
Decision Tree are being chosen for how often this method used for classification task in health-related predictions with a reliable result.
Spatial-temporal analysis with time-based feature expansion is being chosen because it captures the interaction between space and time, allowing for more comprehensive understanding of DHF spread patterns in specific location and time period while also improving the accuracy of the Decision Tree model.
Although the usage of Decision Tree and time-based feature expansion for class prediction already conducted in several studies, not many studies combining these two methods to give an elaborate analysis regarding class prediction provided with visualization of the certain location and time.
This method combination also emerging possibilities of creating more accurate and precise results compared to the other studies.
Hence, this research carries out a class prediction of DHF spread using Decision Tree in 30 sub districts of Bandung city for 2022, 2023, 2024 and 2025.
Using a dataset from the previous 4 years, time-based feature expansion is being implemented to enhance accuracy.
The takeout of this research are the features that influence the DHF spread the most, the accuracy of each model as a comparison to choose the best model, the result of each sub district class prediction and the visualization based on the class prediction to give a better understanding.
By this take INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
out, there is a hope that this research will give a direct solution to inform the public and health institution regarding DHF spread for the future and influential factor so that the government can provide policies as early as possible to prevent DHF spread.
II.
LITERATURE REVIEW
Dengue Hemorrhagic Fever is a critical state of dengue fever with symptoms of critical high fever, muscle aches, rash, hemorrhagic episodes, and circulatory shock .
The dengue virus was mainly spread by the bites of infected Aedes aegypti mosquitoes, causing the fever .
A lot of factors could influence the rise of DHF To start with the environment.
DHF cases could spread in tropical or subtropical areas.
Since climate change indicators such as rainfall, humidity and temperature can have an impact on Aedes aegypti mosquito breeding, the spread of the virus fluctuates based on the climate change.
By 2020.
DHF rate increased up to 953,476, which mainly take place in tropical countries .
Diversities in a population such as blood type, gender, age, and education create a certain demographic that explains its influence regarding DHF cases in some areas.
There are studies that delve into the relation between blood type and dengue fever.
Blood type B patients were more likely to contract dengue virus infections than blood type AB patients .
Whereas people with blood type O have the worst outcomes in dengue hemorrhagic fever .
Several areas in Indonesia such as Kediri .
Blitar .
, and Ternate .
show that DHF incidence mostly occurs to men and people with the age range of 5-14 years old.
One of the most commonly used techniques for classifier representation in supervised learning is the Decision Tree.
Decision Trees facilitate decision-making by the verification of particular qualities by each node in the decision tree .
The test features of each node are separated based on decision tree functions like the Gini index and entropy.
An indicator of a criterion to lessen the possibility of misclassification is the Gini Index, also referred to as impurity.
Whereas entropy, also referred to as information gain, reveals the degree of disorder in a set.
Zero entropy means the points of each target class are equal .
There is a study that examines splitting choice of Decision Tree, where it concludes that since the Gini index has less bias concerning influences, it is better than entropy information.
This study .
uses both Gini index and entropy for their Decision Tree to detect breast cancer.
It shows Decision Tree with implementation of feature selection using Gini index obtain accuracy of 87.
83%, whereas accuracy using entropy is 86.
Time-based feature expansion, including model selection and combination, is an effective approach for forecasting large sets of time series, with performance varying based on the nature of the time series .
Feature expansion is capable of improving the accuracy of classification as opposed to traditional machine learning algorithms for data classification.
The cause of this is that classifiers can now take into account multiple dimensions due to the expanded features, which is not possible with low-dimensional data .
This study .
compares two methods to make a classification system to predict the number of Social Welfare Service Recipients (SWSR).
In this study.
SVM uses time-based feature expansion, resulting accuracy value of 70% and 80%.
Outperforming LSTM, which had accuracy values of 34.
28% and 48.
RESEARCH METHOD
DHF incidence rates in this study are being predicted using Decision Tree.
The Decision Tree method predicts the incidence rate of 30 sub districts in Bandung for 4 years ahead.
The results of the prediction are being visualized through the map of BandungAos sub districts based on the class.
Fig.
1 shows the steps that occur in the research to finally obtain the prediction and the spatial temporal in map form according to the predicted IQLIMA PUTRI HAWA ET AL.
CLASSIFICATION PREDICTION OF DENGUE FEVER SPREAD USING DECISION TREE WITH TIME-BASED FEATURE
EXPANSION
Dataset The datasets that are put into practice for this research contain DHF cases, climate, population, educational history, and blood type are gained from several sources such as Public Health Office.
Meteorological.
Climatological, and Geophysical Agency (BMKG) and Central Bureau of Statistic (BPS) of Bandung.
Table I illustrates the description regarding the dataset.
The dataset has 13 features that influence DHF cases and incidence rate of sub districts in Bandung between 2017 and 2021.
Preprocessing Data Fig.
Research Method
TABLE I
DATASET
Notation Description Population Male Population Proportion Rainfall Temperature Humidity Blood Type A Blood Type B Blood Type AB Blood Type O Elementary School Graduates Junior High School Graduates High School Graduates College Graduates Target/Incidence Rate (/100.
000 populatio.
INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
In order to proceed the research, the data that are being used need to be processed, from class labelling, scaling, and solving imbalance data problems.
The datasets that are being used have incidence rate as the target Incidence rate (IR) shows the frequency of new cases of a health condition, in this case DHF, that specified population during a specific time period to provide insight of how fast the disease spread to a certain population.
From Equation 1, it could be seen how the incidence rate equation measures DHF cases per 100,000 population, showing how the incidence rate from the data is being obtained .
Since this research intends to classify DHF incidence rate, class labeling is important to group these incidence rates by ranging it and label it into classes as it can be seen from Table II.
yaycI = yaycaycyce y 100,000 ycEycuycyycycoycaycycnycuycu
TABLE II
CLASS LABELING
Class
Low
Medium
High
Label Class
Range IR<55 55 Ou IR O 100
IR > 100
In this research, normalization is implemented to ensure all the feature values in the dataset are normalized within the range of 0 to 1 using min max scaler.
As can be viewed on Equation 2, the process of Min-Max scaling is influenced by the feature's minimum and maximum data values.
ycu and ycuA indicates the original value and the scaled value, while ycn and ycu indicates the index of the dataset and the index of the feature .
ycuAycn,ycu = ycuycn,ycu Oe ycoycnycu.
cuycu ) ycoycaycu.
cuycu ) Oe ycoycnycu.
cuycu ) .
After examining the data, it turns out the number of certain classes is higher compared to the other class.
Which means the dataset is imbalanced and could have an unfavorable impact on classification model performance by causing bias towards the majority class, reducing overall accuracy.
This research used random oversampling (ROS) to handle imbalance data by raising the sample size of minority classes .
Fig.
illustrates how oversampling works.
Fig.
Oversampling .
IQLIMA PUTRI HAWA ET AL.
CLASSIFICATION PREDICTION OF DENGUE FEVER SPREAD USING DECISION TREE WITH TIME-BASED FEATURE
EXPANSION
Feature Expansion The foundation of feature expansion is the modeling of time-based class predictions.
Feature expansion involves creating lagged features to capture temporal patterns, enhancing predictive accuracy and providing context for the case.
In this research, the lagged features are based on the data frequency or the time interval in the dataset, using yearly time steps.
Since this research attempt predict for 2022 .
, 2023 .
, 2024 .
, and 2025 .
, this research learns from the previous years.
Hence, the time step selection being 1 year before .
c Oe .
, 2 years before .
c Oe .
, 3 years before .
c Oe .
and 4 years before .
c Oe .
The goal of this study is to use the best outcome of a combination of earlier yc Oe yco models to build a predictive model of categorization for future yc yco.
This concept allows for the establishment of a classification model derived from the prior yc Oe yco features, with a ycyc class target .
Table i explains how feature expansion in this research works, by using previous features in order to predict the class, using the target class to be studied by the model.
TABLE i IMPLEMENTATION OF FEATURE EXPANSION
Time Step
1 year before 2 years before 3 years before 4 years before Model
Data Feature 2017, 2018
2018, 2019
2019, 2020
2017, 2018, 2019
2018, 2019, 2020
2017, 2018, 2019,
Target Class Feature Selection In order to choose the best model, it is necessary to select the best combination feature using SelectKbest with Aof_classifAy, since there are a lot of possible feature combinations as can be seen in Table IV.
SelectKBest is a feature selection algorithm that identifies the most relevant features from a given dataset .
These selected combinations are tested by being implemented in the decision tree so the accuracy of each feature combination could be examined.
Models with feature combinations that obtain the highest accuracy, which indicates optimal performance, are being chosen to predict the future classification.
TABLE IV
EXAMPLE OF MODEL 3B POSSIBLE FEATURE COMBINATIONS
Year
Data Feature 2018, 2019, 2020
2018, 2019, 2020
2018, 2019, 2020
2018, 2019, 2020
2018, 2019, 2020
2018, 2019, 2020
Feature xa2, xa3, xb10 xa4, xb8, xc13 xa1, xc6, xc9 xa1, xa4, xb3, xc10 xa1, xb9, xb13, xc2 xa1, xa2, xa3, .
, xc13 Decision Tree Modeling Decision Tree in classification is a method that applies distinct decision variables to represent path that every observation will follow in the tree, which aims to improve classification accuracy .
Nodes and branches are INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
the structure of each tree.
Every node signifies a feature of the category being classified, while each subset specifies the possible values the node can assume .
This research uses Gini index as a crucial indicator of a data node's "impurity," enabling the algorithm to select the most suitable feature to split into at every node with one minimum samples leaf and two minimum samples split.
The illustration of a decision tree could be seen on Fig.
Fig.
Illustration of Decision Tree .
Gini index is a splitting criterion used in decision trees to measure dataset or node impurity and improve classification accuracy by determining the best feature for splitting the data at each node.
The Gini index ranges from zero to one, zero signifies the data is pure .
ll instances in the dataset belong to a single clas.
One signifies the data is completely impure, with classes being evenly distributed across the dataset.
Gini index in decision trees works by measuring data impurity to create splits that result in more homogeneous subsets, thereby improving classification accuracy.
Gini chooses the minimum value for choosing the root node and for every decision we do it again on all features .
Equation 3 and 4 show how the Gini index works in a decision tree, with L signifying a dataset containing j distinct class labels.
Where ycyycn represents the relative frequency of class ycn in ya.
When the dataset is divided based on attribute ya into two subsets ya1 and ya2 , with sizes ycA1 and ycA2 .
GINI is computed using Equation 3 and the impurity reduction is determined using Equation 4.
yayaycAya.
= 1 Oe Oc ycyycn 2 ycn=1 yayaycAyaya .
= ycA1 yayaycAya.
a1 ) yayaycAya.
a2 ) ycA The implementation of Decision Tree on the dataset begins at the root node, where the model finds the best feature and threshold to split the data into two subsets that minimize the Gini index by evaluating all possible splits .
and calculates the Gini index for each resulting subset using Equation 3.
After the splits.
Equation 4 is used to calculate the weighted Gini index to measure the quality of the splits.
The feature and threshold used for the split are those that produce the lowest Gini index.
The decision tree continues these splitting procedures for each node as recursive partitioning until the nodes are pure or certain criteria are satisfied, such as maximum tree depth or minimum sample size in leaf nodes.
The prediction is performed then by traversing from the root to leaves.
The classification for each node is then being determined by the majority class of the samples in that leaf node.
IQLIMA PUTRI HAWA ET AL.
CLASSIFICATION PREDICTION OF DENGUE FEVER SPREAD USING DECISION TREE WITH TIME-BASED FEATURE
EXPANSION
Evaluation Confusion Matrix evaluates the accuracy of an ML model using a set of previously known target data.
addition, several other metrics, including sensitivity, specificity, precision, and f1 score, are generated related to this matrix .
The classification model with 3 classes in this research is evaluated based upon accuracy, f1 score, recall, and precision calculated from the values in the Confusion Matrix that can be seen on Fig.
using Equations 5, 6, 7, 8.
These equations show the precision, recall, and f1 score for class ya with total class yc .
Each ycNycEyca , ycNycEyca , and yaycAyca are the number of TP.
FP, and FN classifier's predictions for class ya .
This research used standard k-fold cross validation with cross validation of 5-fold to evaluate the performance of a Fig.
Confusion Matrix for 3 Class .
Ocycyca=1 ycNycEyca .
cNycEyca yaycEyca yaycAyca ycNycAyca ) ycNycEyca ycEycyceycaycnycycnycuycuyca = ycNycEyca yaycEyca ycNycEyca ycIyceycaycaycoycoyca = ycNycEyca yaycAyca 2 UI ycNycEyca ya1 ycycaycuycyceyca = 2 UI ycNycEyca yaycAyca yaycEyca yaycaycaycycycaycayc = IV.
RESULTS AND DISCUSSION
Result After selecting the feature and checking the accuracy, this research obtained the average values of confusion matrix of each model starting from Model 1A with time step of one year until Model 4A with time step of four These average values of confusion matric illustrate the performance of each model with different target class and data features the model studied when operating with the method of decision tree.
The visualized result of each model confusion matrix can be viewed on Fig.
Fig.
Fig.
7, and Fig.
Fig.
5 shows the confusion matrix average values for Model 1 that visualized the average values of Model 1 accuracy, f1 score, precision and recall.
Fig.
6 shows the confusion matrix average values for Model 2 that visualized the average values of Model 2 accuracy, f1 score, precision and recall.
Fig.
7 shows the confusion matrix average values for Model 3 that visualized the average values of Model 3 accuracy, f1 score, precision and recall.
INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
Fig.
Confusion matrix values in average of Model 1.
Fig.
Confusion matrix values in average of Model 2.
Fig.
Confusion matrix values in average of Model 3.
Fig.
8 shows the confusion matrix average values for Model 4 that visualized the average values of Model 4 accuracy, f1 score, precision and recall.
The selected model, as can be seen in Table V, has been concluded after the accuracy of each model with selected features are being examined.
Table VI shows yc yco class prediction of DHF spread in Bandung sub district.
IQLIMA PUTRI HAWA ET AL.
CLASSIFICATION PREDICTION OF DENGUE FEVER SPREAD USING DECISION TREE WITH TIME-BASED FEATURE
EXPANSION
Fig.
Confusion values in average Values of Model 4.
TABLE V
SELECTED T-K MODEL AND FEATURE COMBINATION.
Selected Model Selected Feature 'xa1', 'xa2', 'xa3', 'xa4', 'xa6', 'xa8', 'xa9', 'xa10', 'xa11', 'xa12', 'xa13' xa1, xa10, xa11, xb1, xb3, xb4, xb10, xb11 xa1, xa10, xa11, xb1, xb10, xb11, xc1, xc10, xc11, xc13 xa1, xa10, xa11, xa13, xb1, xb10, xb11, xb13, xc1, xc10, xc11, xc13, xd1, xd6, xd10, xd11, xd13 Selected Target Class Accuracy F1 Score Precision Recall TABLE VI T K CLASS PREDICTION OF BANDUNG SUB DISTRICT.
Sub District Andir Antapani Arcamanik Astana Anyar Babakan Ciparay Bandung Kidul Bandung Kulon Bandung Wetan Batununggal Bojongloa Kaler Bojongloa Kidul Buahbatu Cibeunying Kaler t k Class Prediction INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
Cibeunying Kidul Cibiru Cicendo Cidadap Cinambo Coblong Gedebage Kiaracondong Lengkong Mandalajati Panyileukan Rancasari Regol Sukajadi Sukasari Sumur Bandung Ujung Berung .
Fig.
9 Visualization of DHF Incidence Rate Class Prediction in 4 years .
2024 and .
2025 IQLIMA PUTRI HAWA ET AL.
CLASSIFICATION PREDICTION OF DENGUE FEVER SPREAD USING DECISION TREE WITH TIME-BASED FEATURE
EXPANSION
In order to give more comprehend understanding regarding the class prediction of DHF spread in Bandung, we visualized the result of yc yco class prediction.
Fig.
9 and Fig.
10 shows class prediction of DHF spread for each year of 2022 until 2025 in the form of maps and subplots.
Fig.
Subplots of DHF Incidence Rate Class Prediction in four years, .
2022 and 2023, .
2024 and 2025.
INTL.
JOURNAL ON ICT VOL.
NO.
DEC 2024
Discussion Decision Tree analyzing the performance of each model is necessary to acquire an outstanding model that could produce an accurate prediction.
Average confusion matrix values in this research are meant to explain the performance of each model that differentiates by the features and the target class that they are studied.
Accuracy in a confusion matrix reflects the model's overall performance across all classes, represented as the percentage of correct predictions out of the total predictions made.
F1 score in confusion matrix provides a fair evaluation of recall and precision, providing information of how well the model balances recall and precision.
Precision in confusion matrix measures the proportion of correctly predicted positive cases .
rue positive.
out of all cases that the model predicted as positive .
um of true positives and false positive.
, it focuses on how accurate the model's positive predictions are.
Recall in confusion matrix measures the proportion of actual positive cases .
rue positive.
that were correctly predicted as positive by the model .
um of true positives and false negative.
, it focuses on how well the model captures all actual positive cases.
Model 1 average values can be seen through Fig.
5, where Model 1D has the best average values.
Obtaining each of the average values of 0.
97 accuracy, 0.
f1 score, 0.
97 precision and 0.
96 recall.
Model 2 average values can be seen through Fig.
6, where both Model 2A and 2C have equally satisfactory average values.
Obtaining each of the average values of 0.
96 accuracy, 96 f1 score, and 0.
97 precision and 0.
96 recall.
Model 3 average values can be seen through Fig.
7, where Model 3B has the best average values.
Obtaining each of the average values of 0.
97 accuracy, 0.
97 f1 score, 97 precision and 0.
97 recall.
Model 4 average values can be seen through Fig.
8, where it only has 1 model namely Model 4A.
Obtaining each of the average values of 0.
97 accuracy, 0.
96 f1 score, and 0.
97 precision and 97 recall.
It can be said that these models perform exceptionally well because all of the average values are higher than 90.
When it comes to accuracy, the models predict most cases correctly for every class.
In terms of precision, they minimize false positives while successfully identifying the positive class.
The models reduce false negatives by capturing the majority of true positive cases in terms of recall.
Lastly, the modelAos ability to reduce false positives and false negatives is demonstrated by the f1 score, which shows a successful balance between precision and recall.
The selected model as shown in Table V, has been concluded after the accuracy of each model with selected features are being examined.
The result shows that Model 1B is suitable to predict classification for 2022 with target class of 2019 and 11 features in total, resulting accuracy of 0.
The model used selected feature of population, male population proportion, rainfall, temperature, blood type A, blood type AB, blood type O, elementary school graduates, junior high school graduates, high school graduates, and college graduates from Model 2A is suitable to predict classification for 2023 with target class of 2019 and 8 features in total, resulting accuracy of 0.
The model used selected features of population, elementary school graduates, and junior high school graduates from 2017.
While features from 2018 consist of population, rainfall, temperature, elementary school graduates, and junior high school graduates.
Model 3B is suitable to predict classification for 2024 with target class of 2021 and 10 features in total, resulting accuracy of 0.
The model used selected features of population, elementary school graduates and junior high school graduates from 2018, 2019,2020 and college graduates feature from 2020.
Model 4A is suitable to predict classification for 2024 with target class of 2021 and 17 features in total, resulting accuracy of 0.
The model used selected features of population, elementary school graduates, junior high school graduates, and college graduates from 2017, 2018, 2019, 2020 and blood type A from 2020.
From these selected features and models.
Model 2A with the finest accuracy of 98, compared to Model 3B and 4A with difference of 2% and Model 1B with difference of 8%.
Looking from the same pattern of features in Model 3B and Model 4A that mainly the features are population, elementary school graduates, junior high school graduates and college graduates and both target class are 2021 seems to influence both models having the same accuracy of 0.
Finally, the class prediction of the spread of DHF using the Decision Tree as the method has been obtained as in Table VI, after selecting the feature and model to produce the accurate prediction.
Later than, class prediction of DHF spread based on the sub district in Bandung are being visualized.
It could be seen from Fig.
, and Fig.
, that prediction for 2022 and 2023 has the same visualization due to the same class target of 2019 that are being studied when selecting the best feature and model.
This is also applied to prediction 2024 IQLIMA PUTRI HAWA ET AL.
CLASSIFICATION PREDICTION OF DENGUE FEVER SPREAD USING DECISION TREE WITH TIME-BASED FEATURE
EXPANSION
and 2025 from Fig.
and Fig.
for having the same visualization due to having the same class target of 2021 that are being studied when selecting the best feature and model.
For 2022 and 2023, it seems that the DHF spread with medium intensity consist of Andir.
Babakan Ciparay and Bandung Kulon.
As for 2024 and 2025, the DHF spread with medium intensity consist of Babakan Ciparay and Bandung Kulon.
In order to comprehend and evaluate the effect of using different methods and preprocessing, this research examine the comparison of this research with Decision Tree without time-based feature expansion .
, also comparing research of time-based feature expansion with different methods such as Random Forest .
Study .
uses Decision Tree and entropy as its measurement for potential split of each node with cross validation of 10-fold and obtaining the optimum accuracy of 87.
With time-based feature expansion, this research model can reach an accuracy value of 0.
98 with Model 2A.
Study .
uses Random Forest as the method to predict DHF incidence rate.
There are technical differences between this research and the one using Random Forest .
This research uses Random Oversampling (ROS) to handle imbalance data and standard k-fold cross validation of 5-fold, while study .
using stratified k-fold cross validation with cross validation of 10-fold.
This research resulted Model 2A as the best model with accuracy of 0.
98, using features from 2017 and 2018.
While study .
resulted Model 2C with the highest accuracy of 96,67%, using features from 2017 and 2018.
From this comparison it could be concluded that although using different methods and results, data from 2017 and 2018 seems more significant based on accuracy.
Another pattern that could be seen from this research and study .
is when visualizing the prediction for the future onto the map.
Which is how the target classes that are being studied by the model could affect the prediction.
In study .
, the visualization of predictions for 2023 and 2024 are the same.
While the visualization for 2022 is different from the rest.
This is caused by the model learning the same target class, which is target class of 2021.
For this research, the visualization of prediction for 2022 and 2023 are the same due to target class studied by both models 2019.
While the visualization for 2024 and 2025 are the same due to target class studied by both models are 2021.
CONCLUSION
Class prediction using time-based feature expansion succeed to perform class prediction of DHF spread in Bandung sub district for the future.
The models are being developed by expanding feature from previous time of yc Oe yco and target class of yc yco.
Using dataset of DHF spread in Bandung sub district from 2017 until 2021, the implementation of time-based feature expansion using decision tree shows a satisfying result of accuracy value up to more than 90%.
Models with the finest accuracy are being achieved with feature expansion from previous time period.
ACKNOWLEDGMENT
The author wishes to thank Telkom University and both lecturers for the chance, support and funds that are being given.
As a result, this author could finish this research.
REFERENCES