Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
RESEARCH ARTICLE Empirical Evaluation Using Intelligent Modeling in Prediction of Potential Cancer Problematics Cases in Nigeria Arnold Adimabua Ojugo1*.
Chris Obaro Obruche 2, & Andrew Okonji Eboka3 Department of Computer Science.
College of Science.
Federal University of Petroleum Resources Effurun.
Delta State.
Nigeria Research Assistant.
Department of Computer Science.
College of Science.
Federal University of Petroleum Resources Effurun.
Delta State.
Nigeria Department of Network Computing.
Coventry University.
Priory Street Coventry CV1 5FB.
United Kingdom *Corresponding author: Arnold Adimabua Ojugo.
Department of Computer Science.
Federal University of Petroleum Resources Effurun.
Delta State.
Nigeria E-mail: ojugo.
arnold@fupre.
Abstract: The rapid rate as well as the volume in amount of data churned out on daily basis has necessitated the need for data mining process.
Advanced by the field of data science with machine learning approaches as new paradigm and platform, it has become imperative to provide beneficial support in constructing models that can effectively assist domain experts/practitioners Ae to make comprehensive decisions regarding potential cases.
The study uses deep learning prognosis to effectively respond to problematic cases of cancer in Nigeria.
We use the fuzzy rule-based memetic model to predict potential problematic cases of cancer Ae predicting results from data samples collected from the Epidemiology laboratory at Federal Medical Center Asaba.
Nigeria.
Dataset is split into training .
%) and testing .
%) to aid validation.
Results indicate that age, obesity, environmental conditions and family relations .
o the first and second degre.
are critical factors to be watched for benign and malignant cancer types.
Constructed model result shows high predictive capability strength compared to other models presented on similar studies.
Keywords: Cancer, memetic algorithm, cluster, epidemiology, reinforcement learning.
Nigeria Introduction Cancer is viewed as a group of diseases characterized by unregulated division and spread of The cancerous cells may occur in liquids, as in leukemia.
Most, however, occur in solid tumors that originally appear in various tissues in various parts of the body.
By their original locations they are classified into various types of cancer, such as lung, colon, breast, prostate cancer, etc (Dawane and Pandit, 2.
Localized tumors can be removed by surgery or irradiation with high survival rates.
As cancer progresses, however, it metastasizes Ae invading the surrounding tissues, entering the blood stream, spreading and establishing colonies in distant parts of the body (Ferlay et al, 2.
Only a third of patients with metastasized cancer survive more than five years.
Invasive distensions spreading crab-like from a tumor in the breast were described by Hippocrates.
From the crab, karkinos in Greek and cancer in Latin, came the name of the disease and the name of its inducing agents, carcinogens (Ukah and Nwafor, 2.
The causes of serious ill-health in the world are changing.
Infection as a major cause is giving way to non-communicable diseases such as cardiovascular disease and cancer.
There were over 10-million new cancer cases worldwide and over 6-million deaths attributed to This open access article is distributed under a Creative Commons Attribution (CC-BY-NC) 4.
Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
cancer just in 1996 alone.
In 2020 there are predicted to be 20 million new cases and 12 million deaths.
Part of the reason for this is that life expectancy is steadily rising and most cancers are more common in an ageing population.
More significantly, the globalization of unhealthy lifestyles with reference to cigarette smoking and the adoption of many other features of the modern Western diet .
igh fat, low fibre conten.
will increase cancer incidence (Malcolm, 2001.
Ferlay et al, 2017.
GLOBOCAN, 2.
Cancer Around The World Cancer is potentially, a fatal disease caused by environmental factors that mutate genesencoding critical cell-regulatory proteins.
Thus, the resultant aberrant cell behaviour leads to an expansive masses of abnormal cells that destroy surrounding normal tissue and can spread to vital organs resulting in disseminated disease, commonly a harbinger of imminent death (Ikwo, 2.
Cancer is the second leading cause of death in the United States.
About one-half of all men and one-third of all women in the US will develop cancer during their Today, millions of people are living with cancer or have had cancer (Sylla and Wild, 2.
Over the next 15 years, there will be a dramatic increase in the number of people developing cancer and globally, there are over 10-million new cancer cases diagnosed yearly, which will become in the neighborhood of 30-million by the year 2025.
Cancer is now the public most feared disease with billions of dollars are spent annually on research for drugs, cancer charities and governments.
Even so, a cure for cancer appears elusive (Jedy-Agba et al, 2.
Cancer is a complex genetic disease that is caused primarily by environmental Its causative agents .
are present in food, water, chemicals, air, and even in sunlight that people are exposed to.
Since epithelial cells cover the skin, line the respiratory and alimentary tracts, and metabolize ingested carcinogens, it is not surprising that over 90% of cancers occur in epithelia (Fritz et al, 2.
Cancer can also be caused due to certain polyunsaturated fatty acid that generate damaging free radicals.
And the intake of antioxidants that can scavenge these harmful radicals is also a confounding factor.
Reducing infection, particularly in the poorer countries, will lead to reductions in cancer Other infectious agents associated with increased cancer risk include hepatitis B .
, certain subtypes of human papilloma virus .
, bacterium Helicobacter pylori .
and human immunodeficiency virus .
any site.
(Ikwo, 2013.
Doll et al, 1.
The management of patients with cancer is often very costly.
But, huge steps in improving the prognosis of patients with cancer are almost immediately achievable with present-day technology and sufficient financial resource, and all essentially relate to early detection.
Cancer though, does not develop overnight.
It does however, evolve over time with detectable premalignant lesions presaging the development of full-blown malignancy.
Malignant tumors not only invade surrounding tissue.
But, they are able to colonize other vital, organs Ae a process known as metastasis (Bray and Parkin, 2.
Widespread metastatic disease is a harbinger of imminent death.
Thus, immediate referral to the oncologist after detection of any suspicious lump or symptom is paramount.
in many parts of the world with poor health education patients present with very advanced disease.
In the same vein, cancer screening programmes are designed to detect not only early asymptomatic malignant tumors (Curado et al, 2.
But also premalignant lesions.
Even in the richer countries, such programmes are a significant financial burden, and the more cost-effective programmes target the higher-risk groups denoted by age .
screening, mammography, and colonoscop.
or occupation .
blood in the urine of dye workers for bladder cance.
(Mills et al, 2.
Classifying Cancerous Symptoms and Tumors Cancer often presents tumors in both benign and/or malignant cases.
Benign tumors are slow-growing expansive masses that compress rather than invade surrounding tissue.
such they generally pose little threat, except when growing in a confined space like the skull, and can usually be readily excised.
Many so-called benign tumors have malignant potential This open access article is distributed under a Creative Commons Attribution (CC-BY-NC) 4.
Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
especially those occurring in the large intestine.
They should be removed before malignancy Conversely, malignant tumors usually grows rapidly, invading surrounding tissue and colonizing distant organs.
The ability of tumor cells to detach from the original mass .
he primary tumo.
and set up a metastasis .
econdary tumo.
discontinuous with the primary is unequivocal proof of malignancy.
Tumors are also classified according to their tissue of origin.
recognition of the parent tissue in a lymph node metastasis could establish the location of a hitherto undiagnosed primary tumor (Malcolm, 2001.
Ikwo, 2.
Cancers may be classified by their primary site of origin or by their histological or tissue types namely: .
by site of origin yields breast, lungs, prostate, liver, carcinoma .
idney cance.
, oral, brain etc, .
by tissue types into sarcoma, carcinoma, myeloma, leukemia, lymphoma etc, .
by grade abnormality of the cells with respect to surrounding normal tissues, and .
by stage depending on the degree of the tumor size and its spread.
Cancer is caused by the damage in DNA.
These can be inherited via parents, or may be the spontaneous problem that occurs during the lifetime of a person or patient.
This process is usually referred to as mutation (Ananya, 2.
Motivation of The Study Study is motivated (Stolfo et al, 2015.
Ojugo and Eboka, 2021.
Ojugo and Oyemade, 2.
Unavailability of datasets and censored results Ae makes prediction of cancerous studies difficult to assess.
Also, its has been found also that in some instances, the dataset have been found to consist of ambiguities, impartial truth and noise that must be resolved via robust search in the bid to classify observations and expected values effectively.
The non-reliability in performance with selecting network parameters, mismatched feats and anomalies has been attributed cum trigger by non-optimized data and lack of These have resulted in various incorrect prediction of results and allowed for some cases to evade detection via diagnosis.
Eliminating noisy feats via an accurately optimized classifier will thus, foster a more efficient model cum algorithm to aid cancer studies prediction.
Cancer persist even with the adoption of several classifiers available.
Thus, we need to explore parameter selection.
The significance of a unified model capable of addressing optimization problems and machine learning has not been explored, thus the need to explore such model unification.
To overcome these pitfalls, we use a memetic algorithm to detect cancer problematic cases using KDD dataset as generated from Epidemiology Department of the Federal Medical Centre.
Asaba in Delta State of Nigeria.
Materials and Methods Cancer Around The World Machine learning model requires the necessary instances as dataset to effectively train a These instances must be appropriately labeled so as to minimize error rates.
The errors determine how effective a model has progressed in its training, and is able to mine the data features of interest.
Errors in classification can result from group the data points as used to predict new objects in a class (Ojugo and Eboka, 2.
Our dataset is retrieved from the Federal Medical Centre Epidemiology Laboratory in Asaba.
Delta State.
contains 4687-cancer cases with 34-attributes to include patient bio-data, symptoms patient suffers from.
HIV and other tests, history of the disease, diagnostic tools used, treatment that includes regimens for the type of the disease and doses given, with its drug reaction, the follow-up results for the whole treatment period, also costs and hospitalization paid.
However, attributes that are likely to affect the patient behavior towards treatment includes:
chemotherapy completed, treatment failed, treatment discontinued, death, and transferred We use the fuzzy .
ule-base.
universe of discourse, which was used to generate the rules for the proposed fuzzy discourse equation as Eq.
1 (Ojugo and Otakore, 2.
This open access article is distributed under a Creative Commons Attribution (CC-BY-NC) 4.
Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
yaycycycyc yaycnycycuycycycyce = Oc.
a, yaA, ya, ya, y.
O ycU where A,B,C,D,E = picked questions option.
= Assigned questions option fuzzy range value, while X.
= unpicked option Table 1.
Fuzzy Encoded Universe Discourse For Cancer Dataset Code Fuzzy Set (Parameter.
P01
P02
P03
P04
P05
P06
P07
P08
P09
P10
P11
P12
P13
P14
Tumor/Lump Bleeding / hormone imbalances Mouth Ulcer and skin sores Pain and difficulty swallowing Weight loss and poor appetite Fatigue and depression Sweats, burps and tastelessness Nausea and vomiting Diarrhea Constipation Abdominal bloating Neurological symptoms Cardio-vascular symptoms Respiratory Symptoms Membership Function Degree of Cancer Malignant (Type-.
Benign (Type-II) Hybrid Memetic Algorithm (Genetic Algorithm Trained Neural Networ.
Our Modular Neural Network (MNN) as detailed in (Ghazale, 2.
is an improved deep learning neural network with learning that features an independent series of intermediary components Ae forming a module operating under certain architecture.
These intermediary acts as bridge to receive individual network module output as input that helps compute the final output, which is resolved via a tangent activation function.
MNN seeks to reduce large network into potentially, smaller, more managerial network (Aleskerov et al, 1.
enhances efficiency via connected units that exponentially increases, as independent networks are added.
While, this complicates the network structure, it improves computational efficiency with reduced computational time on individual task assigned to segmented modules, and tasks are executed in parallel with module re-organization to improve flexibility and network adaptability (Bolton and Hand, 2.
The network enhances intelligence and increases time efficiency by reducing the networkAos learning time Ae achieved via an independent training algorithms applied at each module with training dataset implemented independently and more quickly.
This makes the model more flexible, adaptable and robust as rules can be re-used independently at various Re-usability of rules has been a tedious experienced with such large and complex With appropriate data encoding and carefully selected feats Ae the network experiences improved performance, compartmentalization via removal of partitions of interfaces, greater flexibility and eliminates redundancy (Brause et al, 1.
Thus, our MNN architecture is one comprised of smaller network.
Ae whose modularization allows for easy learning and understanding of data feats, grants model greater flexibility via task execution parallelism via compartmentalization, eases code reuse, flexibility and adaptability (Burge and Shawe-Taylor, 2.
MNN passes data via task decomposition and training modules via a multi-objective, multi-agent and multi-region support module that aids effective classification.
MNN can be implemented using the multi-layered perceptron, adaptive resonance theory and self-organizing maps.
The network is trained via either the supervised, unsupervised or reinforcement learning (Chiu and Tsai, 2.
Our hybrid is divided into 3-units .
supervised genetic algorithm, .
an unsupervised Kohenen neural network, and .
knowledgebase Ae as seen in figure 1 and described below.
The modular design as in figure 1 shows input is received and passed via GA-block .
onsisting of encoder, selector, swapper recombiner, swapper mutator, and lastly.
This open access article is distributed under a Creative Commons Attribution (CC-BY-NC) 4.
Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
belief terminator for CGA.
Each phase performs and integral GA fundamental operator process to have the dataset trained.
Upon completion of the optimization, the dataset feats are held within the knowledgebase as a special holding place for operational data during machine learning process (Ojugo and Yoro, 2021.
Ojugo and Eboka, 2.
The MNN block receives optimized rule dataset, grouped as successive labeled/unlabeled transactions instances as in fig 1 (Ojugo and Otakore, 2021.
Aleksey and Alexander, 2016.
Phua et al, 2004.
Stolfo et al, 1997.
With this, our classifier propagates IF-THEN transaction rule values of selected, predefined variables into the varying classes for detection.
Rules are modeled as a production system with 4components: .
rule set containing in each rule, the pattern of how rule.
and operation.
are applied, .
knowledgebase of transaction rule-set of .
enuine and cancer classe.
IF-THEN rules as selected data feats, .
control strategy to specify the order in which the rules are compared to those in the knowledgebase to find a match and it seeks also a way to resolve conflicts that arise when several rules are matched at the same time, and .
a rule applier.
The MNN provides a self-learning ability and acts as the principal component analyzer with rules optimized by CGAAos crossover and mutation so that the trained model or network can effectively, autonomously classify transaction into both class types.
Last stage of the network acts as a decision support and recognition system, with predicted values .
and the automatic update of rules-knowledgebase, as transactions are encountered with new data, and thus, classified.
Training GA-Block
Modular NN
Encoder
MLP/BPGD
Assigner
MLP/BPGD
Historic Dataset Pre-processing with Feat Selection MLP/BPGD Selector Operator Input Swapper KnowledgeBase (Optimized Datase.
Testing Activation Function Decision Support System Output Changer Terminator Stored Result Dataset Figure 1.
Hybrid Learning Ensemble The model is initialized with IF-THEN rules, whose fitness is computed.
30-rules are selected via tournament method.
Model uses a 2-point crossover that helps it to learn the dynamic and non-linear feats in dataset.
1-to-30 rules randomly generated via Gaussian distribution and correspond to these crossover points are selected .
ll genes of a single As new parents contribute to yield a new pool of rules with genes of various parents .
pplied via mutatio.
Ae the model selects 3-random genes.
These are then allocated new random values .
etween 0 and .
Ae which still conforms to model belief space.
The random values yield a score, generated for both cancer .
enign and malignan.
and normal rules from the universe discourse equation.
Selection via MNN ensures that first 3-beliefs are met.
mutation ensures the fourth belief is met.
Its influence function determines how the number of mutations, how close a solution is and its impact on how algorithm is This open access article is distributed under a Creative Commons Attribution (CC-BY-NC) 4.
Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
Model stops when best rule has fitness that equals the suspicion score or is higher than computed fitness function of transactions by each cardholder (Stolfo and Prodromidis, 1999.
Syeda et al, 2002.
Vasta et al, 2005.
Wheeler and Aitken, 2019.
Minahan.
Xu et al, 2.
Result Findings and Discussion Result Findings Dataset is divided into a ratio of 40% for training and 60% for testing.
The predictive capability of the model is identified via fifteen-sign abnormalities labelled among GAoptimized .
enign/malignant and norma.
The training phase uses a feedforward training algorithm and approach with an epoch training cycle for each phase until a finite epoch is obtained or an equilibrium is reached.
We obtained an equilibrium at 40-epochs as in the training phase interface with the dots.
Figure 2 shows training phase interface.
figure 3 shows the test phase.
Figure 2.
Training Phase Result Figure 3.
Testing Phase Result From our confusion matrix, we compute: .
sensitivity is the measure of how likely the model will predict the presence of all instances are cancerous .
benign and malignant case.
when it is present, .
specificity measures how likely model will detect the absence of cancerous .
benign and malignant case.
when it is not present and/or not exhibited in the dataset, and .
accuracy measures the proportion of true results seen as the degree of truth of a prediction.
And given by Equations .
Ae .
Sensitivity = TP / (TP FN) Ae where TP = 43, and FN = 5 Thus, we have .
/ .
* .
E [ 0.
895 * .
= 90%.
Specificity = [TN / (TN FP) * .
We have that .
/ .
* .
= 19%.
Accuracy = [(TP TN) / (TP TN FP FN) * .
We also have [.
/ .
3 11 .
] * 100 = 74% The model is found to have a sensitivity of 90%, specificity value of 19% and prediction accuracy of 74% .
with a rate of improvement of 12-percent for data inclusion that were not originally used to train the model as in figure 4.
This open access article is distributed under a Creative Commons Attribution (CC-BY-NC) 4.
Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
SENSITIVITY SPECIFICITY ACCURACY Figure 4.
Statistical Analyses for the Model Performance Discussion of Findings If a patient exhibits at most three symptoms or less of a class of cancer alongside the tumor or lump THEN (C.
it is benign.
Else, if a patient exhibits exactly four symptoms of a class and above, it implies (C.
that though it may be benign for now Ae it is likely to become whereas Ae if the patients is exhibits more than six of the symptoms of a class of cancer THEN (C.
it is malignant.
The IF-THEN Rules generated from fuzzy partitions of classification of either the benign and malignant cases are as thus:
R1: IF patient exhibits tumor or lump and its serious THEN class C 1.
R2: IF patient exhibits tumor or lump and shows bleeding signs and/or hormonal imbalance and both symptoms are serious THEN class C1.
R3: IF patient exhibits tumor/lump, with bleeding and/or hormonal imbalance and mouth ulcer and both symptoms are serious THEN class C1.
R4: IF patient exhibits tumor/lump, with bleeding and/or hormonal imbalance, ulcer and difficulty swallowing and both symptoms are serious THEN C1.
R5-R6: IF patient exhibits tumor/lump, bleeding/hormonal imbalance, mouth ulcer, difficulty swallowing with weight loss and loss of appetite, as well as fatigue/depression and both symptoms are serious THEN class C2.
R7-R14: IF patient exhibits tumor or lump, hormonal imbalance and bleed, ulcer, difficulty swallowing, weight loss and loss of appetite, fatigue/depression, sweats/burps, nausea, diarrhea, constipation, abdominal bloating, cardio-vascular symptoms, neurological symptoms, respiratory symptoms and all symptoms are serious THEN class C3.
Summary and Conclusion Our hybrid memetic algorithm employed the fuzzy universe discourse linguistics and fuzzy system as a preprocessor.
In the design, building and implementing if such hybrid Ae we took cognizance that genetic algorithm will help speed up the ANN to avoid it being trapped at local maxima as well as in region of multi-modal local maxima.
This will enable the model yield robust optima in the shortest amount of time.
The fuzzy system will help better represent variables and data values in the model.
Hybrids are quite difficult to implement and explore Ae even though they always yield optimal and better solutions.
However, care must be employed during parameter selection to avoid over-fitting, over-parameterization and over-training.
Also, the correctly formatted .
historic dataset must be encoded through the underlying algorithmAos structured learning for robustness and code reuse as well as allow for modelAos adaptability and This will in turn help to address the inherent issues of statistical dependencies imposed on the model by the various models fused for hybridization.
However, proper encoding schemes must be selected to help resolve the conflicts in the data feats of interest Ae as most systems may not adequately highlight the implications of such in a multi-agent This open access article is distributed under a Creative Commons Attribution (CC-BY-NC) 4.
Ojugo, et.
ARRUS Journal of Mathematics and Applied Science.
Vol.
No.
https://doi.
org/10.
35877/mathscience614 ISSN : 2776-7922 (Prin.
/ 2807-3037 (Onlin.
and multi-modal populated model.
This is because the agents as they traverse the network or system Ae often can create their own behavioural rules on the dataset used Ae so that in most cases, they display results of complex chaos, non-linearity and dynamism .
s expecte.
of the underlying probabilities of data feats of interest.
To curb this, we employed CulturalGA, which ensures via its belief functions that all conditions to yield better generation is met with the processes of crossover and mutation applied.
References