APTISI Transactions on Technopreneurship (ATT) Vol.
No.
July 2025, pp.
371Oe386 E-ISSN: 2656-8888 | P-ISSN: 2655-8807.
DOI:10.
ye Leveraging A Hybrid Machine Learning Model for Enhanced Cyberbullying Detection .
Muhamad Safiih Lola2 Fenny Syafariani1* Fahana Wan Nasir4 .
Sharifah Sakinah Syed Abd Mutalib3 .
Abdul Aziz K.
Abdul Hamid5 .
Wan Nuraini .
Nurul Hila Zainuddin6 1,2,3,4,5 Faculty of Computer Science and Mathematics.
Universiti Malaysia Terengganu.
Malaysia 6 Faculty of Science and Mathematics.
Universiti Pendidikan Sultan Idris.
Malaysia 1 r.
syafariani@email.
id, 2 safiihmd@umt.
my, 3 s.
sakinah@umt.
my, 4 wannuraini.
fahana@umt.
5 abdulazizkah@umt.
my, 6 nurulhila@fsmt.
*Corresponding Author
Article Info
ABSTRACT
Article history:
Cyberbullying is a form of bullying that occurs through digital technology on various social media platforms.
This issue has become critical, particularly when it involves racial statements that can threaten community harmony.
Many researchers worldwide are working on solutions for automatic hate speech and cyberaggression detection using different machine learning models.
This study aims to introduce a novel hybrid method for detecting cyberbullying, utilizing a combination of Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA), collectively referred to as SVM-LDA.
The methodology involves integrating SVM and LDA techniques.
The models efficiency was assessed using various metrics, offering a comparative analysis of the hybrid model against individual machine learning models.
The results show that the proposed hybrid model achieved 96.
1% accuracy and outperformed single machine learning models on the Twitter dataset.
The hybrid model also demonstrated robustness in handling imbalanced classes for cyberbullying detection.
The proposed SVMLDA hybrid approach shows significant potential in effectively detecting cyberbullying, even in cases of class imbalance.
This model offers a more robust solution compared to traditional single machine learning models in detecting Submission November 12, 2024 Revised April 9, 2025 Accepted April 25, 2025 Published April 30, 2025 Keywords:
Machine Learning Cyberbullying Detection Hybrid SVM-LDA Digital Technology This is an open access article under the CC BY 4.
0 license.
DOI: https://doi.
org/10.
34306/att.
This is an open-access article under the CC-BY license .
ttps://creativecommons.
org/licenses/by/4.
AAuthors retain all copyrights INTRODUCTION Recently, cyberbullying has become a major issue worldwide, especially among young people who frequently use the internet and social media platforms like Facebook.
Twitter, and Instagram .
This phenomenon involves using electronic communication to insult, intimidate, or threaten others, causing significant psychological harm .
Cyberbullying attacks are typically carried out in various ways, such as spreading false information, sharing private content without permission, posting offensive comments, sending threats, or impersonating others.
The internetAos anonymity and wide reach help cyberbullies harass their victims, which can cause severe emotional distress and, in extreme cases, lead to suicide .
, .
, .
To address this growing problem, researchers explored methods to detect and prevent cyberbullying that also have strong relevance to the Sustainable Development Goals (SDG.
, especially in creating a safer and more inclusive digital enviJournal homepage: https://att.
id/index.
php/att ye
E-ISSN: 2656-8888 | P-ISSN: 2655-8807
ronment, which supports Goal 4 (Quality Educatio.
and Goal 16 (Peace.
Justice and Strong Institution.
powerful approach is to leverage machine learning algorithms capable of analyzing vast datasets to uncover relationships and patterns .
However, detecting cyberbullying is still a challenging task, and relying on just one machine learning algorithm is not enough .
, .
In this research, we present an innovative hybrid model that combines SVM with LDA for detecting cyberbullying in online text.
The LDA is a popular statistical tool that often outperforms more sophisticated modern machine learning techniques in several cases, such as remote sensing .
and violence detecting .
Discriminative classifiers aim to create a decision boundary that best separates different classes.
LDA is attractive because it has low model complexity and can capture key data characteristics .
ean and covarianc.
from limited training data, then use these to estimate the decision boundary .
, .
However, it also often used as a feature reduction technique in the preprocessing step for classification and machine learning applications .
Additionally.
SVMs function as discriminative classifiers by utilizing a local separation index, known as the margin .
LITERATURE REVIEW
Cyberbullying on social media platforms, especially Twitter .
ow rebranded as X), poses a significant issue due to its detrimental effects on individuals, particularly the youth who frequently use these platforms .
These platforms facilitate harassment, threats, and humiliation, which can lead to considerable emotional and psychological damage for the victims.
Detecting cyberbullying is a challenging task that involves various considerations, including the language of online interactions, the identities of the message sender and recipient, and other factors.
Relying on a single machine learning algorithm may not be adequate for detecting all forms of cyberbullying .
For instance, while some algorithms may excel at identifying specific types of cyberbullying, others might be more effective with different kinds.
The literature presents a variety of methods based on machine learning, including NaOve Bayes (NB).
Linear Discriminant Analysis (LDA), and Support Vector Machine (SVM) .
, .
The NB Classifier is a probabilistic supervised learning method that mostly uses metrics from training data to determine how likely an item is to belong to a certain class.
The NB classifier is commonly applied in areas such as text classification, sentiment analysis, spam filtering, and recommendation systems.
It assumes that when conditioned on the target class, the features .
r attribute.
are independent.
In other words, given the class variable, the value of one feature does not rely on the value of any other feature .
Additionally.
Table 1 provides a comprehensive comparison of the techniques and evaluation metrics used in previous studies within this domain.
Table 1.
Literature review References Proposed Model Datasets .
Random Forest (RF).
SVM.
Decision Obtained from Twitter Tree (DT), and NB RF with Term Frequency - Inverse Hate Speech Dataset obDocument Frequency (TF-IDF).
NB, tained from an Association SVM.
Logistic Regression (LR), and for Computational LinguisXGBoost Github Gated Recurrent Units (GRU).
Convo- Obtained from Twitter lutional Neural Network (CNN), and Hybrid CNN-GRU NB.
Multilayer Perceptron (MLP).
Obtained from Twitter SVM, and AdaBoost (AB) .
SVM.
RF, and Recurrent Neural Net- Obtained from Twitter work (RNN) .
XGBoost.
SVM.
LR.
NB.
Feed For- Obtained from Reddit, ward Neural Network (FFNN) YouTube.
Twitter, and Wikipedia NB.
RF.
DT.
SVM.
Deep Neural Net- Obtained from Twitter work (DNN) The Best Accuracy APTISI Transactions on Technopreneurship (ATT).
Vol.
No.
July 2025, pp.
371Ae386 APTISI Transactions on Technopreneurship (ATT) .
Long Short-Term Memory (LSTM).
NB.
DT.
LR.
SVM.
RF, and Hybrid
LSTM-CNN
RF.
SVM.
NB.
RNN.
CNN, and Hybrid
RF-CNN
Obtained from Twitter and Facebook Obtained from Twitter and Instagram .
presents an advanced machine learning system designed to automatically identify hate speech within Arabic social media platforms.
This system captures various emotional types and employs diverse feature sets for analysis.
Four machine learning algorithms SVM.
NB.
RF, and DTare applied, utilizing emotionrelated, profile-related, and TF-IDF features.
Among these.
RF with profile-related and TF-IDF features has the highest accuracy of 91.
3% among the tested models.
Similarly, .
focuses on classifying both fake news and hate speech by extracting features from content labeled as real or fake news.
This study employs the XGBoost.
Naive Bayes (NB), and Logistic Regression (LR) algorithms with TF-IDF features.
XGBoost achieves an accuracy of 83.
0%, indicating that 17% of the data with hateful content is misclassified.
Furthermore, the model attains a precision of 82.
meaning 18% of the hateful data is also misclassified.
In .
, the issue of hate speech within the Saudi Twitter sphere is explored through various deep learning methods.
Experiments are conducted on two datasets using.
GRU.
CNN, a hybrid CNN-GRU, and BERT.
Reference .
explores the automatic detection of racism and hate speech in Indonesian tweets by employing various machine learning models.
The models consist of Naive Bayes (NB).
Support Vector Machine (SVM).
AdaBoost (AB), and Multi-Layer Perceptron (MLP).
To mitigate the issue of class imbalance, the study applies the Synthetic Minority Oversampling Technique (SMOTE), and experiments are conducted using features with and without SMOTE.
The MLP model utilizing SMOTE features achieves an accuracy of 4%, while the AdaBoost and Naive Bayes models, using non-SMOTE data, attain an accuracy of 71.
focuses on identifying hate speech in social media information.
In this study, audios were extracted from various videos which then were converted into text using a speech-to-text converter.
The tests use Recurrent Neural Network (RNN).
Random Forest (RF).
Support Vector Machine (SVM), and Naive Bayes (NB) models.
Two experimental settings are used: the first classifies movies as normal or hostile, while the second divides them into normal, racist, and sexist categories.
introduces an innovative system designed to identify hate speech across various social media platforms, including Twitter.
YouTube.
Wikipedia, and Reddit.
This system utilizes a comprehensive dataset where only 20% is labeled as hateful and 80% of the data is labeled as non-hateful.
The study evaluates multiple machine learning algorithms such as FFNN.
LR.
XGBoost.
NB, and SVM and finds that XGBoost achieves the highest accuracy of 92.
Similarly.
Reference .
examines hate speech related to Islam on social media.
This research develops an automated tool capable of classifying content into strong Islamophobic, weak Islamophobic, and non-Islamophobic categories.
Various machine learning algorithms namely RF.
NB.
DT.
LR.
DNN, and SVM are tested, with SVM achieving a 74.
6% accuracy based on 10 fold cross-validation.
A recent study discussed in Reference .
applies a hybrid approach combining LSTM and CNN for text classification tasks.
This research compares several machine learning algorithms, including NB.
DT.
LR.
SVM.
RF, and LSTM, and finds that the LSTM-CNN hybrid model surpasses all others with an impressive accuracy Reference .
proposes another hybrid model for detecting hate speech on social media, exploring various machine learning algorithms such as RF.
SVM.
NB.
RNN.
CNN, and a Hybrid RF-CNN model.
The Hybrid RF-CNN model achieves the highest accuracy of 98.
Based on the success of these hybrid models, this research uses the SVM-LDA hybrid approach to improve the detection of racist comments on Twitter.
In contrast to hybrid approaches such as CNN-LSTM and RF-CNN, the SVM-LDA model combines two more statistical and classical-based techniques.
SVM is an effective machine learning method for classification problems, especially in the context of imbalanced data, as it focuses on separating classes by a maximum margin.
LDA is used to reduce the dimensionality of the data while retaining features that can distinguish classes.
E-ISSN: 2656-8888 | P-ISSN: 2655-8807
RESEARCH METHOD
In this research, three baseline models have been investigated and examined for detecting cyberbullying, namely NB.
SVM, and LDA.
Then, we introduced a hybrid SVM-LDA model.
The detailed explanation is given in the following subsections.
NaOve Bayes (NB) The NB algorithm, a classification technique grounded in statistical and probabilistic principles, was introduced by the British scientist Thomas Bayes.
As a machine learning model, it applies Bayes theorem to predict future outcomes by drawing on past data .
A key characteristic of the NB classifier is its strong yet simplistic assumption that each condition or event is independent of the others.
The dataset has a label, class, or target as a reference .
In a NB classifier, learning is a process that calculates the stochastic value of a Below is the equation for the NB algorithm .
P (H | .
= P .
| H) A P (H) P .
To explain the equation, the data point x belongs to an unknown class, with P .
representing the probability of x.
P (H) denotes the prior probability of the hypothesis H, while P .
| H) refers to the likelihood of x given the hypothesis H.
Additionally.
P (H | .
is the posterior probability of the hypothesis H based on condition x.
For classification, certain rules are required to determine the appropriate group for further examination, as outlined below:
prior y likelihood To summarize, posterior is the probability of class appearance, prior is the class before sample introduction, likelihood is the occurrence of sample features in a class, and evidence is the worldwide emergence of sample characteristics.
The NB algorithm consists of several stages: First, the number of classes or labels (P (H)) is counted, followed by calculating the number of cases for each class (P .
| H)).
Next, all class variables are multiplied, and finally, the results are compared across classes.
The NB classifier is designed to identify the class with the highest probability when assigning test data to the most suitable category.
Each document is represented by a set of attributes, x1 , x2 , .
, xn , where x1 corresponds to the first word, and x1 , x2 , .
, xn represent Tweet categories.
During classification, the algorithm seeks the category with the highest probability (VMAP ) for the documents being tested, as described by the following equation .
, .
Posterior = VM AP = arg max Vjev P .
1 , x2 , .
, xn | Vj ) A P (Vj ) P .
1 , x2 , .
, xn ) .
The value of P.
1, x2,.
,x.
is constant for all categories (V.
Therefore, the equation is as follows:
VM AP = arg max P .
1 , x2 , .
, xn | Vj ) P (Vj ).
Vjev The equation can be simplified into the following:
VM AP = arg max P .
i | Vj ) P (Vj ).
To describe.
Vj is the tweet category, and j is 1, 2, .
, n.
In this research, j1 is in the category of a tweet with negative sentiment, while j2 is the category of a tweet with positive sentiment.
Other than that, j3 is in the neutral tweet category, and j4 is a question sentiment tweet category with P .
i | Vj ) = probability xi in category Vj , and P (Vj ) = probability of Vj .
P (Vj ) and P .
1 | Vj ) are calculated on the training data where the equation is.
P (Vj ) = P .
i | Vj ) = .
nk 1 n .
APTISI Transactions on Technopreneurship (ATT).
Vol.
No.
July 2025, pp.
371Ae386 ye APTISI Transactions on Technopreneurship (ATT) The total number of documents in all categories is denoted as .
, while .
oc j | represents the document count for each specific category j.
Additionally, n refers to the frequency of a word in each category, and nk indicates how often a particular word appears.
Finally, the total number of words across all categories is summed up as .
Support Vector Machine (SVM) The Support Vector Machine (SVM) is a deterministic binary classifier that operates on linear functions within a high-dimensional feature space.
It can distinguish data by defining decision boundaries based on a subset of feature vectors .
The SVM framework relies on optimization algorithms and adheres to the principle of Structural Risk Minimization, which aims to identify the optimal hyperplane for separating two classes in the input data .
, .
min OuwOu2 t yi .
T xi .
Ou 1, .
OAi = 1, .
, n.
The optimization problem Pnin the equation 8 can be solved using Quadratic Programming with Lagrange Multipliers, where w = i=1 i yi xi with only the i values corresponding to data points that meet the hyperplane equality constraint in equation 8 being non-zero.
These i values are also known as support Linear Discriminat Analysis (LDA) The Linear Discriminant Analysis (LDA) classifier, frequently employed in supervised classification tasks, serves as a dimensionality reduction technique .
This method, widely applied in statistics and various other fields, identifies a linear combination of functions that distinguishes or separates objects or events across two or more classes.
It is most commonly used for feature extraction in pattern classification problems.
Simply put, dimensionality reduction techniques are crucial for machine learning applications as they reduce the dimensions .
hat is, variable.
of a particular dataset while retaining most of the data.
The LDA method has been effectively utilized across numerous domains, including face recognition .
, .
, text categorization .
, and gene microarray analysis .
The classical LDA method aims to find an optimal transformation that reduces the distance within the same class while increasing the distance between different classes, leading to effective discrimination.
Mathematically, this involves solving an optimization problem to determine the direction of w*Rd as follows:
wO = arg max wT Sb w wT Sw w Where the covariance between classes.
Sb , and the within-class covariance.
Sw , are defined as follows:
Sb = .
1 Oe m2 ).
1 Oe m2 )T .
X X .
Oe mi )2 .
Sw = iOO.
x Oe1 Here, mi represents the empirical class means of the mapped data.
The matrix Sw Sb can be optiO mized through eigen decomposition to yield the discriminant function w .
The eigenvector associated with the largest eigenvalue determines wO .
After disregarding the scaling factor, w can be expressed as follows .
wO = SOe1 w .
1 Oe m2 ) .
A common issue that arises is when Sw turns out to be a singular matrix.
To address this weakness, one approach is to add a diagonal matrix .
small scalar value multiplied by the identity matri.
to the Sw matrix .
This allows us to obtain the discriminant function wO as follows:
Oe1 wO = (Sw I) .
1 Oe m2 ) .
E-ISSN: 2656-8888 | P-ISSN: 2655-8807 ye Proposes SVM LDA Classifier(SVM-LDA) In this section, we explained the SVM-LDA algorithm.
We begin by discussing cases where the data is assumed to be linearly separable.
Following that, we address scenarios in which the data cannot be separated Linearly Separable Data Cases The goal of SVM is to find a hyperplane f .
= wT x b that divides the data into two classes .
cyberbullying and non-cyberbullyin.
The hyperplane is defined by the weights w and bias b, and it should separate the classes in a way that maximizes the margin .
istance between the hyperplane and the nearest data The objective function of this model is as follows:
w (Sw I) ws,t yi wT xi b Ou 1 OAi = 1, .
, n .
w=0,b, 2 This equation represents the objective function for the SVM-LDA model, which aims to minimize the weighted sum of the dataAos covariance and identity matrix.
Where Sw is the covariance matrix from Equation 10, and I is the identity matrix with dimension p y p.
From the equation above, we can derive:
X X
wT Sw w = wT .
Oe mi ) .
i=1,2 x This equation calculates the spread of data points within each class.
Here, mi represents the mean of class i, and x is a data point.
The term wT .
Oe mi ) measures how far each data point is from its classAos mean.
Just like in SVM.
Equation 13 can be solved using Quadratic Programming with Lagrange Multipliers, where = Sw I:
, b, ) = yi wT x b Oe 1 w w Oe By taking the derivatives with respect to w and b, we obtain the corresponding dual form.
This results in an alternative expression called the primal Lagrange, which integrates both SVM and LDA approaches:
i Oe yi yj i j xTi Oe1 xj 2 i,j=1 yi i = 0, i Ou 0, i = 1, .
, n.
The hyperplane function for the SVM-LDA combination is given by:
= yi i xTi Oe1 x b = 0.
The SVM-LDA formulation outlined in equation 13 is equivalent to the following formulation, and as previously discussed, it can be efficiently addressed using the general SVM approach.
This alignment with the standard SVM method simplifies its application.
The SVM-LDA formulation outlined in equation 13 is equivalent to the following formulation, and as previously discussed, it can be efficiently addressed using the general SVM approach.
This alignment with the standard SVM method simplifies its application.
Ouw Ou w=0,b, 2 yi wT xCi b Ou 1 OAi = 1, .
, n .
Where wC = 1/2 w xCi = Oe1/2 xi for i = 1 .
And = Sw I By substituting Equations 18, 19, 20 into equation 17.
Equation 13 is obtained.
APTISI Transactions on Technopreneurship (ATT).
Vol.
No.
July 2025, pp.
371Ae386 ye APTISI Transactions on Technopreneurship (ATT) Cases Where The Data Are Not Linear Separable In general, it is rare to encounter separable cases.
The common issue in classification problems is dealing with non-separable cases.
For non-separable cases, the goal is to maximize the margin by minimizing classification errors, which is represented using slack variables and denoted as i , commonly referred to as the soft margin hyperplane.
The optimization problem can be written as follows .
, .
w=0,b,C>0 w (Sw I)w C i yi wT xi b Ou 1 Oe i OAi = 1, .
, n i Ou 0 i = 1, 2, .
, n In this context.
C represents a positive regularization parameter, while i indicates the slack variable associated with data point i, corresponding to the training error.
To tackle this problem, which is classified as quadratic programming, available SVM software can be utilized as shown in Section i-B.
The goal of this formulation is to minimize the cumulative training error while maximizing the margin.
C is the coefficient that determines the penalty for classification errors, and i is called the slack variable.
Minimizing C i=1 i means reducing the error during the training process.
The optimization problem in the equation can be solved using Quadratic Programming with Lagrange Multipliers, similar to how it was done in SVM models.
The Data Set We utilized a dataset from .
, curated by Fatma Elsafoury, focusing on online bullying and toxicity.
This dataset comprises information on various cyberbullying detection efforts, gathered from diverse sources.
The dataset, which was acquired from various social media sources like YouTube.
Kaggle.
Wikipedia Talk pages, and Twitter, consists of texts categorized as either cyberbullying or non-cyberbullying.
The dataset consists of a total of 13,471 instances, with a focus on racism-related content.
These instances are categorized into various types of cyberbullying, including hate speech, aggression, insults, and toxicity.
Specifically, the dataset is divided as follows: 45% of the data is related to hate speech, 30% to insults, 15% to aggression, and the remaining 10% to other forms of toxicity.
Table 2 displays a sample of the data utilized.
77E 17
59E 17
Table 2.
Sample text from the dataset Text Muslim mob violence against Hindus in Bangladesh continues in 2014.
#Islam http://t.
co/C1JBWJwuRc @aymannathem As soon as ISIS chased all the minorities out of Mosul, the Sunni Arabs were happy to steal their property.
So fuck them.
User @AAlwuhaib1977 Category Hate Speech Hate Speech/Insult Pre Processing Data The collected data is still unstructured, with the contents of each sentence written in a non-standard This stage will clean the data by removing extraneous characters, converting all data to lowercase, tokenizing, removing stop words, punctuation, lemmatization, and stemming.
Proper preprocessing and cleaning of the document are essential to ensure effective model training.
There are as many as 1970 AuRacismAy labels and 11501 AuNon-RacismAy labels among the 13471 data.
The following are two examples of data before and after preprocessing in Table 3.
There are punctuation marks such as periods (.
) and commas (,) as well as slang words and others.
As a result, data cleaning is performed in such a way that noise-free data is obtained .
E-ISSN: 2656-8888 | P-ISSN: 2655-8807
ye Figure 1.
Calculation Result At the Figure 1 preprocessing stage, several steps are undertaken to clean the data.
Tokenization is the first step, where natural text is divided into tokens by removing white spaces, effectively breaking down sentences into individual words.
Although this process appears simple, determining the appropriate tokens is quite complex.
77E 17
59E 17
Table 3.
Sample text before and after the preprocessing.
User Preprocessing Text Postprocessing Text @AAlwuhaib1977 Muslim mob violence against Muslim mob violence hindu Hindus in Bangladesh con- bangladesh continues islam tinues in 2014.
#Islam http://t.
co/C1JBWJwuRc @Alfonso AraujoG @ardiem1m @MaxBlumenthal It Nothing grandpa inherited has nothing to do with their grand- religion It is inherited with their religion.
Lemmatization, however, takes into account the context of a word and reduces it to its base form.
This process is essential for minimizing the number of unique word occurrences and ensuring that similar words are processed in their canonical form.
Next, stop words are removed because they offer nothing to the machine learning modelAos training and merely increase complexity by expanding the feature space.
Words like AyaAy.
AyamAy, and AyanAy are deleted to boost the modelAos learning efficiency.
Case normalization is then applied to treat words with different cases in the same way, such as AyRacismAy and AyracismAy.
The lemmatization process is more careful as it preserves the meaning of the word in the context of the sentence (Table .
Table 4.
Results of TF-IDF feature extraction on sample tweets.
77E 17
59E 17
Bangladesh Continues Grandpa Hindu Inherited Islam Mob Muslim Nothing Religion Violence Evaluation To evaluate their performance, metrics like accuracy, precision, recall, specificity, and the F1 score were used.
Accuracy = Number of correct predictions Total number of predictions Precision = True Positive True Positive False Positive APTISI Transactions on Technopreneurship (ATT).
Vol.
No.
July 2025, pp.
371Ae386 ye APTISI Transactions on Technopreneurship (ATT) Recall = True Positive True Positive False Negative Specificity = True Negatives True Negatives False Positives F 1 Score = 2 y precision y recall precision recall The performance of a classifier depends on how well it can correctly classify each instance in a dataset, which is evaluated by the ratio of correct predictions to the total number of predictions.
Precision is an essential metric in machine learning, representing the ratio of true positive cases to the total instances predicted as positive by the classifier.
RESULT AND DISCUSSION
For providing the distribution of the dataset, with respect to each class, we present in Figure 2.
Figure 2a shows that the pie chart visually represents the distribution of responses in two categories: non-racism and Each category is represented by a segment of the pie chart.
Most of the tweets belong to the nonracism tweets, the larger blue segment, approximately 85.
4%, and 14.
6% racism tweets, the smaller orange This pie chart provides a quick snapshot of the distribution, highlighting the prevalence of non-racism .
Class Percentages .
Polarity .
Review Length .
Word Counts Distribution Figure 2.
Distribution of tweets sentiments in different classes Figure 2b represents review polarity distribution from the tweets.
The graph compares two sets of The x-axis represents polarity of the tweets ranging from -1 to 1.
The y-axis represents frequency, with values ranging from 0 to 5000.
In non-racism Reviews, the majority of data points cluster around the center .
ear zero polarit.
A significant spike in blue bars occurs at this central point, indicating a high frequency of E-ISSN: 2656-8888 | P-ISSN: 2655-8807 ye non-racist content with neutral polarity .
, .
That Figure 2c histogram represents two categories of reviews, non-racism .
n blu.
and racism .
n orang.
The vertical axis shows the frequency .
umber of review.
, while the horizontal axis represents review length.
Both categories exhibit a roughly bell-shaped distribution which is similar to a normal distribution.
For review length range, most reviews fall within the range of approximately 10 to 80 units on the review length axis.
Non Racist .
Racist Class Figure 3.
Word clouds for .
non racist, and .
racist class In Figure 2d, we present word counts distribution.
From two categories, non-racism and racism, both categories exhibit a roughly bell-shaped distribution.
The peak frequency for both non-racism and racism reviews occurs around a word count of 7-8.
Specifically, more frequent .
igher bar.
in the range of 5 to 15 words in non-racism reviews, but most common word count is around 7-8.
Moreover, word counts distribution for racism reviews has lower frequency overall, but it also peaks around 7-8 words, but with significantly fewer Additionally, we provide Figure 3 that also shows the word frequency in the dataset through word-cloud.
Support Vector Machine (SVM) .
NaOve Bayes .
Linear Discriminat Analysis .
Hybrid SVM-LDA Figure 4.
Confusion matrix belong to .
SVM, .
NB, .
LDA and .
Hybrid SVM-LDA model APTISI Transactions on Technopreneurship (ATT).
Vol.
No.
July 2025, pp.
371Ae386 ye APTISI Transactions on Technopreneurship (ATT) Figure 4 presents confusion matrices denoted 0 as non-racism and 1 as racism that evaluate four models using metrics derived from a matrix encompassing four terms.
True-positive (TP) refers to instances where offensive text is present in tweets, and the model accurately identifies it as such.
False-positive (FP) describes situations where tweets do not contain offensive text, but the model incorrectly predicts them as False Positives (FP) and False Negatives (FN) in cyberbullying detection can result from factors such as ambiguous language, inadequate feature extraction, and data imbalance.
75E 17
62E 17
63E 17
77E 17
76E 17
78E 17
79E 17
52E 17
Table 5.
Example of classification results Text True Class SVM @MaxBlumenthal Yeap, there is only so much bandwidths for self genocidal Jews, and itAos BlumenthalAos turn to be the center of attention.
@TRobinsonNewEra:
http://t.
co/SCPKHxreTP BREAKING NEWS: 25 muslim men charged with sexual offences against two children in Calderdal #ha.
@obsurfer84 The story about her age came from both Aisha and Ursa.
It can be found in both Bukhari and Muslim.
@dankmtl Are you now going to play the ignorant argumentative asshole and pretend there is no Arabian peninsula? @pNibbler @AlterNet @MaxBlumen0 thal They want their own Islamic schools to prevent that kind of education.
@halalflaws @biebervalue @greenlin0 erzjm Because what you think is Islam has no resemblance to the real Islam.
@harmslesstree2 To suggest that Jews of Israel should subject their lives to the same barbarity that the Copts of Egypt live under is insane @ibnHlophe @eeviewonders @anjem0 choudary Murdering Muslims every day is the only way ISIS can keep control.
LDA
SVM-LDA
Based on the four models evaluated, it is evident from model NB in Figure 4b that this model is unable to address the issue of imbalanced datasets, where the FN value is very low at 25, but the FP value is very high This indicates that the NB model is not reliable for detecting cyberbullying, especially in cases of imbalanced classes.
Unlike the new hybrid model proposed.
Figure 4d, which shows smaller FN and FP values of 69 and 25 respectively, compared to individual models such as SVM with the FN value of 154 and the FP of 70 (Figure 4.
, and LDA with the FN value of 110 and the FP of 85 (Figure 4.
This proves that the proposed hybrid model is much better at predicting cyberbullying, even though there are cases of class imbalance in its We show several datasets that can be well classified by the hybrid SVM-LDA model, but other models cannot, as displayed in Table 5.
E-ISSN: 2656-8888 | P-ISSN: 2655-8807
Overall Accuracy .
Class-wise Accuracy Figure 5.
Results of .
overall accuracy, and .
class-wise accuracy values for each model In Figure 5, we visualize the performance of our models, where the hybrid SVM-LDA model stands out with superior results.
Achieving an accuracy of 0.
961 in detecting cyberbullying, this hybrid model surpasses both the SVM and LDA models in terms of accuracy, precision, specificity, and F1-score.
Table 5 further supports these findings by comparing the baseline and proposed hybrid models across several metrics, including accuracy, precision, sensitivity/recall, specificity, and F1-score on the Twitter dataset.
Although the hybrid SVM-LDA model slightly underperforms in sensitivity, with a score of 0.
834 compared to the NB model 0.
it excels in the other indices.
Overall, based on Table 6, the hybrid SVM-LDA model proves to be the most Table 6.
Comparison of SVM.
NB.
LDA, and SVM-LDA
Criteria
SVM
LDA
SVM-LDA
Accuracy Precision Sensitivity/Recall Specificity F1 Score
MAE
MSE
RMSE
MAPE
195E 14 1.
809E 15 1.
452E 14 5.
977E 13
AUC
The NB model has an overall accuracy score of 0.
This poor performance demonstrates the NB model inadequacy in predicting racism and non-racism behaviors.
The relatively low specificity of 0.
indicates that the NB model ability to predict the non-racism category is quite poor, whereas the model ability to predict the racism category is excellent with sensitivity score of 0.
The SVM model results reveal an overall accuracy score of 0.
This model predicts the non-racism category very well, as evidenced by the relatively high specificity of 0.
968 with a sensitivity value of 0.
629 which means that the model is also good at predicting the racism category.
At the same time, the LDA model yields an overall accuracy of 0.
The LDA model ability to predict the non-racism category is also very good, as evidenced by the high specificity of 962, while the sensitivity of model to predict the racism category is 0.
Figure 6 incorporate the Area Under the Curve (AUC) score as part of our evaluation.
The AUC score is widely used for assessing binary classification tasks, such as the detection of cyberbullying on social media.
This metric assesses a classifier overall performance by considering how well it balances the False Positive Rate (FPR) and the True Positive Rate (TPR) across various threshold values.
In cyberbullying detection, the FPR indicates the frequency of non-bullying instances incorrectly labeled as bullying, while the TPR denotes the percentage of genuine bullying cases accurately detected.
By providing a measure of the extent to which the model can distinguish between positive and negative classes.
AUC provides greater insight than metrics such as accuracy, and enables a fairer assessment of the model performance in cyberbullying detection.
The AUC value ranges from 0 to 1.
an AUC of 1 denotes perfect classification where all genuine bullying instances APTISI Transactions on Technopreneurship (ATT).
Vol.
No.
July 2025, pp.
371Ae386 APTISI Transactions on Technopreneurship (ATT) ye Figure 6.
Evaluation metrics each model are accurately detected and no non-bullying examples are misclassified as bullying whereas an AUC of 0.
indicates a classifier that performs at the level of random chance.
A higher AUC value indicates a more effective model in distinguishing between positive and negative samples.
Table 5 displays the AUC values for all models in our investigation, demonstrating that our hybrid SVM-LDA model outperforms the others in detecting cyberbullying on the Twitter platform.
MANAGERIAL IMPLICATION
To summarize, this research compares algorithms in machine learning classification in assessing and detecting racism or non-racism tweets.
The hybrid SVM-LDA model outperforms the NB.
SVM and LDA models in terms of accuracy, precision, specificity.
F1 score, and AUC metrics, particularly when there are imbalance cases in the datasets.
The results also indicate the requirement for NER system that can generate training data automatically and utilizes machine-labeled data to reduce the cost of labeling and address class imbalance in an online context.
Accordingly, this would result in an improvement in the efficiency of the NER system.
Hopefully, this study can contribute to the field of machine learning classification for the NER by providing insights into the performance of different algorithms.
This research introduces a hybrid SVM-LDA approach that presents distinct advantages compared to conventional cyberbullying detection methods.
Although the model proposed in this research is focused on cyber bullying detection, the concepts and architecture used can be extended to various other applications in the field of Natural Language Processing (NLP).
CONCLUSION
Cyberbullying is becoming more prevalent on social media platforms like Twitter, making it crucial to automatically detect and stop it to prevent further spread.
This research focuses on using sentiment analysis to identify both racist and non-racist content.
To address this, we present an innovative hybrid method that combines Support Vector Machine (SVM) with Linear Discriminant Analysis (LDA) for detecting cyberbullying on Twitter.
Our method harnesses the strengths of both SVM and LDA to extract pertinent features from text data.
Extensive testing and assessment have shown that our approach effectively identifies cyberbullying content.
integrating SVM with LDA, our model proficiently analyzes and classifies textual data, offering better performance for cyberbullying detection.
Our innovative SVM-LDA hybrid approach shows considerable potential for detecting cyberbullying even in the case of imbalanced datasets.
By combining these techniques, we have created a robust tool for identifying and addressing this pressing social issue.
While our hybrid SVM-LDA model shows promising results, there are potential limitations to consider, such as handling false positives, which could lead to incorrect classifications of non-racist content as It is crucial to continuously refine these models to minimize errors and ensure fairness, transparency, and accountability in their application.
While our hybrid SVM-LDA model shows promising results, there are potential limitations to consider, such as handling false positives, which could lead to incorrect classifications of non-racist content as racist.
Additionally, future research could adapt this model to handle multilingual datasets or explore its applicability in detecting other forms of online harassment, such as hate speech or gender-based E-ISSN: 2656-8888 | P-ISSN: 2655-8807
DECLARATIONS
About Authors Fenny Syafariani (FS) https://orcid.
org/0009-0006-0905-623X Muhamad Safiih Lola (MS) https://orcid.
org/0000-0001-9287-7317 Sharifah Sakinah Syed Abd Mutalib (SS) Wan Nuraini Fahana Wan Nasir (WN) Abdul Aziz K.
Abdul Hamid (AA) Nurul Hila Zainuddin (NH) https://orcid.
org/0000-0002-3803-4578 https://orcid.
org/0000-0003-1564-5111 https://orcid.
org/0000-0002-3075-7536 https://orcid.
org/0000-0001-9972-7573 Author Contributions Conceptualization: FS.
MS.
SS.
WN.
AA and ID.
Methodology: FS.
MS and WN.
Software: FS.
AA and ID.
Validation: AA.
ID.
WN.
Formal Analysis: FS.
MS and WN.
Investigation: FS.
MS and WN.
Resources: SS.
WN.
AA and ID.
Data Curation: WN.
AA and ID.
Writing Original Draft Preparation: FS.
Writing Review and Editing: MS.
SS.
WN and ID.
Visualization: FS.
MS SS WN.
All authors.
FS.
MS.
SS.
WN.
AA and ID, have read and agreed to the published version of the manuscript.
Data Availability Statement The datasets used to support the findings of this study are available from the direct link in the dataset Funding This work was supported in part by Universiti Malaysia Terengganu (UMT) under Private Partnership Research Grant (PPRG) Vot No.
We thank Universiti Malaysia Terengganu for providing funding support for this project (UMT/PPRG/2022/55.
Declaration of Conflicting Interest The authors declare that they have no conflicts of interest, known competing financial interests, or personal relationships that could have influenced the work reported in this paper.
REFERENCES