TELKOMNIKA Telecommunication Computing Electronics and Control Vol.
No.
April 2026, pp.
ISSN: 1693-6930.
DOI: 10.
12928/TELKOMNIKA.
Identification of paleographic curvature using skeletonization and key point detection Fadhilatul Fitriyah1.
Dian Andriana2.
Muhammad Zulhaj Aliansyah3.
Lukman Hakim1.
Muhammad Faishol Amrulloh1 Department of Informatics Engineering.
Faculty of Engineering.
Universitas Yudharta Pasuruan.
Pasuruan.
Indonesia Research Center for Artificial Intelligence and Cyber-Security.
National Research and Innovation Agency (BRIN).
Bandung.
Indonesia Department of Data Science.
Faculty of Computer Science.
Universitas Pembangunan Nasional Veteran Jawa Timur.
Surabaya.
Indonesia
Article Info
ABSTRACT
Article history:
Jawi script represents a vital component of the Islamic intellectual heritage of the Nusantara, preserved across numerous classical manuscripts.
primary challenge in digitizing these documents is character segmentation, particularly where handwritten characters connect without distinct This research proposes a skeletonization-based segmentation method to address this issue, utilizing a dataset from 17 pages of the AuKitab Syair PerahuAy manuscript containing 269 test characters.
The pre-processing stage involves grayscale conversion, binarization, and noise removal through connected component analysis (CCA).
The segmentation process then integrates skeleton structures, centroid positioning, intersection points, and loop detection.
Evaluation results show the system successfully identified 187 out of 269 characters, achieving an accuracy of 0.
801, a precision of 895, a recall of 86.
38%, and an F1-score of 88.
While these results demonstrate the methodAos effectiveness, the small dataset from a single manuscript limits its generalizability.
Nevertheless, this study establishes a foundational step toward an automated Jawi image-processing system and the digital preservation of Islamic Nusantara literacy, contributing a tailored skeletonization-based approach for Jawi script.
Received Aug 25, 2025 Revised Dec 27, 2025 Accepted Jan 30, 2026 Keywords:
Aksara Jawi Connected component Key points Manuscript Paleography Segmentation Skeletonization This is an open access article under the CC BY-SA license.
Corresponding Author:
Muhammad Faishol Amrulloh Department of Informatics Engineering.
Faculty of Engineering.
Universitas Yudharta Pasuruan Jl.
Yudharta No.
Kembang Kuning.
Sengonagung.
Purwosari.
Pasuruan.
East Java 67162.
Indonesia Email: faishol@yudharta.
INTRODUCTION
Turats Nusantara refers to the Islamic intellectual heritage preserved in manuscripts written in adapted Arabic scripts, notably Pegon and Jawi .
, .
Etymologically, turats .
rom the root wa-ra-ts.
denotes inherited knowledge transmitted across generations .
Beyond mere adoption.
Jawi script represents a sophisticated phonological adjustment of Arabic letterforms to suit regional languages while maintaining complex cursive characteristics .
Despite its historical significance, the digitalization of Jawi manuscripts remains challenging .
major hurdle lies in character segmentation within optical character recognition (OCR) pipelines, as Jawi is inherently cursive and context-sensitive .
, .
Unlike Latin text.
Jawi characters change shape based on their position initial, medial, or final and connect through shared strokes, often lacking clear boundaries .
These traits, coupled with non-uniform diacritics, handwriting variability, and the scarcity of Jawi- Journal homepage: http://journal.
id/index.
php/TELKOMNIKA A TELKOMNIKA Telecommun Comput El Control specific datasets, make segmentation a complex yet vital task for the digital preservation of Malay-Islamic manuscripts .
Previous segmentation studies have primarily focused on Arabic script.
Seam carving approaches can handle touching characters but are highly sensitive to noise and image quality .
Morphological and connected component-based methods perform well on structured text but struggle with handwritten documents due to their complexity and input dependency .
Hybrid deep learning approaches combining convolutional neural networks (CNN.
with thinning or segmentation hypothesis graphs achieve high accuracy but require large annotated datasets and involve significant computational complexity .
summary of representative segmentation and recognition approaches, their datasets, strengths, limitations, and relevance to Jawi script is presented in Table 1.
Research explicitly targeting Jawi script remains limited.
Existing works largely focus on isolated character recognition using Freeman chain code combined with rule-based classifiers or support vector machines (SVM.
, reporting high accuracy under controlled conditions .
, .
However, these approaches do not address the segmentation of connected handwritten characters in authentic manuscript settings.
Table 1.
Related research Reference .
Method Seam carving .
Morphological CC analysis CNN SHG .
This FCC regular
Decision tree
FCC SVM
decision rules Skeletonization Dataset Islamic Educational.
Scientific and Cultural Organization Ae Arabic Database (IESK-ArDB.
Institut fyr Nachrichtentechnik / yOcole Nationale dAoIngynieurs de Tunis (Arabic Handwritten Word Databas.
(IFN/ENIT) Arabic handwritten docs Arabic handwritten Isolated Jawi .
ive writing, 10 user.
) Isolated handwritten Jawi Jawi manuscript AuSyair PerahuAy Strengths Handles Limitations Noise-sensitive.
energy-dependent Relevance to Jawi Arabic-only Effective for structured text High accuracy.
Real-time.
u = High accuracy .
Lightweight.
Quality-sensitive.
multi-stage Complex.
Arabic-only Isolated only.
Android-only Jawi-specific .
Similar-shape Errors on complex Jawi-specific .
Jawi manuscript Arabic-only However, most of these studies remain limited to the standard Arabic script and have not been specifically applied to Jawi, which exhibits distinct structural complexities and character variations.
One of primary challenges in Jawi script segmentation is the process of segments of connected characters, whose ligature structures are often irregular and highly variable.
Therefore, there is a need for an approach capable of accurately identifying character boundaries even under complex connection patterns.
Compared with existing approaches, the proposed method offers a lightweight and interpretable alternative for Jawi character segmentation.
Morphology-based methods perform well on printed text but struggle with irregular handwritten ligatures .
, while seam carving approaches are highly sensitive to noise and image degradation .
CNN-based methods require large annotated datasets and provide limited interpretability, which is impractical for Jawi manuscripts .
, .
In contrast, this study employs skeletonization and keypoint detection using intersection, loop, and centroid cues to robustly segment characters in small and noisy manuscript datasets.
Therefore, an alternative segmentation strategy is required that is lightweight, interpretable, and effective on small, noisy datasets.
This study proposes a skeletonization-based segmentation method that integrates connected component analysis (CCA) with keypoint detection, including centroids, intersection points, and loops, to identify character boundaries within complex ligatures.
Beyond improving segmentation accuracy, the method emphasizes palaeographic curvature characteristics of Jawi script as structural cues for character isolation, supporting manuscript retrieval and digital preservation of Islamic Nusantara heritage.
METHOD
In this research, a Jawi script letter segmentation method is proposed that integrated skeletonization technique to extract the basic structure of character strokes CCA to identified connected area in an image, and bounding box as the spatial boundaries of each component analysed .
This research is composed of Identification of paleographic curvature using skeletonization and key point detection (Fadhilatul Fitriya.
A ISSN: 1693-6930 several main steps, include pre-processing, skeletonization.
CCA, bounding box definition, and character segmentation based on the key points identification.
The overall flow of the character segmentation process in this research is represented in Figure 1.
Figure 1.
Flow of research Dataset The dataset consists of scanned pages of the AuKitab Syair PerahuAy manuscript written in Jawi script .
The scanning process was conducted by the Research Center for Artificial Intelligence and Cybersecurity.
National Research and Innovation Agency (BRIN).
The manuscript comprises 17 pages in PNG format.
Ground truth labeling was performed independently by four santri proficient in Jawi reading.
Disagreements among annotations were resolved through consensus discussions facilitated by the Turats Division of Universitas Yudharta Pasuruan, which also conducted final validation.
This process ensured both technical accuracy and consistency with turats scholarship.
A sample line image is shown in Figure 2.
Figure 2.
Line excerpt from AuKitab Syair PerahuAy page 04 line 15 Pre-processing Pre-processing begins with RGB-to-grayscale conversion to reduce image complexity, followed by binarization using OtsuAos thresholding method .
OtsuAos method automatically determines the optimal threshold by maximizing between-class variance, enabling clear separation between foreground text and The binarization process is defined as:
cEaycnycyc.
, ycnyce yce.
cu, y.
Ou ycN cu, y.
= { 0 .
, ycnyce yce.
cu, y.
< ycN cu, y.
is binarization result at pixel coordinates .
cu, y.
, yce.
cu, y.
intensity value at the pixel position .
cu, y.
in the grayscale image, and ycN is the threshold computed using OtsuAos criterion.
CCA
After binarization.
CCA is applied to remove noise and identify relevant character components based on 8-neighborhood connectivity.
Components are classified according to area thresholds: components larger than 50 pixels are considered main strokes, while components between 5 and 50 pixels are retained as potential diacritics if located near the main stroke.
Smaller or distant components are discarded as noise.
This filtering step preserves essential character structures for further analysis.
The component labeling rule can be expressed as in .
ycAycaycnycu ycycycycuycoyce, ycnyce ycNyca O ya yaycaycayceycoyca = { yaycnycaycaycycnycycnycayc, ycnyce ycNyca Ou ya TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 620-634 TELKOMNIKA Telecommun Comput El Control yaycaycayceycoyca is the label category for component yca, ya represents the component area, ycNyca is the area threshold, e.
A component is considered a main stroke if it is assumed to be part of the character, while diacritics refer to small components such as dots or harakat.
Bounding box In this step, the system identified character components using bounding box analysis derived from the CCA results, which is used to determine the boundaries of each object in the image .
, .
Large components or those with a vertical aspect ratio are assumed to be main strokes, while smaller components that do not meet these criteria are considered as diacritic candidates .
To associate diacritics with their corresponding main characters, the Euclidean distance between the centroid of a small component and the centroid of the nearest main stroke is calculated.
The distance is defined as in .
yce = Oo.
aycc Oe yayco )2 .
aycc Oe yayco )2 .
yce is the Euclidean distance between diacritic component and main stroke, .
aycc , yaycc ) are the centroid coordinates of diacritic component, and .
ayco , yayco ) are the centroid coordinates of main stroke component.
small component is associated with a main stroke as a diacritic if its centroid lies within 60 pixels of the main strokeAos centroid.
Furthermore, the vertical position of the diacritic centroid relative to the main stroke centroid is used to determine whether it is located above or below main stroke .
This classification can be as in .
yaycaycayceycoyc = { Above ycnyce ycycc < ycyco Below ycnyce ycycc > ycyco yaycaycayceycoyc is the label category for component yc, ycycc is the vertical coordinate of the diacritic centroid, ycyco is the vertical coordinate of the main stroke centroid.
If .
cycc < ycyco ), the diacritic is located above the main stroke, and .
cycc > ycyco ), the diacritic is located below the main stroke.
It is important to note that in the digital image coordinate system, the origin point .
, .
is located at the top-left corner, so smaller yc values indicate higher positions in the image.
This makes centroid comparison a reliable method for distinguishing diacritics positioned above or below the main stroke.
Skeletonization Skeletonization is applied to the main stroke components to reduce stroke thickness to a one-pixelwide representation while preserving topological structure .
This step simplifies character shapes and facilitates keypoint detection without losing essential palaeographic features .
Two skeletonization algorithms Zhang Suen and Lee were evaluated.
The Zhang Suen algorithm, a twosubiteration thinning method, was selected due to its efficiency in preserving connectivity and minimizing spurious branches .
, .
The skeletonization process can be mathematically expressed as:
aA) = .
cy OO yaA| ycy OO medial axis.
aA) O .
aA \ .
) remains connecte.
aA) is the binary object, ycy represents a pixel in object yaA, medial axis .
aA) denotes the medial axis of object yaA.
aA) is the set of pixels in object yaA that lie on the medial axis and can be deleted without breaking object connectivity.
This definition forms the foundation of the skeletonization process, ensuring that the topological structure of characters is preserved.
In the Zhang-Suen algorithm, a pixel ycy is deleted if it satisfies the following conditions.
These conditions ensure that the skeleton remains one pixel wide while preserving connectivity.
Delete pixel ycy if:
Oe 2 O N.
O 6 N.
= number of 8 neighbors of pixel ycy Oe Z .
= 1 Z .
= number of transitions from 0 to 1 in the clockwise order of the 8 neighbors Oe ycy2 .
ycy6 = 0 step 1 Oe ycy2 .
ycy6 = 0 step 2 ycy2 .
ycy6 = position of the pixelAos neighbors in the clockwise direction To strengthen the justification for the chosen method, a comparative evaluation was also conducted using the Lee algorithm.
The performance of Zhang-Suen and Lee skeletonization was quantitatively assessed using three metrics: execution time, number of spurious branches, and pixel reduction ratio (PRR).
Identification of paleographic curvature using skeletonization and key point detection (Fadhilatul Fitriya.
A ISSN: 1693-6930 This comparison provides a comprehensive basis for selecting the most suitable algorithm for Jawi character segmentation in this research.
Keypoint detection After skeletonization of the main strokes, structural key points are extracted to support detailed shape analysis and writing direction detection.
The detected key points include start points, end points, intersection points, turn points, centroid points, and loop.
Their definitions are as in:
Start point The skeleton point with one active neighbor in the 3y3 window and located at the top-right position .
aximum column value and minimum row valu.
is defined as the start point.
This point is selected because it represents the natural writing direction of Arabic or Jawi script, which begins from the right.
Mathematically, the definition can be formulated as in:
= 1
ycIycycaycyc ycyycuycnycuyc = ycaycyci ycoycaycu .
Oe yc.
= .
ycy is a skeleton pixel, yccyceyciycyceyce.
represents the number of connected neighbors of pixel ycy and ycu.
Oe yc.
are the ycu and yc coordinate values of pixel ycy in the image coordinate system.
A pixel ycy is defined as a starting point if it is an endpoint, i.
, connected to only one neighbor in the skeleton graph .
= .
Among all such pixels, the one with the largest .
cu Oe y.
value is chosen as the top-right position for traversal.
End point A skeleton pixel with one active neighbor, located at the far left .
inimum column valu.
and bottom .
aximum row valu.
, is defined as the end point.
This point indicates the geometric termination of the character shape.
Mathematically, the definition can be formulated as in:
= 1
yaycuycc ycyycuycnycuyc = ycaycyci ycoycnycu .
Oe yc.
= .
As with the start point, the end point is also a pixel ycy that is connected to only one neighbor .
= .
The difference from the start point is that the end point is located at the bottom-left of the skeleton, determined by a small ycu .
and large yc .
Intersection point A skeleton pixel with three or more active neighbors is defined as an intersection point.
This point represents branching in the character structure, which is typical in complex or curved shapes.
Mathematically, the definition can be expressed as .
Ou 3 An intersection point has high connectivity, as it is connected to three or more neighboring pixels in the skeleton graph.
Turn point A skeleton pixel with one horizontal neighbor and one vertical neighbor is defined as a turn point.
This point corresponds to a sharp change in direction, typically close to a 90A angle, and is often found in curved or angular characters.
Mathematically, a turn point can be defined as .
cy OeycyycnOe1 ) Oo .
cyycn 1 Oeycyycn ) yuE = ycaycuyc Oe1 (Anycyycn ycn OeycyycnOe1 An Anycyycn 1 Oeycyycn An .
If ycyycnOe1 , ycyycn , ycyycn 1 are three consecutive pixels along the skeleton path, then pixel ycyycn is considered a turn point if:
yuE > yuEycEaycyceycEaycuycoycc TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 620-634 TELKOMNIKA Telecommun Comput El Control where yuEycEaycyceycEaycuycoycc is a predefined threshold angle .
ommonly between 30A and 60A) used to detect significant curvature changes.
Centroid point The centroid point represents the average position of all skeleton pixels within a characterAos bounding box.
It serves as a global descriptor of the characterAos position and is also used to establish the relationship between the main stroke and its diacritics.
Mathematically, if there are ycu skeleton points, the centroid coordinates can be computed as:
yaycu = Ocycuyco=1 ycuyco , yayc = Ocycuyco=1 ycyco yaycu and yayc are the centroid coordinates representing the average ycu and yc positions of all skeleton pixels, ycuyco and ycyco are the ycu and yc coordinate values of the yco-th skeleton pixel in the image coordinate system.
Thus, the centroid is defined as ya = .
aycu , yayc ).
This point can be used to normalize the position of each character and as a feature in determining relative positions or for character classification.
Loop A skeleton loop is defined as a closed cycle detected in the character skeleton graph.
To ensure accuracy, the system applies a filtering process that distinguishes true loop from noise or small turns.
Each cycle is evaluated using the Convex Hull method, and only those with a sufficiently large area are classified as valid loop.
Mathematically, loop identification can be expressed as:
yaycuycuycyyc = {Ea OO ycaycycaycoyce_ycaycaycycnyc.
aycuycuycyceycuyaycycoyco(E.
) > yuE} .
ye is the skeleton graph representing the connectivity of pixels.
Ea is a A cycle .
losed pat.
extracted from the cycle basis of graph ye, and ycaycycaycoyce_ycaycaycycnyc.
denotes the set of all fundamental cycles in the skeleton yaycuycuycyceycuyaycycoyco(E.
represents the convex hull formed by the pixels belonging to cycle Ea, and ycaycyceyca.
aycuycuycyceycuyaycycoyco(E.
) is the area enclosed by the convex hull of cycle Ea.
yu is the minimum area threshold used to filter out small or noisy cycles, and Loopyc is the set of valid loop whose areas exceed the threshold yuE.
Where yaycuycuycyyc denotes the set of valid loop, yca represents a cycle from ycaycycaycoyce_ycaycaycycnyc.
, and yuE is the minimum area threshold .
et to 1.
5 in this stud.
Cycles with an area greater than yuE are retained as valid loop, while smaller ones are discarded as noise.
This approach ensures that only loop reflecting true circular character structures are preserved in the analysis.
Character segmentation Character segmentation was performed after identifying key points in the skeleton structure, using three approaches: character segmentation is conducted after skeleton-based keypoint extraction using a threestage strategy.
First, connected component labeling (CCL) is applied to directly segment isolated characters, with centroids used to guide cropping and associate diacritics with the main stroke.
Second, for connected components, skeleton intersection points are employed as structural cut cues, with cuts applied one pixel before the intersection to preserve stroke continuity.
Third, loop .
losed-pat.
analysis is integrated to resolve ambiguities in multi-character connections: cropping is adjusted relative to the intersection position depending on loop location, double cuts are applied for three-character connections, and components without loops and with only two endpoints are retained as single characters.
This combined strategy enables accurate and context-aware segmentation of complex handwritten Jawi characters.
While these segmentation methods .
, connected component, skeletonization, and keypoint detectio.
are not entirely new, the novelty of this work lies in their adaptation to the Jawi script.
Previous studies have provided important insights into Arabic script segmentation, but they remain largely limited to standard Arabic and have not been specifically applied to Jawi.
The Jawi script introduces distinct structural complexities: its characters exhibit more intricate structures, diacritic placement differs, handwriting thickness varies considerably, and ligatures are often irregular.
These characteristics make the segmentation of connected characters particularly challenging, thereby requiring a more robust and adaptive approach to accurately identify character boundaries under complex conditions.
Although this research focuses on character segmentation, the proposed method can be integrated into a complete OCR workflow.
Segmented characters serve as input for later stages such as feature extraction, classification, and post-processing, enabling the recognition of Pegon script in historical This integration highlights the practical application of segmentation methods in broader OCR Identification of paleographic curvature using skeletonization and key point detection (Fadhilatul Fitriya.
A ISSN: 1693-6930 Evaluation This evaluation stage was conducted to measure how well the segmentation method separated Jawi characters correctly.
The evaluation was carried out by comparing the automatic segmentation results generated by the system with the manually defined ground truth data.
This process used several standard evaluation metrics in the field of pattern recognition, namely accuracy, precision, recall, and F1-score .
RESULTS AND DISCUSSION
Pre-processing Figures 3.
illustrates the pre-processing stages, including grayscale conversion, thresholding, and noise removal using CCA.
These steps effectively enhance foreground-background separation and provide a clean basis for segmentation.
However, some diacritics are removed during CCA due to their small size and spatial separation resembling noise, which affects characters that rely on dot patterns.
This limitation indicates that fixed threshold settings are not fully aligned with Jawi paleographic characteristics, suggesting the need for adaptive diacritic-preservation strategies to improve segmentation consistency.
Figure 3.
Step of pre-processing: .
grayscale image result, .
thresholding image result, and .
cleaned thresholding image result Bounding box Bounding box analysis successfully distinguishes main strokes and diacritics, as visualized in Figure 4, preserving their spatial relationships, which are critical for Jawi recognition.
Nevertheless, diacritics located far from the main stroke are sometimes misclassified as noise or incorrectly associated due to overlapping strokes.
This limitation reduces segmentation accuracy for dot-sensitive characters.
Adaptive distance thresholds normalized to character size and the incorporation of Jawi-specific linguistic rules are recommended to improve robustness across handwriting styles and manuscript quality variations.
Figure 4.
Visualization of diacritics and main strokes Skeletonization The skeletonization results are presented in Figure 5, comparing the Zhang-Suen and Lee .
Zhang-Suen Figure 5.
produces thin skeletons and preserves main stroke structures efficiently, but introduces excessive branches that increase visual roughness.
In contrast, the Lee algorithm Figure 5.
generates smoother skeletons with fewer branches, although some fine structural details are lost due to .
Figure 5.
Step of skeletonization: .
Zhang-Suen skeletonization algorithm and .
Lee 94 skeletonization TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 620-634 TELKOMNIKA Telecommun Comput El Control Quantitative results in Table 2 show that Zhang-Suen achieves faster processing time .
28 m.
than Lee .
70 m.
, but produces more excessive branches .
Both methods yield identical PRR values of 31%.
This indicates a clear trade-off between structural detail preservation and skeleton smoothness.
Considering its efficiency and comparable PRR.
Zhang-Suen was selected as the preferred method, with the acknowledgment that branch-pruning post-processing is required to mitigate noise and prevent segmentation Future work should investigate adaptive or hybrid skeletonization strategies to balance efficiency and structural fidelity for Jawi character recognition.
Table 2.
Comparison of skeletonization algorithms Algorithm Zhang-Suen Lee .
Processing time .
Excessive branches PRR (%) Notes Faster, but generates more noise/branches Cleaner skeleton, but slower and loses details Key point detection Key point detection results shown in Figure 6 demonstrate reliable identification of start points, end points, centroids, and intersections, effectively supporting character segmentation.
However, turn points are not successfully detected, as curved strokes are often misinterpreted as intersections.
In practice, this limitation has minimal impact on segmentation reliability because segmentation decisions primarily rely on intersection and centroid points rather than turn points.
No explicit correction strategy was applied at this instead, segmentation robustness was maintained by prioritizing stable keypoints and excluding ambiguous turn-point detections.
This limitation indicates that the current rule-based approach is insufficient to capture curvature variations.
Future improvements should incorporate curvature-sensitive analysis or datadriven methods to enhance the detection of directional changes in Jawi character strokes.
Figure 6.
Key point detection results for the word AuAAyOOA Character segmentation In the character segmentation section, the characters in the figure are arranged according to the natural Jawi reading direction, from right to left.
However, the subfigure labels follow the standard journal convention of left-to-right ordering.
Based on centroid of CCL This section presents the segmentation results obtained using CCL.
The method is designed to separate individual Jawi characters by identifying components without structural connections and extracting them as single units.
To improve segmentation consistency, the centroid of each connected component, representing its spatial center of mass, was used as a reference.
This allows the system to separate the main strokes from diacritical elements such as dots or harakat more precisely.
Figure 7.
shows yaAo.
Figure 7.
shows yaAo and raAo, and Figure 7.
shows dal.
These examples demonstrate that CCL performs effectively on isolated components with no structural overlap.
As shown in Figure 7, single characters were segmented correctly when their structures were not connected, demonstrating the effectiveness of CCL in handling isolated strokes.
However, segmentation failures were observed in tightly connected characters, such as yaAo and raAo (Figure 7.
), where adjacent centroids were too close, causing the system to interpret multiple characters as a single unit.
Identification of paleographic curvature using skeletonization and key point detection (Fadhilatul Fitriya.
A ISSN: 1693-6930 .
Figure 7.
Step of segmentation: .
results of yaAo character segmentation, .
results of yaAo and raAo character segmentation, and .
results of dal character segmentation This highlights a fundamental limitation of centroid-based segmentation: its lack of discriminative power in dealing with the irregular spacing and complex ligatures of Jawi script.
To address this, more context-aware strategies are required.
Potential improvements include normalizing centroid distances relative to character size or integrating additional heuristics, such as skeleton-based intersection points, to guide more accurate segmentation.
Based on centroid from CCL and intersection points Despite the observed improvements, segmentation errors persist in complex character structures.
Figures 8.
show the results of character segmentation using centroid information from CCL combined with skeleton intersection points for different Jawi characters.
In Figures 8.
, characters with serrated or highly curved strokes .
, ya and r.
exhibit over-segmentation, as multiple centroid and intersection points are generated along irregular skeleton paths.
Conversely.
Figures 8.
illustrate cases where smoother ligatures .
, ya and da.
lead to under-segmentation due to weak or missing intersection points in the skeleton representation.
These results indicate that the effectiveness of the proposed approach is highly sensitive to skeleton quality and residual noise, which directly influence the accuracy of centroid detection and intersection point extraction.
Figure 8.
Step of segmentation: .
results of yaAo character segmentation, .
results of raAo character segmentation, .
results of yaAo character segmentation and .
results of dal character segmentation TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 620-634 TELKOMNIKA Telecommun Comput El Control Based on centroid from CCL, intersection points, and loop To further address connected characters, loop features were integrated with centroid and intersection As shown in Figures 9 and 10, this approach improves segmentation accuracy for characters with loop structures .
, baAo, haA.
, outperforming the previous methods in handling complex shapes.
Nevertheless, limitations remain for characters with multiple loops or closely located intersections, as well as for sequences of three connected characters .
, mim-nun-wa.
, where incorrect cut placement may cause structural loss.
These findings indicate that fixed cutting rules are still insufficient for highly complex Jawi Future improvements should incorporate adaptive intersection filtering, branch pruning, and Jawi-specific linguistic constraints to enhance segmentation robustness.
The segmentation results for characters with loop structures are shown in Figures 10.
Figure 10.
shows haAo, and Figure 10.
shows baAo.
These examples demonstrate that the proposed method performs effectively for both independent and connected characters with loop structures.
Compared to the two previous approaches, this method provides improved segmentation accuracy, particularly in distinguishing characters with complex shapes.
Despite its advancements, the current segmentation method faces structural challenges, particularly with characters featuring double loops or intersections located very close to a loop, such as haAo or mim.
these instances, cutting rules often damage the loop area, leading to structural degradation.
Similarly, in complex sequences like mim-nun-waw, the system occasionally fails to position cut points accurately, causing the middle character .
to be lost or incorrectly merged.
To overcome these limitations and enhance robustness against the high variability of Jawi script, future refinements should focus on adaptive intersection filtering, skeleton branch pruning, and the integration of linguistic rules to better preserve character integrity.
Figure 9.
The result of visualization of character segmentation that has loop part baAo and haAo .
Figure 10.
Step of segmentation: .
results of haAo character segmentation and .
results of baAo character Evaluation The evaluation stage focused on comparing the systemAos character segmentation results against a manually verified ground truth (GT) from the AuKitab Syair PerahuAy manuscript.
Using a sample of 10 manuscript lines, the analysis was conducted on a per-line .
basis to ensure accuracy.
From a total of 269 ground-truth characters, the system generated 236 segments, of which 187 were true positives (TP).
The Identification of paleographic curvature using skeletonization and key point detection (Fadhilatul Fitriya.
A ISSN: 1693-6930 remaining data recorded 17 false positives (FP) and 31 false negatives (FN).
While the full evaluation utilized all 10 lines, the segmentation process is visually demonstrated through four selected examples in Figures 11 to 14.
Figure 11.
Page p01-lineimg 12 Figure 12.
Page p06-lineimg0 Figure 13.
Page p06-lineimg5 Figure 14.
Page p08-lineimg0 To provide a more comprehensive view of the evaluation.
Table 3 presents the detailed segmentation results for each manuscript line, including the number of GT characters, detected (DT) characters, correctly segmented characters TP.
FP, and FN, along with their corresponding accuracy, precision, recall, and F1-score.
Table 3.
Detailed evaluation results of segmentation system
Page-line P01-12
P06-0
P06-5
P08-0
P09-15
P12-4
P13-0
P13-12
P14-4
P14-7
Average Accuracy Precision Recall F1-score The evaluation results in Table 3 demonstrate robust performance, with an accuracy of 0.
801 and a high precision of 0.
A recall of 86.
38% and an F1-score of 88.
91% further confirm the systemAos reliability in balancing detection and accuracy.
However, performance inconsistencies persist, with per-line accuracy ranging from 0.
75 to 0.
these are primarily caused by under-segmentation in looped characters and over-segmentation in dense connections.
Given the small, homogeneous dataset and the absence of crossvalidation, these findings should be considered preliminary.
Future work should focus on adaptive strategies such as skeleton pruning and curvature analysis-integrated with hybrid machine learning approaches and expanded datasets to enhance generalizability across diverse manuscripts.
The results highlight both the systemAos strengths and its limitations, particularly its inconsistent performance with dense ligatures and looped structures.
Furthermore, a direct comparison with other segmentation systems was not feasible due to the restricted sample size and the absence of publicly available Jawi script benchmark datasets.
Consequently, these findings serve as an initial baseline, with future efforts aimed at incorporating broader datasets to enable formal benchmarking against alternative segmentation CONCLUSION This study developed a Jawi character segmentation system for the AuSyair PerahuAy manuscript, integrating CCA, skeletonization, and keypoint detection.
The system achieved an accuracy of 0.
801, a precision of 0.
895, a recall of 86.
38% and an F1-score of 88.
91%, proving its effectiveness in handling complex handwritten forms.
However, limitations remain: the small, single-manuscript dataset restricts generalizability, while segmentation errors persist in dense ligatures and looped structures, highlighting the need for further refinements to capture the full variability of Jawi handwriting.
TELKOMNIKA Telecommun Comput El Control.
Vol.
No.
April 2026: 620-634 TELKOMNIKA Telecommun Comput El Control Future research should expand the dataset to encompass diverse handwriting styles and manuscripts to enhance system robustness.
Leveraging deep learning architectures, such as lightweight CNNs or transformers, could significantly improve accuracy for complex or degraded texts.
Furthermore, a hybrid framework incorporating keypoint detection would bolster adaptability.
Ultimately, integrating this method into a complete OCR pipeline supported by mobile or hardware-based digitization tools is a vital step toward scalable transcription and the digital preservation of Jawi manuscripts.
FUNDING INFORMATION
The authors are grateful for the funding support from Universitas Yudharta Pasuruan and Research Center for Artificial Intelligence and Cybersecurity.
National Research and Innovation Agency (BRIN) Bandung which has contributed to the implementation of this research.
AUTHOR CONTRIBUTIONS STATEMENT
This journal uses the Contributor Roles Taxonomy (CRediT) to recognize individual author contributions, reduce authorship disputes, and facilitate collaboration.
Name of Author Fadhilatul Fitriyah Dian Andriana Muhammad Zulhaj Aliansyah Lukman Hakim Muhammad Faishol Amrulloh C : Conceptualization M : Methodology So : Software Va : Validation Fo : Formal analysis ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue ue I : Investigation R : Resources D : Data Curation O : Writing - Original Draft E : Writing - Review & Editing Vi : Visualization Su : Supervision P : Project administration Fu : Funding acquisition CONFLICT OF INTEREST STATEMENT Authors state no conflict of interest.
ETHICAL APPROVAL
Not applicable.
This study did not involve human or animal subjects.
DATA AVAILABILITY
The data that support the findings of this study consist of scanned images of AuSyair PerahuAy manuscripts and their labeled datasets.
This data is openly available on the BRIN RIN Dataverse https://data.
id/dataset.
xhtml?persistentId=hdl:20.
12690/RIN/1ESGFP.
REFERENCES