Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
2 November 2025, pp.
DOI: 10.
51578/j.
e-ISSN 2722-1679 p-ISSN 2684-9135 https://jurnal.
id/index.
php/akmi Predicting Container Delivery Dates Using Machine Learning Techniques:
A Regression Approach Godwin Nwachukwu NKEM 1*.
Adedeji Daniel GBADEBO2 Operation Department.
Lekki Freeport Terminal.
Lagos.
Nigeria Department of Accounting Science.
Walter Sisulu University.
Mthatha.
South Africa.
*e-mail : nkem.
godwinnwachukwu@gmail.
email: agbadebo@wsu.
Article Info Keywords:
Seaport.
Container Terminal.
Inland Delivery.
Machine Learning.
Random Forest Model Received: 2025-06-08.
Reviewed: 2025-09-15.
Revised: 2025-10-28.
Accepted: 2025-10-29.
Published: 2025-11-29 Abstract Inland container delivery constitutes a critical component of the global maritime logistics chain, acting as the final phase that connects international ports to inland Accurate prediction of inland container delivery times is crucial for enhancing operational efficiency, minimizing demurrage and detention costs, and improving customer satisfaction across global supply Purpose Ae.
This study leverages historical container movement data across key international ports to develop a robust machine learning model for predicting inland container delivery timelines.
Methodology Ae.
Using a Random Forest Regressor, the model was trained to forecast the total inland delivery time based on features such as container size, type, shipping line, dispatch weekday, and temporal patterns.
Findings Ae The findings have practical implications for shipping lines, freight forwarders, port authorities, and inland terminal operators seeking to optimize logistics planning, reduce uncertainty, and improve supply chain.
Evaluation of the model's performance yielded a Mean Absolute Error of 4.
59 days, a Root Mean Squared Error 55 days, and a coefficient of determination of 0.
indicating moderate predictive accuracy.
Supporting visualizations - including learning curves, gain curves, feature importance plots, residual distributions, and prediction bands - illustrate the model's strengths and areas for further refinement.
Originality Ae The study contributes to the growing field of intelligent logistics and maritime informatics by providing a data-driven framework for improving inland delivery predictability.
INTRODUCTION
Globalization is not just a trend but a force that extends local supply chains across boundaries, catering to customersAo demands worldwide and changing the landscape of trade and commerce.
With increasing globalization, many products and operations are outsourced and moved across countries (Yu, *Corresponding Author This is an open access article under the CC BY-SA license Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 , & Qi.
The impact of transportation on the supply chain has always been crucial and is ever-expanding (Viellechner.
, & Spinler.
In international logistics, the timely delivery of containers is essential for maintaining supply chain efficiency.
Delays in shipment can lead to significant financial losses and operational disruptions.
Inland container delivery represents a critical segment in the end-to-end supply chain, acting as the final leg in maritime container transportation after discharge at international ports (Khiari.
, & Olaverri-Monreal.
Once containers arrive at major global maritime gateways such as the Port of Singapore.
Port Klang (Malaysi.
Lekki Deep Seaport (Nigeri.
, or the Port of Rotterdam (Netherland.
, they are typically moved inland to final destinations like bonded terminals, inland container depots, warehouses, or customer premises (Giallombardo et al.
, 2.
These movements occur via various transport modes, including road, rail, and barge, depending on the regional infrastructure and logistics planning (Kavirathna et al.
, 2.
This inland leg is often subject to significant variability due to port congestion, customs clearance delays, road conditions, and terminal handling inefficiencies.
As such, timely and reliable inland container delivery is essential to ensure predictability in supply chain operations, minimize additional costs .
, demurrage and detentio.
, and maintain customer satisfaction (Hanedar et al.
, 2.
Effective inland container tracking hinges on robust data integration from multiple touchpoints, including port systems, shipping lines, inland depots, and transportation providers.
This data integration is not just a technical requirement, but a crucial aspect of supply chain management (Padi.
, & Setty.
However, in many cases, real-time visibility is limited, and processes remain fragmented.
This lack of synchronization creates uncertainty and operational inefficiencies, highlighting the need for comprehensive data integration (Padi.
, & Setty.
Traditional methods for estimating delivery times rely on historical averages and manual adjustments, which lack precision.
However, recent advancements in Machine Learning (ML) offer the potential to revolutionize this process.
ML can outperform conventional approaches by creating dynamic, data-driven predictive models, leading to more accurate and reliable delivery time estimates (Notteboom, , & Rodrigue.
, 2008.
Darendeli et al.
, 2.
This paper explores the application of regression techniques to predict the actual delivery dates of containers, demonstrating the potential impact of ML on the supply chain.
The dataset used in this research captures detailed information about the inland movement process, recording the progression of each container from the Place of Dispatch through the Port of Loading.
Port of Discharge, and the Post-Port of Discharge phase until it reaches its final Delivered Date.
Time intervals such as "Dispatch to Loading," "Loading to Discharge," and "Post-Discharge to Delivery" provide crucial insight into operational bottlenecks, regional performance disparities, and transit time While applying ML in predicting inland delivery times offers significant benefits, it also comes with challenges.
These challenges can range from data quality issues to the need for specialized skills and resources.
ML can offer a robust solution by addressing these challenges (Jonquais.
, & Krempl.
By leveraging historical data and incorporating factors such as container size, type, shipping line, and dispatch timing, these models can improve decision-making, enable proactive scheduling, and reduce the overall cycle time from port to consignee .
e Araujo.
, & Etemad.
With major ports such as Singapore.
Klang.
Lekki Deep Seaport, and Rotterdam functioning as key transshipment and entry points for global cargo flows, the relevance of efficient inland logistics becomes even more pronounced (Yu.
, & Qi.
Enhancing container delivery prediction and tracking in such corridors contributes to the performance of individual supply chains and the competitiveness of broader trade facilitation and logistics (Hathikal et al.
, 2.
In the literature, extensive research exists for improving the operations of inland delivery efficiency from port of discharge in logistics from various perspectives via ML techniques that use extensive data.
Therefore.
ML is considered one of the most essential elements of Big Data Analytics (Jonquais.
, & Krempl.
, 2.
The foundational work of Notteboom & Rodrigue .
examines the functional and spatial development of inland terminals and container ports.
The effectiveness of container flow, according to the author, depends on inland logistics networks, especially dry ports and intermodal links.
They created a conceptual model that highlights elements such as port congestion, transportation bottlenecks, and Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 regulatory impediments to represent the increasing complexity of hinterland logistics.
Their results lend credence to the notion that inland variables, such as train schedules, terminal dwell durations, and the quality of regional infrastructure, must be included in prediction models for inland container delivery, in addition to sea data.
This study establishes the foundation for determining important operational variables that ought to be fed into ML models, even if it does not employ ML itself.
Kavirathna et al.
introduce a multi-criteria decision-making framework for evaluating container terminal performance based on operational KPIs such as berth occupancy, crane productivity, truck turnaround time, and gate throughput.
While the authors used quantitative analysis and simulation techniques rather than ML, the amount of operational data they examined is extremely useful for delivery prediction.
Their analysis demonstrates that internal port processes, such as poor stacking and delayed customs procedures, have a significant impact on final delivery timeframes.
Integrating such insights into ML models, particularly Random Forests or Gradient Boosting, which handle structured tabular data well, can significantly improve forecast accuracy.
Khiari & Olaverri .
investigated various ML models for predicting delivery times in postal services in their paper AuBoosting Algorithms for Delivery Time Prediction in Transportation Logistics".
The study compared linear regression, bagging, and boosting algorithms, including LightGBM and CatBoost.
The results showed that boosting algorithms beat others in terms of accuracy and runtime LightGBM and CatBoost, in particular, demonstrated improved performance, suggesting that they are well-suited for predicting real-time delivery times in logistics operations.
Theofanis & Boile .
analysed inland container logistics from a policy and infrastructural The evaluation of performance measures for inland terminals highlighted the necessity for public-private collaboration to enhance container flow efficiency.
Their studies indicated that insufficient data sharing among stakeholdersAiterminal operators, transporters, and customsAi generates considerable difficulties in delivery estimation.
Although they did not employ ML methods, their policy analysis guides the development of intelligent transport systems (ITS) and digital logistics platforms that may ultimately provide the requisite real-time data inputs for ML models.
They contend that predictive systems should be engineered to interact with human decision-makers, including logistics planners and port officials.
Giallombardo et al.
examine predictive analytics in maritime logistics, with a focus on terminal operations and the integration of hinterland transportation.
The authors demonstrated, through simulation and predictive modeling, utilizing neural networks and decision trees, that dwell time at the port is a critical predictor of inland delivery delays.
They also evaluated models that included external variables, such as vessel schedule reliability and carrier-specific behaviors.
A significant addition is the stratified approach to modeling - initially forecasting port performance and subsequently integrating that output into inland prediction models.
This complex structure reflects the principles of Random Forests or deep learning ensembles, highlighting the sequential nature of delay propagation.
Their findings support the use of modular predictive systems, each trained on a distinct transport layer.
The effectiveness of ML models, particularly ensemble techniques such as Random Forest and boosting algorithms, in forecasting delivery times and maritime logistics expenses is demonstrated by these studies collectively.
Prediction accuracy is further enhanced by incorporating real-time contextual data, such as traffic and weather conditions.
Other models, such as LightGBM and deep learning architectures, may perform better based on the application and data properties, even if Random Forest offers resilience and interpretability.
RESEARCH METHOD
The paper implements ML methods to predict the inland destinations covering shipment activities in the United States.
We use real-world operational records for maritime container shipments to model inland delivery times.
One of the primary challenges encountered during the project was the limited availability and transparency of data within the maritime shipping industry.
Despite the global importance of container transport, public datasets remain scarce, outdated, or incomplete.
documented in the Review of Maritime Transport 2016 by UNCTAD, both port authorities and private Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 shipping lines often withhold data due to commercial confidentiality and strategic interests (Romeo.
Nonetheless, for this project, relevant shipment records covering the years 2021 and 2022 were The data tracks containers from major Asian ports, including Yantian (Chin.
Shanghai (Chin.
, and Vung Tau (Vietna.
, to inland destinations in the United States, such as Dallas and Gainesville, with Long Beach.
California, as the primary U.
port of discharge.
Each containerAos journey was captured through key timestamps: dispatch, port loading, port discharge, post-discharge, predicted delivery, and actual delivery dates.
From these timestamps, intermediate durations and total transit time were derived.
This aggregated duration served as the target variable in the regression Figure 1 describes the data handling processes.
The dataset was pre-processed to clean missing and transform raw timestamps into proper time intervals representing each container journey segment.
Feature engineering, transforming raw data into a more valuable and efficient format for ML, was employed to capture temporal dynamics and geographic transitions.
This allowed the model to account for complex relationships such as port congestion, inland travel variability, and multimodal handling The dataset was kept intact by handling missing values, such as NaN.
Blank values, null, or Continuous zeroes, by substituting them with the corresponding feature mean.
To enhance model performance and lower computational complexity, redundant or unnecessary features were eliminated from the dataset.
A Standard Scaler was used in data cleaning to achieve a standardized distribution with a mean of zero and a standard deviation of one.
It standardizes features by subtracting the mean value from the feature and dividing the result by the feature's standard deviation.
Alternatively, these missing values can be handled by using techniques such as imputation .
replacing missing values with the mean, median, or mode of the featur.
, deleting rows or columns with missing values, or advanced imputation methods, such as K-nearest neighbours (KNN) imputation.
Moreso, a pre-processing procedure called "categorical encoding" was used to translate categorical data into a numerical representation that ML algorithms can understand.
This categorical encoding involves converting categorical variables into a numerical format that ML algorithms can use.
Standard encoding techniques include one-hot encoding, label encoding, and target encoding.
This step is essential because most ML algorithms require numerical input data.
Categorical variables must be encoded for our ML models to properly use numerical input, which is how these models are trained.
use method of one-hot encoding to encode categorical values.
The method set up binary dummy variables inside the categorical variable for every category.
A binary column with a value of 1 denoting the presence of a category and a value of 0 denoting its absence is used to represent each category.
Figure 1: Data Handling Process Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 Prediction Models Predictive modelling was conducted using several ML algorithms.
Logistic regression, decision trees, support vector machines.
Nayve Bayes, and k-nearest neighbours were initially evaluated for suitability (Hathikal et l.
,2.
Logistic Regression, although widely used for classification, was not ideal for this regression-based task.
It assumes a linear relationship between the log-odds of the outcome and the independent variables.
While logistic regression performs well on categorical outputs, it is limited when dealing with non-linear and continuous target variables such as delivery time .
The logistic function is mathematically represented as:
u0 yu1 ycU) ycy.
cU) = .
1 yce .
u0 yu1 ycU) ycy.
cU) .
Oeycy.
cU)) = yce .
u0 yu1 ycU) .
cU) ) = yu0 yu1 ycU cU) Oe To better model complex, non-linear interactions present in the data.
Random Forest Regression was selected for the final implementation.
Random Forest is an ensemble learning method that constructs multiple decision trees and averages their outputs to improve accuracy and reduce overfitting.
It efficiently handles high-dimensional, noisy, and heterogeneous data, and can model intricate feature interactions without requiring extensive preprocessing.
The mathematical representation of the Random Forest regression model is:
ycI = Oc Eayca .
cU) yaA yca=1 Where: ycI is the predicted value, yaA is the total number of decision trees in the forest, and Eayca .
cU) is the prediction from the yca-th decision tree for input ycU.
Each tree Eayca is trained using a different bootstrap sample and a random subset of features, allowing the best to learn diverse patterns and reducing the likelihood of overfitting.
Evaluation Metrics Three statistical measures (Mean Absolute Error.
Root Mean Square Error and Coefficient of Determinatio.
were used to evaluate prediction accuracy, assuming actual values a1, a2 , a3.
A an and predicted values p1, p2 , p3.
A pn.
These metrics are: Mean Absolute Error (MAE): Measures the average magnitude of errors in a set of predictions, without considering their direction.
ycAyaya = Oc .
cyycn Oe ycaycn | ycu ycn=1 Root Mean Square Error (RMSE): Penalizes larger errors more than MAE and gives a better sense of prediction reliability.
ycIycAycIya = oc ycn=1 .
cyycn Oe ycaycn )2 ycu Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 Coefficient of Determination ( ycI 2 ): Represents the proportion of variance in the dependent variable that is predictable from the independent variables.
ycI2 = 1 Oe ycIycIya ycIycIycN Where: ycIycIya is sum of squared errors.
ycIycIycN is total sum of squares.
and ycI 2 is coefficient of determination.
RESULTS AND DISCUSSIONS
The outcome shows that the evaluation of the Random Forest regression modelAos performance yielded a Mean Absolute Error (MAE) of 4.
59 days, a Root Mean Squared Error (RMSE) of 10.
55 days, and a coefficient of determination of 0.
68, indicating moderate predictive accuracy.
Figure 2: The residual of the model performance Source: Authors .
Figure 3: The model performance Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 Source: Authors .
Figure 4: The residuals distribution of the model performance Source: Authors .
The Random Forest regression model is employed to forecast the duration of inland container The performance and limitations of the model are illustrated in Figures 2, figure 3, and figure In general, the residual of the model performance plot, which compares predicted values to residuals, demonstrates that many predictions are tightly clustered around zero, suggesting that the model is performing well.
In certain instances, residuals are substantially positive, indicating that the model may be underestimating delivery times.
These outliers likely reflect exceptional delays due to rare or unmodeled operational disruptions, highlighting the need for caution when interpreting predictions that fall near the modelAos performance boundaries.
These outliers likely indicate significant delays resulting from infrequent or unaccounted operational disruptions, underscoring the necessity for prudence when interpreting predictions that approach the modelAos performance limits.
The model performance graphs underscore the modelAos constraints.
Despite high and steady training scores, the cross-validation scores are low and relatively flat, suggesting that the model is overfitting: it excels on the training data but fails to generalize to unseen data.
With more training examples the fit time increases as expected, but the gain in model performance plateaus early, showing limited benefit from additional training data under the current feature setup.
This raises concerns about the modelAos robustness in operational settings, especially under variable or evolving conditions.
Lastly, the residuals distribution of the model performance plot demonstrates a steep peak near zero with a long right tail, suggesting that, despite the majority of predictions being in close agreement with the actual values, a small but significant subset of cases involves significant underestimation.
This skewness suggests a potential operational risk, as delays may be systematically underestimated in specific scenarios.
The Random Forest model is reasonably practical for general delivery duration estimation, as indicated by these plots.
However, its operational deployment should be supplemented with cautionary mechanisms, such as the integration of additional context-specific features to improve generalization or the handling of exceptions for high-risk predictions.
Figure 5 compares the observed delivery times and the values predicted by a Random Forest regression model.
Each point on the plot corresponds to a container shipment, with the x-axis representing the actual number of days taken for delivery and the y-axis representing the modelAos predicted delivery days.
A dashed diagonal line indicates perfect predictions, where the predicted values match the actual ones.
Visually, the actual vs predicted delivery days plot demonstrates that most points are closely clustered around the diagonal, suggesting that the model performs well in estimating delivery durations across most shipments.
This alignment indicates a strong predictive ability of the Random Forest model, particularly in capturing the core dynamics influencing container delivery time.
However, there are a few noticeable deviations from the diagonal, representing instances where the model either overestimated or underestimated delivery duration.
These outliers may be attributed to unanticipated Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 real-world disruptions such as port congestion, customs clearance delays, weather-related issues, and familiar sources of variability in maritime logistics Figure 5: Actual vs Predicted Delivery Days (Random Fores.
Source: Authors .
In practical terms, this plot provides strong evidence that the model is suitable for generating estimated times of arrival (ETA.
that stakeholders in the maritime supply chain, such as shipping lines, port authorities, and freight forwarders, can rely on for planning purposes.
By improving the accuracy of ETA predictions, organizations can better allocate port resources, manage container yard operations, and provide more accurate delivery forecasts to customers, thereby improving overall supply chain visibility and efficiency.
The Cumulative Gain Curve for Long Delivery Prediction in Figure 6 provides insight into the modelAos ability to identify containers that are likely to experience high delivery delays.
On the x-axis, the graph plots the proportion of containers sorted by predicted delivery delay.
At the same time, the yaxis reflects the cumulative proportion of actual high-delay containers captured by the model.
The blue curve represents the performance of the trained model, whereas the diagonal grey line represents a baseline random model, which assumes no learning or predictive capability.
This visualization shows that the model significantly outperforms random selection.
For example, by examining just 20% of the containers predicted to have the highest delivery times, the model successfully captures around 7080% of all containers that experienced long delivery delays.
This steep early rise in the gain curve indicates that the model is highly effective in prioritizing the riskiest shipments, those most likely to be delayed.
From a maritime logistics perspective, this is especially valuable.
Port operators, logistics planners, or freight forwarders can focus their monitoring and contingency planning on a smaller subset of containers the model flags as high risk.
This proactive focus enables better resource allocation, enhances customer communication, and facilitates more reliable scheduling.
The ability to preemptively identify problematic deliveries before they occur can significantly reduce bottlenecks and improve supply chain resilience.
Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 Figure 6: Cumulative gain curve for long delivery prediction Source: Authors .
Figure 7.
Delivery correlation heatmap Source: Authors .
Figure 7 is the delivery correlation heatmap.
It provides an intuitive overview of how different stages and attributes in the maritime logistics process relate, particularly in terms of total delivery time (Dispatch to Deliver.
Each cell in the heatmap represents the Pearson correlation coefficient between two variables, ranging from -1 .
erfect negative correlatio.
to 1 .
erfect positive correlatio.
, with color coding making it easier to spot strong relationships.
One notable insight from the heatmap is the strong positive correlation .
between Loading to Discharge and Dispatch to Delivery, indicating that delays in the main sea leg of the journey .
rom port of loading to port of discharg.
significantly Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 contribute to the total delivery delay.
This finding is particularly relevant for maritime operations, underscoring the importance of maintaining reliable ocean transit and adhering to schedule integrity.
Conversely, as expected.
Dispatch Month and Dispatch_Week exhibit a robust correlation with each other .
= 0.
, since they are derived from the same timestamp.
However, these time-based features have weak correlations with delivery delays, suggesting that seasonality alone is not a dominant driver of delivery performance in this dataset.
Interestingly, there is a perfect negative correlation .
between Discharge_to_PostDischarge and PostDischarge_to_PredictedDelivery, which may indicate a data or modeling artifactAipossibly due to how predicted and actual durations are computed or imputed.
This warrants a deeper review to ensure no inverse relationships were introduced through feature engineering.
The delivery correlation heatmap helps stakeholders identify which stages of the delivery pipeline most influence overall performance, allowing targeted interventions such as reducing discharge-port dwell time or improving vessel scheduling accuracy to improve overall supply chain efficiency.
Figure 8.
Interval bands for the delivery duration Source: Authors .
The prediction interval bands for the delivery duration in Figure 8 provide a compelling view into the uncertainty associated with the modelAos predictions across all delivery duration estimates.
In this plot, the black line represents the actual delivery durations, sorted in ascending order, while the blue line denotes the modelAos predicted durations.
The shaded area represents the 95% confidence interval, indicating the range within which the model predicts the actual delivery duration will fall with high This visualization is especially valuable in maritime logistics, where unpredictable delays due to port congestion, weather events, labour issues, or customs procedures can significantly affect delivery The wider bands seen toward the higher end of the sorted deliveries reflect the increased uncertainty for extreme delays, which is both expected and practically applicable.
It suggests that while the model performs reliably for most typical deliveries, it appropriately acknowledges and accounts for variability in long-tail delivery times.
In operations, such prediction intervals enable risk-aware planning and decision-making.
For example, supply chain managers can use these intervals to proactively buffer inventory levels or adjust customer commitments when the upper bounds of the prediction suggest potential severe delays.
The reasonably tight band around most of the delivery range also reflects confidence in the modelAos generalization, reinforcing its practical value in real-time forecasting systems for container delivery.
Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 Figure 9: Feature importance .
inear regression coefficient.
Source: Authors .
The feature importance .
inear regression coefficient.
chart, depicted in Figure 9, is a crucial tool for understanding the relative influence of key delivery stages on the overall duration of container The regression coefficients, which are normalized for interpretability, indicate the degree to which each feature contributes to predicting total delivery time.
Longer bars indicate a higher impact, providing valuable insights for maritime logistics planners.
The most influential segment is Dispatch to Loading, which exhibits the highest coefficient, approaching 1.
This underscores the urgent need to optimize pre-port processes, as delays at the very start of the logistics chain, often linked to inland haulage, documentation, or congestion at origin terminals, can significantly extend the overall delivery It is a call to action to reduce end-to-end delays.
The Loading to Discharge feature encapsulates the maritime transit phase and is the next most This aligns with expectations in the marine context, where variations in voyage times due to vessel scheduling, weather disruptions, or port omissions can directly and often substantially affect delivery accuracy.
It underscores the importance of stability in the maritime transit phase.
Post-discharge to Predicted Delivery and Discharge to Post-Discharge show slightly lower but notable impacts.
These stages include container handling, customs clearance, and last-mile coordination at the destination port.
Though these delays may be shorter in absolute terms, they can become critical bottlenecks, especially in ports with high throughput or inefficiencies.
Overall, this visualization provides actionable insight for maritime logistics planners: early-stage process efficiency and maritime leg stability are key levers for improving delivery time predictions and reliability.
The potential benefits of targeted interventions at these stages can yield outsized benefits in reducing delivery variability and enhancing customer satisfaction, offering a promising future for the industry.
CONCLUSIONS
This study evaluated the predictive ability of a Random Forest regression model for forecasting inland container delivery times.
The model had a Mean Absolute Error (MAE) of 4.
59 days, a Root Mean Square Error (RMSE) of 10.
55 days, and an RA score of 0.
68, showing a good fit and ability to identify patterns in the data.
Despite these positive results, residual analysis revealed heteroskedasticity and a significant underestimate in longer delivery durations, implying model bias and considerable operational risk.
The learning curve also suggested overfitting, with high training scores .
but significantly lower cross-validation scores .
, indicating a generalization gap caused by data sparsity or feature restrictions.
To enhance the modelAos reliability and operational applicability.
Jurnal Sains Teknologi Transportasi Maritim Volume 7 No.
November 2025 e-ISSN 2722-1679 p-ISSN 2684-9135 future studies should consider incorporating more detailed and diverse features, such as real-time traffic data, weather conditions, and port congestion rates.
Furthermore, balancing the dataset and looking at various ensemble or deep learning methodologies may increase predicted accuracy, particularly in outlier cases.
These enhancements may help ensure the modelAos practical utility for logistical decisionmaking.
ACKNOWLEDGEMENTS
The authors would like to use this opportunity to acknowledge the following people for their support, suggestions and feedback during the period of this research: Augustine Nkem and Nkem Stephen.
REFERENCES