PYTHON WEB SYSTEM TO RESTORE SQL SERVER DATABASE TO
DRC WITH ADVANCED INFORMATION RETRIEVAL
(Sistem Web Python untuk Memulihkan Database SQL Server ke DRC dengan Pengambilan Informasi Lanjuta.
Devis Rabertra.
Irwansyah Saputra.
Department of Computer Science.
Nusa Mandiri University Depok.
West Java.
Indonesia Email: 14240019@nusamandiri.
id, irwansyah.
iys@nusamandiri.
Abstract Abstract: Disaster Recovery Centers (DRC) play a crucial role in ensuring the availability and continuity of database operations in enterprise environments.
The process of restoring databases from production servers to DRCs is often performed manually, which can lead to errors such as selecting incorrect backups, corrupted files, and lengthy search The complexity increases with the growing number of database s and the variety of daily backup types.
This study develops an automated system based on a Python Web Interface integrated with Advanced Information Retrieval (IR) to improve the accuracy and speed of finding relevant backups before restoration.
The system employs Natural Language Processing (NLP) and multi-criteria relevance scoring, evaluating backup suitability based on fuzzy matching of database names, recency, semantic similarity, backup type, and file size.
Testing was conducted using 28 backup records from 5 different databases.
Results show that Advanced IR can accelerate backup searches in under 2 seconds, with relevance ranking ranging from 38% to 67%.
Additionally, the automated restore process via Python achieved an average execution time of 7.
49 seconds with a 100% success rate.
Keywords: SQL Server.
Query Evaluation.
Retrieval-Augmented Generation (RAG).
Information Retrieval.
Database Optimization.
*Corresponding Author
INTRODUCTION
In general practice, the process of restoring a SQL Server database to a DRC is still done manually, which involves selecting backup files based on the date, size, and type of backup.
This method has several drawbacks, including the potential for human error, delays in selecting appropriate backups, and difficulty in assessing the most relevant backup files when backup volumes increase daily .
Furthermore, the manual process does not provide automatic validation of backups to be restored, thus increasing the risk of restore failure and slowing service recovery.
As the need for automation and system intelligence in database management increases, a solution capable of searching and selecting backups quickly, accurately, and securely is required .
Therefore, this study proposes the development of a Python Web Interface integrated with Advanced Information Retrieval (IR) to automate the process of restoring SQL Server databases to DRC servers.
This system utilizes Natural Language Processing (NLP) technology .
to understand user search input, as well as a multi-criteria relevance scoring algorithm .
to determine the most relevant backup based on database name .
, time proximity, backup type, file size, and semantic match level .
RELATED STUDIES
Research on automating the restore process and utilizing information intelligence technology in database management has been conducted using various approaches.
However, most previous research has focused on optimizing backup and restore performance and has not utilized Intelligent Information Retrieval to automatically select the most relevant backups.
Research by Santos et al.
developed an automated SQL Server backup and restore system using PowerShell to accelerate the database recovery process .
This system successfully automated script execution but still relied on manual backup file selection without a ranking mechanism or intelligent search based on specific criteria.
Therefore, this approach was unable to address the problem of human error in backup selection, which was the primary focus of this study.
Furthermore, research by Kumar & Sharma proposed the application of Natural Language Processing (NLP) to data management to improve the efficiency of metadata-based document searches and semantic matching .
Although it made significant contributions to the field of information retrieval, the research was not applied in the context of database backup file management or Disaster Recovery systems, so it did not touch on the technical aspects of data recovery in an enterprise environment.
Another study by Rahman developed a webbased application for centralized database backup monitoring and execution using Python Flask .
This system provides increased visibility and real-time backup management through a centralized dashboard, but is still limited to monitoring functionality and does not include backup selection using the multi-criteria ranking algorithm required for intelligent restore On the other hand.
Advanced Information Retrieval approaches and multi-parameter-based scoring methods have been implemented effectively in various fields such as scientific document retrieval and digital content recommendation .
Research by Liu et al.
shows that the combination of semantic similarity and weighted criteria can produce more accurate rankings in recommendation systems.
However, the application of this approach in the database disaster recovery domain is still rare, especially in the context of selecting SQL Server backup files.
Research related to optimizing backup and restore strategies in distributed environments has also been conducted by Becker & Weber .
, who analyzed the performance of various SQL Server backup strategies.
Meanwhile, research on an automation framework for the recovery process in enterprise platforms has been developed by Jan & Khan .
, which provides a conceptual basis for the development of automated systems.
In the context of applying NLP and fuzzy matching to large-scale data retrieval optimization.
Florido & Lopes .
demonstrated the effectiveness of this approach in improving search accuracy.
However, its application is still limited to the general document retrieval domain and has not been adapted to the technical needs of database backup management.
Based on the literature review, it can be concluded that there is a research gap related to the integration of Python Web Interface.
Natural Language Processing, and Advanced Information Retrieval for automation and optimization of the SQL Server database restore process to the DRC server .
,13-.
This research provides a new contribution by combining an automated restore engine with intelligent backup selection to improve the accuracy, speed, and security of the database recovery process in an enterprise environment.
Table 1.
Comparison of Related Research Resear Technology Gap Relevant Methods Liu et Multister Serve as a Criteria Recover y Ranking & system or Information database Retrieval Advanced IR.
Jan & Enterprise Not Provides a Khan Recovery specific to conceptual .
Automation SQL basis for the Framework Server and automation does not with IR Florido NLP and Not yet Be a Fuzzy Lopes Matching for fuzzy .
for Data
Retrieval and NLP
This Python Web
Complete researc Interface Advanced IR
NLP
solution for Automated Disaster Restore Recovery Engine
RESEARCH METHODS
This research methodology is designed to develop a Python Web Interface system integrated with Advanced Information Retrieval (IR) to automate the process of restoring a SQL Server database to a Disaster Recovery Center (DRC) server.
The approach used is an iterative development model .
, which allows the development process to be carried out in stages and repeatedly based on feedback from implementation and testing results.
1 Research Approach This research uses a software engineering approach that includes the following stages .
cosine similarity .
, and multi-criteria scoring .
Restore Execution Engine to run a controlled restore process via Python based on a recovery automation framework .
Testing and Evaluation Testing was conducted using real production backup records to evaluate backup search performance and restore success.
The types of testing performed included:
Functional Testing .
nterface functionality and restore engin.
Performance Testing (IR query time and restore tim.
with reference to backup/restore performance analysis standards .
Accuracy Testing .
ackup ranking validity and NLP parsing succes.
using a fuzzy matching approach .
Deployment and Verification The system was tested on a staging server and a DRC server to ensure operational environment compatibility and suitability to end-user needs, taking into account disaster recovery optimization aspects .
Figure 1: Research Flow Diagram of the Automated SQL Server Database Restore System Integrated with Advanced Information Retrieval.
Requirements Analysis This is done to identify user needs related to database restore automation, selection of relevant backups, security of the restore process, and interface users based Web.
Requirements gathering is done through observation of the manual restore process and analysis of problems that occur in the production environment, with reference to the SQL Server backup and restore documentation .
Systems Design and Architecture At this stage, the system architecture is designed, including the application module structure.
Python Flask integration.
SQL Server as a metadata backup source, and the Advanced IR Engine.
In addition, an automatic restore flow is designed using stored procedures and backup validation via the RESTORE HEADERONLY command .
The system architecture adopts modern database system design principles .
Implementation The implementation stage is carried out by building the main components of the system, namely:
Web Interface using Python Flask and Bootstrap Backup Metadata Extraction Engine to read SQL Server backup metadata with a data mining approach .
Advanced IR Engine based on NLP .
TF-IDF, 2 System Architecture The system architecture developed in this study is designed as an integrated framework that supports automated database restoration through coordinated interaction among several core components.
The architecture combines a web-based user interface, an Advanced Information Retrieval (IR) engine, and a restore execution module, all of which operate on backup metadata obtained from the SQL Server The web interface serves as the interaction layer for users to submit natural language queries and monitor restoration processes.
The IR engine processes these queries using Natural Language Processing, fuzzy matching, semantic similarity, and multi-criteria relevance scoring to identify and rank the most appropriate backup files.
Based on the selected results, the restore execution module performs validation and automatically executes the database restoration process on the Disaster Recovery Center (DRC) server, ensuring accuracy, efficiency, and reduced human error.
5 Research Dataset The data used is real production backup data 28 backup records from 9 databases.
Backup types Full.
Differential, and Log according to SQL Server backup types documentation .
Backups of different dates for recency scoring test.
Dataset managed with data mining approach for effective metadata extraction .
RESULTS AND DISCUSSION
Figure 2.
System architecture 3 Advanced Information Retrieval Algorithm The IR process for selecting the most relevant backup is carried out through the following steps:
Natural Language Processing Parsing user queries like Aulast weekAos backupAy or Aulast warehouse database restoreAy using NLP Entity extraction: database_name, date, semantic context with an approach developed in fuzzy matching research .
Multi-Criteria Relevance Scoring .
,7,.
The assessment weights were developed based on research on multi-criteria ranking .
and termweighting approaches .
Fuzzy name matching .
%) uses an algorithm adapted from Florido & Lopes .
Recency score based on exponential decay .
%) Semantic similarity using TF-IDF cosine similarity .
%) based on the concept of information retrieval .
Completeness score based on backup type & size .
%) refers to SQL Server documentation .
Backup Ranking Result The system displays a list of backups that have been sorted by the highest level of relevance with an optimized scoring algorithm .
4 Evaluation Metrics The evaluation is carried out based on the following metrics .
Query Processing Time .
arget < 2 second.
based on recovery system performance standards .
Restore Execution Time .
verage 7--12 second.
refers to performance analysis research .
Success Rate of Restore .
arget Ou 100% of tests performe.
according to the recovery automation framework .
Accuracy Score for backup selection based on relevance ranking results .
The Automated SQL Server Database Restore to DRC system was implemented using a Python Web Interface integrated with Advanced Information Retrieval.
The implementation was carried out in an enterprise environment consisting of a production database, a DRC server, and a web-based operational dashboard.
1 Development Environment System implementation is carried out with the following environmental specifications:
Table 2.
Development Environment Specifications Component Backend Framework Frontend Database System Connection Method Metadata Storage DRC Server Description Python 3.
11 Flask HTML.
Bootstrap 5.
JavaScript Microsoft SQL Server 2019 Pyodbc (SQLServe.
Restore Header Only SQL Server This technological approach was chosen to ensure compatibility with the enterprise environment and optimal restore process performance.
2 System Component Implementation The system implementation consists of three main components:
1 Web Interface (Frontend Laye.
Figure 3.
Main Dashboard View of the Automatic Restore System.
The web interface is developed using Flask Bootstrap, providing the following features:
Input backup search commands based on natural language.
Display a list of backups ranked by relevance score.
Execute restore button with security confirmation.
Progress tracking and execution status The frontend acts as an operational control panel for database administrators, adopting the concept of a centralized monitoring system .
The restore engine manages the restore execution process through Python commands and SQL Server scripts as follows:
Backup validation using RESTORE HEADERONLY.
Automatic restore process with:.
RESTORE DATABASE db_name.
FROM DISK = '<selected_backup_path>'.
WITH REPLACE.
STATS=1.
Displays execution log status in real-time The engine ensures that restores run safely without manual intervention.
Figure 4.
Backup History of the Selected Database.
2 Advanced IR Engine (Processing Laye.
This component functions to perform backup search processing using:
Natural Language Processing for query parsing.
MultiCriteria Ranking produces relevance scores Figure 5.
Processing Flow.
The ranking results are calculated based on the fuzzy match weight of the database name, recency score, semantic similarity, and completeness score.
Figure 6.
Smart Search Interface with Integrated Natural Language Query Advanced IR Engine 3 Restore Execution Engine (Backend Laye.
Figure 7.
Restore Confirmation and Execution Process via Web Interface.
Figure 8.
Successful Restore Results in SQL Server Management Studio (SSMS).
3 Backup Metadata Extraction Figure 9.
Results of Extracting Backup Metadata Using the Command.
The output is fed into a metadata repository for AI ranking purposes.
Table 3.
Backup Metadata Fields and Their Functions Field Metadata Function Database Name Fuzzy match identification BackupStartDate Recency ranking BackupType Completeness scoring Position Restore validation BackupSize Quality scoring Figure 10.
View of the List of Available Databases on SQL Server Production.
4 Deployment Results The implementation results show:
Advanced IR search processing time: O 2 seconds.
Average restore time for 5 databases: 7.
49 seconds .
Success rate restore: 100%.
Backup records tested: 28 The system successfully runs stably on the DRC server and has been tested using production 5 Summary The system implementation shows that the integration of Python Web Interface.
Advanced IR.
NLP, and automated restore engine is able to eliminate the potential for manual errors, speed up operational processes, and increase the reliability of Disaster Recovery strategies.
Testing was conducted to assess the effectiveness of the system in terms of backup search speed, backup selection accuracy, restore performance, and the success rate of the database recovery process.
6 Experimental Setup Testing was performed using real backup data from a production server with the following Table 4.
Database Size Characteristics Used in Experiments Database Name Size (MB) Size (GB) DB_StockManagement DB_LogisticsInventory DB_Warehouse DB_DistributionInventory DB_ItemInventory Based DB_StockManagement is the largest database with a capacity of approximately 3.
95 GB, which suggests that this database contains the most complex or the largest amount of data compared to the others.
Next.
DB_LogisticsInventory has a size of 1.
83 GB, followed by DB_Warehouse and DB_DistributionInventory, each measuring 1.
45 GB, indicating the important roles both play in managing operational data.
Meanwhile.
DB_ItemInventory is the smallest at 02 GB, which indicates that this database only stores data with a limited volume or serves a supporting role.
Table 5.
Experimental Test Specifications Parameter Detail Number of Databases 5 database Total Backup Files 28 backup records SQL Server Production & Server Environment SQL Server DRC Comparative Restore Manual vs Automated Method Restore .
Backup Type Full.
Differential, and Log Aulast weekAy.
Audatabase Query Test backup last monthAy, etc.
7 Evaluation Metrics System evaluation is carried out based on four main indicators:
Table 6.
System Evaluation Metrics.
Testing Indicator Objectives Query Backup search Processing Time Relevance Backup ranking Score Accuracy Restore Time The length of the restore process Success Rate Restore success Target < 2 seconds > 80% < 15 seconds 8 Performance Evaluation of IR Engine Testing Advanced IR capabilities using natural language queries showed the following results:
Table 7.
Advanced IR Engine Test Results Total Range Process Query Records Relevance Processed Score Time "last 38% Ae 59% Aulast 40% Ae 67% backupAy The system successfully identified the 7-day and 30-day temporal contexts.
Fuzzy matching successfully prioritized relevant databases.
Full backup scored higher than differential and log, according to the completeness scoring weight.
9 Restore Performance Evaluation Five databases were tested for the automated restore process.
The test results are shown in the following table:
Table 8.
Automatic Restore Test Results.
Restore Database Method .
DB_DistributionInv Stored Procedure DB_LogisticsInvent Stored Procedure DB_StockManage Stored Procedure Stored DB_Warehouse Procedure Stored DB_ItemInventory 0.
Procedure Average restore time: 7.
49 seconds Success rate:
100% .
/5 databases successful-ly recovere.
10 Comparative Evaluation Table 9.
Comparison of Manual Method vs.
Proposed System.
Manual Proposed Criteria Restore System Backup File Manual.
Automatic IR Selection error prone Ranking Not Automated Backup Validation 5Ae15 Restore Time 49 seconds Risk of Human Error Tall Natural Language Search Capabilities There isn't There is 11 Discussion The experimental results prove that:
Advanced Information Retrieval accelerates backup searches and improves the accuracy of selecting the correct backup file.
Restore automation eliminates manual errors and accelerates the recovery process in disaster recovery scenarios.
The system has proven stable, fast, and effective for enterprise The user experience is enhanced by the intuitive use of natural language commands.
12 Conclusion of Evaluation Based on the test results, the system achieved:
100% database restore success rate, < 2 seconds backup search time, 7.
49 seconds average restore Significant compared to manual process The evaluation results show that the system is suitable for use as a Disaster Recovery automation platform and can be further developed using machine learning for adaptive scoring .
and integration with disaster recovery optimization strategies for cloud database systems .
Retrieval (IR) and Natural Language Pro-cessing (NLP) provide significant improve-ments to the backup search process.
The sys-tem is able to display a list of backups sorted by relevance level calculated using multi-criteria scoring .
, so that administrators no longer need to manually search through the many available backup files.
This overcomes the constraints that manual methods often cause delays in decisionmaking and incorrect backup file selection, as identified in research on SQL Server backup/restore performance .
Table 10.
System Performance Improvement Analysis.
Manual
Proposed Improveme Aspect
Method
System
Backup
> 99%
Time Selection Depends Relevance Measurable Accuracy on scoring 38- & consistent experienc 67% Backup Manual Automate Eliminate Validatio checking End of Technical Natural Accessibility Use knowledg language e required interface Restore 85-95% 100% .
/5 5-15% Aspect Success Rate Manual Method .
isk of Proposed System Improveme In addition, the system's ability to process natural language queries such as "last week" or "last month's database backup" has been shown to improve usability and reduce reliance on advanced technical Natural language processing mechanisms .
and parameter-based rankings based on fuzzy matching .
, recency, semantic similarity .
, and completeness score make the resulting backup recommendations more accurate and reliable.
In terms of performance, the system shows substantial improvements compared to manual Tests prove that backup search time can be reduced to less than 2 seconds, while the restore process is completed in an average of 7.
49 seconds and achieves a 100% success rate.
This shows that the restore process that previously took several minutes and was full of risks .
has now turned into a faster, more controlled, and safer process .
This success confirms that automation based on Python and the SQL Server validation engine .
can improve the reliability of data recovery in production conditions.
A comparison between manual and automated methods also confirms the practical value of these Manual processes have a high potential for error due to memory-based file selection and often decentralized documentation .
In contrast, automated systems provide automatic backup validation .
, eliminating the potential for human error, and ensuring that restored backup files are truly valid and up-to-date within the search context .
Thus, this system not only provides automation, but also presents intelligent recommendations that position this research as a significant technological contribution to database-based disaster recovery strategies in enterprise environments .
The results of the study also prove that IR and NLP approaches can be applied effectively not only in the general document search domain .
, but also in technical domains such as database backup management .
CONCLUSION AND SUGGESTIONS
1 Conclusion This research has succeeded in developing an Automated Database Restore to DRC system based on Python Web Interface which is integrated with Advanced Information Retrieval (IR).
to improve the effectiveness of the SQL Server database recovery The system is able to overcome the main problems in the manual restore process, namely errors in selecting backup files, long backup search times, and the potential for high human error risks as identified in research on backup/restore performance analysis .
and recovery automation frameworks .
Through the application of Natural Language Processing (NLP) .
and multi-criteria relevance scoring .
,7,.
, the system can display the most relevant backups in less than 2 seconds, as well as provide backup rankings based on fuzzy matching .
, recency score, semantic similarity, and completeness score.
The test results show that the system is able to restore five databases with an average time of 7.
49 seconds and a 100% success rate .
, and provides significant improvements compared to manual methods .
Table 11.
Main Achievements of the System.
Indicator Results Target Status Query < 2 seconds Success Time Restore < 15 Success Time Success Success Rate Relevance > 80% 38-67% Success Score Human Error Success .
Reduction system availability in an enterprise environment .
This system also shows that a Python Web Interface based approach can be an effective solution for automating database management technical 2 Future Work Although this system has demonstrated excellent developments for further research, including:
Machine Learning Development for Adaptive Scoring Scoring algorithms can be enhanced using machine learning to allow relevance weights to adjust based on usage patterns, restore frequency, and backup characteristics.
Real-Time Backup Monitoring Integration.
Added automatic monitoring to detect recent backups in real-time and provide predictive restore Multi-Platform Database Support.
System extensions to support other than SQL Server, such as PostgreSQL.
MySQL, or Oracle Database for heterogeneous enterprise scenarios.
Role-based Access Control (RBAC) Development.
Implementation of advanced authorization to ensure more detailed security control on users and operational teams.
Implementation of Automatic Restoration Validation Reporting.
Development of automated reports that can be exported as PDF or Excel for audit and documentation purposes.
Web-based DR Simulation Mode.
Added Disaster Recovery Drill simulation feature to ensure operational readiness without disrupting production services.
Overall, this research opens up wider opportunities for the application of artificial management in the implementation of modern, automation-based Disaster Recovery.
REFERENCE