International Journal of Computer and Information System (IJCIS) Peer Reviewed Ae International Journal Vol : Vol.
Issue 03.
August 2025 e-ISSN : 2745-9659 https://ijcis.
net/index.
php/ijcis/index Development of an Intelligent Chatbot with Voice Recognition Features to Assist Users in Analyzing PDF Documents at Politeknik Indonusa Surakarta 1st Rakyan Duta Wijaya, 2nd Dwi Iskandar, 3rd Frestiany Regina Putri Software Engineering Technology Study Program Politeknik Indonusa Surakarta Surakarta.
Indonesia Email : 1b22016@poltekindonusa.
id, 2dwik@poltekindonusa.
id, 3frestiany.
putri@poltekindonusa.
AbstractAi This study aims to design an intelligent chatbot system integrated with voice recognition features to assist users in efficiently and interactively analyzing the content of PDF documents.
The background of this system development stems from the difficulties many users face in understanding and extracting information from complex and lengthy digital documents.
Adopting a Research and Development (R&D) approach based on the ADDIE model, this study develops a system prototype that allows users to issue commands via voice or text, and then receive responses in the form of summaries, specific answers, or information retrieval from PDF files.
Although still in the design stage, this study has identified key components such as the voice recognition module, natural language processing (NLP), and PDF parser.
The system evaluation plan involves testing accuracy, response relevance, and user comfort through limited simulations.
Planning results indicate that the system holds significant potential in supporting educational, research, and administrative activities, although it still faces technical challenges such as voice accent variations and complex document structures.
With further development that considers efficiency and sustainability, this system is expected to become a smart solution for digital document management.
Keywords: Chatbot.
Voice Recognition.
Document Analysis.
Artificial Intelligence.
INTRODUCTION
in the context of administrative work, scientific research, and educational activities that heavily rely on documentThe advancement of artificial intelligence (AI), based information.
Therefore, an AI-based solution is particularly within the domain of intelligent chatbots neededAione that can manage PDF documents efficiently powered by Large Language Models (LLM.
, has witnessed and interactively through a user-friendly approach.
an unprecedented acceleration in recent years.
Models such On the other hand, voice recognition technology as ChatGPT exemplify the transformative capabilities of continues to advance and has been applied in various LLMs in natural language processing (NLP), offering applications to enhance the efficiency of human-machine sophisticated linguistic comprehension and the capacity to Zhou .
demonstrated that voice execute a wide array of text-oriented tasks with high recognition technology based on Convolutional Neural As observed by Jiang et al.
, despite the Network (CNN) can accurately convert voice input into text evident flexibility and cognitive utility of LLM-based with a high level of precision, allowing users to interact chatbot systems in enhancing digital workflows, there without the need to type .
The use of voice commands is remains a pressing need to critically examine their particularly beneficial for users with physical limitations or associated energy demands and environmental implications, those working in multitasking environments.
In the context notably their carbon footprint and resource-intensive of chatbots, the integration of this technology presents a maintenance requirements .
This emergent concern significant opportunity to improve accessibility and User underscores the necessity of fostering AI solutions that are Experience (UX).
not only functionally intelligent but also ecologically The integration of intelligent chatbots, voice The deliberate integration of sustainable recognition, and PDF document analysis systems offers an computational practices into the design and deployment of innovative solution to address information efficiency such systems is vital to ensuring long-term viability.
A study by Madenda .
demonstrated that strategically leveraging LLM-based technologies with an incorporating voice recognition features into digital emphasis on both performance and efficiency, it becomes assistant applications can accelerate information retrieval feasible to develop intelligent tools that meaningfully assist processes and simplify navigation within computer software users in performing routine yet cognitively demanding tasks .
Meanwhile, research by Khan et al.
, which such as the parsing, analysis, and synthesis of digital designed an automated system to extract key data from document content while simultaneously promoting Safety Data Sheet (SDS) documents, also proved the environmental stewardship.
effectiveness of AI- and machine learning-based approaches One of the challenges faced by many users in in comprehensively and rapidly understanding semiprofessional and academic activities is the difficulty in structured documents .
reading, filtering, and understanding the content of complex Based on the aforementioned background, this study and lengthy PDF documents.
Peddarapu et al.
stated aims to develop a prototype of an intelligent chatbot that the increasing volume of digital documents demands an equipped with voice recognition features to assist users in automated system capable of summarizing and extracting analyzing and extracting information from PDF documents.
key information from PDF files .
This is highly relevant The main research question of this study is: AuHow can an Journal IJCIS homepage - https://ijcis.
net/index.
php/ijcis/index Page 236 International Journal of Computer and Information System (IJCIS) Peer Reviewed Ae International Journal Vol : Vol.
Issue 03.
August 2025 e-ISSN : 2745-9659 https://ijcis.
net/index.
php/ijcis/index intelligent chatbot with voice recognition be developed to facilitate PDF document analysis?Ay The objective of this research is to design and build an AI-based system capable of understanding voice input, reading the content of PDF documents, and providing relevant answers based on user The significance of this research lies in improving efficiency and inclusivity in document processing, especially for non-technical users who require a practical way to interact with complex information.
II.
RESEARCH METHODS
1 Research and Development (R&D) Method The Research Development (R&D) methodology constitutes a structured and iterative framework designed to generate practical, useroriented solutions through a systematic process of design, implementation, and empirical validation.
noted by Rahayu .
, the R&D approach is widely employed in the creation of instructional media and educational technology platforms due to its inherent adaptability to user-specific contexts and requirements .
Within the scope of this studyAicentered on the development of an intelligent, voice-commandenabled chatbot for PDF document analysisAithe application of the R&D method is particularly The nature of the system under development necessitates not only a robust technical architecture but also a rigorous assessment of end-user needs and real-time system efficacy.
Given the integration of sophisticated technologies such as automatic speech recognition (ASR) and digital document parsing, the R&D model's emphasis on cyclical refinement, performance-based evaluation becomes critical to ensure the systemAos functional viability and contextual relevance.
The modelAos structured yet flexible nature facilitates measurable progress tracking while allowing for adaptive enhancements aligned with technological and user-centric advancements.
The development methodology adopted in this study is the ADDIE model, a well-established instructional and system design framework comprising five iterative and interdependent phases: Analyze.
Design.
Develop.
Implement, and Evaluate.
This model was selected due to its methodological clarity, adaptability, and proven efficacy in the development of AI-driven application systems.
During the Analyze phase, the study systematically identifies the needs and challenges faced by the academic community, particularly with regard to improving accessibility and comprehension of complex PDF documents.
The Design phase encompasses the architectural formulation of the chatbot system, including the user interface layout, integration of external Application Programming Interfaces (API.
, and the configuration of data flow In the Develop phase, essential system componentsAisuch as the voice recognition engine.
Natural Language Processing (NLP) modules, and the Journal IJCIS homepage - https://ijcis.
net/index.
php/ijcis/index PDF parsing subsystemAiare engineered and cohesively integrated to form the core functionalities of the prototype.
The Implement phase is dedicated to the initial deployment and simulation of the prototype within a controlled environment, enabling observation of realtime performance and user interaction.
Lastly, the Evaluate phase involves a comprehensive assessment of the systemAos operational effectiveness, with a particular focus on voice recognition accuracy, contextual response generation, and overall user Moreover, this study also incorporates considerations related to data security and text-based information processing in digital documents, aligning with the findings of Ravi et al.
, who underscore the importance of robust security protocols in machine learning-based document analysis systems .
2 Conceptual Framework The conceptual framework in this study illustrates the workflow of an intelligent chatbot system designed to assist users in analyzing the content of PDF documents using voice input.
The system is developed using a Research and Development (R&D) approach that includes the processes of needs identification, system design, development, and user evaluation.
Figure 1.
Conceptual Framework The workflow is initiated with the Problem Identification phase, during which the research identified a significant challenge faced by the academic community at Politeknik Indonusa Surakarta namely, the difficulty in effectively extracting and comprehending information embedded within complex and text-dense PDF documents.
Through a combination of direct observations and structured interviews, the study uncovered a clear demand for a user-oriented.
AI driven solution.
This solution would need to be capable of not only processing extensive textual data but also facilitating interactive engagement by responding to user queries, generating concise summaries, and enabling efficient information retrieval from within the documentsAo content.
The subsequent phase.
System Input, encompasses two principal components: the input of PDF Page 237 International Journal of Computer and Information System (IJCIS) Peer Reviewed Ae International Journal Vol : Vol.
Issue 03.
August 2025 e-ISSN : 2745-9659 https://ijcis.
net/index.
php/ijcis/index documents and user-issued commands.
Within this stage, users are enabled to upload PDF files via the chatbotAos interactive interface, upon which the system automatically performs text extraction utilizing integrated parsing mechanisms.
In addition to document input, users may provide operational commands either through textual input or vocal When voice input is utilized, the system dynamically activates its Voice Recognition module to transcribe spoken language into machine-readable text, which is then processed by the Natural Language Processing (NLP) engine embedded within the chatbot Following the successful interpretation of the command, the system proceeds to analyze the textual content of the uploaded document and subsequently generates output in the form of contextually appropriate responses such as concise content summaries, direct answers to user inquiries, or targeted keyword and phrase searches.
All results are rendered through a user-centric interface designed to optimize accessibility and interaction fluency.
The final phase.
Implementation and Testing, involves the deployment of the developed prototype to a controlled group of users for preliminary evaluation.
This stage is designed to assess the system's operational effectiveness across multiple dimensions, including response accuracy, processing speed, and overall user satisfaction.
The feedback gathered during this stage both quantitative and qualitative is instrumental in informing iterative refinements and system enhancements prior to broader deployment.
adopting this evaluative approach, the system is envisioned to evolve into a robust and user-adaptive solution, capable of facilitating efficient and interactive management of information embedded within PDF documents.
RESULT AND ANALYSIS
1 System Implementation Simulation The intelligent chatbot system developed in this study is designed to assist users in analyzing PDF documents efficiently and interactively.
The main focus of the implementation is to enable users to give commands either in text or voice form and receive responses based on the documentAos content.
The core features of the system include: .
System input, which involves uploading PDF documents and user .
Voice recognition, which converts voice input into text commands.
Chatbot output, which delivers specific answers, summaries, or phrase searches within the document.
The process begins when the user uploads a PDF document through the chatbot interface.
Then, the user can issue commands either by typing or speaking.
The system processes the document content based on the given command and displays the result through an intuitive This system supports user flexibility and can be applied in various contexts such as education, research, and other professional needs.
Journal IJCIS homepage - https://ijcis.
net/index.
php/ijcis/index During its operational deployment, the system is designed to handle two distinct yet synergistically integrated input streams.
The first pertains to the PDF document, which constitutes the primary object of Users are facilitated to upload such documents via the chatbot interface, upon which the system initiates an automated content extraction process utilizing dedicated libraries such as PyMuPDF or PDFMiner, optimized for parsing unstructured textual data from complex digital file formats.
The second input type encompasses user-issued commands, which may be submitted in either textual or spoken form.
When voice input is employed, the system leverages its embedded voice recognition capabilities, which may include deep learning architectures such as 1D Convolutional Neural Networks .
D-CNN.
or integration with third-party APIs like Google Speech-to-Text, to transcribe speech into structured text.
This transcribed data is subsequently processed by the chatbot's NLP module to interpret the user's intent and guide the response generation process.
As highlighted by Zhou .
, the application of CNN-based voice recognition has been shown to yield high transcription accuracy while significantly accelerating input efficiency .
The interpreted commands can range from simple queries such as AuPlease summarize this document,Ay or AuWhat is written in Article 5?Ay to more specific keyword-based searches like AuFind the word AoregulationAo in the PDF.
Ay The system responds by executing the appropriate NLP operations, providing the user with either a concise summary, a targeted answer, or contextual search results directly extracted from the document, all within a streamlined and interactive interface.
The output from the chatbot system is displayed through a user-friendly interface, with the capability to present various types of responses based on the uploaded PDF document.
For example, if the user requests a summary, the system will run an NLP-based summarization algorithm and display a shortened version of the document.
If the user inquires about specific content, the system will perform context matching based on the extracted PDF data and provide a direct answer.
To illustrate how the system functions as a whole, two key illustrations are included, namely:
Page 238 International Journal of Computer and Information System (IJCIS) Peer Reviewed Ae International Journal Vol : Vol.
Issue 03.
August 2025 e-ISSN : 2745-9659 https://ijcis.
net/index.
php/ijcis/index Figure 2.
System Architecture Diagram chatbot in interpreting voice commands, generating contextually relevant responses derived from the content of PDF documents, and ensuring a high level of user comfort and interaction fluency within the system The evaluation framework incorporates multiple performance indicators, including the accuracy of voice recognition, the semantic relevance of chatbot responses, and the overall User Experience (UX).
ensure a representative assessment, the testing phase is planned to involve a sample group of 5 to 10 participants from diverse user profilesAiencompassing students, academic faculty, and administrative Each participant will engage with the system through predefined voice command scenarios aimed at evaluating the chatbotAos capacity to accurately extract, interpret, and respond to content embedded within PDF The evaluation employs a mixed-method approach, consisting of quantitative analysis using structured observation checklists and rating scales, alongside qualitative feedback gathered through user reflections, enabling a more holistic understanding of the systemAos usability and effectiveness across different user Table 1.
System Evaluation Plan Evaluation Type Test Description Voice Recognition Accuracy Match between voice input and transcription result Chatbot Response Relevance User Experience (UX) Figure 3.
Flowchart The diagram further delineates the sequential integration of system modules, beginning with the acquisition of user input, followed by the extraction and processing of textual data from PDF documents, the interpretation of contextual relevance and user intent, and culminating in the dynamic presentation of results via the chatbot interface.
This modular architecture not only facilitates a streamlined and scalable workflow but also enables extensibility, allowing for the iterative incorporation of advanced functionalitiesAisuch as sentiment analysis, optical character recognition (OCR), and multilingual capabilitiesAiin future development 2 System Evaluation Plan The system evaluation process is designed to rigorously assess the performance of the intelligent Journal IJCIS homepage - https://ijcis.
net/index.
php/ijcis/index Relevance of chatbot responses to PDF content .
cale 1Ae.
Ease of use, interface design, and overall satisfaction .
cale 1Ae.
Score / Result 4,4 out of 5 4,6 out of 5 3 Testing Scheme and Methodology System testing was conducted to comprehensively evaluate both the functional integrity and the overall user experience (UX) of the proposed intelligent chatbot Two primary methodological approaches were employed to ensure rigorous assessment: black-box testing and usability testing utilizing the System Usability Scale (SUS) framework.
Black-box testing served to verify whether the system could reliably generate accurate and contextually appropriate outputs in response to a diverse range of user inputs, without reference to or reliance on the underlying code structure.
In parallel, usability testing focused on assessing the intuitiveness, accessibility, and operational efficiency of the system from the end-userAos perspective, with particular emphasis on its voice recognition capabilities and effectiveness in comprehending complex document The evaluation was grounded on a curated dataset comprising representative dummy PDF documents including annual reports, academic theses, and scientific journal articles selected to simulate realistic interaction scenarios and diverse document Page 239 International Journal of Computer and Information System (IJCIS) Peer Reviewed Ae International Journal Vol : Vol.
Issue 03.
August 2025 e-ISSN : 2745-9659 https://ijcis.
net/index.
php/ijcis/index structures commonly encountered in professional and academic environments.
Table 2.
Testing Scheme Test Type Usability Testing (SUS) Description Input and output testing without reviewing the code structure User-based system assessment through Simulation Dataset PDF documents:
abstracts, reports, articles Black-box Testing Objective Evaluate overall system functionality Assess user comfort and satisfaction Test system content feasibility and (LLM.
, as highlighted by Jiang et al.
, which may result in a considerable environmental footprint if resource utilization is not optimized and managed sustainably .
To effectively mitigate the aforementioned challenges, it is imperative to engineer a system that incorporating considerations of computational resource efficiency and long-term environmental sustainability.
As emphasized by Jiang et al.
, a comprehensive and systemic approach must be embedded throughout the entire lifecycle of LLM-based chatbot developmentAibeginning at the initial design phaseAito proactively minimize energy consumption and reduce the resulting carbon footprint .
Furthermore, the enhancement of voice recognition accuracy and contextual processing performance can be substantially improved through the fine-tuning of models on localized datasets that reflect user-specific linguistic variations, as well as the strategic reinforcement of advanced Natural Language Processing (NLP) modules.
By systematically mapping both the opportunities and inherent limitations of the system, the proposed solution is positioned not only as a technically viable innovation but also as a socially and environmentally sustainable platform for intelligent document analysis.
4 Potential and Challenges Analysis.
The development of an intelligent chatbot system integrated with voice recognition capabilities for PDF document analysis presents considerable promise in advancing digital efficiency and inclusivity across diverse sectors.
Within the educational domain, for example, such a system can significantly support both students and academic staff by facilitating rapid comprehension of scholarly materials through seamless voice-based interaction.
In legal and governmental contexts, the deployment of this technology enables more practical and efficient navigation of complex IV.
CONCLUSION
regulatory and policy documents.
Moreover, voiceThis research articulates the conceptualization and enabled interfaces hold the potential to bridge accessibility gaps by accommodating users with preliminary development of an intelligent chatbot system physical impairments or limited digital literacy, thereby augmented with voice recognition capabilities, specifically fostering a more inclusive digital ecosystem.
This aligns engineered to facilitate the efficient and interactive analysis with the findings of Zhou .
, which indicate that the of Portable Document Format (PDF) files.
Anchored in a implementation of voice recognition technology can Research and Development (R&D) methodology and significantly improve user experience .
On the other systematically structured around the ADDIE framework Analysis.
Design.
Development, hand, a study by Peddarapu et al.
also highlights comprising Implementation.
Evaluation the benefits of using AI to summarize and simplify lengthy documents such as PDFs, which are increasingly a methodical progression that spans from the identification of user requirements to the formulation and simulation of a used in professional contexts .
However, there are several technical and functional functional prototype.
The overarching objective of this initiative is to architect challenges that need to be considered in the an AI-powered solution capable of accurately deciphering development of this system, including:
Limited robustness of speech-to-text recognition voice-based user commands, performing robust text when exposed to diverse user-specific variables extraction from unstructured PDF content, and delivering such as accents, regional dialects, and intonation semantically relevant, context-aware responses.
These patterns, often resulting in reduced transcription outputs are rendered through a user-centric interface designed to enhance accessibility, streamline interaction, accuracy and compromised input reliability.
Challenges in contextual comprehension of lengthy and address the growing need for intelligent automation in or ambiguous sentences, which may hinder the document management.
The proposed system thus serves as chatbotAos ability to accurately interpret user intent, a foundational blueprint for the integration of natural thereby increasing the likelihood of semantic language interfaces and document intelligence in future digital assistant technologies.
misalignment or irrelevant responses.
Although the system remains in the planning and .
Structural intricacy and variability of PDF documents, including features such as multi-column prototyping stage, the preliminary design exhibits layouts, embedded tables, and non-textual elements substantial potential for enhancing task efficiency, user .
, images or diagram.
, which pose significant accessibility, and digital inclusivity particularly within the obstacles to precise and comprehensive content domains of education, academic research, and administrative operations.
The evaluation strategy .
Substantial computational and energy demands emphasizes key performance indicators such as speech associated with deploying Large Language Models recognition accuracy, response relevance, and overall user Journal IJCIS homepage - https://ijcis.
net/index.
php/ijcis/index Page 240 International Journal of Computer and Information System (IJCIS) Peer Reviewed Ae International Journal Vol : Vol.
Issue 03.
August 2025 e-ISSN : 2745-9659 https://ijcis.
net/index.
php/ijcis/index Simulation outcomes suggest promising performance levels, albeit with identified technical challenges including variability in accent recognition and the structural complexity of PDF documents.
Moving forward, future development efforts must prioritize resource efficiency, energy optimization, and the incorporation of localized model training to improve contextual interpretation and system adaptability.
conclusion, the proposed design establishes a solid foundational framework for the advancement of a practical, adaptive, and environmentally sustainable voice-enabled PDF chatbot solution.
REFERENCES