JOIV : Int. Inform. Visualization, 8. : IT for Global Goals: Building a Sustainable Tomorrow - November 2024 1651-1661 INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION journal homepage : w. org/index. php/joiv A Review on Classifying and Prioritizing User Review-Based Software Requirements Amran Salleh a. Mar Yah Said a,*. Mohd Hafeez Osman a. SaAoadah Hassan a Faculty of Computer Science and Information Technology. Universiti Putra Malaysia. Serdang. Selangor. Malaysia Corresponding author: *maryah@upm. AbstractAiUser reviews are a valuable source of feedback for software developers, as they contain user requirements, opinions, and expectations regarding app usage, including dislikes, feature requests, and reporting bugs. However, extracting and analyzing user requirements from user reviews is ineffective due to the large volume, unstructured nature, and varying quality of the reviews. Therefore, further research is not just necessary but crucial to effectively explore methods to gather informative and meaningful user This study aims to investigate, analyze, and summarize the methods of requirement classification and prioritization techniques derived from user reviews. This review revealed that leveraging opinion mining, sentiment analysis, natural language processing, or any stacking technique can significantly enhance the extraction and classification processes. Additionally, an updated matrix taxonomy has been developed based on a combination of definitions from various studies to classify user reviews into four main categories: information seeking, feature request, problem discovery, and information giving. Furthermore, we identified Naive Bayes. SVM, and Neural Networks algorithms as dependable and suitable for requirement classification and prioritization tasks. The study also introduced a new 4-tuple pattern for efficient requirement prioritization, which included elicitation technique, requirement classification, additional factors, and higher range priority value. This study highlights the need for better tools to handle complex user Investigating the potential of emerging machine learning models and algorithms to improve classification and prioritization accuracy is crucial. Additionally, further research should explore automated classification to enhance efficiency. KeywordsAiUser reviews. requirements prioritization. requirements classification. mobile apps. user requirements. Manuscript received 4 Jun. revised 9 Aug. accepted 12 Oct. Date of publication 30 Nov. International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4. 0 International License. unstructured nature, and varying quality of the reviews. According to a study conducted by . , user reviews generate large volumes . of data with noisy characteristics, as also stated by . This data will grow exponentially and rapidly, leading to difficulties in classifying and prioritizing the software requirements derived from the reviews. Furthermore. Fereidouni et al. state that identifying critical user feedback from many reviews remains an ongoing Therefore, further research is necessary to effectively explore methods to gather informative and meaningful user feedback . Moreover, one of the problems faced is the unstructured nature of user reviews . , . The lack of structure in these reviews hinders the recognition of precise software requirements, as they often contain unclear and incomplete requirements. Another problem is the noise from the internet slang, shortcut grammar, and bad formatting . Additionally, del Sagrado et al. also stated the difficulty in selecting appropriate requirements for inclusion in the following app release. These INTRODUCTION In today's swiftly changing technology landscape, software evolution and maintenance have emerged as integral components of software engineering activities . to ensure the software remains functional and cost-effective . Software evolution involves periodic updates to enhance features or eliminate unused functions. Meanwhile, software maintenance involves transforming, modifying, and updating software to meet customer requirements. Furthermore, it is imperative to acknowledge that user feedback, mainly through mobile app reviews, is a rich source of information . Ae. , as they contain user requirements, opinions, and expectations regarding app usage, including dislikes, requests for new features, and bug reports . , . Therefore, it plays a crucial role in shaping the future of software evolution and maintenance. However, extracting and analyzing user requirements from user reviews is challenging due to the large volume. Eligibility Criteria Initially, we selected the studies based on their titles. After removing duplicate papers, we applied the following inclusion criteria (IC. and exclusion criteria (EC. factors often raise doubts about the relevance and validity of the information extracted from the reviews. Another problem and challenge in user reviews is the varying quality of user reviews . , . , which can make the identification of helpful user feedback a very challenging task. The lack of uniform terminology can obstruct the development of practical software requirements. Therefore, this study addresses the research problem of how to classify and prioritize software requirements from user reviews. Table I below shows our study's comparison aspects that are different from the existing literature. Hence, this study aims to investigate, analyze, and summarize the methods of requirement classification and prioritization techniques derived from user reviews. achieve this aim, we have crafted five Research Questions (RQ. to delve into the critical analysis of recently conducted research on classification and prioritization tasks based on eliciting user requirements from user reviews: C RQ1: What are the key challenges of incorporating user reviews for requirement classification and prioritization in software development? C RQ2: What techniques are commonly used in research studies to enhance the understanding of user needs with short text inputs, aiming to improve the technique of user requirement classification and prioritization? C RQ3: What are the predominant classifications of user reviews recommended in the preceding study? C RQ4: How did the previous scholars classify user C RQ5: What is a proper machine learning technique to be leveraged to enhance the accuracy of requirement classification and improve the requirement prioritization process in software development This study's method followed Kitchenham et al. By providing a comprehensive review, the study contributes to the field of software engineering and to the research and practice of user review processing in software requirements engineering. This study also proposes a matrix user review of the taxonomy based on the definition and topic. Besides, this study proposes a novel pattern involving a 4-tuple structure that consists of an elicitation technique, requirement classification, additional factors, and priority value for each requirement. The structure of this paper is divided into the following Section II introduces the materials and method. Section i reports the results and discussion. Furthermore. Section IV will present conclusions and potential future work. Inclusion criteria: C IC1 Articles published between 2015 and 2023. C IC2 Articles are written in English. C IC3 Articles published in computer science. C IC4 Type of articles in the conference or journal format. C IC4Articles related to user review-based requirement C IC5 Articles available in digital format . Exclusion criteria: C EC1 The paper is not peer-reviewed, book, review, or in press. C EC2 Articles that are not relevant to user review-based requirement engineering. C EC3 Articles that are duplicates or have substantial overlaps with other articles. C The EC4 website has no author and cannot be verified. Information Sources This study's data sources are gathered from four online digital libraries: ScienceDirect. Ie Xplore. Scopus, and Web of Science. Search Strategy The search strategy was meticulously designed following the SEGRESS guidelines . to thoroughly review the literature on the classification and prioritization of user review-based software requirements. The strategy aimed to ensure an exhaustive collection of relevant studies, minimizing the risk of missing pertinent literature. The process was structured according to best practices outlined by . , emphasizing transparency and replicability. The search was executed across four major academic databases: ScienceDirect. Ie Xplore. Scopus, and Web of Science. These databases were selected for their extensive coverage of software engineering and related disciplines, ensuring access to a wide range of peer-reviewed articles, conference papers, and other relevant publications. A comprehensive search string was constructed to identify studies on the elicitation, prioritization, and classification of software requirements derived from user reviews. The search string was carefully crafted using Boolean operators "AND" and "OR" to combine key terms and their synonyms, allowing for flexibility and inclusiveness in the search process. For instance, a representative search string used in the databases II. MATERIALS AND METHODS For our review, we followed the guidelines of Kitchenham et al. The selection of Kitchenham et al. 's method for this review is primarily due to its alignment with the specific needs of software engineering research, as discussed in the SEGRESS guidelines . , while PRISMA was initially developed for the medical and healthcare fields. Besides that. KitchenhamAos approach offers tailored guidelines that better accommodate the diverse nature of software engineering studies, including mapping studies. This makes it more suitable for ensuring comprehensive and contextually relevant systematic literature reviews in this field. TITLE-ABS-KEY. equirement AND . licitation OR gatherin. AND . lassification OR classify OR groupin. AND . rioritization OR ranking OR weighting OR sortin. This search string reflects the topic's multifaceted nature, ensuring that studies covering various requirement elicitation and prioritization aspects were captured. The search terms were derived from an extensive preliminary literature review and iteratively refined to optimize search results. Additionally, alternative search terms were considered and included in the search strategy to capture studies that may use different terminology. For instance, synonyms for "prioritization," such as "ranking" or "weighting," were included to ensure comprehensive coverage. This approach aligns with the guidance provided by . on enhancing the breadth and depth of search processes in systematic reviews. The search process guidelines suggest presenting the full electronic search strategy for at least one database, as shown in Table 5 . This practice was followed to allow for the replication of the search process and to enhance the transparency of the review. Regarding search limitations, the search was restricted to English-language publications due to the reviewers' linguistic constraints and the predominance of English in academic publications in software engineering. restrictions were placed on publication dates to capture foundational and recent studies. TABLE I OUR STUDY VS. EXISTING LITERATURE Aspect Challenges in Incorporating User Reviews Our Literature Study Identifies specific challenges in using user reviews for requirement classification and Techniques to Enhance Understanding of User Needs Investigates techniques that improve understanding of user needs through short text inputs. Predominant Classifications of User Reviews Seeks to identify major classifications used in previous Classification Methods in Previous Studies Aims to understand the classification methods used by previous scholars. Machine Learning Techniques for Enhancing Accuracy Investigates suitable machine learning techniques to enhance the accuracy of requirement classification and prioritization. Existing Studies . : Discuss challenges in opinion mining from mobile app store reviews but not specifically for requirement classification and prioritization. : Highlight challenges developers face with app store feedback but focus more on overall software engineering practices than just . : Examine various automated tools and technologies for requirements elicitation but do not focus on short text inputs. : Provide a comprehensive review of deep learning models for text classification, including sentiment analysis but not specifically tied to requirements classification. : Classify non-functional requirements but do not specifically address user reviews. : Use zero-shot learning for requirements classification but focus on functional vs. non-functional and security requirements. : Review requirement prioritization techniques and their empirical evaluations but do not focus on classification methods. : Analyze various requirements prioritization techniques but mainly discuss prioritization rather than classification methods. : Propose a semi-automated requirements prioritization method using RankBoost and weighted PageRank. : Review test case prioritization using genetic algorithms, highlighting the potential of machine learning but in the context of testing rather than requirements classification. Screening: In this step, the researcher removed duplicate articles using the duplicate function in Mendeley desktop repositories. Initially, 4731 publications were excluded, leaving 531 papers for further review based on specific inclusion and exclusion criteria. In total, 22 publications were rejected due to duplication. Eligibility: In this step, 509 articles were prepared for During this stage, the researcher used Python to implement a high-level for screening keywords and article titles and a manual screening to ensure met the inclusion criteria and aligned with the current research objectives. Consequently, 376 data/papers/articles were excluded as they did not qualify due to the out-of-field, title not significantly, abstract not related to the study's objective, and no full-text A total of 133 articles were selected . ee Fig. Inclusion: In this step, the researcher included the final set of studies in the systematic review and meta-analysis. The researcher extracted data from the studies, synthesized the results, and reported the findings. Selection Process The researcher carefully read the titles and abstracts of each article to check if they were relevant. Then, they read the entire article to confirm their validity and find the needed This process ensured that the results contributed to evidence-based research. Most recent studies employed the Systematic Reviews and Meta-Analyses (PRISMA) method, like those cited in . Ae. , among others. The detailed explanation of how the PRISMA technique was used in these studies is as follows: Identification: In this step, the researcher searched for relevant studies in different databases, such as Ie Xplore. Scopus. ScienceDirect, and Web of Science. The researcher used specific and synonym keywords to define the search This initial phase of the systematic review resulted in 5262 publications related to the study topic from the four Fig. 1 Flow diagram of search study Traditional Natural Language Processing (NLP) models have difficulty dealing with the informal nature of the language users use, which requires a deep understanding of the true meaning behind their words. According to . , informal expressions such as slang and abbreviations complicate the analysis process. Understanding and addressing these issues in user feedback analysis demands the development of a sophisticated natural language processing model. RESULTS AND DISCUSSION This section presents our findings for the research questions (RQ. defined in Section I. Challenges in incorporating user reviews (RQ. The analysis revealed several challenges and limitations when dealing with user reviews, such as informal forms, unstructured formats, noise, short text, morphological complexities, and dialects, as shown in Table II. It is important to note that the sixth element, dialects, is only present in user reviews that are not in English. Unstructured format: This irregularity complicates the analysis process, especially when identifying valuable information . According to . , unstructured text is generally more complicated to process and analyze. address this problem, we need innovative techniques to understand and extract meaningful information from unstructured feedback. This will help make the analysis and interpretation of feedback data more efficient. TABLE II CHALLENGES IN USER REVIEW Challenges in user review Informal form Unstructured Noisy word Short text Morphological Dialect Non-English Review (Author. , . , . English Review (Author. , . , . Ae. Ae. Ae. , . , . , . Noise: According to . , short texts usually have fewer characters but more noise, affecting classification To address this issue, integrating external knowledge with the model helps to understand short texts better and learn additional information . Another approach involves reducing noise by removing unnecessary characters like hashtags. URLs, numbers, punctuation marks, and other symbols . This method improves data clarity, similar to Yang et al. , removing noise before applying collocation and part-of-speech (POS) techniques to filter out . Ae. Informal form: Analysis of user feedback often faces challenges when feedback is received in an informal form. This makes it difficult to obtain meaningful information. meaningless phrases. Both approaches highlight importance of noise reduction in data preprocessing. Techniques to Enhance Understanding of User Needs (RQ. User reviews represent an important, valuable source . of end-user feedback, aiding developers in identifying, categorizing, and prioritizing their software development Yet, user reviews typically form a short, informal text with a noisy tone, presenting challenges in Consequently, previous researchers have suggested and implemented diverse techniques to improve the comprehension of user needs from such concise texts. These techniques include: Short text: Users often write short . and incomplete reviews, making analysis difficult. Short texts frequently do not follow proper syntax and typically contain colloquial terms . LOL, pwd, etc. ), phonetic spellings, and other new words . As a result, traditional NLP methods cannot be easily applied because of the short text length, leading to the problem of sparse features . Short texts can often be ambiguous . , . because they frequently contain words or phrases with multiple meanings. Some words have several distinct meanings, and some different words can mean the same thing. This ambiguity makes it difficult to determine the intended meaning of a short text . Therefore, special techniques such as classification and association rule mining are needed for analyzing short texts . Opinion mining: This technique aims to extract relevant central information such as opinions, sentiments, emotions, or attitudes from comments and map them to software requirements. For example, . employed opinion mining to identify relevant parts of comments and associate them with both functional and non-functional requirements. Similarly, . emphasizes applying opinion-mining techniques to extract valuable insights from user reviews, which can inform software evolution and future research . Morphological: Morphology is the study of the internal structure of words, including how words are formed and their relationship to other words in the same language. The form of a word can changes depending on the context, which makes it difficult for NLP systems to process. This involves analyzing root words, prefixes, and suffixes . The complexity of a language's morphology can cause problems for part-of-speech tagging because root words can transform into thousands of different forms, leading to data scattering issues, as noted in the context of the Arabic language . Additionally, a language's morphological complexities and various dialects make semantic analysis particularly challenging . Sentiment analysis: This technique calculates the polarity and subjectivity of user reviews and identifies whether the corpus expresses positive, negative, or neutral For instance, . used sentiment analysis to assign quantitative values to user reviews based on their polarity and subjectivity scores. Natural Language Processing (NLP) is the most widely utilized technique in every study, and it involves processing and understanding natural language texts using various methods. Syntactic parsing, semantic analysis, word embeddings, or transformer models are the techniques that are being used in NLP. For example. Panichella et al. used the Stanford Typed Dependencies parser to represent the grammatical relations between words in sentences and extract features for classification. Hua et al. used knowledgeintensive approaches based on lexical-semantic analysis to improve the accuracy of short-text understanding. Dialects: Users often use dialects and slang rather than formal language in their social media communications . This can be challenging for NLP systems when analyzing user For example, . implemented a processing unit specifically for Arabic posts. This unit was designed to improve the data quality by removing redundant content, repeated posts, and irrelevant information like timestamps and 'likes'. This approach to data preprocessing is essential to ensure the accuracy of the following analyses. TABLE i NOVEL MATRIX TAXONOMY FOR USER REVIEW Taxonomy Information Feature Definition Efforts to acquire knowledge or assistance from other developers . , users, or the software provider . Expressions of ideas, suggestions, or needs for enhancements, whether they pertain to the product, services, or their functionalities . or specifically to the software or product functionality . , are essentially requests for improvement . Problem Information Statements that define issues and unexpected behaviors . , or those expressing dissatisfaction or describing anomalies with the software or product . , indicate potential improvement areas. Communications that update other developers on planned updates . or those that express satisfaction or inform other users or sellers about product functionality . play a crucial role in the collaborative development process. Topic Question Feature request, content request, promise, improvement request, idea, suggestion, and shortcoming Bug report, issues, dissatisfaction, emotion Feature information. Satisfaction, emotion ways, including bug reports, functional and non-functional . , . Ae. , aspects, usability, user experience . , and ratings . This study created a new matrix taxonomy by combining definitions from different studies and topics. Its purpose is to categorize user reviews into four main types: Predominant Classifications of User Reviews (RQ. User reviews provide valuable feedback from end-users, helping developers enhance their software products. Previous studies have suggested classifying user reviews in various information seeking, feature request, problem discovery, and information giving, as detailed in Table i. et al. used user ideas in their study for classification, while Asadabadi et al. used text-mining techniques to extract features for their classification. Classification Methods in Previous Studies (RQ. User reviews represent a precious source of end-user feedback, aiding developers in enhancing their software However, user reviews are not homogeneous. Depending on the review and why it was written, they can differ in their types and aspects. Therefore, previous studies have proposed different ways to classify user reviews, such . Novel approaches: Prior researchers have studied and proposed novel approaches to user review classification, such as assigning a unique identifier to each requirement, using a hierarchical structure to organize requirements, or using clustering techniques to group similar requirements (GarciaLopez et al. For example. Cai et al. proposed a unique identifier approach. explored a hierarchical structure approach and studies on a clustering technique approach. Manual classification: some researchers have manually labeled user reviews based on predefined categories, such as bug reports, non-functional requirements, usability, user experience, or ratings. Most researchers (. , . , . ) used manual classification for their studies. Similarity values: Some researchers have used similarity values as extracted features to classify user reviews based on their relevance or importance to the software development process. Raharjana et al. explored and utilized similarity values to classify user reviews based on their polarity and subjectivity. Moreover, using the stacking classier ensemble strategy can also improve classification accuracy by up to 89 percent Sai et al. For the same reason . , the experiment showed that combining BERT with CNN. BERT with RNN, and BERT with BiLSTM resulted in good performance, especially regarding accuracy. Combining different methods within one model can enhance classification accuracy . ee Table IV). Pre-labeled datasets: Meanwhile, some researchers have used existing datasets labeled by other sources, such as app developers or third-party platforms, to train and evaluate their classification models. For instance, . used pre-labeled datasets for their studies. Specific aspects: Prior researchers have focused on specific aspects or attention mechanisms of user reviews, such as user ideas or text mining techniques, and used them as the basis for their classification methods. For instance. Wouters TABLE IV OVERVIEW OF METHOD(S) USED BY VARIOUS AUTHORS Classification Methods Author. CNN LSTM Word2Vec FastText BERT ANN RNN SVM Nayve Bayes BDT Elhassan et al. Sharma et al. Sai et al. Al-Buraihy et al. Agathangelou & Katakis . Yucel et al. Mandhasiya et al. Qureshi et al. Alturayeif et al. Note: LR: Logistic Regression. RF: Random Forest. BDT: boosted decision trees. Evaluate the different classification and prioritization models: performance and accuracy. Some of the standard machine learning techniques that have been proposed and used for requirement classification and prioritization are: Machine Learning Techniques for Enhancing Accuracy (RQ. Machine learning techniques use data and algorithms to find patterns and make predictions or decisions based on the These methods can be applied for various purposes, such as classification and prioritization. Machine learning . Naive Bayes: This technique is based on applying BayesAo theorem, which calculates the probability of a class given a set of features. Naive Bayes assumes that the features are independent of the class. It is simple, fast, and effective for text classification tasks. For example. Maalej and Nabil . used Naive Bayes to classify user reviews into bug reports, non-functional requirements, usability, user experience, and . Automatically analyze user reviews . and extract the central information: user needs, preferences, sentiments, ratings, bug reports, or feature requests. Categorize the user reviews into several types of requirements classification: functional or non-functional, usability or user experience, enhancement, or new features. Support Vector Machine (SVM): This technique is based on finding a hyperplane that separates the data points into different classes with the maximum margin. SVM can handle linear and non-linear classification problems using . Assign priority values or ranks to user reviews: it is based on their importance, urgency, feasibility, or customer different kernel functions. SVM is robust . , . , accurate . , and efficient for text classification tasks. For example. Binkhonain and Zhao . used SVM to classify user reviews based on non-functional requirements and Discussion Requirement prioritization assigns importance or urgent values to user requirements and ranks them according to their relative significance for software development. For instance. Hujainah et al. used a Binary Search Tree (BST) to rank requirements, while Aziz et al. utilized the Kano model to identify which requirements satisfy the customers the most. Similarly, . utilized the Kano model to gather requirements and prioritize service improvements. Bhatia & Sharma . proposed using ANOVA f-value to rank a feature, and Asadabadi et al. suggested weighting importance according to the time of review posting. Neural Networks: This technique simulates the structure and function of biological neurons in a network. Neural networks can learn complex and non-linear relationships between inputs and outputs using multiple layers of neurons with different activation functions. They are robust, flexible, and scalable for text classification tasks . Ae. For example. Bhatia and Sharma . used neural networks to select the top k features for model training and evaluation. Fig. 2 Essential elements required for effective requirement prioritization Fig. 3 Example tuple implemented in research TABLE V ESSENTIAL ELEMENTS REQUIRED FOR REQUIREMENT PRIORITIZATION Tuple Elicitation technique Requirement classification Additional factor Higher range priority value Description Requirements elicitation involves obtaining information from stakeholders to understand their needs and expectations. Standard techniques include stakeholder analysis . dentifying impacted partie. and brainstorming . enerating idea. Other methods include interviews, surveys, and user reviews. Each technique has advantages and disadvantages, depending on the project context. involves grouping user requirements into different categories based on their characteristics, such as functional, non-functional, usability, user experience, bugs, enhancement, or new features. Developers can analyze and manage requirements better using this approach. Factors like ratings, sentiment, and attention mechanisms can influence requirement prioritization. These helps measure importance and assign priority values to user requirements. A higher range priority values represent the urgency of user requirements. Various methods, such as Best Worst Scaling. KendallAos W, or Okapi BM25, can be used. These values help rank requirements and select the most valuable ones for the next release. Through the literature search, the researcher found a few common elements that need to be considered to develop effective requirement prioritization or any research regarding These multifaceted elements have been collectively defined as a tuple within the context of this study. The tuples are . elicitation technique, . requirement classification, . additional factor, and . higher range priority value. These four tuples are derived from synthesis readings . ee Table VI) and revealed a novel tuple pattern in prioritization to perform effective requirement prioritization. Table V explains each tuple and needs to be considered as an essential element, as presented in Fig. 2 and Fig. This requirement prioritization can help developers select the most valuable and feasible requirements for the next release and allocate resources and time accordingly. TABLE VI EXTRACTION AND MATCH TUPLE PATTERN Year Extract and match tuple pattern [ {}] . ] [] [] [ {}] . ] . ] [WSM metho. ocial network. umber of likes. number of shares. emotions expressed. }] [] . umber of likes. ] [] ocus group. }] [N/A] [] [Kendall's Correlation Coefficient. ocus group. }] . non-urgency. ] [] [IFS. Weighted Page Rank Algorithm. obile apps store. ser review. }] . not critical. ] . ] . redict the number of votes a negative review. nline review. }] . eature cluster. ] . eighted method. ocial network. }] . unctional requirement. non-functional ] [] . equirement priority matrix. se case. {}] . equirement sentence. ] [] [AHP metho. ser story. }] . untional requirement. non-functional requirement. ] [] [Triangular Fuzzy Numbers. Alpha Cut approach. Weighted Average (WA). [ {}] . ] . ime of reviews. review usefulness. ] [] IV. CONCLUSION Requirement prioritization is essential in software It enables developers to effectively identify and rank user requirements based on their significance, facilitating informed decision-making regarding resource allocation and project planning. The study highlights four critical elementsAielicitation technique, requirement classification, additional factors, and higher range priority valueAithat collectively enhance the prioritization process. The study also underscores the critical importance of extracting software requirements from user reviews, highlighting the challenges posed by such feedback's informal and unstructured nature. Proposing a new taxonomy for categorizing user reviews and advocating for advanced machine learning techniques paves the way for future research to enhance the adaptability and efficiency of requirements engineering processes. ACKNOWLEDGMENT We thank the Faculty of Computer Science and Information Technology (FCSIT) of the University Putra Malaysia (UPM) for financial assistance. REFERENCES