1133 | Studies in English Language and Education, 12. , 1133-1152, 2025 Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning P-ISSN 2355-2794 E-ISSN 2461-0275 Joshua Hicks * Rina Marnita Oktavianus Oktavianus Department of Linguistics. Faculty of Humanities. Universitas Andalas. Padang 25163. INDONESIA Abstract This applied psycholinguistics study explores whether multimodal flashcards . ontaining text, audio, and a pictur. are more effective than monomodal flashcards . ontaining text onl. as a tool for learning the meanings of novel second-language (L. concrete nouns. The research instrument was Anki, a flashcard application that utilises active recall and spaced repetition. The study used a within-subject design, where each participant . = . studied a total of 30 L2-L1 (EsperantoAeIndonesia. word pairs over the course of seven study sessions utilising an assortment of 15 multimodal and 15 monomodal flashcards, with each word pair being presented multimodally to approximately half of the participants and monomodally to the other When . viewing the answer side of a card, participants were instructed to tap AoGoodAo if they recalled the answer correctly or AoAgainAo if not. Recall accuracy data for the two card types were collected and then analysed using a Wilcoxon signedrank test, which indicated that the number of user-initiated reviews (AoAgainAo count, which is indicative of the number of memory lapse. was significantly higher for monomodal flashcards (Mdn = 61, n = . than for multimodal flashcards (Mdn = 50, n = . Z = -3. 4, p < 0. 001, r = -0. These results support the hypothesis that multimodal flashcards are more effective than monomodal flashcards as a tool for learning the meanings of L2 concrete nouns. By implication, language learners can enhance their recall accuracy of L2 concrete nouns by creating and using flashcards that utilise multiple semantically congruent modes. Keywords: Dual-encoding, multimodal, multisensory, recall, vocabulary. * Corresponding author, email: joshua. academia@icloud. Citation in APA style: Hicks. Marnita. , & Oktavianus. Comparing the effectiveness of multimodal vs monomodal digital flashcards for L2 vocabulary learning. Studies in English Language and Education, 12. , 11331152. Received June 26, 2024. Revised October 30, 2024. Accepted August 6, 2025. Published Online September 30, 2025 https://doi. org/10. 24815/siele. Copyright A 2025 by Authors, published by Studies in English Language and Education. This is an open-access article Creative Commons Attribution International License . ttps://creativecommons. org/licenses/by/4. Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1134 INTRODUCTION Digital flashcards have been shown to be an effective tool for L2 vocabulary learning in a range of studies (Bakla & yNekiy, 2017. Mujahidah et al. , 2024. Nguyen, 2. However, these studies do not test the effectiveness of different types of flashcard design, and while some studies have demonstrated that multimodal . r multimedia or multisensor. learning can be more effective than monomodal learning (Chun & Plass, 1996. Mayer et al. , 2015. Mayer. Moran et al. , 2013. Okray et al. , 2023. Shams & Seitz, 2008. Thelen & Murray, 2. , there are no studies that specifically test the effectiveness of using multimodal versus monomodal digital flashcards for L2 vocabulary learning (Google Scholar as on 24-10-2. Flashcard applications such as Anki offer a myriad of design possibilities. users are not limited to text only, but they may create multimodal flashcards that include text, images, audio, and video. The inclusion of pictures and audio in addition to text is often recommended and practised by those who use Anki for L2 vocabulary learning (Refold, n. This practice can be supported by cognitive theories such as Dual Coding Theory (Clark & Paivio, 1. Cognitive Theory of Multimedia Learning (R. Mayer, 2. , and by applying insights gleaned from studies in multisensory learning to L2 vocabulary learning . ee Moran et al. , 2013. Okray et al. , 2023. Shams & Seitz, 2. However, it should be acknowledged that neither the application of theory to practice nor the application of pedagogical principles gleaned from one kind of education . onolingual instructio. to another kind of education . econd language learnin. will necessarily result in the intended enhancement of learning. Therefore, this study aims to empirically test one particular application of theory to practice, namely, to compare the effectiveness of multimodal and monomodal Anki flashcards as a tool for learning the meanings of L2 concrete nouns. The use of the free flashcard application Anki as a research instrument means that the research design reflects a real-world L2 vocabulary learning method that may be freely replicated by learners, teachers, and researchers alike. The research questions of this study are as follows: Are multimodal flashcards . ontaining text, audio, and a pictur. significantly more effective than monomodal flashcards . ontaining text onl. as a tool for learning the meanings of L2 concrete nouns . resulting in significantly higher recall accurac. ? Does learning L2 vocabulary multimodally result in better recall accuracy, even in response to monomodal . ext-onl. test cues? If multimodal learning is shown to be more effective than monomodal learning in this study, why is this the case? Based on Dual Coding Theory and supporting evidence, it was predicted that multimodal flashcards would result in significantly higher recall accuracy of L2 vocabulary compared to monomodal flashcards. To test this prediction, the present study was designed to test the null hypothesis (H. and alternative hypothesis (H. Empirical data were collected, and the statistical hypothesis was tested by performing a one-tailed Wilcoxon signed-rank test on the raw data. Null hypothesis (H. Alternative hypothesis (H. : Multimodal flashcards are not significantly more effective than monomodal flashcards as a tool for learning the meanings of L2 concrete nouns. : Multimodal flashcards are significantly more effective than monomodal flashcards as a tool for learning the meanings of L2 concrete nouns. LITERATURE REVIEW Multimodal vs. Monomodal L2 Vocabulary Learning Lin and Yu . 7, p. compared the effectiveness of monomodal . ext onl. and multimodal . ext audio pictur. presentation types for English vocabulary learning via 1135 | Studies in English Language and Education, 12. , 1133-1152, 2025 multimedia message (MMS). An analysis of recall accuracy data from an immediate post-test showed no significant effect of presentation type. However, an analysis of recall accuracy data from a delayed post-test . wo weeks after vocabulary learnin. indicated that recall accuracy was significantly higher for vocabulary that had been presented multimodally compared to vocabulary that had been presented monomodally. Similarly, in a study by K. Mayer et al. , an analysis of immediate post-test data showed no significant difference in the recall accuracy of vocabulary that had been learned monomodally . udio-onl. and multimodally . udio picture. audio gestur. However, analyses of results from delayed post-tests . wo months and six months after learnin. indicated that recall accuracy was significantly higher for multimodally learned words than for monomodally learned words. The results of these studies suggest that the benefits of multimodal vocabulary learning are best observed in a delayed post-test, not in an immediate post-test. One possible explanation is that the advantage of multimodal learning over monomodal learning may only become apparent once learning has been sufficiently consolidated, e. , through repeated spaced retrieval. The consolidation of multimodal learning would mean the formation and strengthening of an interconnected network of mental representations corresponding to the multiple modes used, enabling retrieval to operate on a richer, more informative network of representations, thus improving recall accuracy . ee Moran et al. , 2. In the current study. Anki study sessions provide learners with a built-in opportunity for repeated spaced retrieval . ctive recall testin. , which consolidates learning. In addition, rather than using an immediate post-test, recall accuracy data from the entire study phase . even study session. were collected and analysed, followed by data from monomodal . ext-onl. delayed post-tests. Learning L2 Vocabulary from Pictures vs. L1 Translations Carpenter and Olson . explored whether novel L2 concrete nouns are learned better by being paired with pictures or L1 translations. Carpenter and OlsonAos . 2, p. first experiment replicated the pattern of results reported by Lotto and de Groot . in that there was no advantage in cued recall of L2 words from pictures compared with L1 translations. However, when they asked the participants to verbally free recall in L1 the pictures presented vs. the L1 translations, the picture superiority effect was present in that participants were able to recall more pictures than the L1 translations. Therefore. Carpenter and Olson . 2, p. concluded that the picture itself had been sufficiently encoded but that participants had failed to establish a sufficient association between the picture and the L2 word. It stands to reason that once a sufficient association between the picture and the L2 word has been established, recall accuracy of L2 words learnt from pictures may be greater than recall accuracy of L2 words learnt from L1 translations. This is evident in Carpenter and OlsonAos . 2, . second experiment, which involved three tests with immediate feedback. these tests would have served as additional opportunities for spaced retrieval, strengthening the association between the L2 vocabulary item and the picture or L1 translation. As in Experiment one, no significant advantage emerged for pictureAeL2 pairs over L1 translationAeL2 pairs in Test 1. however, this advantage was apparent in Tests 2 and 3 . s > 3. 23, ps < . , and a repeated-measures ANOVA revealed that this interaction was significant by participants as well as by items. In the current study, pictures are included on multimodal cards in addition to L1 translations since these two modes can enhance and clarify each other. Additionally, repeated spaced retrieval is integrated into Anki study sessions to help participants establish a sufficient association between the L2 word and other information on the card . , picture. L2 audio, and L1 wor. This consolidation of learning through repeated spaced repetition is an important part of the study since the results of other studies suggest that the advantage of multimodal L2 vocabulary learning can only be observed once learning has been sufficiently consolidated. Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1136 The More Modes the Merrier? In a study by Li et al. , participants presented with two verbal modes . ext audi. performed better than those presented with four modes . ext audio picture vide. in the two post-tests. This suggests that presenting more modes does not necessarily improve learning Li et al. noted that a possible explanation for these results is that the four-mode presentation slides forced participants to handle additional visual information within a limited time, which increased their cognitive load, negatively impacting learning outcomes. This explanation draws on the Limited Capacity Assumption of MayerAos Cognitive Theory of Multimedia Learning (CTML), which states that each channel in the human cognitive system has limited processing capacity. as a consequence, presenting too much visual information at once can overload the visual-pictorial channel and presenting too auditory information at once can overload the auditory-verbal channel (R. Mayer, 2. A major contribution of MayerAos Cognitive Theory of Multimedia Learning (CTML) is that Mayer takes into account John SwellerAos Cognitive Load Theory when investigating the optimal conditions in which dual-coding can occur, and he developed principles to guide educators in the most effective use of multimedia for learning (R. Mayer, 2. On the one hand, presenting in multiple modes has the potential to facilitate dual-encoding, which benefits recall. However, presenting in multiple modes can result in cognitive overload, hindering encoding and negatively impacting recall. Multimodal teaching and learning should therefore be done in a way that both maximises the chances of dual-encoding while managing the risk of cognitive overload by reducing unnecessary cognitive load. To this end. Mayer developed several principles of multimedia design. These principles were designed for and applied to the use of explanatory animations . ords moving picture. , but they are also relevant for multimedia L2 vocabulary teaching and learning. MayerAos principles of multimedia design have informed the design of the multimodal flashcards used in this study. Dual Coding Theory According to PaivioAos Dual Coding Theory (DCT), the mind uses two distinct types of mental representation or AucodeAy. verbal representations in the Verbal System (V) correspond to linguistic stimuli, and non-verbal representations . in the Image System (I) correspond to non-linguistic stimuli (Clark & Paivio, 1991, p. Paivio & Csapo, 1973, p. While monolinguals have one verbal system (V) and one image system (I), bilinguals or language learners have two verbal systems (V1 and V. corresponding to two languages (L1 and L. plus one shared image system (I) (Paivio & Desrochers, 1980, pp. The independence and partial interconnectedness of each system means that one code can be transformed into another, meaning Ae for example Ae that pictures can be named, and words can evoke nonverbal images (Paivio & Csapo, 1973, p. In the context of second-language learning, this means that L2 words can be pictured . sing V2IeI) or translated into L1 . sing V2IeV. In addition, if one representation or connection within a representational network decays . , becomes unviabl. , the independence and partial interconnectedness of each symbolic system means that the rest of the network remains functional and may even be able to retrieve the required information by means of other representations and connections in the For example, if a V2AeV1 connection is unviable, the image system can provide a means of indirect access from one language to another, enabling a person to translate from L2 to L1 by means of the image system (V2 Ie I Ie V. (Paivio & Desrochers, 1980, p. According to DCT, using both verbal stimuli . , auditory word. and non-verbal stimuli . , picture. in teaching and learning facilitates the building of connections between the verbal and nonverbal systems . , dual-encodin. , resulting in a larger number of possible retrieval routes, which can have an additive effect on recall (Paivio & Csapo, 1973, p. A AoVerbal OnlyAo monomodal learning method (L2 L. facilitates the formation of V2AeV1 connections only (, lef. , whereas a AoThree SystemAo multimodal learning method (L2 Picture L. facilitates the formation, activation, and consolidation of connections between all three systems, resulting in a larger 1137 | Studies in English Language and Education, 12. , 1133-1152, 2025 number of possible retrieval routes, which can have an additive effect on recall (, righ. The key implication for L2 vocabulary teaching and learning is summed up in what Nation . 3, p. calls the dual-encoding principle. having both linguistic and non-linguistic . , pictoria. associations for a word aids word retention. Figure 1. Between-system connections resulting from AoVerbal onlyAo monomodal language learning . and AoThree SystemAo multimodal language learning . ased on Paivio & Desrochers, 1980, p. In addition to between-system connections, connections also exist between subsystems that correspond to different sensory modalities . , visual, auditory, moto. (Clark & Paivio, 1991. These subsystems are capable of functioning more-or-less independently of one another, as evidenced by the selective effects of focal brain injuries which might impair one subsystem while leaving others functionally intact (Paivio, 1986, p. For example, visual memory of the shapes of words . tored in the verbal-visual subsyste. may be impaired in an adult with brain injury, while motor memory of the shapes of words . he verbal-motor subsyste. remains unimpaired, allowing the patient to decode the meaning of words by tracing letters with his finger . ee Carreker & Birsh, 2018, p. Since each subsystem is more-or-less independent. Paivio and CsapoAos . 3, p. claim that Authe two codes can have additive effects on recallAy may be expanded to suggest that, in addition to this, interconnections between subsystems can also have an additive effect on recall. The implication of this for the current study is that the inclusion of both visual and auditory words . acilitating the formation, activation and consolidation of connections between verbal-visual and verbal-auditory subsystem. may have an additive effect on recall. METHODS The present study employed a quantitative, within-subjects design to investigate the effectiveness of multimodal flashcards . ontaining text, audio, and a pictur. compared to monomodal flashcards . ontaining text onl. for learning second-language (L. The Esperanto language was selected as the target language to ensure all participants began with no prior knowledge of the target vocabulary. Recall accuracy data was collected during two distinct phases: the Study Phase . cross seven session. and the Test Phase . hich includes three delayed post-test. Data collection was performed remotely via the Anki application, which tracked participantsAo recall accuracy. The subsequent subsections detail the research design, including the J. Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1138 participant recruitment process, the technique of data collection and analysis, and the research Participants A total of 38 participants were initially recruited to participate in the study. of these, 25 participants completed the study by completing seven study sessions with built-in active recall testing . he study phas. followed by three delayed post-tests . he test phas. Meanwhile, 13 recruited participants failed to complete the study and therefore their data were excluded from The eligibility criteria for participation in the study were as follows: Nationality: Indonesian Language: Can speak Indonesian Experience: Has never studied the Esperanto language Age: Ou17 years old Device: Owns an Android phone that can install AnkiDroid These criteria also describe the target population of the study. The accessible population was, however, much more specific. The researcher had access to two participant pools, namely: First-semester English Literature students at Andalas University. Padang. Indonesia. Members of Sunset English Club and their friends . free weekly club at 1 Nusantara Cafe where people can informally learn English and practice speaking English with other. The sampling method used can be described as convenience sampling, one type of nonprobability sampling in which participants are recruited based on their availability and willingness to participate (Suen et al. , 2. Of the final group of 25 participants, 10 were from the first participant pool and 15 from the second participant pool. The following variation was present in the sample: First Language . r mother tongu. : Bahasa Minangkabau . Bahasa Indonesia . Occupation: Students . , working . Age range: 17Ae41 years old Since many participants are bilingual, in this paper the abbreviation L1 . irst languag. is used in a non-technical sense to refer to a known language to which participants were exposed from childhood and in which participants are already fully communicatively competent . Indonesia. , regardless of whether the language was acquired AofirstAo or acquired simultaneously with another language. The variation present in the sample Ae including variation not measured, such as language learning ability and working memory capacity Ae was not expected to affect the outcome of the study since this study uses a within-subject design, i. , each participantAos performance was compared to his / her own on L2 words learned using monomodal flashcards and L2 words learned using multimodal flashcards. Technique of Data Collection and Analysis To answer research question one, participants . escribed in section 3. were asked to learn a total of 30 EsperantoAeIndonesian word pairs (Appendix B) over the course of seven study sessions, presented as an assortment of 15 multimodal and 15 monomodal Anki cards . ard outlines are shown in Figures 2 and 3, example cards shown in Figures 2 and 5, the Anki application is described in section 3. 4, and the Anki settings used are listed in Appendix A). During the Study Phase, participantsAo receptive retrieval . ecalling L1 in response to an L2 test cu. for multimodal and monomodal cards was repeatedly tested using multimodal test cues (L2 text and L2 audi. and monomodal test cues (L2 text onl. , respectively. Participants were instructed to attempt to recall the answer side of the card (L. when presented with the question side of a card (L2 test cu. , and then to tap AoReveal AnswerAo (Tampilkan Jawaba. to check their answer, build an 1139 | Studies in English Language and Education, 12. , 1133-1152, 2025 association between the different elements of the card, and provide feedback regarding their recall Participants were instructed to tap AoGoodAo (Bai. if they had recalled the answer correctly or AoAgainAo (Ulan. if not, which provided the recall accuracy data for analysis. ParticipantsAo AoAgainAo count for multimodal and monomodal cards was collected and analysed using a one-tailed Wilcoxon signed-rank test to determine whether multimodal cards . ontaining text, audio, and a pictur. are significantly more effective than monomodal flashcards . ontaining text onl. as a tool for learning the meanings of L2 concrete nouns, . resulting in significantly higher recall accurac. The Wilcoxon signed-rank test Ae a non-parametric paired test Ae was selected because a Shapiro-Wilk test indicated that the assumption of normality for the differences between the two dependent samples was violated in three of the four data sets analysed (Scheff. With a sample size of n=25, this violation makes a non-parametric paired test the appropriate choice. Figure 2. Multimodal flashcard outline. recall testing . , building associations . Figure 3. Monomodal flashcard outline. recall testing . , building association . Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1140 Figure 4. A multimodal Anki card for tondilo Ae gunting AoscissorsAo, question side for recall testing . , answer side for building associations . Figure 5. A monomodal Anki card for kapro Ae kambing AogoatAo. question side for recall testing . , answer side for building associations . During the Study Phase, cards marked AoAgainAo . did not recall correctl. were shown again after a short interval within the same study session . aiting in the Aolearning queueAo for that study sessio. , whereas cards marked AoGoodAo . recalled correctl. were scheduled for the next day unless the card was only on the first learning step . ee Figures 6 and . This repeated recall testing is key to AnkiAos effectiveness as a learning tool. Research has shown that it is primarily the number of test episodes . paced retrieval. , not the number of study episodes, that determines retention, as demonstrated by Karpicke and Roediger . , who found that increasing the number study episodes for learning foreign vocabulary words had little effect on retention . Ie 1141 | Studies in English Language and Education, 12. , 1133-1152, 2025 36%), whereas increasing the number of test episodes increased retention significantly . Ie 81%). Figure 6. A New of Learning Step 1 Card flow chart. Figure 7. A Due Review Card flow chart. The Study Phase consisted of seven study sessions. Sessions one to five introduced six new items per session. Each session presented items for review that had been learned in previous As a result of this design, sessions six and seven allowed the participants to consolidate their learning of all 30 items without any new words being introduced. The participants were instructed to complete one study session per day, meaning that the study phase would be completed over the course of seven days. However, some participants failed to complete one study session each day, so the participants completed the study sessions over the course of 7-14 days. To answer research question two, during a subsequent Test Phase, participantsAo recall accuracy in response to only monomodal test cues (L2 tex. was tested for all 30-word pairs in each of three delayed post-tests. In other words, monomodal Anki cards (Figure . were used to test participantsAo recall of all word pairs regardless of whether the word pair was initially learnt J. Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1142 using a monomodal or multimodal card in the Study Phase. Post-tests were carried out one, three, and seven days after the Study Phase. The use of increasingly larger intervals between post-tests in the Test Phase was intended to progressively increase the chances that participants would forget what they had learnt . Boros, n. This was intended to enable us to compare participantsAo retention of vocabulary learnt using monomodal and multimodal flashcards over time. As in the Study Phase, participants were asked to grade their recall with AoGoodAo or AoAgainAo depending on whether they recalled the meaning correctly or not. Unlike during the Study Phase, in each posttest each card was seen only once, even when the participant selected AoAgainAo. Recall accuracy data . he AoGoodAo coun. for cards that had been learnt in a multimodal and monomodal mode during the Study Phase were collected, and the data were analysed using a one-tailed Wilcoxon Signed-Rank test to determine whether learning L2 vocabulary multimodally can result in better recall accuracy even in response to monomodal . ext-onl. test cues. Lastly, and to answer research question three, the question of why multimodal learning can be more effective than monomodal learning was addressed by discussing the results in light of PaivioAos Dual Coding Theory and with reference to insights gleaned from studies in multisensory research . Research Design This study uses a quantitative, within-subjects design . Carpenter & Olson, 2. For each participant, half of the word pairs were presented multimodally and half monomodally. This within-subjects design controls for a possible difference in ability between participants (Jhangiani et al. , 2. The insertion order of new cards was random . ee Appendix A) to avoid order In addition, the combinations of word pairs and card types were counterbalanced across participants according to a Latin square design to control for possible variation in word pair difficulty (Jhangiani et al. , 2. In a Latin Square design, each treatment condition occurs in every column and row (Rayner & Livingston, 2023. Richardson, 2. as shown in Table 1, each word pair Ae divided into sublist 1 and sublist 2 . ee Appendix B) Ae was presented multimodally to . half the participants and monomodally to the other half. Latin square counterbalancing means that any overall difference in recall accuracy between the two conditions . ultimodal/monomodal card. cannot have been caused by a difference in the difficulty of vocabulary between sublists (Jhangiani et al. , 2. Table 1. Latin Square. Deck A . sed by Group A participants, n = . Deck B . sed by Group B participants, n = . Sublist 1 Multimodal Sublist 2 Monomodal Monomodal Multimodal An important part of the research design was to choose a target language that would be completely new to all research participants. The chosen target language to be learned by participants in this study was Esperanto, an artificial language constructed by L. Zamenhof in 1887 (The Editors of Encyclopaedia Britannica, n. Esperanto was chosen because Ae unlike the English language Ae it is very rare to find someone who has studied or been exposed to Esperanto in Padang. Indonesia. therefore, it would be easy to find participants with no lexical knowledge of Esperanto, which would eliminate the bias of certain participants having pre-existing knowledge of the target language, eliminate the need for a pre-test, and make it easy for the researcher to find novel . , previously not encountere. concrete nouns for participants to learn. Research Instrument The Anki application Ae a free, open-source application for creating and studying digital flashcards within a spaced repetition system Ae was used as a research instrument. The first author 1143 | Studies in English Language and Education, 12. , 1133-1152, 2025 . Aothe researcherA. used Anki for macOS, participants used AnkiDroid for Android, and data were synced between the participantsAo and the researcherAos devices via AnkiWeb (AnkiAos syncing servic. , enabling the him to upload Anki decks to participantsAo accounts and collect recall accuracy data remotely. The researcher created an AnkiWeb account for each participant Ae to be used exclusively for the experiment Ae and provided each participant with login details for their participant account. A list of 30 EsperantoAeIndonesian word pairs was compiled . ee Appendix B). All Esperanto words were concrete nouns of 2-3 syllables in length, denoting objects with which the participants were likely to be familiar . , a chai. The researcher endeavoured to balance the difficulty of L2 words in each sublist based on an analysis of the phonological complexity of each The Anki deck options used in the current study can be found in Appendix A. Audio for each Esperanto word was recorded by the researcher, added to multimodal flashcards, and set to play automatically. Pictures for each vocabulary item were sourced online, primarily from https://publicdomainvectors. org/en/. The researcher selected pictures that were easily recognisable, with minimal visual noise, and enough visual context to help participants recognise the pictured object. Each Anki card was tagged according to card type . ultimodal/monomoda. These tags did not appear to participants but were used by the researcher to filter each participantAos Anki statistics according to card type, so that recall accuracy data for all multimodal cards and all monomodal cards could be viewed separately and manually input into a spreadsheet. In the Anki deck used for the post-tests (Test Phas. , card tags . ultimodal/monomoda. indicated whether the word pair was learnt initially in a monomodal or a multimodal way during the Study Phase. RESULTS The Study Phase (Research Question On. The independent variable in this study is the flashcard type used . ultimodal or The dependent variable measured in the Study Phase is the AoAgainAo count . , the number of user-initiated review. for monomodal and multimodal cards. Since participants were instructed to tap the AoAgainAo button if they failed to correctly recall the answer for a card, the AoAgainAo count is indicative of the number of memory lapses. Thus, a higher AoAgainAo count . umber of user-initiated review. is indicative of lower recall accuracy, and a lower AoAgainAo count is indicative of higher recall accuracy. Median recall accuracy was higher for multimodally learnt items than for monomodally learnt items . ee Figure . Results of the Wilcoxon signed-rank test . ne-taile. indicated that there were significantly more user-initiated reviews . AoAgainAo coun. for monomodal flashcards (Mdn = 61, n = . than for multimodal flashcards (Mdn = 50, n = . Z = -3. 4, p < 0. 001, r = -0. Since the number of user-initiated reviews is indicative of the number of memory lapses, the results indicate that significantly more memory lapses occurred for L2 words that were learnt monomodally than for L2 words that were learnt multimodally. Therefore, the null hypothesis . hat multimodal flashcards are no more effective than monomodal flashcards as a tool for learning L2 concrete noun. can be rejected. This finding answers research question one. For this sample, multimodal flashcards are significantly more effective than monomodal flashcards as a tool for learning the meanings (L1 translation. of L2 concrete nouns. The chance of a type I error . ejecting a correct H. is very small . 035%), and the results strongly support H1, as the smaller the p-value, the more it supports H1. Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1144 Figure 8. Median number of user-initiated reviews for multimodal and monomodal cards during the Study Phase. The Test Phase (Research Question Tw. The dependent variable measured in the Test Phase is the AoGoodAo count . , the number of correct recall. Unlike during the Study Phase, each card was presented only once during each post-test, and all test cues were monomodal . ext onl. Participants were instructed to tap AoGoodAo if they successfully recalled the answer, and therefore, a higher AoGoodAo count is indicative of higher recall accuracy. Median recall accuracy was higher for multimodally learnt items than for monomodally learnt items . ee Figure . A one-tailed Wilcoxon signed-rank test analysis of the results from post-test 1 showed that there were significantly more correct recalls of multimodally learned items (Mdn = 14, n = . than of monomodally learned items (Mdn = 13, n = . Z = -2. 6, p = 0. 005, r = -0. Significantly more items that had been learned in a multimodal way were recalled correctly than items that had been learned in a monomodal way, even in response to monomodal test cues, and even though monomodal cards had been reviewed significantly more times on average than multimodal cards during the study phase. Therefore, in answer to research question two, the null hypothesis can be For this sample, multimodal flashcards are significantly more effective than monomodal flashcards as a tool for learning the meanings of L2 concrete nouns. As mentioned in section 3. 2, the use of increasingly larger intervals between post-tests in the Test Phase was intended to progressively increase the chances that participants would forget what they had learnt, enabling us to compare participantsAo retention of vocabulary learnt using monomodal and multimodal flashcards over time. However, recall accuracy did not significantly reduce between post-tests as expected. On the contrary, recall accuracy for multimodally learnt cards remained relatively stable between post-tests, with median recall remaining at 14 for each post-test, while recall accuracy for monomodally learned cards actually improved between post tests . Ie 13 Ie . As a result, the effect size of the difference between multimodally and monomodally learned cards reduced between post-tests, and a Wilcoxon signed-rank test analysis of the results from post-test 3 found no significant difference between the number of correct recalls (AoGoodAo coun. of multimodally learned items (Mdn = 14, n = . and monomodally learned items (Mdn = 14, n = . Z = -1. 2, p = 0. 119, r = -0. The relative stability and progressive improvement of recall accuracy for multimodally and monomodally learnt items between post-tests can be attributed to a weakness in the design of the Test Phase. Firstly, the length of time between post-tests was not sufficient for participants to forget what they had learnt during the Study Phase. Secondly and more significantly, participants were able to learn from the post-tests because they were self-marked, enabling many participants 1145 | Studies in English Language and Education, 12. , 1133-1152, 2025 to improve their scores for monomodally learnt items in post-tests 2 and 3. Progressive improvement in median recall accuracy can be observed for monomodally learnt cards, but not for multimodally learnt cards, because there was much more room . for improvement in the recall accuracy scores for monomodally learnt items. In contrast, participant score for multimodally learnt cards was already very high in post-test 1 . ith 10 out of 25 participants scoring full mark. , leaving little room . for improvement. This practice effect . ee Jhangiani et al. , 2. could have been avoided by using a test in which participants received no Figure 9. Median number of correct recalls for multimodal and monomodal cards during the Test Phase. Interpretation (Research Question Thre. Having shown that multimodal flashcards are significantly more effective as a tool for learning the meanings of L2 concrete nouns compared to monomodal flashcards . esearch questions on. , even in response to monomodal test cues . esearch question tw. , this subsection explores why this is the case in light of Dual Coding Theory, and with reference to research in multisensory learning . esearch question thre. The multimodal cards used in this study facilitated the building of connections between the Verbal systems and the Image system . , which can have an additive effect on recall due to the interconnected yet independent nature of each symbolic system (Paivio & Csapo, 1. This additive effect on recall was due to a larger number of viable retrieval routes for word pairs learned multimodally. Using multimodal cards will have resulted in the availability of twice as many viable retrieval routes compared to using monomodal cards (V2 Ie I Ie V1 and V2 Ie V1, compared to V2 Ie V1 only, see Figure 10, middle ro. meaning that if one retrieval route became decayed . then an alternative retrieval route could be used to successfully recall the meaning of an L2 word in L1 . ee Figure 10, bottom ro. Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1146 Figure 10. Symbolic system activation mapping. In addition to facilitating connections between the verbal systems (V1 and V. and the image system (I), the multimodal cards also facilitated interconnections between the Verbalvisual (V2vi. and Verbal-auditory (V2au. subsystems by providing both L2 text and audio, which may also have had an additive effect on recall. Although Paivio . 6, p. did not directly conclude this, interconnections between subsystems can be expected to have an additive effect on recall. This expectation stems from evidence, such as the selective effects of focal brain injuries, that shows subsystems corresponding to different sensory modalities can function moreor-less independently of one another. Providing visual and auditory verbal input simultaneously will have resulted in the formation of visual and auditory representations in V2vis and V2aud, along with strong interconnections between them, plus Ae when viewing the answer side of the card Ae strong connections between these V2 subsystems and the other symbolic systems (V1 and I). In contrast, monomodal cards presented only the visual . form of the L2 word on the question side of the card, leaving the participant to guess at its proper pronunciation . ts phonemic for. 1147 | Studies in English Language and Education, 12. , 1133-1152, 2025 The participantAos mind may have attempted to construct an auditory representation of the word by mapping graphemes to phonemes . ee Magrassi et al. , 2. , but this representation (?V2au. may have deviated from correct or standard pronunciation. Upon viewing the answer side of the card, connections between V2vis and V1 will have formed, and existing semantic knowledge of L1 may have been retrieved from the Image system, since the availability of imagens is an essential part of semantic memory (Paivio, 1986, p. However, cross-system connections from (?V2au. to other systems would presumably be much weaker than interconnections formed due to being exposed to multimodal . nd multisensor. ee Figure 11, middle ro. If Ae for example Ae the connection between V2vis and V1 were to decay, the possible and weak connections formed through monomodal learning may not have been sufficient for the participant to recall L2 in response to an L1 test cue, resulting in lower recall accuracy for monomodally learned items . ee Figure 11, bottom row, left colum. A participant who has learnt the same word multimodally would be able to rely on other retrieval routes that had been established and strengthened through multimodal learning, enabling successful recall . ee Figure 11, bottom row, right colum. Furthermore, it is reasonable to assume that imagined representations constructed by the mind without the aid of external input in the representationAos corresponding sensory modality only form a kind of tentative knowledge that is not as resilient or likely to be activated compared to representations that have been formed as the result of real-world experience or input. Figure 11. Symbolic system and subsystem activation mapping. The concept of redintegration as a mechanism of memory retrieval can also be used to explain why learning L2 vocabulary multimodally results in better recall accuracy even in response to monomodal . ext-onl. test cues. Redintegration is Authe capacity of a portion of a consolidated memory to re-activate the entire extended original networkAy (Thelen & Murray. Hicks. Marnita & O. Oktavianus. Comparing the Effectiveness of Multimodal vs Monomodal Digital Flashcards for L2 Vocabulary Learning | 1148 2013, p. In the context of the present study, this means that if a word is learned multimodally, resulting in the formation of a cognitive network of representations corresponding to multiple modes . text, audio, and pictur. , then a subsequent monomodal test cue that stimulates just one part of this network . text that stimulates a Verbal-visual representatio. e-)activate the whole network . including Verbal-auditory and Image representation. In this way, retrieval of a multimodally learned word operates on a richer and more informative network of interconnected representations, resulting in higher recall accuracy . ee Moran et al. 2013, p. From a Dual Coding Theory perspective. Clark and Paivio . 7, p. 9, 1991, p. used the term Aospreading activationAo to describe an equivalent . r at least simila. cognitive process by which a stimulus which directly stimulates a representation in one symbolic . system can also indirectly stimulate representations in other symbolic . systems by means of established connections between . In the current study, the mnemonic advantage of multimodal . cards was observed not only in response to multisensory test cues . during the Study Phase, in which the front side of the card included both L2 text and audi. , but also in response to unisensory test cues . during the Test Phase, in which the front side of the card included only L2 tex. This finding is consistent with Thelen and MurrayAos . 3, p. conclusion that semantically congruent multisensory experience at one point in time improves subsequent unisensory visual . nd auditor. object recognition, when compared to objects encountered exclusively in a unisensory context: The visual objects in this study were L2 words, and recognition of these objects was a cognitive prerequisite to recalling the objectAos paired associate Ae i. its L1 translation equivalent. The results of the current study suggest that exposure to semantically congruent multisensory stimuli enhances not only subsequent unisensory object recognition, but also subsequent unisensory cued recall of an associated object . n this case, the L2 wordsAo L1 translation equivalen. The present study contributes towards a growing body of literature that demonstrates that multimodal . r multimedia or multisensor. learning can be more effective than monomodal learning, and it addresses a gap in the literature by comparing the effectiveness of multimodal and monomodal flashcards for L2 vocabulary learning. The main implication of this study for L2 teaching and learning is that creating and using multimodal flashcards of the kind used in this study is worthwhile because multimodal flashcards are significantly more effective than monomodal flashcards as a tool for learning the meanings of L2 concrete nouns. While this study used Esperanto as the target language, the mnemonic advantage of using multimodal flashcards for vocabulary learning may be generalised to the learning of any second language, including English. Indeed, flashcards that include both text and audio can be expected to especially benefit learners of languages with complex graphemephoneme correspondences, such as English. CONCLUSION The results of this study show that multimodal flashcards of the kind used in this study are significantly more effective than text-only monomodal flashcards as a tool for learning the meanings of L2 concrete nouns, even in response to monomodal test cues. The mnemonic advantage of multimodal learning over monomodal learning is due to its greater effectiveness at facilitating the formation of interconnections between different symbolic . systems, enabling retrieval to operate on a richer, more informative network of representations, and resulting in a larger number of possible retrieval routes for word pairs learned multimodally, thus improving recall accuracy. Since participants provided feedback about their own recall accuracy, the main limitation of this study is that its validity depends upon participantsAo ability and willingness to follow the researcherAos instructions. Participants could tap AoGoodAo even if they failed to recall the answer however, it is difficult to imagine a motive for doing so since participants were not aware of the research objectives, they knew that their performance data would be anonymised, and they had agreed to follow the researcherAos instructions. Future research could replicate this 1149 | Studies in English Language and Education, 12. , 1133-1152, 2025 study but carry out delayed post-tests under controlled laboratory conditions in which answers are marked by the researcher and participants are given no feedback, and with larger intervals between post-tests, which would strengthen the studyAos validity, avoid the practice effects we observed in post-tests 2 and 3, and allow the researchers to observe the effect of card type on long-term vocabulary retention. ACKNOWLEDGMENT The first author would like to express his gratitude to Universitas Andalas for granting the scholarship for his postgraduate study in Linguistics at Universitas Andalas. Indonesia, and for supporting the completion of this study. REFERENCES