Jurnal Informatika Universitas Pamulang Penerbit: Program Studi Teknik Informatika Universitas Pamulang Vol. No. September 2020 . ISSN: 2541-1004 e-ISSN: 2622-4615 32493/informatika. Application of C4. 5 Algorithm in Improving English Skills in Students Kristin D R Sianipar1. Septri Wanti Siahaan2. W Fikrul Ilmi R. H Zer3. Dedy Hartama4 1,2,3,4 Program Studi Teknik Informatika. STIKOM Tunas Bangsa Pematangsiantar. Jl. Jend. Sudirman Blok A. No. 1,2 dan 3. Kota Pematangsiantar. Sumatera Utara. Indonesia, 21143 e-mail: 1kristinsianipar7@gmail. com, 2septriwanti26@gmail. com, 3fikrulilmizer@gmail. dedyhartama@amitunasbangsa. Submitted Date: May 31st, 2020 Revised Date: September 22nd, 2020 Reviewed Date: June 17th, 2020 Accepted Date: September 30th, 2020 Abstract In this world, many languages from other countries can be used as a communication tool. One of them is English. Students who has qualification must know that learning English is very much needed. Because nobody knows what will happen in the next few years. It could be one factor to obtain a position the next few years is our expertise in speaking English. English is a global language used by people to communicate with other people. On this occasion, researchers conducted research to determine what factors can improve students' ability to speak English. To complete this research, researchers resolve by applying the existing algorithm in data mining, namely C4. 5 Algorithm. The result of this research can be concluded that the factors that influence to improve students ability in English are hearing from the Keywords: C4. 5 algorithm. English. data mining. Introduction Countries in the world have a language used in communicating with other people. Indonesia, the language of unity is used, namely Indonesian. However, if you wanted to communicate with foreigners from other countries, it would be difficult to understand what they were saying. This is because humans use the language they have and so do strangers, so there is no common ground in communication. Therefore. English is used as a global language in So, people in the world can communicate and understand what is being said. One technology that can be used to improve learning is computers (Irnanda & Windarto. In Indonesia, everyone has learned English since elementary school (SD) or some have learned it from kindergarten. Even in Higher Education, it is applied to provide English courses to students even though the output of the study is not related to English. From this, it can be concluded that the importance of proficiency in foreign languages, especially English is a factor of success in one's academic or to get a good career in work (Megawati, 2. However, even though http://openjournal. id/index. php/informatika they have been studying English for a long time, there are still many students who find it difficult to learn English. Students have difficulty learning English. Difficulties that can be experienced, such as difficulties in reading, writing, listening and Sometimes there are students who still feel insecure about applying the language in everyday The reason is because they are afraid of being ridiculed and considered showing off that they are able to speak English. There are also problems that are often experienced by students, namely difficulties in reading vocabulary and applying grammar in English. The teaching role is very necessary to be able to improve students' English skills. With these problems, the authors took the initiative to determine what factors could improve students' skills in English. There are many branches of computer science that can be used to solve complex problems. This branch of computer science is Artificial Intelligence like datamining (Branch. Widyastuti. Gracella. Simanjuntak, & Hartama, 2019. Katrina. Damanik, & Parhusip. Rofiqo. Windarto, & Hartama, 2018. Sadewo. Windarto, & Wanto, 2018. Sari. Wanto. Jurnal Informatika Universitas Pamulang Penerbit: Program Studi Teknik Informatika Universitas Pamulang Vol. No. September 2020 . & Windarto, 2018. Series, 2019. Swarm. Analysis. Problem, & Prediction, 2. Based on the above problems, the authors make use of data mining, namely classification in solving problems to improve students' English skills. The algorithm for data mining used is the C4. 5 Algorithm (Haryati. Sudarsono, & Suryana, 2. The purpose of this research is to determine factors that can improve students' English skills. Research Methods The following are the steps in completing this research, namely: Create the required dataset. Apply the required algorithm to the existing dataset. Perform a manual count. Validate with Rapidminer software. 1 Create a Dataset The dataset is obtained by providing a questionnaire created from Google Form. The questionnaire was given to students to complete the completion of this study. Table 1. Respondent Data Ability In English Capable Capable Capable Incapable Capable Incapable Capable Capable Incapable Capable Capable Incapable Incapable Capable Capable Incapable Incapable Incapable Incapable Capable Capable Capable Incapable Capable Capable Incapable Incapable http://openjournal. id/index. php/informatika ISSN: 2541-1004 e-ISSN: 2622-4615 32493/informatika. Incapable Incapable Capable Incapable Capable Incapable Capable Capable Incapable Incapable Incapable Incapable Capable Capable Incapable Capable Capable Capable Incapable Incapable Incapable Capable Capable Incapable Capable Capable Incapable Incapable Capable Incapable Capable Incapable Capable Capable Capable Incapable Capable Incapable Incapable Incapable Incapable Capable Capable This study collected data from 70 respondents by making 4 criteria, namely: Reading References (K. Hearing from the Environment (K. Practicing in the Environment (K. , and Utilizing Technology (K. With sub criteria as follows: A 5 = Strongly Agree A 4 = Agree A 3 = Quite Agree A 2 = Disagree A 1 = Very Disagree Jurnal Informatika Universitas Pamulang Penerbit: Program Studi Teknik Informatika Universitas Pamulang Vol. No. September 2020 . 2 Applying the Algorithm Used In this study using C4. 5 Algorithm in determining factors that influence students in improving students' English skills. C4. 5 algorithm is an algorithm found in the classification technique to solve cases or problems. Decision tree . ecision tre. is the basis of C4. 5 Algorithm. 45 algorithm is a decision tree induction algorithm. ID3 (Iterative Dichotomiser . (Febriarini & Astuti, n. The formulas used in C4. 5 Algorithm are: Calculate entropy Explanation: A S = dataset . A k = number of dataset partitions Calculate the gain Explanation: A S = dataset . A A = attributes A N = number of attribute partitions A | S1 | = large number of i-partition ISSN: 2541-1004 e-ISSN: 2622-4615 32493/informatika. | S | = number of cases of S The advantage of using C4. 5 Algorithm is that it can make a decision tree so that it is efficient, where the decision tree handles the discrete type and discrete-numeric type attributes, is easy to interpret and has an acceptable level of accuracy (Kamber, n. The weakness of the C4. 5 algorithm is that there is scalability in that training data can only be used and stored as a whole at the same time in memory (Luvia. Windarto. Solikhun, & Hartama, 2. 3 Perform Manual Counts In this study, manual calculations are performed using existing formulas and done using Microsoft Excel. When doing manual counting, it must be done carefully in order to obtain correct results and in accordance with the Rapidminer software. 4 Validation The validation process is carried out using Rapidminer software. Validation is done by taking a sample of the dataset owned by as many as 70 respondents from questionnaire data given to students and then the data is tested using a decision tree (C4. Figure 1. Decision tree from a dataset of 70 respondents Decision Tree uses performance in the form of a tree . where in each node that is owned to explain the existing attributes, branches interpret the values of existing attributes and leaves present the class (Yulia & Putri, 2. In the decision tree there is what is known as root. Root is the node at the top of the decision tree. Decision Tree is the best-known data classification technique for use in data mining. Decision tree use relatively http://openjournal. id/index. php/informatika fast development. The output of this type of model is made to be easily understood. Result and Discussion Classification is performed on the respondent's data. The calculation is done using Microsoft Excel software. Then, then perform data calculations using the formula in C4. 5 Algorithm. The results of calculations on the 1st iteration can be see in Table 2. Jurnal Informatika Universitas Pamulang Penerbit: Program Studi Teknik Informatika Universitas Pamulang Vol. No. September 2020 . ISSN: 2541-1004 e-ISSN: 2622-4615 32493/informatika. Table 2. Results of the 1st Iteration Calculation Node 1 Membaca Referensi Sangat Setuju Setuju Cukup Setuju Kurang Setuju Sangat Kurang Setuju Jlh Kasus (S) Mampu (S. Tidak Mampu (S. Entropy 0,999411 0,936667 0,942683 0,764205 0,94566 0,975119 Gain 0,188403 Mendengar dari Lingkungan Sangat Setuju Setuju Cukup Setuju 0,863121 Kurang Setuju Sangat Kurang Setuju Mempraktikkan dari Lingkungan Sangat Setuju Setuju Cukup Setuju Kurang Setuju Sangat Kurang Setuju 0,755375 0,918296 0,749595 0,970951 Memanfaatkan Teknologi Sangat Setuju Setuju Cukup Setuju Kurang Setuju Sangat Kurang Setuju 0,863121 0,985228 0,811278 0,899744 0,721928 0,279838 0,151526 0,149121 Table 3. Results of the 2nd Iteration Calculation Mampu (S. Tidak Mampu (S. Entropy Sangat Setuju Setuju Cukup Setuju Kurang Setuju Sangat Kurang Setuju Jlh Kasus (S) Mempraktikkan dari Lingkungan Sangat Setuju Setuju Cukup Setuju Kurang Setuju Sangat Kurang Setuju 0,9940302 Memanfaatkan Teknologi Sangat Setuju Setuju Cukup Setuju Kurang Setuju Sangat Kurang Setuju Node 2 Membaca Referensi Gain 0,3925371 http://openjournal. id/index. php/informatika Jurnal Informatika Universitas Pamulang Penerbit: Program Studi Teknik Informatika Universitas Pamulang Vol. No. September 2020 . In Table 2 is the result of the calculation in the 1st iteration. It can be seen that the criteria of Hearing from the Environment get the highest gain with a value of 0. 27984 and the highest entopry value is "Strongly Agree" with a value of From Table 2, because there is the highest root gain, then proceed to the 2nd iteration. The results of the count in the 2nd iteration can be see in Table 3. In Table 3 is a calculation on the second iteration and can be seen that the same gain has ISSN: 2541-1004 e-ISSN: 2622-4615 32493/informatika. been obtained in the criteria of Reading Reference and Utilizing Technology with the gain value is 1. And the Practicing criteria in the Environment get the lowest gain with a value of 0. If the entropy has a value of 0 in one of the Universe, it indicates that it already has leaves. In this study, we validated the Rapidminer We use a decision tree and produce a decision tree as follows: Figure 2. Decision tree in rapidminer Based on the decision tree above produced a decision tree for increasing students' English ability in text form, as follows: Mendengar dari Lingkungan = 1. 0: Tidak Mampu {Mampu=0. Tidak Mampu=. Mendengar dari Lingkungan = 2. | Membaca Referensi = 2. 0: Tidak Mampu {Mampu=0. Tidak Mampu=. Membaca Referensi = 4. 0: Mampu {Mampu=2. Tidak Mampu=. | Membaca Referensi = 5. 0: Tidak Mampu {Mampu=0. Tidak Mampu=. Mendengar dari Lingkungan = 3. 0: Mampu {Mampu=9. Tidak Mampu=. Mendengar dari Lingkungan = 4. Mempraktekkan dalam Lingkungan = 0: Tidak Mampu {Mampu=0. Tidak Mampu=. Mempraktekkan dalam Lingkungan = 0: Tidak Mampu {Mampu=0. Tidak Mampu=. Mempraktekkan dalam Lingkungan = 0: Mampu {Mampu=4. Tidak Mampu=. Mempraktekkan dalam Lingkungan = 0: Mampu {Mampu=12. Tidak Mampu=. Mendengar dari Lingkungan = 5. | Memanfaatkan Teknnologi = 1. 0: Tidak Mampu {Mampu=0. Tidak Mampu=. http://openjournal. id/index. php/informatika | Memanfaatkan Teknnologi = 2. 0: Tidak Mampu {Mampu=0. Tidak Mampu=. | Memanfaatkan Teknnologi = 3. 0: Mampu {Mampu=6. Tidak Mampu=. Memanfaatkan Teknnologi = 5. 0: Mampu {Mampu=3. Tidak Mampu=. Conclusion From the results of this study, using the C4. 5 Algorithm can make it easier to determine what factors can improve English skills in Researchers have completed this research and it can be concluded that the factors that influence to improve students' ability to English are the criteria of "Hearing from the Environment" with the iteration calculation process stopping at the 2nd iteration. With this research, it can provide motivation to others, especially students, namely to improve English language skills so that they can be used in the coming year and become a source of employment. References