JOIV : Int.
Inform.
Visualization, 6.
- March 2022 138-144
INTERNATIONAL JOURNAL
ON INFORMATICS VISUALIZATION
INTERNATIONAL
JOURNAL ON
INFORMATICS
VISUALIZATION
journal homepage : w.
org/index.
php/joiv An Android Malware Detection System using a Knowledge-based Permission Counting Method Sun-A Lee a.
A-Reum Yoon a.
Ji-Won Lee a.
Kwangjae Lee a,* Department of Information Security Engineering 31.
Sangmyeongdae-gil.
Dongnam-gu.
Cheonan-si.
Chungcheongnam-do, 31066.
Republic of Korea Corresponding author: *begleam@smu.
AbstractAi As the number of damage cases caused by malicious apps increases, accurate detection is required through various detection conditions, not just detection using simple techniques.
This paper proposes a knowledge-based machine learning method using authority information and adding its usage counting features.
This method classifies training apps and malicious apps through machine learning using permission features in manifest.
xml of Android apps.
As a result of the experiment, accuracy, recall, precision.
F1 score are 01%, 97.
70%, 100.
0%, 99.
01%, respectively.
Since recall is higher than other indicators, it accurately predicts malicious apps as In other words, the proposed system effectively prevents the distribution of malicious apps.
As the number of harmful apps develops daily, it was determined in this study that it is critical to detect malicious apps using a machine learning model effectively.
However, utilizing permission alone as a criterion for distinguishing between legitimate and malicious apps is insufficient to detect all harmful apps that emerge from new attack technologies.
Combining feature information efficient in detecting malicious apps, such as APIs that access and control sensitive data from users or adding other detection criteria will likely improve the detection model's According to the upcoming study, recent attackers have used obfuscation to disguise harmful code and hinder static analysis of rogue programs.
It is important to consider how to detect harmful apps that are obfuscated in this way.
KeywordsAiMachine learning.
android malware detection.
permission counting.
knowledge-based analysis.
Manuscript received 22 Oct.
revised 17 Nov.
accepted 21 Dec.
Date of publication 31 Mar.
International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4.
0 International License.
transferring money from accounts to defraud 264 million won .
The representative research to prevent damage caused by malicious apps is a signature analysis method.
This analysis consists of a code-based static analysis and a sandbox-based dynamic analysis .
Ae.
The static analysis can detect the same type of malicious apps by analyzing the source code of malicious apps.
However, it is difficult to detect manual analysis time problems and new types of malicious apps because their signatures change.
The dynamic analysis analyzes suspected malicious output actions and packets to determine whether it is malicious.
Thus, it can detect malicious apps that are difficult to perform static analysis, such as obfuscation.
However, it is possible to avoid detection with the execution environment detection or the conditional operation function of malicious apps.
In that case, it may be difficult to detect malicious apps.
In addition, like static analysis, it is difficult to detect new types of malicious apps.
In recent research, machine-learning (ML) based detection methods have been proposed to solve new mutation detection and false detection rates.
These methods distinguish between a normal app and a malicious app using a feature that is a form
INTRODUCTION
As the use of smartphones has increased recently, damage from malicious apps is also increasing.
Malicious apps refer to malicious software (Malwar.
that performs malicious functions by disguising it as a normal app on a smartphone .
, .
According to the Korea Internet & Security Agency (KISA), malicious apps increased 5.
5 times from 1,635 in 2016 to 9,051 in 2019 and are being used for intelligent crimes .
Additionally, according to the EST-Security report, which analyzed the sources of apps installed on over 12 million Android devices, 67% of malicious apps were installed on the Google Play Store .
Despite the tricky upload key acquisition and signature procedures when uploading apps to the Google Play Store, it has not been able to escape from the wrong explanation that it is a malicious app installation platform.
The installed malicious apps are cleverly distributed to smartphone users, causing financial For example, there were financial crimes such as stealing personal information from smartphones or or change of an input value.
If we reprocess information from thousands or millions of malicious apps in our dataset and apply it to training to create a model, it becomes a ML-based model with specific features and has a high detection rate .
, .
The ML-based researches performed detection by reading the information in the Android Application Package (APK) file and using permission feature or malicious behavior mainly used in malicious apps .
Ae.
These research use the dataset, including the permission features of the android system.
And to obtain high accuracy, their dataset adds features: API calls.
Dex header.
Broadcast Receiver.
Service, etc.
In addition, a study was conducted to increase the weight of the top 20 highly important inputs by organizing feature information on authority.
API, etc.
, which are frequently used in malicious app detection.
However, if there is a change in the input value due to obfuscation, the probability of incorrect detection increases significantly because it is difficult to determine with a trained model .
In addition, the use of permission information, which is an input value that is not obfuscated, can reduce the false detection rate, but this single feature extraction method has a limitation in that the detection rate is low.
In this paper, we propose a method of lowering the false detection rate as a machine learning method that extracts and uses permission information that is resistant to obfuscation.
more detail, for improving the low detection rate of single feature extraction, the top 20 permissions analyzed as important in detecting malicious and normal apps, and the number of permissions used are additionally created through frequency analysis.
The proposed method is not only tough on false detection by obfuscation but also has a higher detection rate than the existing single feature extraction method.
the AndroidManifest file has the permission information needed to operate the app.
It is used to protect the user's personal information, and a system is automatically assigned according to the authority, and user approval is required .
In addition, malicious apps excessively require authorization information .
Therefore, a series of permission information that may be used in malicious behavior, such as accessing important information on a smartphone or exchanging data over the Internet, may be used as a malicious app detection feature.
Malicious App Dataset This paper uses the Android Malicious App Dataset (CICAndMal2.
provided by the Canadian Institute for Cybersecurity (CIC) .
The dataset is a collection of 429 malicious apps and 5,065 normal apps from the Google Play Store on smartphones to collect traffic generated through various scenarios such as Internet searches, phone calls, and Based on that information, it was classified into 42 malicious software groups and four categories (Adware.
Ransomware.
Scareware, and SMS-malwar.
Categories of datasets and malicious software groups can be represented as shown in Table 2 .
, .
, .
, .
TABLE II
CATEGORIES AND MALWARE FAMILIES IN ANDMAL2017
Name Adware APK Configuration APK is a package file used to distribute Android software and middleware.
This file contains elements that are needed to run the app, such as AndroidManifest.
Class.
Res, and Lib.
The description of the components is shown in Table 1 .
Ransomware TABLE I COMPONENTS OF AN APK FILE Name AndroidManifest.
Classes.
/res /lib /assets /META-INF Description Xml file that manages the app.
Application Permission.
Intent.
Service.
Activity.
SDK version information.
Collected class-files and converted them into byte code to allow Android Dalvik virtual machines to recognize elements.
A file containing resource file information.
Save type and id information for various resource files.
Uncompiled images.
A folder containing xml resource files.
A folder containing the library.
Composed so files compiled appropriately for each process created with NDK (Native Development Ki.
A folder containing app information that can be managed by Assets-Manager.
A folder related to the signature.
Save SHA-1 and Base-64 signature values.
Scareware SMS-malware Description Malware Group (Famil.
Software that arbitrarily displays advertising to Dowgin.
Ewind.
Feiwo.
Gooligan.
Kemoge, koodous.
Mobidash.
Selfmite.
Shuanet.
Youmi Malware that infiltrates under the guise of email or updates, encrypts data on the user's device, and requires payment in return for Charger.
Jisut.
Koler.
LockerPin.
Simplocker.
Pletor.
PornDroid.
RansomBO.
Svpeng, annaLocker family A copycat version of ransomware.
Stressing that they have control of the computer and demand money.
AndroidDefender.
AndroidSpy.
AV for Android.
AVpass.
FakeApp.
FakeApp.
AL,
FakeAV.
FakeJobOffer.
FakeTaoBao.
Penetho.
VirusShield family SMS Phishing.
It induces malicious app installation by impersonating normal apps that are generally installed on mobile phones.
BeanBot.
Biige.
FakeInst.
FakeMart.
FakeNotify.
Jifake.
Mazarbot.
Nandrobox.
Plankton.
SMSsniffer.
Zsone family Machine-learning Algorithm Machine learning algorithms are largely divided into a supervised learning method of learning computers with correct labels on training data and an unsupervised learning method of learning computers without correct labels on training data.
Supervised learning is a method of classifying new data by learning within a predetermined label using classification or regression, and unsupervised learning is used This paper extracts the app's permission information using Android static analysis.
Among the Android app components, to obtain meaningful knowledge through data, although there is no prior knowledge of specific results such as clustering or pattern recognition .
, .
This paper uses a ML algorithm for classifying normal and malicious apps, using a supervised learning method.
To select the right algorithm for the proposed system.
Representative classification algorithms such as K-Nearest Neighbor (KNN).
Support Vector Machine (SVM).
Ada Boost.
Extra Tree, and Random Forest were compared and analyzed in Table 3 .
Ae.
is a harmonious average of precision and recall, is required.
In addition, an ROC curve and an Area under the ROC curve (AUC) were added as performance evaluation indicators, which show the performance of the classification model as a curve and indicate the area of the curve.
The AUC-ROC curve represents the relationship between sensitivity and specificity on a two-dimensional plane.
The criteria for how well you find malicious apps are expressed as sensitivity (Y-axi.
, and the criteria for how well you classify normal apps are expressed as specificity (X-axi.
This curve represents a model with higher classification accuracy as it approaches the upper left .
, .
of the coordinates.
Finally, a K-Fold Cross Validation technique was applied to verify the reliability of the detection performance evaluation.
For K-layer CrossValidation, as shown in Fig.
1, after arbitrarily dividing the dataset into the same size, one of them is used as a validation dataset and the other (K-.
as a learning dataset.
If this process is repeated k times sequentially, it is possible to verify the entire given dataset .
In this paper, for the performance evaluation of the proposed model, the k value was set to 8, divided into 8 datasets and performance evaluation was performed.
TABLE i MACHINE LEARNING CLASSIFICATION USING THE SUPERVISED LEARNING
Algorithm
K-NN
SVM
Ada-Boost Extra-Tree RandomForest Description A method of classifying new input data into the proximity of the neighboring data category.
Define the classification baseline, the decision boundary, as a model.
A method of categorizing which side of a boundary the new data belongs to.
As a type of ensemble learning, a method of classifying weak classifiers into strong classifiers by combining the results.
The weight of the sample misclassified by the drug classifier is applied according to the Ensemble learning method that randomly generates N Weak Trees for existing datasets and selects classifiers with good performance by combining classification results.
Create N Weak Trees randomly while allowing duplication for the dataset.
Ensemble learning method of selecting a classifier with good performance by combining classification results.
Performance Evaluation Index In this paper, a confusion matrix and a receiver operating characteristic (ROC) curve are used as performance evaluation indicators for machine learning models .
, .
The confusion matrix consists of True Positive (TP).
False Positive (FP).
True Negative (TN), and False Negative (FN).
In order to minimize misdetection and misdetection, the smaller the number of FPs and FNs, the better the classification model.
In addition, four evaluation indicators are added: Accuracy.
Precision.
Recall, and F1-score through TP.
FN.
FP, and TN of the confusion matrix shown in Table 4 .
Fig.
1 An example of the K-fold Cross Validation.
II.
MATERIAL AND METHOD
Fig.
2 shows the conceptual diagram of the Android malicious app detection system proposed in this paper.
Various files exist inside packages of normal and malicious The APK file consists of several files required to run the app, of which the AndroidManifest.
xml file has the necessary permission information to run the app.
TABLE IV
CONFUSION MATRIX
Predicted Malicious App Benign App Actual Malicious App Benign App True Positive False Negative False Positive True Negative Accuracy represents the ratio of the number of normal detections in the total detected data.
Precision is the percentage of the actual number of malicious apps predicted by malicious apps.
On the other hand.
Recall is the ratio of the number predicted by malicious apps among the actual malicious apps.
Since Precision and Recall have a relationship that is difficult to balance at the same time.
F1-score, which Fig.
2 A Conceptual Diagram of a Proposed Android Malware Detection System.
insert permAos name at n-1 index of label nIan 1 if label.
exist among all perms in apk.
then value.
Ia 1 end for value.
Ia .
pk == malicious ap.
? 1 : 0 end for dataset Ia .
abel, valu.
permList Ia label In Android systems, permissions used in apps must be licensed to protect users' personal information.
Depending on the type of permission, the user directly approves it, or the system is automatically granted.
A typical feature of malicious apps is that they excessively require the use of Therefore, we propose a ML-based detection method that classifies normal and malicious apps using permission information.
First, the privileged information of the app is extracted in a one-hot encoding format from the AndroidManifest.
xml file.
Input data for training the ML model is converted into a dataset in the form of a csv file.
order to increase detection performance, the frequency of data is analyzed, and the total number of permissions (TPC) and the number of frequently used permissions Main Permission Count (MPC) are learning algorithm is designed by comparing binary classification algorithms and selecting the one with good performance.
Finally, a new app that requires a malicious or normal test is determined using the ML model previously created.
Adding Frequency Features to Datasets Since the single feature extraction method has a low detection rate by detecting only authority information, the performance is further improved by using the frequency analysis results.
The two features define features that mean the total number of permissions used in the app as TPC, and features that mean the top 20 permissions analyzed as important in using the app as MPC.
Adding these features is also iterative, so an automation program described in Algorithm 2 was used.
TABLE V
TOP 20 PERMISSIONS
Dataset Creation The preprocessing process must inevitably perform repetitive tasks.
This is because it is the process of creating a dataset necessary for training with numerous related Therefore, in this paper, an automation program described in Algorithm 1 was used to extract permission information from multiple apps automatically.
First, an existing permission list file is read, and a label variable is Additionally, create the value variable to have the label size 3 as a column.
When the column size of the generated value is n, 1 is classified into an order, 2 is an APK name, 3 to n-1 is a permission list, and n is classified according to the index.
Next, it is repeated as many times as the number of APKs prepared to create a dataset.
A new line of value is added, and the class value is stored as 1 malicious and 0 normal depending on whether the APK file is malicious.
Then add the value item to 1 for all the permissions in the AndroidManifest.
xml file.
In this case, if an authority that is not in the permission list is found, a new column is added to column n-1 of value and set to 1.
The label also inserts the permission name into the n-1 index as a value.
After that, when the repetition ends by the number of APK files, a dataset is generated by combining label and value and stored as a CSV (Comma-Separated Value.
Finally, save the updated label in this task to the permission list file.
Permission (Descriptio.
READ_PHONE_STATE (Read about phone status such as device phone number, network information, and call status in progres.
INSTALL_SHORTCUT (Install icons on the home scree.
SYSTEM_ALERT_WINDOW (Open windows using top TYPE_SYSTEM_ALERT of other GET_TASKS (Access current or recently executed task informatio.
ACCESS_WIFI_STATE (Access to information about Wi-Fi network.
MOUNT_UNMOUNT_FILESYSTEMS (File system format for removable storag.
RECEIVE (Receive messages from c2dm serve.
WRITE_EXTERNAL_STORAGE (Write a file to an external repositor.
ACCESS_COARSE_LOCATION (Access to a wide range of locations (Cell-ID.
WiF.
) CHANGE_WIFI_STATE (Change Wi-Fi connection statu.
VIBRATE (Vibration contro.
BILLING (Access to payment dat.
ACCESS_FINE_LOCATION (Access to GPS) WAKE_LOCK (Keep the process when the screen is dark or on standb.
READ_EXTERNAL_STORAGE (Read a file to an external repositor.
RECEIVE_BOOT_COMPLETED (Boot complete executio.
GET_ACCOUNTS (Access the account list from within the account servic.
CAMERA (Access to camera equipmen.
ACCESS_NETWORK_STATE (Access to information about network acces.
READ_GSERVICES (Read data about ma.
Algorithm 1 Auto Extract Algorithm Input: apk.
APK files Input: permList, permission list file Output: dataset, dataset file in csv format Output: permList, permission list file Method:
label Ia permList n Ia #permList 3 for j=1 to #apk do if value is not exist then create value to 1yn array else then attach new row of value decompression apk.
Ia j value.
Ia name of apk for k=3 to n-1 do if perm does not exist in label then insert new column at n-1 index of value First, the dataset file is read and stored in label and value variables, respectively.
And the top 20 authority names of high importance are defined as permList.
Insert two additional columns in column n-1 of value and insert TPC and MPC into the n-1 index as values in the label.
In order to, obtain the *1 K-NN options: n_neighbors =10 frequency.
TPC and MPC variables are created and initialized to zero.
Next, since one line of the dataset extracts features of one APK file, the number of lines of value repeats it.
TPC
checks all the characteristics of each APK and accumulates the permission's true .
/false .
*2 SVM options:
C= 0.
*3 Ada-Boost options: n_estimators=100 *4 Extra-Tree: n_estimators=100 *5 Random Forest options: n_estimators= 50 As a result.
Random Forest was selected as an algorithm suitable for this study.
Random Forest can prevent overfitting by the law of large numbers made of randomness.
It is also robust to noise and reduces predicted volatility.
The binary classification ML model was trained through the previously generated dataset, and the performance of the five classification algorithms was compared.
Hyper-Parameter of each classification model showed optimal performance even with default values, so only the n_estimatorsAo values of three models.
Random Forest.
Extra Tree, and Ada Boost, were set In the case of K-NN, the n_neighbors value was set to the optimal 10 in the experiment, and the SVM gave the gamma C value to 0.
Ada Boost set it to 100 optimal for the experiment, and Random Forest and Extra Tree set it to 50 optimal for the experiment.
As a result, comparing the accuracy with the training rate of 80% and the verification rate of 20% for the experimental data confirmed that the Random Forest model showed the highest performance with 95.
accuracy in determining normal and malicious apps.
Algorithm 2 Auto Add Feature Algorithm Input: dataset, dataset file in csv format Input: permList, top 20 permission list to detect malware Output: dataset, dataset file in csv format Method:
abel, valu.
Ia dataset insert new 2-column at n-1 index of value insert .
pcAos name, mpcAos nam.
at n-1 index of label tpc Ia 0, mpc Ia 0 for j=1 to #valueAos row do for k=3 to n-3 do tcp Ia tcp value.
if label.
exist among all permList then mpc Ia mpc value.
end for value.
Ia tpc, value.
Ia mpc tpc Ia 0, mpc Ia 0 end for dataset Ia .
abel, valu.
The MCP verifies all the privileged names of each APK.
After checking whether the value is in the permList, the authority's true .
/false .
values are accumulated and summed, if applicable.
Then.
TPC and MPC values are stored in the n-2 and n-1 indexes of the value and are initialized to zero again.
After that, if the number of lines of value is repeated, combine the label and value to create a dataset and save it as a CSV file.
A total of 1013 APK files and 1031 Permission Lists were sorted.
If the corresponding permission exists in each APK file, it is output as 1 and if not, it is output In addition, it is possible to check the frequency of the total permission present in each APK file in the last cell.
The top 20 Permission frequencies based on the importance of feature information were added to the existing csv file.
If there are the top 20 permissions based on the importance of feature information among the permissions present in each APK file, it is vectorized to check the frequency by accumulating them i.
RESULT AND DISCUSSION
In this experiment, 600 normal apps and 413 malicious apps were used.
Among the permission information extracted from a total of 1031 APK files, a total of 1013 feature data were used for model training by deleting duplicated or meaningless data.
The division ratio of the training and verification datasets was 8:2, and the training datasets were used as 810 and 203 testing datasets.
The experiment was conducted by adding frequency to the previously created dataset using the Random Forest algorithm.
Frequency Feature for Permission (EXP #.
This study considers whether it is possible to distinguish between normal and malicious apps with higher accuracy by adding features on the frequency of permissions in APK files.
This experiment identifies the frequency of permission in one APK file and uses the information as a feature of the ML model to increase classification accuracy.
The experiment was conducted with a total of 1032 features by adding a field for the frequency of the permission of the corresponding APK to the pre-treated experimental data.
Creation of Classification Algorithm Models Six hundred downloaded datasets were collected in Benign folders, 413 malicious apps in Malicious folders, and 1031 Permission features used in normal and malicious apps were The Permission frequency contained in one APK file was converted into a csv file.
For higher accuracy of normal and malicious apps classification, the top 20 Permission frequencies based on the importance of feature information in one APK file were converted to csv files.
classification algorithm selection process was performed based on the previously extracted authority information to determine normal and malicious apps.
K-NN.
SVM.
Ada Boost.
Extra Tree, and Random Forest were considered as classification algorithms as shown in Table VI.
TABLE VI
MACHINE LEARNING MODELS ACCURACY COMPARE
Model
K-NN *1
SVM*2
AdaBoost*3 Extra Tree*4 RF*5 Accuracy Precision Recall F1 score Fig.
3 Compare of Permission Frequency Performance Indicator Earlier, as it was confirmed that the performance of the Random Forest model was the best through the algorithm selection process, the experiment was conducted using only the Random Forest model.
The experimental data was divided by a verification ratio of 20%, and the value of HyperParameter optimal for the experiment was set.
The max_depth value was set to 50, the min_samples_split value was set to 5, and the n_estimators value was set to 50.
The algorithm selection process confirmed that the result of the normal/malicious app determination of the ML model training without adding the Permission frequency was As a result of training the machine learning model by adding features on the frequency of permission of each APK file, the accuracy was 97.
Compared with previous results that did not add the frequency of permission as a feature, it can be seen that the accuracy was improved by A total of four types of model performance evaluation indicators were used: accuracy, accuracy, reproduction rate, and F1-score.
In EXP #1, the accuracy of the ML model was 04%, the precision was 97.
65%, and the reproduction rate The F1-score, which calculated the harmonic mean of precision and reproduction rate, also showed a result Earlier, as a result of calculating the accuracy with 8-layer cross-validation for reliable evaluation, the verification result was 93.
relatively superior to that of EXP #1.
Table 7 is the result of synthesizing the previous two experimental results.
In terms of accuracy and F1-score.
EXP #2 showed 1.
97% higher accuracy than EXP #1, and there was a 0.
05% fine difference in terms of precision.
However, the reproduction rate was 6%, showing the biggest difference.
This is because there was a difference between the two models in the False Positive ratio in the confusion matrix.
In Fig.
4, the two experimental results are shown and compared as a ROC curve.
Although there is a slight difference, the ROC curve corner of EXP #2 is closer to the upper left.
The AUC value, which means the area under the ROC curve, can also be determined by classification accuracy, with the AUC value of the EXP #1 model being 9975 and the EXP #2 model being 0.
The larger the area under the ROC curve, the better the model, so the EXP #2 model with a relatively high AUC value is the model optimized for classification.
In addition, the EXP #2 model was classified with 0.
7% higher accuracy in the 8-layer crossvalidation conducted to increase the reliability of the experimental results further.
TABLE VII
EXP 1.
EXP 2 COMPARISON OF RESULTS
Model Accuracy Precision Recall F1 score Exp #2* Exp #1 frequency feature for the top 20 permissions (EXP #.
Through EXP #1, it was confirmed that the accuracy of determining normal and malicious apps according to the frequency of permission was improved.
In this experiment, the frequency of permissions frequently used in apps is added as a feature to increase the classification accuracy of normal and malicious apps.
Even if many permissions make up malicious apps, there may be cases where normal apps request In order to prevent discriminating normal apps as malicious apps, this study improves the accuracy of determining normal apps and malicious apps by adding frequency features, not the presence or absence of the top 20 In EXP #2, an experiment was conducted with a total of 1033 features by adding the frequency of use of the top 20 Permissions shown in the above feature information important to the dataset used in EXP #1.
The experimental data were divided by a verification ratio of 20%, and the Hyper-Parameter value was set to a max_depth value of 50, a value of min_samples_split, and a value of n_estimulators of 50, which are optimal for the experiment.
The training was conducted by setting the same algorithm and the same parameter, and the accuracy was 99.
01%, which showed higher classification accuracy than EXP #1.
The reproduction rate was 97.
70%, but the reproduction rate was Since the False Positive ratio is 0, there is no false F1-score also showed high results at 99.
compare under the same conditions as EXP #1, the EXP #2 model was classified with an accuracy of 94.
18% due to 8layer cross-validation.
*Options: n_neighbors =100, max_depth=50, min_samples_splits =5 Fig.
4 The ROC Curves of EXP #1 and EXP 2.
IV.
CONCLUSION
This study classified normal and malicious apps through ML-based detection techniques based on the frequency of permission of Android apps.
Considering that malicious apps require excessive permission, unlike normal apps, we checked whether malicious apps could be detected more accurately by using them as frequency feature data using permission.
Among the various ML classification algorithms.
Randomforest showed the highest accuracy.
The simple Permissionbased detection model using Random Forest showed an accuracy of 95.
07%, and the detection model, including the frequency at which the app used permission, obtained an accuracy of 97.
The trained model showed 99.
accuracy, including the frequency of using the top 20 Permission information that affects distinguishing between Comparison Between Experiments As a result of evaluating the two experiments with the same performance evaluation index, the Performance of EXP #2 is normal and malicious apps.
It was confirmed that the accuracy was improved by about 2% by including meaningful feature data such as the frequency of the top 20 permissions based on the importance of feature information using the frequency of use of the permission.
Since it is important to minimize misdetection and misdetection and make accurate judgments in detecting normal and malicious apps, an ML model optimized for classification has been implemented.
In this study, as the number of malicious apps that develop day-byday increases, it was judged that it is important to detect malicious apps through the ML model accurately.
However, simply using permission as a criterion for distinguishing between normal and malicious apps is not enough to detect all malicious apps appearing with new attack technology.
addition, it is expected that higher accuracy of the detection model can be expected by combining feature information effective in detecting malicious apps, such as APIs that access and control sensitive data from users or adding other detection In future research, recent attackers bypass malicious app detection using obfuscation that hides malicious code and prevents static analysis of malicious apps.
It is necessary to think about how to detect malicious apps with such .
REFERENCES