Ransomware Detection System Based on Machine Learning

Every day, there is great growth of the Internet and smart devices connected to the network. Additionally, there is an increasing number of malwares that attack networks, devices, system and applications. One of the biggest threats and newest attacks in cybersecurity is Ransom Software (Ransomware). Although there is a lot of research on detecting malware using machine learning (ML), only a few focus on ML-based ransomware detection, especially attacks targeting smartphone operating systems (e.g., Android) and applications. In this research, a new system was proposed to protect smartphones from malicious applications through monitoring network traffic. Six ML methods (Random Forest (RF), k-Nearest Neighbors (k-NN), Multi-Layer Perceptron (MLP), Decision tree (DT), Logistic Regression (LR), and eXtreme Gradient Boosting (XGB)) are applied to CICAndMal2017 dataset which consists of benign and various kinds of android malware samples. 603288 benign and ransomware samples were extracted from this collection. Ransomware samples were collected from 10 different families. Several types of feature selection techniques have been used on the dataset. Finally, seven performance metrics were used to determine the best feature selection and ML classifiers for ransomware detection. The experiment results imply that DT and XGB outperform other classifiers with best detection accuracy at more than (99.30%) and (99.20%) for (DT) and (XGB) respectively.


Introduction
The Android OS covers the world with 85% of smart devices and phones market share and it continues to grow [1]. In the last years, relying on smart devices (especially smartphones) has increased in daily activities ranging from studying, shopping, and entertainment, to financial transactions [2]. The reasons for this are due to recent technological developments, the widespread of smartphones and contemporary conditions (such as the COVID-19 quarantines). These reasons have the Android OS the main target for attackers [3]. Unfortunately, Android differs from other mobile OS, in that it maintains openness and does not impose many restrictions on users for application uploading and downloading. It leaves the safety of the phones and devices in the user's hands through letting them decide whether or not to install an application thus, smartphones have become more liable to cyber-attacks [1].
Cyber criminals are developing malicious applications to target individuals, companies and even governments. Ransomware takes over the victim's device, and blocks or encrypts the data, therefore, preventing the victim from using the device. The victim can get back to using the device or its data only if ransom is paid [4]. Ransomware made history in 2020 as it contributed to the first reported death related to a cyber-attack, when a German hospital was attacked by ransomware, causing a lock out of their systems and preventing treatment of patients. Consequently, a woman in need of urgent help died [5].
According to Cisco Annual Internet Report [6], more than 299 billion mobile applications will be downloaded and used in 2023. With the great growth and increasing use of applications, network interaction and utilization has increased substantially via these applications [7]. Besides, network traffic has increased dramatically due to many permanently linked applications like social networking applications. Now, with the fast evolution of the Internet, the fifth generation will make AI systems, the Internet of Things and self-driving cars the most important tools that humans use in their lives. By 2023, the fifth generation connection will create three times more traffic than the fourth generation connection [6]. To detect Android malware, antivirus software uses standard code analysis and signature detection techniques which are known to hackers. The use of ML methods for network traffic detection is one of the best solutions used as effective ransomware detection [3].

Malware
The malicious software, which is known as malware, is one of the most dangerous and most common cyber threats. These programs are built to collect sensitive data and information, disrupt, damage, or gain unauthorized access to applications, system, or networks [8]. In addition to losses in money, information, time, and infrastructure for individuals, companies and institutions, these victims could incur other harms like loss of lives. According to the malware function and its proliferation systems, malware can be divided (not exclusive categories) into several types such as (ransomware, adware, virus, worm, trojan, bot, scareware, etc.) [9].

Static & dynamic analysis
To save phones and smart devices from threats that attack android OS, various solutions based on features analysis were proposed. The static analysis is the approved technique by antivirus companies. It is a passive approach based on signature check and source code by educing the features from the source code of the applications or extracting them from the binary strings. This means testing files or applications without running its code. It is a faster and safer approach that generates rich information about malware samples. But it suffers several flaws. Cyber-criminals use different polymorphism and obfuscation techniques to overcome the detection systems via packing and encryption [9].
Another solution is dynamic analysis which involves executing the malicious file in an isolated and safe environment (e.g., sandbox) in order to know the real behavior, that is, the way it interacts with the underlying OS, and analyze their execution logs. But this technique needs more processing capacity and battery power. It is difficult because sometimes the malware that is being analyzed in a safe environment differs from the real one, and it is not available for all researchers [12]. both analysis techniques have their benefits and flaws. The static analysis of features is safer and faster than dynamic but, malicious programs can avoid detection using techniques of code obfuscation. In contrast, polymorphic malware techniques and code obfuscation hardly evade dynamic analysis because malware is monitored and analyzed at execution time. Therefore, to use dynamic analysis at the lowest cost, one of the best solutions is to use CICAndMal2017 dataset which is available and based on real-time network traffic features. The contributions of this research are: 1. Using the data preprocessing techniques on the network traffic dataset and explaining the benefits of these techniques on the performance of android ransomware detection. 2. Understanding feature selection techniques and comparing between them to determine the best option. 3. Applying ML methods on the dataset based on best features to determine the best methods for ransomware detection. 4. This research aspires to discover the perfect method to detect ransomware via monitoring network flow, which is done by comparing the research with related works that have used the same dataset. The remaining parts of this paper are as follows: section 2 reviews the related works to detect malware by ML methods. The details of dataset and network traffic are explained in section 3. The proposed model is discussed in section 4. The data preprocessing techniques are shown in section 5. The experiments in feature selection techniques, ML classifiers, and the results are explained in section 6.

Related Works
Much research has been conducted in the area of Android malware detection. This research relied on static, dynamic, or hybrid features analysis. In 2017, Chen et al. designed a novel system that utilizes data mining techniques with dynamic analysis to monitor the Application Programming Interface (API), which is the interaction procedures and protocols between applications, for ransomware detection. In order to create API call flow, the authors monitor the behaviors of software, then, they commence mapping the API calls in a feature space. The researchers applied data normalization technique and methods of feature selection to select the best for discriminating between ransomware and benign software. Then, four data mining algorithms were used for building the detection model. The experimental results show that SL algorithm can achieve 98.2% and 97.6% accuracy and detection rate respectively [13].
In 2018, Al-rimy et al. conducted research that provides an important and detailed review of ransomware. This survey provides a comprehensive demonstration and study of the latest technology to detect and prevent ransomware. The authors made a new classification of ransomware from various viewpoints. They explain the factors and circumstances that helped make these attacks successful, and discuss the related research into struggling ransomware, with various solutions for analyzing these attacks, as well as detecting, preventing, and predicting them [10].
In 2018, Cusack et al. used a programmable forwarding engine (PFEs) which collects network monitoring data for per-packet. This data was utilized to monitor the flow of network between the command and control (C&C) server and an infected computer. After feature extraction from the flow, the authors used this data to classify ransomware. The classification model achieved 86% detection rate with 11% false negative rate [12].
In 2018, Zhang et al. utilized static feature analysis for ransomware classification. First, from ransomware samples, the authors transformed opcode sequences into N-gram sequences. Then, they treat N-gram vectors as feature vectors. Next, they introduce these vectors into five ML methods to classify ransomware. The models are validated with six metrics. The proposed approach achieves 91% accuracy, and 99% accuracy of binary classification [14].
In 2018, Alhawi et al. proposed a NetConverse, which is a ML method for ransomware detection. They use TShark to create a dataset from the conversation of network traffic. Features are extracted and fed into ML classifiers that achieved 97% accuracy for DT and 96% for Logistic Model Tree (LMT) [15].
In 2019, another work utilized the features of network to detect ransomware using ML classifiers. Kaiiali et al. focus on crypto ransomware network activities. Network features are extracted and fed for two classifiers that operate in parallel on (packet and flow levels). The detection accuracy of the two proposed levels was 97.92% and 97.08% respectively [16].
In 2019, Noorbehbahani et al. applied two experiments on CICAndMal2017 dataset to analyze six ML techniques (DT, RF, Random Tree (RT), k-NN, NB, and SVM) for ransomware detection. Firstly, a dataset was applied with different forms and classes of ransomware on ML classifiers. Then they were applied to 10 ransomware families separately on classifiers. The results show that RF was the best in both experiments. The highest detection accuracy belongs to RF with accuracy score 83%, and 79% for DT [7].
In 2020, Moussaileb et al. suggested an analysis of different families of ransomware depending on collected logs form a device system and network. In order to packet detection, the authors delved into the malware network traffic that created by these samples. This work shows that using DT to detect zero-day attacks provides high detection rates among other ML algorithms [17].
In 2020, Sangal et al. used ML methods for new android malware detection. They applied many techniques (RF, KNN, SVM, and NB) on a dataset of android applications with permissions and intent features. First, they performed data pre-processing to handle missing values. Next, they used feature selection to minimize the dataset dimensions. They used AndMal2019 dataset which was provided by CIC. The best detection result was 96% using RF classifier [18].
In 2020, during the lockdown due to COVID-19, everyone sat at home and their interactions with others increased mostly through smartphones. Hence, this presented an opportunity for cyber criminals to develop malware-infected applications. For this purpose, Sangal et al. proposed a new system which focuses on machine learning and signature-based methods to detect known Android malware. In this work, 11,000 distinct Android applications belonging to twelve different Android application categories were collected. Ten feature selection methods were used to reduce the dimensions of the dataset. For Android malware detection, Deep Neural Network (DNN) machine learning technology was used, and it achieved 97% of malware detection points from real-world applications [19].
In 2020, 67 research papers for malware detection and classification were reviewed in a deep survey by Mateu et al. This study aimed ats providing a detailed and systematic overview of ML methods (especially Deep learning DL methods) for malware detection. It also offered a description of the features and methods in a traditional ML process, from feature extraction, and selection steps to detection and classification. It explained all feature analysis methods with all the branches [20].
In 2021, a survey for risks of cyber-attacks and Advanced Persistent Threat (APT) attacks was made. Lee et al. touched upon the rapid development of APTs and their use of AI techniques to design the new kind of ransomware, that spreads quickly between users of IoT devices and smartphones to infect the largest number of them at the same time. The authors proposed using detection and response tools which can quickly extract ransomware attack features and respond of this threat. They built an open-source framework that enables ransomware detection at the system and network level [21].

The Network Traffic and Dataset used
Network traffic or data traffic refers to the data that moves across the network at any time. It consists of a sequence of packets (packet is the smallest unit of data that is passed over a network). Each packet includes Payloads (raw data) and Headers (metadata) that contain basic flow information [15]. One of the best methods to detect malware is to monitor malicious network traffic which can uniquely offer a clear view of the behavior of malware applications. When a malicious program infects a victim's device, it may establish a connection to an external server to perform a malicious operation like download updates or other malware, to obtain new commands, or to steal sensitive and private information [17]. Therefore, monitoring network traffic that enters the network and leaves it, intra-network traffic and device activity, provides important and useful information to disclose malicious behavior.
The dataset used in this research was obtained from the Canadian Institute for Cybersecurity [22]. It is a collection of benign samples and several malware types. The AndMal2017 dataset includes network traffic, API/SYS calls, memory dumps, logs, and phone statistics with 42 malware families. The previous works proved that network traffic can be utilized to detect and classify android malware. Therefore, this research focuses on the network traffic feature for detecting ransomware applications. 603288 ransomware and benign data samples were extracted with network flow features that consists of six columns for each flow (FlowID, SourceIP, DestinationIP, SourcePort, DestinationPort, and Protocol) and 79 network traffic features. This dataset was created via CICFlowMeter software [23] which is a network traffic flow generator and analyzer.

Ransomware Dataset
In this research, 353288 ransomware samples were used with 85 features which were collected from 10 popular ransomware families. Table I lists the behavior and characteristics of ransomware and the number of samples used for each one of the families.

Benign Dataset
The benign applications used in this research were published in 2015, 2016 and 2017 in Google play market. These applications are more than six thousand and they have been grouped based on the popularity of the applications (best free new and most free popular) for each class available in the market. These applications were checked in Virustotal Web Service with two Antivirus Products (AV) [24]. 250000 benign samples with 85 features of network traffic were extracted and used in this research. These features can be categorized into classes such as (Flow-ID, Packet-based, Byte-based, Flow-based, Time-based).

The Proposed System
The proposed system in this research consists of five steps as shown in Figure I.

Figure I.
The methodology of the proposed system The five steps can be briefly explained as follows: A. Gathering a network feature dataset using traffic capture such as CICFlowMeter [23] or from sober website such as [22]. B. Data preprocessing is the second step, through removing the missing value in the columns, removing features with low variance values. Then, the technique of data normalization was used to scale and modify the data in the range [0 and 1] via (Max-Min method). It is notable that in this research, the dataset used does not contain any missing values. C. The third step is feature selection. Several techniques were used to analyze and select the best features from the dataset. D. Six algorithms of ML were applied on the selected features from the previous step. The dataset used in this work was divided into 80% for training the algorithms, and the rest for testing. E. Finally, (Testing and Evaluation step), the 20% remaining dataset that broke up from the total data was used for testing the ML classifiers.

The Data preprocessing
The analysis shows that the CICAndMal2017 dataset includes 85 features of network traffic. The names of the features are shown in the Table II, along with their values that are an instance in the dataset. After analyzing the dataset, it was noted that its features have different value ranges, in addition to features (columns) that have complete zero values. Therefore, the dataset needs data preprocessing operations before using ML algorithms on it, as follows.

Deleting the low variance columns (features)
When a dataset includes features that has values with a very slight variance or has same value for all rows in the column, then these features will not add any informative power to the model [25]. Hence, using these features also adds an unnecessary computational burden and should be removed from the dataset. In order to improve the model performance, the technique of deletion features with low variance is used. The Variance Threshold technique which was provided by sklearn [26] was used in this research. VarianceThreshold is a simple basic feature selector that deletes the low-variance columns. This technique only handles the input columns (X), not to the target column (y), and it is most useful when used for unsupervised learning. After applying this technique on the dataset, twelve low-variance features were removed from the whole feature set as shown in Table III. The remaining features are 69 out of a total of 85 features. Four columns were removed from the data that were entered to ML algorithms because their values are string and cannot be used for training the algorithms. Up to this step, the data entered is 69 features.

Normalizing the dataset values
Researchers always aspire to get the best performance of the designed system. Data normalization is a very important technique that is used to improve the performance of the ML system [13]. The reason for this is that some datasets (e.g., CICAndMal2017 dataset) include features with very different values, ranges, and scales. For example, one of the CIC dataset features is "FIN Flag Count" which is (the number of packets with FIN), the values of this feature are 0 or 1. Whereas the values of "Flow Duration" feature (duration of the flow in microsecond) is in the range of tens to millions of microseconds. When a feature selection technique is used on these features, then this technique tends to bias toward larger values over smaller values. To solve this problem, a data normalization technique is used. In this research, the Max-Min technique was used to normalize data attribute in the dataset within the range [0-1] using equation 1 [25], where Z ϵ [min, max] and z' ϵ [0, 1].

Experiments
Several experiments were performed on the proposed system in order to obtain a perfect result. These experiments included:

The Techniques of Feature Selection
The efficiency of the detection and classification system usually depends on the nature and quality of the dataset. Also, data with higher dimensions increases noise and may lead to a complex detection model thus, the need to use feature selection appears. The feature selection or (feature reduction) technique is defined as the operation of identifying and selecting the feature that more related to the desired output variable, and thus, reducing the mathematical and statistical operations [27,28]. The results are (reducing classification time, reducing overfitting, and increasing accuracy). It is especially important to use feature selection technique particularly when the dataset is very huge. To find out the benefit of using this technique, first all feature sets (69 features) were used to train and test the six ML classifiers. After this, the next two feature selection techniques were used to train and test the ML classifiers.

Univariate Feature Selection Technique
The statistical test Univariate feature selection was used to determine which of the features are the best and choose the one that has a strong relationship with the target via univariate statistical tests. When it analyzes the relationship between one feature and its goal, it ignores the other features. That is the reason to called it 'univariate' because each feature has its result. In the end, all the results are compared, and then f-test or (f-statistic) method was used to select features with top scores. f-test is a method that is used when the input data is in numerical form and the output is categorical. The sklearn Python provides f_classif() function which is implementation of f-test method [29]. Finally, SelectKBest( ) function [30] was used to choose the best (10,15,20,25) features out of the 69 total features.

SelectFromModel technique
SelectFromModel is a technique that is used with a model (estimator) which has feature_importantance attribute. The best features, which are the most important features, are chosen according to feature weights [31]. SelectFromModel handles all features at the same time, thus it can capture interactions compared to univariate feature selection. Three models (Random_Forest, Extra_Trees, and Logistic_Regression) were used to fit data and select features. The results of applying SelectFromModel technique on the dataset was selection of 21 features with RF model, 20 features with ET model, and 17 features with LR model out of the whole number of features. The effect of using these features to train and test the ML classifiers is shown in the results in Table V.

Machine Learning Classifiers
Six ML classifiers which are commonly used in the field of cybersecurity [3,9,15] have been used to detect Android ransomware in this research. The focus on these classifiers was to measure their efficacy in detecting ransomware when used with data normalization and feature selection techniques that were not used in previous works. Another focus is also to design a simple and perfect security system based on machine learning techniques and methods for early detection of ransomware in network traffic before it hits the target. The ML classifiers used are:

Decision Tree (DT):
DT is a simple regression and classification method. It is a Supervised ML sequential model where the data is constantly split according to a certain parameter with a series of tests, similar to a flow chart structure where the inner node denotes a test on a feature. The leaf node holds a class label, and each branch of tree represents a result of the test. The DT flow starts when the features which were extracted from a new sample are introduced to the tree. Then it creates a group of questions to ask of this sample's features sequentially. The max_depth parameter (maximum depth of the tree) is very important to control the efficacy of the tree. The tree with most depth produces the best result, but it will require a lot of time and calculations to process [3,32]. In decision tree algorithm, the input is the maximum depth of the tree, which is 40 in this research, and the strategy (best or random) used to select the split at each node, was the 'best' in this research. The most important feature will be the root of the DT, then other features will be distributed from the top of the DT to the bottom depending on several sequential questions (decisions) and the results of information gain. The output in DT is the final nodes which represent the results of the questions (the target feature values), in this research the output is the class of the sample (benign or ransomware).

Random Forest (RF):
RF is an ensemble method that is based on and consists of many DTs and bagging techniques. Bagging demands train each DT on a part of the whole dataset. Each tree gets its classification, and finally the classification is done using majority voting on the DTs results. The most important parameters are max_depth that define the maximum depth of the tree, and n_estimators, which define the number of trees in the forest [9,32]. In random forest algorithm, the input is the number of decision trees and the maximum depth of the tree used to train the dataset. In this research, the number of the trees in the forest was 50, and the maximum depth of each tree was 25. Several decision trees will produce several classification results. Thus, the output of random forest for classification problems is the majority vote of the most trees, which represent the classification result (ransomware or benign).

Logistic Regression (LR):
LR is an algorithm that is based on 'Statistical Learning' method. It is used for regression and classification tasks. LR is a probability-based prediction algorithm that uses sigmoid function to transform the output and returns value of probability. It separates between the samples by making a boundary (hyperplane or line). For the new samples, LR examines these samples to learn on which side of the hyperplane they are located, then it makes the decision. The most important parameters are max_iter parameter which define the maximum number of iterations [3,9]. In logistic regression algorithm, the input data with N samples and M features was used to train the algorithm with maximum number of iterations, which is 150 iterations. The multi_class parameter was set to 'ovr' to handle binary classification issues. The solver parameter (the strategy of the optimization problem) was set to 'lbfgs' that process complex dataset in faster way. These parameters were set to produce a predictive model for the output variable (benign or ransomware) using sigmoid function, and this is the output of LR.

k-Nearest Neighbor (k-NN):
it is one of supervised ML algorithm that are used in the tasks of regression and classification. It assumes that similar things exist in close proximity (near) to each other. It stores the part of training data and does not make the prediction until it receives the part of test data. The process of prediction is done when getting the instance of test data, then it scans the training data for the k most similar neighbors. For this, it is computationally expensive. The most important parameter is n_neighbors, which is defined as the number of neighbors to use [3,9]. In k-NN algorithm, the input is the k-nearest training samples in the dataset, which equal to 5 in this research. All points in neighborhood have same weights through using 'uniform' weight function. The prediction for each sample in the test data is calculated using Euclidean metric. The output in k-NN for classification problems is a class membership, which represents a prediction of whether the sample is ransomware or benign by a majority vote of the sample's neighbors.

XGBoost (XGB):
It is an open-source library that has recently been used in many ML applications. It provides a high-performance implementation of gradient boosted DT to solve many data science issues in an accurate and fast way. Boosting involves training multiple poor DTs at successive steps to enhance the prediction. A poor DT model can only perform well on part of the training data, where multiple poor learners are combined selectively to produce a much powerful learning model. The learning_rate parameter is a hyperparameter that controls the changing in the weights updating of a model in response to the estimated error [3]. In XGBoost algorithm, the input is the number of gradient boosted trees (which set to 50 with n_estimators parameter) and the maximum depth of the tree (which set to 25 with max_depth parameter). Repeatedly, each tree makes its prediction and computes the error for the output variable, and these errors are then used to build the next tree. The output of XGBoost is computing through adding the new tree prediction to the predictions of the previous trees, then the final prediction result (ransomware or benign) is found.

Multi-Layer Perceptron (MLP):
The Deep Learning MLP is a class of feedforward ANN. It utilizes backpropagation supervised learning technique for training. It consists at least of 3 layers of nodes (input, hidden, and output) with linear or nonlinear activation function. Each node in any layer is connected to all the nodes in the next layer (Fully-Connected Layer). The most important parameters are the hidden_layer_sizes which are define as the number of hidden layers with the number of its elements, and the activation parameter which is defined as the activation function for the hidden layer [3,32]. In MLP algorithm, the input is defined by the nodes in input layer which equal to the features count, and the hidden layers (3 in this research) with different numbers of nodes via the parameter 'hidden_layer_sizes = (64, 128, 64)'. The activation function is 'relu' and the maximum number of iterations = 20. After several times of calculating the weighted sum, applying the activation function, and weights adjusting, the MLP output is found with one node in output layer which its result is (Ransomware or Benign).

Performance Metrics
To evaluate ML classifiers, seven metrics based on confusion matrix were used in this experiment, Confusion Matrix (CM) is an error matrix which is used in ML fields specifically in statistical classification issues. It is a table layout that shows an algorithm performance. Each column of CM refers to the instances in a predicted class, and each row in CM refers to instances in an actual class. CM is explained in Table IV: Where: • TP: malware samples count that are correctly classified. • F-score (F): it is a measure of a test's accuracy that evaluates the system performance by uniting both precision and recall in on value • Classification Time (T): the required time to train the model

Results
The Table V presents the results of using ML methods for android ransomware detection. From Table V, the benefits of using feature selection techniques can be noted, especially where it selects the best number of the most important features, which contributed to improving the performance of the system in terms of high detection accuracy, precision, recall and f-score, and reducing the FPR rate. In addition, feature selection technique has reduced training time to a quarter of the required time to train all features, and the reason for this was the decrease of computations and processing operations.
Also, from Table V, it can be concluded that both DT and XGB classifiers are the best method used for ransomware detection with detection accuracy exceeding 99%. DT is faster than XGB when compared in terms of "classification time" metric because DT is a single tree that can process a large dataset in a short time. The XGB classifier provides exceedingly high accuracy and it prevents overfitting, but it is more difficult than DT and it requires more arithmetic operations (as with RF) and a lot of training time as it consists of several trees. k-NN classifier provided superior performance with a detection accuracy of more than 98%. Although the efficiency of k-NN depends on n_neighbors parameter, it requires more training time and more storage space because it is a lazy algorithm, k-NN does not learn data generalization at the training phase, it delays that to the testing phase. The detection accuracy for MLP classifier exceeded 91%, and the experiments proved that increasing the number of iterations in MLP leads to better results, but it takes more training time. The detection accuracy for LR was low (≈ 85%). The reasons for this are that LR is suitable to process linear problems and simple datasets with separable data (uncorrelated features), because LR has a linear decision surface.
In comparison with previous works, Table VI presents the detection accuracy for the proposed system and the works [7] and [16] which used the same dataset (the CICAndMal2017 dataset) for android ransomware detection. Also, the table shows a comparison with the work [15] which used the dataset with the same features that were extracted from network traffic for android ransomware detection. From Table VI, it can be concluded that the proposed system gives a higher accuracy rate for ransomware detection (in bold font) than other works. According to the experiments, the benefits of dynamic analysis for ransomware detection from its early phases of entering the device over the network become clear. Also, it can be concluded that using a number of network traffic features (e.g., 21 features by RF selector, 20 by ET selector ... etc.) is considered effective in distinguishing ransomware from benign. The proposed system gives high accuracy (more than 99%) for both DT and XGB classifiers, and low FPR (0.010) and (0.011) for DT and XGB respectively. So, it can be stated that the proposed ML methods are effective for Android ransomware detection when applied to network traffic features. Thus, it can be adopted as a method for detecting ransomware applications on smartphone.

Conclusion and Future works
Some attackers can recognize patterns of malicious and benign applications and then attempts to simulate a specific class of traffic (for example the duration of the flow) or use fake IP address, but it is impossible to change the variance of some or all features for evading the detection system as in the proposed system. For this reason, it is important to highlight that this research differs from other works that utilize one category of network traffic. This research focused on selecting the best features from the total feature set. The analysis results of the experiments show that the features of network traffic are very suitable when utilizing them to detect the ransomware, as the dataset used in this analysis has been extracted from online network traffic. Several techniques were used to select best features, and six ML algorithms were applied for android ransomware detection. Finally, seven performance metrics were used to evaluate ML classifiers. The results showed that the average detection accuracy was (more than 99%) for DT and XGB, and FPR is (0.016% and 0.029%) for DT and XGB respectively. For future work, the suggestion is using network traffic features for detection and classification of other android malware types. Also, the aim is to choose other types of features (e.g., utilizing memory dump, permission, logs, or API calls) to develop a widespread Android ransomware detection framework.