doi:10.21833/ijaas.2020.07.007

	IJAAS
	International Journal of ADVANCED AND APPLIED SCIENCES EISSN: 2313-3724, Print ISSN: 2313-626X Frequency: 12





Volume 7, Issue 7 (July 2020), Pages: 56-67 ---------------------------------------------- Original Research Paper Title: Detecting phishing attacks using a combined model of LSTM and CNN Author(s): Subhash Ariyadasa^{1, 2,}, Subha Fernando¹, Shantha Fernando³ Affiliation(s):* ¹Department of Computational Mathematics, University of Moratuwa, Moratuwa, Sri Lanka ²Department of Computer Science and Informatics, Uva Wellassa University, Badulla, Sri Lanka ³Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka Full Text - PDF XML * Corresponding Author. Corresponding author's ORCID profile: https://orcid.org/0000-0002-7937-128X Digital Object Identifier: https://doi.org/10.21833/ijaas.2020.07.007 Abstract: Phishing, a social engineering crime which has been existing for more than two decades, has gained significant research attention to find better solutions to face against the very dynamic strategies of phishing. The financial sector is the primary target of phishing, and there are many different approaches to combat phishing attacks. Software-based detection approaches are more prominent in phishing detection; however, still, there is no robust solution that can stable for a long period. The primary purpose of this paper is to propose a novel solution to detect phishing attacks using a combined model of LSTM and CNN deep networks with the use of both URLs and HTML pages. The URLs are learned using an LSTM network with 1D convolutional, and another 1D convolutional network is used to learn the HTML features. These two networks were trained separately and combined through a sigmoid layer by dropping the last layer of each model to have the proposed model. The proposed model reached 98.34% in terms of accuracy, and that is above the previously recorded highest accuracy of 97.3% among the detection models used both URL and HTML features in the explored literature. The solution requires feature extraction only with HTML pages, and URLs were directly fed with a minimum pre-processing. Although the proposed solution uses extracted HTML features, those do not depend on third-party services. Therefore, an efficient real-time application can be implemented using the proposed model to detect phishing attacks to safeguard Internet users. © 2020 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Keywords: Phishing, LSTM, CNN, Cybersecurity Article History: Received 10 December 2019, Received in revised form 30 March 2020, Accepted 1 April 2020 Acknowledgment: No Acknowledgment. Compliance with ethical standards Conflict of interest: The authors declare that they have no conflict of interest. Citation: Ariyadasa S, Fernando S, and Fernando S (2020). Detecting phishing attacks using a combined model of LSTM and CNN. International Journal of Advanced and Applied Sciences, 7(7): 56-67 Permanent Link to this page Figures Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 Fig. 7 Tables Table 1 Table 2 Table 3 ---------------------------------------------- References (49) Adebowale MA, Lwin KT, Sanchez E, and Hossain MA (2019). Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Systems with Applications, 115: 300-313. https://doi.org/10.1016/j.eswa.2018.07.067 [Google Scholar] Afroz S and Greenstadt R (2011). Phishzoo: Detecting phishing websites by looking at them. In the 5^th International Conference on Semantic Computing, IEEE, Palo Alto, USA: 368-375. https://doi.org/10.1109/ICSC.2011.52 [Google Scholar] APWG (2019). 2^nd quarter 2019: Phishing activity trends report. Anti-Phishing Working Group. https://doi.org/10.1016/S1361-3723(19)30025-9 [Google Scholar] Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, and González FA (2017). Classifying phishing URLs using recurrent neural networks. In the APWG Symposium on Electronic Crime Research, IEEE, Scottsdale, USA: 1-8. https://doi.org/10.1109/ECRIME.2017.7945048 [Google Scholar] Baslyman M and Chiasson S (2016). "Smells phishy?": An educational game about online phishing scams. In the APWG Symposium on Electronic Crime Research, IEEE, Toronto, Canada: 1-11. https://doi.org/10.1109/ECRIME.2016.7487946 [Google Scholar] Cao Y, Han W, and Le Y (2008). Anti-phishing based on automated individual white-list. In the 4^th ACM Workshop on Digital Identity Management, Association for Computing Machinery, Alexandria, USA: 51-60. https://doi.org/10.1145/1456424.1456434 [Google Scholar] PMCid:PMC3451763 Chen KT, Chen JY, Huang CR, and Chen CS (2009). Fighting phishing with discriminative keypoint features. IEEE Internet Computing, 13(3): 56-63. https://doi.org/10.1109/MIC.2009.59 [Google Scholar] Chen W, Zhang W, and Su Y (2018). Phishing detection research based on LSTM recurrent neural network. In the International Conference of Pioneering Computer Scientists, Engineers and Educators, Springer, Zhengzhou, China: 638-645. https://doi.org/10.1007/978-981-13-2203-7_52 [Google Scholar] Chiew KL, Chang EH, Tan CL, Abdullah J, and Yong KSC (2018a). Building standard offline anti-phishing dataset for benchmarking. International Journal of Engineering and Technology, 7(4.31): 7-14. [Google Scholar] Chiew KL, Tan CL, Wong K, Yong KS, and Tiong WK (2019). A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences, 484: 153-166. https://doi.org/10.1016/j.ins.2019.01.064 [Google Scholar] Chiew KL, Yong KSC, and Tan CL (2018b). A survey of phishing attacks: Their types, vectors and technical approaches. Expert Systems with Applications, 106: 1-20. https://doi.org/10.1016/j.eswa.2018.03.050 [Google Scholar] Chou N, Ledesma R, Teraguchi Y, Boneh D, and Mitchell JC (2004). Client-side defense against web-based identity theft. Computer Science Department, Stanford University, Stanford, USA. [Google Scholar] Dong X, Clark JA, and Jacob J (2008). Modelling user-phishing interaction. In the Conference on Human System Interactions, IEEE, Krakow, Poland: 627-632. https://doi.org/10.1109/HSI.2008.4581513 [Google Scholar] Dou Z, Khalil I, Khreishah A, Al-Fuqaha A, and Guizani M (2017). Systematization of knowledge (sok): A systematic review of software-based web phishing detection. IEEE Communications Surveys and Tutorials, 19(4): 2797-2819. https://doi.org/10.1109/COMST.2017.2752087 [Google Scholar] Dunlop M, Groat S, and Shelly D (2010). Goldphish: Using images for content-based phishing analysis. In the 5^th International Conference on Internet Monitoring and Protection, IEEE, Barcelona, Spain: 123-128. https://doi.org/10.1109/ICIMP.2010.24 [Google Scholar] Egelman S, Cranor LF, and Hong J (2008). You've been warned: An empirical study of the effectiveness of web browser phishing warnings. In the SIGCHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, Florence, Italy: 1065-1074. https://doi.org/10.1145/1357054.1357219 [Google Scholar] El-Alfy ESM (2017). Detection of phishing websites based on probabilistic neural networks and K-medoids clustering. The Computer Journal, 60(12): 1745-1759. https://doi.org/10.1093/comjnl/bxx035 [Google Scholar] ENISA (2019). 15 top cyber-threats and trends: ENISA threat landscape report 2018. The European Union Agency for Network and Information Security, Heraklion, Greece. Available online at: https://bit.ly/3g5cVgd Fu AY, Wenyin L, and Deng X (2006). Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE Transactions on Dependable and Secure Computing, 3(4): 301-311. https://doi.org/10.1109/TDSC.2006.50 [Google Scholar] Gu X, Wang H, and Ni T (2013). An efficient approach to detecting phishing web. Journal of Computational Information Systems, 9(14): 5553-5560. [Google Scholar] Gupta BB, Tewari A, Jain AK, and Agrawal DP (2017). Fighting against phishing attacks: State of the art and future challenges. Neural Computing and Applications, 28(12): 3629-3654. https://doi.org/10.1007/s00521-016-2275-y [Google Scholar] Huang CY, Ma SP, Yeh WL, Lin CY, and Liu CT (2010). Mitigate web phishing using site signatures. In the TENCON 2010-2010 IEEE Region 10 Conference, IEEE, Fukuoka, Japan: 803-808. https://doi.org/10.1109/TENCON.2010.5686582 [Google Scholar] Jain AK and Gupta BB (2016). A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016: 9. https://doi.org/10.1186/s13635-016-0034-3 [Google Scholar] Jain AK and Gupta BB (2017). Phishing detection: Analysis of visual similarity based approaches. Security and Communication Networks, 2017: 5421046. https://doi.org/10.1155/2017/5421046 [Google Scholar] Jain AK and Gupta BB (2018). PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Bokhari M, Agrawal N, and Saini D (Eds.), Cyber security: 467-474. Springer, Singapore, Singapore. https://doi.org/10.1007/978-981-10-8536-9_44 [Google Scholar] Jeeva SC and Rajsingh EB (2016). Intelligent phishing URL detection using association rule mining. Human-Centric Computing and Information Sciences, 6(1): 1-19. https://doi.org/10.1186/s13673-016-0064-3 [Google Scholar] Joshi Y, Saklikar S, Das D, and Saha S (2008). Phishguard: A browser plug-in for protection from phishing. In the 2^nd International Conference on Internet Multimedia Services Architecture and Applications, IEEE, Bangalore, India: 1-6. https://doi.org/10.1109/IMSAA.2008.4753929 [Google Scholar] PMid:18604105 Khonji M, Iraqi Y, and Jones A (2013). Phishing detection: A literature survey. IEEE Communications Surveys and Tutorials, 15(4): 2091-2121. https://doi.org/10.1109/SURV.2013.032213.00009 [Google Scholar] LeCun Y, Bengio Y, and Hinton G (2015). Deep learning. Nature, 521: 436-444. https://doi.org/10.1038/nature14539 [Google Scholar] PMid:26017442 Li Y, Yang Z, Chen X, Yuan H, and Liu W (2019). A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, 94: 27-39. https://doi.org/10.1016/j.future.2018.11.004 [Google Scholar] Mao J, Tian W, Li P, Wei T, and Liang Z (2017). Phishing-alarm: Robust and efficient phishing detection via page component similarity. IEEE Access, 5: 17020-17030. https://doi.org/10.1109/ACCESS.2017.2743528 [Google Scholar] Mohammad RM, Thabtah F, and McCluskey L (2014a). Intelligent rule-based phishing websites classification. IET Information Security, 8(3): 153-160. https://doi.org/10.1049/iet-ifs.2013.0202 [Google Scholar] Mohammad RM, Thabtah F, and McCluskey L (2014b). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2): 443-458. https://doi.org/10.1007/s00521-013-1490-z [Google Scholar] Nguyen LAT, To BL, Nguyen HK, and Nguyen MH (2014a). An efficient approach for phishing detection using single-layer neural network. In the International Conference on Advanced Technologies for Communications, IEEE, Hanoi, Vietnam: 435-440. https://doi.org/10.1109/ATC.2014.7043427 [Google Scholar] Nguyen LD, Le DN, and Vinh LT (2014b). Detecting phishing web pages based on DOM-tree structure and graph matching algorithm. In the 5^th Symposium on Information and Communication Technology, Association for Computing Machinery, Hanoi, Viet Nam: 280-285. https://doi.org/10.1145/2676585.2676596 [Google Scholar] Nirmal K, Janet B, and Kumar R (2015). Phishing-the threat that still exists. In the International Conference on Computing and Communications Technologies, IEEE, Chennai, India: 139-143. https://doi.org/10.1109/ICCCT2.2015.7292734 [Google Scholar] Opara C, Wei B, and Chen Y (2019). HTMLPhish: Enabling accurate phishing web page detection by applying deep learning techniques on HTML analysis. Available online at: https://bit.ly/2zV0ymk Pham TTT, Hoang VN, and Ha TN (2018). Exploring efficiency of character-level convolution neuron network and long short term memory on malicious URL detection. In the 7^th International Conference on Network, Communication and Computing, Association for Computing Machinery, Taipei City, Taiwan: 82-86. https://doi.org/10.1145/3301326.3301336 [Google Scholar] PMCid:PMC6162070 Prakash P, Kumar M, Kompella RR, and Gupta M (2010). Phishnet: Predictive blacklisting to detect phishing attacks. In the IEEE INFOCOM, IEEE, San Diego, USA: 1-5. https://doi.org/10.1109/INFCOM.2010.5462216 [Google Scholar] Pratiwi ME, Lorosae TA, and Wibowo FW (2018). Phishing site detection analysis using artificial neural network. Journal of Physics: Conference Series, 1140: 1. https://doi.org/10.1088/1742-6596/1140/1/012048 [Google Scholar] Rosiello AP, Kirda E, and Ferrandi F (2007). A layout-similarity-based approach for detecting phishing pages. In the 3^rd International Conference on Security and Privacy in Communications Networks and the Workshops-SecureComm 2007, IEEE, Nice, France: 454-463. https://doi.org/10.1109/SECCOM.2007.4550367 [Google Scholar] Sahingoz OK, Buber E, Demir O, and Diri B (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117: 345-357. https://doi.org/10.1016/j.eswa.2018.09.029 [Google Scholar] Sahoo D, Liu C, and Hoi SC (2017). Malicious URL detection using machine learning: A survey. Available online at: https://bit.ly/2LKpSyg Sheng S, Magnien B, Kumaraguru P, Acquisti A, Cranor LF, Hong J, and Nunge E (2007). Anti-phishing phil: The design and evaluation of a game that teaches people not to fall for phish. In the 3^rd Symposium on Usable Privacy and Security, Association for Computing Machinery, Pittsburgh, USA: 88-99. https://doi.org/10.1145/1280680.1280692 [Google Scholar] PMid:17433705 Subasi A, Molah E, Almkallawi F, and Chaudhery TJ (2017). Intelligent phishing website detection using random forest classifier. In the International Conference on Electrical and Computing Technologies and Applications, IEEE, Ras Al Khaimah, UAE: 1-5. https://doi.org/10.1109/ICECTA.2017.8252051 [Google Scholar] Weider DY, Nargundkar S, and Tiruthani N (2008). A phishing vulnerability analysis of web based systems. In the IEEE Symposium on Computers and Communications, IEEE, Marrakech, Morocco: 326-331. https://doi.org/10.1109/ISCC.2008.4625681 [Google Scholar] Whittaker C, Ryner B, and Nazif M (2010). Large-scale automatic classification of phishing pages. Available online at: https://bit.ly/2WJqZV4 Wu M, Miller RC, and Garfinkel SL (2006). Do security toolbars actually prevent phishing attacks? In the SIGCHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, Montral, Canada: 601-610. https://doi.org/10.1145/1124772.1124863 [Google Scholar] Zhang Y, Hong JI, and Cranor LF (2007). Cantina: A content-based approach to detecting phishing web sites. In the 16^th International Conference on World Wide Web, Association for Computing Machinery, Banff, Canada: 639-648. https://doi.org/10.1145/1242572.1242659 [Google Scholar]