Abstract
The Web provides various kinds of data and applications that are readily available to explore and are considered a powerful tool for humans. Copyright violation in web documents occurs when there is an unauthorized copy of the information or text from the original document on the web; this violation is known as Plagiarism. Plagiarism Detection (PD)can be defined as the procedure that finds similarities between a document and other documents based on lexical, semantic, and syntactic textual features. The approaches for numeric representation (vectorization) of text like Vector Space Model (VSM) and word embedding along with text similarity measures such as cosine and jaccard are very necessary for plagiarism detection. This paper deals with the concepts of plagiarism, kinds of plagiarism, textual features, text similarity measures, and plagiarism detection methods, which are based on intelligent or traditional techniques. Furthermore, different types of traditional and algorithms of deep learning for instance, Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) are discussed as a plagiarism detector. Besides that, this work reviews many other papers that give attention to the topic of Plagiarism and its detection.