Adopting Text Similarity Methods and Cloud Computing to Build a College Chatbot Model

A chatbot is a computer program which is designed to interact with users and answer questions. Nowadays, chatbots are one of the most common systems that are used in many fields and by different companies to achieve different tasks. Cloud computing is gaining increasing interest. A myriad of fields and applications have been developed based on cloud computing. In this paper, a college chatbot was developed and implemented to assist students to interact with their college and ask questions related to faculty, activities, exams, admission, amongst other tasks. Text similarity algorithms were adopted to achieve the proposed system. More specifically, cosine similarity and jaccard similarity algorithms were used to find the closest question in the dataset. Firebase real-time database, which is one of the Google cloud services, was used as a connector channel between users and the chatbot server. Experiments were conducted to evaluate the performance of cosine similarity and jaccard similarity methods, and to compare the results of both. In addition, real-time database was also evaluated as a chatbot connecter channel.


INTRODUCTION
Chatbots can be defined as computer programs designed to automatically interact with users to answer their questions [1]. Using chatbots is common in companies, banks, education, amongst other tasks. Different techniques have been developed and adopted to build different kinds of chatbot systems. Basically, natural language processing (NLP) techniques and machine learning algorithms are used to develop chatbot systems. [2][3] Developing a chatbot is an arduous task, therefore different platforms, such as IBM Watson and Microsoft Bot Framework, are available to help developers build chatbots. The main drawback of using these platforms is that not all languages are supported. In addition, these platforms do not produce stand-alone chatbots. Accounts on social media such as Facebook, Slack, and Wechat are needed to chat with the bot. In general, chatbots can be divided into different types based on the approach used to develop them [4]. The main two types are retrieval-based approach and generative-based approach [5] [6]. In the retrievalbased approach, a dataset of predefined responses is used to answer questions. The main function of this kind of chatbots is to find the closest question in the dataset and retrieve the corresponding answer. To find the best answer, the score of similarity between the user question and the predefined questions in the dataset needs to be calculated [7]. In contrast, in the generative approach, machine learning algorithms and deep learning models are used to generate answers from scratch. This approach needs a large dataset with millions of examples to train the model. [1] From another point of view, chatbots may be closed domain or open domain. Closed domain means that the chatbot concentrates on one specific field or topic. Open domain, however, means that there is no specific topic and the chatbot may answer any question. In this work, the retrieval-based approach was used to build a closed domain college chatbot to answer students' queries. Using the introduced system, students can ask questions regarding the College of Computer Science of the University of Mosul. The College chatbot would help in reducing the response time and the effort which is needed to answer students' queries.

RELATED WORKS
Much research has been conducted on developing chatbots. Different methods and techniques have been introduced to build chatbots. In [8], a web application was developed to interact with students. Bigram was used to find the similarity between sentences. In [9], WordNet, which is a lexical database was used to find similarities. In [10] AIML (Artificial Intelligent Markup Language) was used beside the open source project "program -o" to develop the chatbot. In [11], cloud-based cognitive services were used. IBM OpenWhish, which is a serverless platform, was used to implement the chatbot. Microsoft Bot Builder was used in [4] to build an English chatbot. Clearly, using public cloud services has limitations. Cloud services are not free, and chatbot platforms do not support all languages. For example, many chatbot platforms do not support Arabic.
In this work, a chatbot, that can support any language, was developed from scratch. In addition, this work introduced a stand-alone mobile-based chatbot, since it is not developed based on any chatbot platforms, or any social media platform.

TEXT SIMILARITY
Measuring similarity between texts is one of the most common techniques that are used to achieve various tasks which are related to data mining and information retrieval. In general, text similarity algorithms aim to find how two sentences or documents are similar relying on some mathematical concepts and equations [12].Text similarity methods have been used to build different kinds of systems such as translation systems, plagiarism detection systems, text clustering, and short-answer grading. Some examples of text similarity algorithms are cosine similarity, Levenshtein distance, Jaccard distance, and Euclidian distance.

Cosine Similarity
Cosine similarity method calculates similarity (angle) between two non-zero vectors [12] [13]. Cosine similarity is calculated as: Before applying the cosine similarity to measure the similarity between two sentences, text vectorization techniques should be used. Text visualization methods aim to convert text to some numerical representations [14]. Bag of words, TF-IDF, and word2dev are examples of text vectorization techniques. In this work, TF-IDF method was used to convert queries to numerical vectors.

Jaccard Distance
Jaccard Index is defined as the size of shared terms divided by the size of all unique terms of two sets. [15][13] Jaccard similarity index and Jaccard distance are calculated using the following : To use jaccard distance, strings/statements must convert to sets.

The Proposed System
The proposed chatbot system consists of three different parts that are showen in Figure 1, and explained below. 2. Android-based server app which acts as a chatbot. The server app analyses the students' queries first, and then provides an appropriate answer. Cosine similarity and jaccard distance methods are used to find the similarity among questions. The answer to the question that has the highest score of similarity will be retrieved. Figure 2 and Figure 3 illustrate the steps of this part. 3. Real-time cloud database which acts as a chatbot connector. Each user (client) is represented in the Firebase as a node based on the phone numbers, and his/her questions are stored as a sub-node. Figure 4 shows the structure of the Firebase database.

Figure 4: Firebase database structure
In addition to the above parts, the proposed system has a local database to store questions and answers. The database may be updated and new questions can be added by the administrator of the database.
The main steps of the proposed system can be summarized as followed: -Obtain student's question.
-Send the question to the firebase -Firebase sends notification to the chatbot server -The server listens to the firebase notification to read the student's question.
-Calculate the TF-IDF for the user question.
-Calculate the cosine similarity/jaccard distance between the user question and questions in the database.
-Retrieve the answer to the closest question.

Implementation
The proposed system was implemented using Kotlin programming language with Android Studio. The proposed chatbot system has two applications that were developed to act as a chatbot client and a chatbot server. Students interact with the client-side application to ask questions and obtain answers. Figure 5 provides an example of the client screen. 123

Figure 5: Sample of Conversation Flow
The server application has more features that allow the admin to administrator more questions and modify answers. Figure 6 shows the main screen of the server. Conversations of the users (students) are stored on the cloud (Firebase) with a special node for each student. Students are identified by their phone number.

Experiments and Results
Experiments were conducted to evaluate the performance of the proposed system. In general, cosine similarity and jaccard distance give acceptable and reasonable results. Table 1 shows the results of using the cosine and jaccard similarity methods with some sample data. Noting that in cosine similarity and jaccard similarity methods, 1 means that the sentences are similar, while 0 means that the sentences are completely dissimilar. The threshold value was set to 0.5. Results have shown that the jaccard similarity method is more efficient than the cosine similarity method especially when the sentiment analysis is not considered. In addition, the results of the performance evaluation indicate the effective use of real-time database notification as a chatbot channel.Phone numbers, questions, and all other information are successfully stored on the cloud database. Figure 7 shows a sample of the users conversations that are stored on the cloud.

Conclusion and Future Work
Chatbots have been one of the most common systems which are used to achieve different tasks. Different techniques could be used to develop different types of chatbots. In this work, text similarity methods are adopted to develop a college chatbot. Firebase real-time notification was also used as a chatbot channel between users and the chatbot. Experiments have shown acceptable and reasonable results. In addition, results have also shown the efficiency of using real-time notification as a way to connect clients and the server.