Building a Covid-19 Fact-Check Tele Bot Using Machine Learning – Telegram Group

Building a Covid-19 Fact-Check Tele Bot Using Machine Learning

Background

As part of the MLDA Deep Learning Week Hackathon 2021, we built a Telegram Bot linked to a machine learning model trained to identify fake news about Covid-19.

Our team, consisting of 3 other First Year Students (Zihan, Jaryl and Wey Zhih), came in finalists among the 80 over participating groups.

Demo Video for Telegram Bot ‘My Ah Ma Knows Better’

Problem Set

Our problem set was the epidemic of fake news about Covid-19 that has plagued Singapore since the beginning of vaccine roll-outs. These fake news have convinced many people (especially the older generation) that vaccines are ineffective or even harmful. At worse, this misinformation potentially leads to the unnecessarily deaths of unvaccinated individuals infected by Covid-19.

Besides Facebook, such misinformation is rampant in local group chats, the biggest of which is “Sg Covid La Kopi” with over 15k members on Telegram. Notably, we used the group as a data-set for fake news!

Gathering Datasets

For our data-set for fake news, we used the chat history for the above-mentioned group chat, “Sg Covid La Kopi”.

The modulators of the group chat are quick to remove any messages suggesting that “vaccines are effective”, and inadvertently created a carefully curated source of fake news.

Screenshot from Telegram group chat ‘Covid La Kopi’

Telegram allows users to export chat history, but as it was a public group, the export format was limited to HTML only. The exported chat history was also split into a 131 HTML files, due to the size limit.

After importing the files into Google Collab, we used the Beautiful Soup library to create a function that converts HTML into a list of strings.

All the HTML files were named in the format “messagesX”, where X was a number. Thus, we made a function that iterates through the file names to extract each HTML file. Ultimately, we only used 10 of the 131 files, as we could not generate a “real news” dataset of an equivalent size.

For our data-set on “real news”, we used Beautiful Soup to web-scrape pages from the World Health Organisation. In particular we looked at advisories and Q&A pages, which would be most similar in sentence syntax to the Telegram messages.

Firstly, we had to build the web-scraping function. Here we referenced an article by Martin Breuss.

Next, we applied the function to our selected web-pages from WHO (as well as our local government).

Classification Model

Now that we have our data-sets of fake and real news, we can create our classification machine learning model.

Before that, we had to convert our two data-sets from lists (of strings) to a csv file.

Next, we imported a Multinomial Naive Bayes algorithm from the SKLearn Machine Learning Library. We used this to train our classification model.

Specifically, we used the CountVectorizer tool, which “transforms a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text” (Geeks For Geeks, 2020). For this portion, we referenced an article by Aman Kharwal.

We then tested the model with user inputs. Each user input (text) is converted into an array, which the model will run against its data sets of real and fake news. The model will then predict whether the user input is fake news, and return a True or False.

Telegram Bot

After training our classification model, we built a Telegram Bot. The bot takes in user inputs (in the form of text messages), and feeds it into our classification model. Our model then compares the user input against our data-sets, to determine whether it was ‘fake news’. For this portion, we referenced an article by Gareth Dwyer.

In order to appeal to the local audience, we changed the bot’s reply to “This is fake news! My Ah Ma Knows Better” instead of “False”. We believe this will make the user experience more friendly, and better connect with the older generation.

Screenshot of Telegram Bot (Full Demo Video above)

Impacts

The purpose of this Telegram Bot is to provide a quick and convenient fact-check for users. We’re targeting the older demographic in Singapore, specifically those who are tech-savvy enough to be exposed to fake news in online chat groups (and thus capable of using chatbots).

We also hope it plays into the competitive nature of users, and everyone’s innate desire to “know better”. Through conversing with the bot, we hope that users will become better at identifying ‘fake news’, and become more vigilant of misinformation.

Future Improvement

We think that the bot can be improved on two fronts:

Firstly, the size and scope of our data-sets can be increased. We used a Telegram group chat as a data-set, and it inevitably includes irrelevant information. Any non-Covid related messages will confused our machine learning model, and reduce its accuracy. If given more time, we could further curate and filter this data-set.

In addition, another data source we were considering were Tweets, which are similar in syntax to the ‘fake news’ chain messages spread through local chat groups.

Secondly, we can train the bot to provide more nuanced replies. For instance, in response to “Everyone should take the vaccine”, the bot could explain that people with certain medical issues should not do so. The bot could then ask the user if he/ she has any of those medical issues, or refer the user to a doctor.

Reflections

The Cover of our Pitch Deck to the Judges

Being first year students, this was definitely an exhilarating project for us. We’ve never done web-scraping or build a Telegram bot before, and everything was a first-time experience. We’re thankful for what we’ve learnt, and look forward to the next hackathon.

The Team

Our group consisted of 4 first year Electrical Electronic Engineering/ Data Analytics students:

  • Li Zihan
  • Tay Jing Rui Jaryl
  • Yee Wey Zhih
  • Zhang Jing Wen

Ten articles before and after

PyCryptoBot Telegram Bot. This really is a very exciting new… – Telegram Group

TRX1 Dev Blog #9 (September 2021) – Telegram Group

ZipBot: Telegram Bot que Compacta Arquivos – Telegram Group

What is telegram messenger?. Telegram is a messenger application… – Telegram Group

The Indra Capital Ecosystem — Part II— IndraX, YieldWallet.io and Airavat – Telegram Group

Python and Telegram bot, how to collaborate them to make processes simplified? – Telegram Group

How to Bulk Invite Members in your Telegram Group or Channel – Telegram Group

Домашняя бухгалтерия в telegram. Теги. – Telegram Group

Build and Deploy a Telegram bot in 5 minutes – Telegram Group

Call Me with Home Assistant, Zoom and Telegram Bot – Telegram Group