How I Create “Sembako Bot” using Google Cloud – Telegram Group

How I Create “Sembako Bot” using Google Cloud

Trust me, it’s not that hard.

Official “sembako” logo from Indonesian government.
Official “sembako” logo from Indonesian government.

Disclaimer: everything that I write in this post represents my opinions and my opinions only, not my employers’.

About “Sembako Bot”

So I actually wrote about “sembako” in the repository’s README, but I can explain briefly here.

sembako-bot/README.md at main · iamdejan/sembako-bot

A Telegram chat bot to send daily prices of various sembako. Sembako is an shorthand in Indonesian language for…

github.com

“Sembako” is an Indonesian abbreviation for “sembilan bahan pokok”, which translates to “nine main commodities”, although it can be translated to “groceries” as well. What are those nine commodities? It can include raw food materials (such as meat), fruits and vegetables, or even liquefied petroleum gas (LPG) which is very popular in Indonesia to cook food.

Now, the real reason why I make this bot is because the prices of “sembako” in Indonesia are in constant rise, especially palm oil* price. So, the inflation of palm oil price happens from December 2021, but the crisis isn’t actually felt until February 2022. The inflation happens because a combination of factors, from supply disruption from Malaysia, supply “game” from distributors (in Indonesian), and Russia-Ukraine war that disturbs supply of sunflower oil (to be fair, the war doesn’t start the inflation, it merely accelerates the inflation). The basic premise of the bot is that the users can monitor the price closely, so that they can make decision on whether to buy the “sembako” today or hold until tomorrow**.

* The most popular cooking oil in Indonesia is palm oil, despite that there are many other cooking oil types, like canola, olive, etc.

** We’re not at extreme inflation yet, but I think it’s still a good thing to be aware of the prices.

Why Telegram

Before I made the bot, I have to decide the platforms. I thought of some candidates, like LINE, WhatsApp, Discord, and Telegram.

For WhatsApp, I didn’t think about it too much because I know what it’s quite difficult to create a bot there. In my knowledge, you have to become a business owner to create a bot.

As for LINE, I remember I made a simple bot in LINE chat app a long time ago. However, LINE is not available on all platforms (LINE on Linux came only with plugin). Plus, LINE is not a lightweight chat app (heavier than Telegram), which makes me lazy to login even on laptop.

For Discord, I rarely use it. And then, CMIIW, but bots that I’ve seen on Discord are on Discord servers (groups). My initial design for the bot is a bot for personal use, not for a group use.

For Telegram, I think that the main reason why I choose it is because I found a tutorial from YK Dojo’s channel about Telegram chat bot development. When I follow along, I think that the SDK is quite easy to use. That’s why I choose Telegram.

How to Use The Bot

NOTE: currently the bot is only available in Indonesian language.

In order to use the bot, you have to search it with ID @tele_sembako_bot. Sembako Bot will appear.

When we search for the bot in Telegram.

After you add it. You will need to type /start. Doing so will give you list of commands you can use.

“Sembako Bot” when we type “/start”.

I will explain briefly each command here:

  1. /update: you can use this command to get “sembako” prices without having to register to daily update. After waiting for 5–10 seconds, you will get the prices (in Indonesian Rupiah), along with the stock.
  2. /subscribe: if you want to get daily update of the prices, you can use this command.
  3. /unsubscribe: if you want to opt out of daily update, use this command.
  4. /help: if you want to see list of all commands, use this command.
  5. /donate: if you want to financially contribute to this project, you can use this command. For more information, see the bottom section below.

If you type an invalid command, the bot will tell you that the command is invalid and you need to check available commands with /help.

When an invalid command was typed, the bot tells us to use /help command to see list of available commands.

High-Level Architecture

So here’s the high-level architecture for Sembako Bot project. As for the explanation of those components inside Google Cloud and the database, those will be explained later.

To understand the role of Telegram server above, I made a sequence diagram below. Here’s the sequence diagram when you use /update command:

The sequence diagram above depicts how Telegram bot actually works*. When users send messages to chat bot, the messages are actually send first to Telegram server. Telegram then sends the message to the webhook URL that we set (later on Terraform part I will explain about setting the webhook). In this case, the server is hosted in Cloud Run. Then, Cloud Run will send the request to the respective container. The container will get the data, then it will get the prices from external sources (Segari, Tokopedia, and Shopee). After that, it will send back the response message using chat_id sent.

* The diagram is correct if you setup a webhook. However, you can also create a Telegram chat bot without a webhook. You can use polling mechanism, as demonstrated in YK Dojo’s video. However, polling mechanism is not suited for serverless approach (since it can scale down to zero container, a.k.a. shutting down), so I opted for webhook instead.

The Server

The explanation on each tech stack can be found on each subsection.

Language

I choose Python language. Why? Not because I’m an expert on Python language. I build up my career with 3 years of coding in Java. But for Telegram bot, the most common language I’ve found in chat bot tutorials use Python for development, including the tutorial from YK Dojo’s channel (the tutorial is presented by Jacob from ClarityCoders).

Framework

I use FastAPI because of its simplicity. Unlike Django, you can start FastAPI with literally one Python file. It’s very tempting to use FastAPI for a small project like this.

TechEmpower Web Framework Benchmarks, Round 20 (2021–02–08), filtered for Python frameworks only.

The second reason I use FastAPI is because it’s one of the fastest web frameworks for Python (see screenshot above). When we combine the runner with Uvicorn, it’s as fast as you can get in Python. Sure, there are faster languages out there, it’s just that since I decided to use Python, FastAPI is the best choice for performance. Sorry, Flask.

One thing that maybe difficult to search is figuring out the payload from Telegram server. After looking up several posts, I found this post from Kendrew C and a blog post from dicoffeean (in Indonesian) that tell me the structure of the payload from Telegram. From those two articles, I know where chat_id and text are structured.

Data Source

The most important in this chat bot is of course, the prices. Where do I get the prices?

Many people think that the prices can be extracted from a web page with crawling. However, I have another approach, which is using Network tab on Developer Tools. My approach is inspired by Hussein Nasser’s Dev Tool playlist, where he determines if the website is efficient or not based on network calls. However, we’re not going too far like that.

Debugging network calls in Segari website.

What we’re interested is the prices. In Segari, the endpoint is easy to find. On the screenshot, you can see that I run through each request until I find JSON response with the price data. Once I find the request, I copy-paste the cURL request to Postman and see if I can reduce the headers or parameters sent. It turns out that there are some headers that don’t need to be sent.

For Tokopedia, the method is similar. However, in Tokopedia, you have to debug the product details page, compared to Segari where you debug the search page. Why? Because Segari doesn’t have product details page. Every detail is put in that search page (PWA approach).

“PDPGetLayoutQuery” endpoint is where price and stock are returned to the frontend.

It seems that I get the request. The problem is that it returns too much data. How do I reduce the response?

“PDPGetLayoutQuery” is actually a GraphQL endpoint.

Upon debugging in Postman, I found out that it’s actually a GraphQL query! Wow, a blessing in disguise! So I only need to modify the GraphQL query (reducing the response query), then we’re good.

The most challenging one is Shopee. I almost didn’t put Shopee as list of price providers, because it’s not clear which network call actually carries the data I need in the chat bot.

Extra step in Shopee website debugging by searching the price.

So I did something “slightly” different. I searched the price (e.g. 112800 according to product details apge) in the Network tab, hoping that I get the network call. Turns out that I found several results. I decided to use get_purchase_quantities_for_selected_model request, and look what I found! The price and stock are there.

Response breakdown from Shopee’s “get_purchase_quantities_for_selected_model" endpoint.

So there we have it. Three providers, three different API endpoints, and therefore three data sources.

Database

For the database, I opted with a new solution: CockroachDB. Why though? Well the reason is simply because I have 5 GB / month of free plan with CockroachDB.

“But, you also have free plan with Mongo Atlas?” Yes, correct. But here’s the thing. In previous project, I use TypeScript. I have experience running MongoDB with TypeScript prior to previous project, so it’s easier to setup. But for Python? My experience with database in Python is with peewee library, so I think I better stick with that. The problem with using peewee is that now I have limited options with database free plan, so I opted for CockroachDB. I’m aware that CockroachDB is NewSQL, but that’s not the reason why I choose it, although to be fair that if the project goes viral, I don’t need to take care of sharding again.

The Infrastructure

Google Cloud Run

To quote Google Cloud documentation:

Cloud Run is a managed compute platform that enables you to run containers that are invocable via requests or events.

In other words, Google Cloud Run is a serverless solution from Google where we can run our containers in Google Cloud, and Google takes care of the rest (auto-scaling, etc.).

I use Google Cloud Run to serve requests from Telegram server. Google Cloud Run is a serverless feature offered by Google, with one distinction from AWS Lambda: it is designed to run containers, different than AWS Lambda (and Google Cloud Function) that is designed to run one “function”.

One think that I like from Google Cloud Run is that they already setup HTTPS for us. That’s important for chatbot development, as Telegram only allows HTTPS webhook to be registered (they don’t accept HTTP URL as webhook).

To setup Google Cloud Run, I use Terraform. I only configure the name of the container (which I upload to Google Container Registry), exposed port, and environment variables.

You can see the configuration here:

sembako-bot/02-cloud-run.tf at main · iamdejan/sembako-bot

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Google Cloud Scheduler

Google Cloud scheduler is a managed scheduler service provided by Google, where it can call an HTTP endpoint periodically.

It’s easy to setup (especially with Terraform), I just need to set the schedule that I want in cron’s notation, as well as the timezone. In this case I use Jakarta’s timezone (Asia/Jakarta).

You can see the configuration here:

sembako-bot/03-cloud-scheduler.tf at main · iamdejan/sembako-bot

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below…

github.com

Deployment with Terraform

I always provision the infrastructure with Terraform. That’s a no-brainer for me. Without Terraform, one mis-click can lead to misconfiguration, while using Terraform ensures that my configuration is well-documented and can be replicated (if needed).

As for the deployment, I have to make at least two minor changes:

  1. I always replace Cloud Run every time I apply the configuration. Replace here means that I delete the existing Cloud Run resource, then I will create a new one. Why I did this? Because during testing, I found out that sometimes Terraform got 409 error when applying. Terraform tried to create new version of Cloud Run deployment (with the same name), but turns out that the name already exists. So my workaround is to replace the resource, which resulted in 2–3 minutes of downtime. I guess that’s okay, since Telegram will retry the webhook API call*.
  2. I set the webhook after each deployment. You actually can see the Terraform configuration here. I shouldn’t have to do this, but related to point 1, if the Cloud Run URL (for some reason) is changed, than I have to manually set the webhook. I thought that it’s better if the webhook is automatically set if the Cloud Run URL changed.

* I actually don’t know how long Telegram will retry. During testing, when I had a bug that resulted in Internal Server Error, Telegram still retried after the hotfix was deployed (5 minutes after the bug was pushed to production).

Conclusion

So, the high-level architecture is easy. We just need to use Cloud Run and Cloud Scheduler. As for the Telegram SDK, it was quite easy as well, although the webhook is slightly more complicated to be figure out.

As for future improvements, I have two ideas:

  1. I will compare Google Cloud with Amazon Web Service, by migrating the resources I use. I will see how the equivalent resources stack up against one another.
  2. Setup a dashboard with Grafana and Prometheus (or InfluxDB if too complicated). The dashboard can compare the prices from a given time range. This is more complex and certainly requires more resources (time, money, etc.) to setup.

Here is the final source code if you want to read the whole code.

GitHub – iamdejan/sembako-bot: A chat bot to send sembako price periodically.

A Telegram chat bot to send daily prices of various sembako. Sembako is an abbreviation in Indonesian language for…

github.com

NOTE: If you’re Indonesian and you want to keep this bot running, you can look up the guidance by typing /donate command on the bot. The bot will tell you all the available methods for donation. Every bit of donation helps me to maintain this project, especially paying the cloud services.

Ten articles before and after

How to automatically and repeatedly send messages in Telegram groups (every few minutes or hours) – Telegram Group

Metatrader +Python +Telegram. Metatrader is a popular platform for… – Telegram Group

Day 2 – Telegram Group

data-rh=”true”>I like programming. The process of creating something from a few lines fascinates me. Most of all I like Data Science. Today I studied the creation of a bot in a Telegram. I write on a python using… – Mikhail Popov – Medium – Telegram Group

(My Notes) Telegram Bot <pyTelegramBotAPI> – Telegram Group

Bots! What are they? Let’s make a Telegram Bot [Part -1] – Telegram Group

Making our First Telegram Bot [Part -2] – Telegram Group

Simple Telegram Bot using API’s. A Simple Telegram Bot which can respond… – Telegram Group

How to build a Telegram bot to show Chainlink price feeds – Telegram Group

Send APK automatically to Telegram using Dart/Flutter – Telegram Group