Introduction to the Telegram API. Analyse your conversation history on… – Best Telegram

Introduction to the Telegram API

Analyse your conversation history on Telegram programatically

Telegram is an instant messaging service just like WhatsApp, Facebook Messenger and WeChat. It has gained popularity in recent years for various reasons: its non-profit nature, cross-platform support, promises of security¹, and its open APIs.

In this post, we’ll use Telethon, a Python client library for the Telegram API to count the number of messages in each of our Telegram chats.

Telegram APIs

Telegram has an open API and protocol free for everyone.” — Telegram homepage

The more well-known of Telegram’s APIs is its Bot API, a HTTP-based API for developers to interact with the bot platform. The Bot API allows developers to control Telegram bots, for example receiving messages and replying to other users.

Besides the Bot API, there’s also the Telegram API itself. This is the API used by Telegram apps for all your actions on Telegram. To name a few: viewing your chats, sending and receiving messages, changing your display picture or creating new groups. Through the Telegram API you can do anything you can do in a Telegram app programatically.

The Telegram API is a lot more complicated than the Bot API. You can access the Bot API through HTTP requests with standard JSON, form or query string payloads, while the Telegram API uses its own custom payload format and encryption protocol.

The Telegram API

Diagram for MTProto server-client flow

MTProto is the custom encryption scheme which backs Telegram’s promises of security. It is an application layer protocol which writes directly to an underlying transport stream such as TCP or UDP, and also HTTP. Fortunately, we don’t need to concern ourselves with it directly when using a client library. On the other hand, we do need to understand the payload format in order to make API calls.

Type Language

The Telegram API is RPC-based, so interacting with the API involves sending a payload representing a function invocation and receiving a result. For example, reading the contents of a conversation involves calling the messages.getMessage function with the necessary parameters and receiving a messages.Messages in return.

Type Language, or TL, is used to represent types and functions used by the API. A TL-Schema is a collection of available types and functions. In MTProto, TL constructs will be serialised into binary form before being embedded as the payload of MTProto messages, however we can leave this to the client library which we will be using.

An example of a TL-Schema (types are declared first, followed by functions with a separator in between):

auth.sentCode#efed51d9 phone_registered:Bool phone_code_hash:string send_call_timeout:int is_password:Bool = auth.SentCode;auth.sentAppCode#e325edcf phone_registered:Bool phone_code_hash:string send_call_timeout:int is_password:Bool = auth.SentCode;---functions---auth.sendCode#768d5f4d phone_number:string sms_type:int api_id:int api_hash:string lang_code:string = auth.SentCode;

A TL function invocation and result using functions and types from the above TL-Schema, and equivalent binary representation (from the official documentation):

(auth.sendCode "79991234567" 1 32 "test-hash" "en")
=
(auth.sentCode
phone_registered:(boolFalse)
phone_code_hash:"2dc02d2cda9e615c84"
)

d16ff372 3939370b 33323139 37363534 00000001 00000020 73657409 61682d74 00006873 e77e812d
=
2215bcbd bc799737 63643212 32643230 39616463 35313665 00343863 e12b7901

TL-Schema layers

The Telegram API is versioned using TL-Schema layers; each layer has a unique TL-Schema. The Telegram website contains the current TL-Schema and previous layers at https://core.telegram.org/schema.

Or so it seems, it turns out that although the latest TL-Schema layer on the Telegram website is Layer 23, at time of writing the latest layer is actually already Layer 71. You can find the latest TL-Schema here instead.

Getting started

Creating a Telegram application

You will need to obtain an api_id and api_hash to interact with the Telegram API. Follow the directions from the official documentation here: https://core.telegram.org/api/obtaining_api_id.

You will have to visit https://my.telegram.org/ and login with your phone number and confirmation code which will be sent on Telegram, and fill in the form under “API Development Tools” with an app title and short name. Afterwards, you can find your api_id and api_hash at the same place.

Alternatively, the same instructions mention that you can use the sample credentials which can be found in Telegram source codes for testing. For convenience, I’ll be using the credentials I found in the Telegram Desktop source code on GitHub in the sample code here.

Installing Telethon

We’ll be using Telethon to communicate with the Telegram API. Telethon is a Python 3 client library (which means you will have to use Python 3) for the Telegram API which will handle all the protocol-specific tasks for us, so we’ll only need to know what types to use and what functions to call.

You can install Telethon with pip:

pip install telethon

Use the pip corresponding to your Python 3 interpreter; this may be pip3 instead. (Random: Recently Ubuntu 17.10 was released, and it uses Python 3 as its default Python installation.)

Creating a client

Before you can start interacting with the Telegram API, you need to create a client object with your api_id and api_hash and authenticate it with your phone number. This is similar to logging in to Telegram on a new device; you can imagine this client as just another Telegram app.

Below is some code to create and authenticate a client object, modified from the Telethon documentation:

from telethon import TelegramClient
from telethon.errors.rpc_errors_401 import SessionPasswordNeededError

# (1) Use your own values here
api_id = 17349
api_hash = '344583e45741c457fe1862106095a5eb'

phone = 'YOUR_NUMBER_HERE'
username = 'username'

# (2) Create the client and connect
client = TelegramClient(username, api_id, api_hash)
client.connect()

# Ensure you're authorized
if not client.is_user_authorized():
client.send_code_request(phone)
try:
client.sign_in(phone, input('Enter the code: '))
except SessionPasswordNeededError:
client.sign_in(password=input('Password: '))

me = client.get_me()
print(me)

As mentioned earlier, the api_id and api_hash above are from the Telegram Desktop source code. Put your own phone number into the phone variable.

Telethon will create a .session file in its working directory to persist the session details, just like how you don’t have to re-authenticate to your Telegram apps every time you close and reopen them. The file name will start with the username variable. It is up to you if you want to change it, in case you want to work with multiple sessions.

If there was no previous session, running this code will cause an authorisation code to be sent to you via Telegram. If you have enabled Two-Step Verification on your Telegram account, you will also need to enter your Telegram password. After you have authenticated once and the .session file is saved, you won’t have to re-authenticate again until your session expires, even if you run the script again.

If the client was created and authenticated successfully, an object representing yourself should be printed to the console. It will look similar to (the ellipses mean that some content was skipped):

User(is_self=True  first_name='Jiayu', last_name=None, username='USERNAME', phone='PHONE_NUMBER' 

Now you can use this client object to start making requests to the Telegram API.

Working with the Telegram API

Inspecting the TL-Schema

As mentioned earlier, using the Telegram API involves calling the available functions in the TL-Schema. In this case, we’re interested in the messages.GetDialogs function. We’ll also need to take note of the relevant types in the function arguments. Here is a subset of the TL-Schema we’ll be using to make this request:

messages.dialogs#15ba6c40 dialogs:Vector<Dialog> messages:Vector<Message> chats:Vector<Chat> users:Vector<User> = messages.Dialogs;messages.dialogsSlice#71e094f3 count:int dialogs:Vector<Dialog> messages:Vector<Message> chats:Vector<Chat> users:Vector<User> = messages.Dialogs;---functions---messages.getDialogs#191ba9c5 flags:# exclude_pinned:flags.0?true offset_date:int offset_id:int offset_peer:InputPeer limit:int = messages.Dialogs;

It’s not easy to read, but note that the messages.getDialogs function will return a messages.Dialogs, which is an abstract type for either a messages.dialogs or a messages.dialogsSlice object which both contain vectors of Dialog, Message, Chat and User.

Using the Telethon documentation

Fortunately, the Telethon documentation gives more details on how we can invoke this function. From https://lonamiwebs.github.io/Telethon/index.html, if you type getdialogs into the search box, you will see a result for a method called GetDialogsRequest (TL-Schema functions are represented by *Request objects in Telethon).

The documentation for GetDialogsRequest states the return type for the method as well as slightly more details about the parameters. The “Copy import to the clipboard” button is particularly useful for when we want to use this object, like right now.

https://lonamiwebs.github.io/Telethon/methods/messages/get_dialogs.html

The messages.getDialogs function as well as the constructor for GetDialogsRequest takes an offset_peer argument of type InputPeer. From the documentation for GetDialogsRequest, click through the InputPeer link to see a page describing the constructors for and methods taking and returning this type.

https://lonamiwebs.github.io/Telethon/types/input_peer.html

Since we want to create an InputPeer object to use as an argument for our GetDialogsRequest, we’re interested in the constructors for InputPeer. In this case, we’ll use the InputPeerEmpty constructor. Click through once again to the page for InputPeerEmpty and copy its import path to use it. The InputPeerEmpty constructor takes no arguments.

Making a request

Here is our finished GetDialogsRequest and how to get its result by passing it to our authorised client object:

from telethon.tl.functions.messages import GetDialogsRequest
from telethon.tl.types import InputPeerEmpty

get_dialogs = GetDialogsRequest(
offset_date=None,
offset_id=0,
offset_peer=InputPeerEmpty(),
limit=30,
)dialogs = client(get_dialogs)
print(dialogs)

In my case, I got back a DialogsSlice object containing a list of dialogs, messages, chats and users, as we expected based on the TL-Schema:

DialogsSlice(count=204, dialogs=[], messages=[], chats=[], users=[])

Receiving a DialogsSlice instead of Dialogs means that not all my dialogs were returned, but the count attribute tells me how many dialogs I have in total. If you have less than a certain amount of conversations, you may receive a Dialogs object instead, in which case all your dialogs were returned and the number of dialogs you have is just the length of the vectors.

Terminology

The terminology used by the Telegram API may be a little confusing sometimes, especially with the lack of information other than the type definitions. What are “dialogs”, “messages”, “chats” and “users”?

  • dialogs represents the conversations from your conversation history
  • chats represents the groups and channels corresponding to the conversations in your conversation history
  • messages contains the last message sent to each conversation like you see in your list of conversations in your Telegram app
  • users contains the individual users with whom you have one-on-one chats with or who was the sender of the last message to one of your groups

For example, if my chat history was this screenshot I took from the Telegram app in the Play Store:

dialogs would contain the conversations in the screenshot: Old Pirates, Press Room, Monika, Jaina…

chats would contain entries for Old Pirates, Press Room and Meme Factory.

messages will contain the messages “All aboard!” from Old Pirates, “Wow, nice mention!” from Press Room, a message representing a sent photo to Monika, a message representing Jaina’s reply and so on.

users will contain an entry for Ashley since she sent the last message to Press Room, Monika, Jaina, Kate and Winston since he sent the last message to Meme Factory.

(I haven’t worked with secret chats through the Telegram API yet so I’m not sure how they are handled.)

Counting messages

Our objective is to count the number of messages in each conversation. To get the number of messages a conversation, we can use the messages.getHistory function from the TL-Schema:

messages.getHistory#afa92846 peer:InputPeer offset_id:int offset_date:date add_offset:int limit:int max_id:int min_id:int = messages.Messages

Following a similar process as previously with messages.getDialogs, we can work out how to call this with Telethon using a GetHistoryRequest. This will return either a Messages or MessagesSlice object which either contains a count attribute telling us how many messages there are in a conversation, or all the messages in a conversation so we can just count the messages it contains.

However, we will first have to construct the right InputPeer for our GetHistoryRequest. This time, we use InputPeerEmpty since we want to retrieve the message history for a specific conversation. Instead, we have to use either the InputPeerUser, InputPeerChat or InputPeerChannel constructor depending on the nature of the conversation.

Manipulating the response data

In order to count the number of messages in each of our conversations, we will have to make a GetHistoryRequest for that conversation with the appropriate InputPeer for that conversation.

All of the relevant InputPeer constructors take the same id and access_hash parameters, but depending on whether the conversation is a one-on-one chat, group or channel, these values are found in different places in the GetDialogsRequest response:

  • dialogs: a list of the conversations we want to count the messages in and contains a peer value with the type and id of the peer corresponding to that conversation, but not the access_hash.
  • chats: contains the id, access_hash and titles for our groups and channels.
  • users: contains the id, access_hash and first name for our individual chats.

In pseudocode, we have:

let counts be a mapping from conversations to message countsfor each dialog in dialogs:
if dialog.peer is a channel:
channel = corresponding object in chats
name = channel.title
id = channel.id
access_hash = channel.access_hash
peer = InputPeerChannel(id, access_hash)
else if dialog.peer is a group:
group = corresponding object in chats
name = group.title
id = group.id
peer = InputPeerChat(id)
else if dialog.peer is a user:
user = corresponding object in users
name = user.first_name
id = user.id
access_hash = user.access_hash
peer = InputPeerUser(id, access_hash) history = message history for peer
count = number of messages in history counts[name] = count

Converting to Python code (note that dialogs, chats and users above are members of the result of our GetDialogsRequest which is also called dialogs):

counts = {}

# create dictionary of ids to users and chats
users = {}
chats = {}

for u in dialogs.users:
users[u.id] = u

for c in dialogs.chats:
chats[c.id] = c

for d in dialogs.dialogs:
peer = d.peer
if isinstance(peer, PeerChannel):
id = peer.channel_id
channel = chats[id]
access_hash = channel.access_hash
name = channel.title

input_peer = InputPeerChannel(id, access_hash)
elif isinstance(peer, PeerChat):
id = peer.chat_id
group = chats[id]
name = group.title

input_peer = InputPeerChat(id)
elif isinstance(peer, PeerUser):
id = peer.user_id
user = users[id]
access_hash = user.access_hash
name = user.first_name

input_peer = InputPeerUser(id, access_hash)
else:
continue

get_history = GetHistoryRequest(
peer=input_peer,
offset_id=0,
offset_date=None,
add_offset=0,
limit=1,
max_id=0,
min_id=0,
)

history = client(get_history)
if isinstance(history, Messages):
count = len(history.messages)
else:
count = history.count

counts[name] = count

print(counts)

Our counts object is a dictionary of chat names to message counts. We can sort and pretty print it to see our top conversations:

sorted_counts = sorted(counts.items(), key=lambda x: x[1], reverse=True)
for name, count in sorted_counts:
print('{}: {}'.format(name, count))

Example output:

Group chat 1: 10000
Group chat 2: 3003
Channel 1: 2000
Chat 1: 1500
Chat 2: 300

Library magic

Telethon has some helper functions to simplify common operations. We could actually have done the above with two of these helper methods, client.get_dialogs() and client.get_message_history(), instead:

from telethon.tl.types import User

_, entities = client.get_dialogs(limit=30)

counts = []
for e in entities:
if isinstance(e, User):
name = e.first_name
else:
name = e.title

count, _, _ = client.get_message_history(e, limit=1)
counts.append((name, count))

message_counts.sort(key=lambda x: x[1], reverse=True)
for name, count in counts:
print('{}: {}'.format(name, count))

However, I felt that it a better learning experience to call the Telegram API methods directly first, especially since there isn’t a helper method for everything. Nevertheless, there are some things which are much simpler with the helper methods, such as how we authenticated our client in the beginning, or actions such as uploading files which would be otherwise tedious.

Wrapping up

The full code for this example can be found as a Gist here: https://gist.github.com/yi-jiayu/7b34260cfbfa6cbc2b4464edd41def42

There’s a lot more you can do with the Telegram API, especially from an analytics standpoint. I started looking into it after thinking about one of my older projects to try to create data visualisations out of exported WhatsApp chat histories: https://github.com/yi-jiayu/chat-analytics.

Using regex to parse the plain text emailed chat history, I could generate a chart similar to the GitHub punch card repository graph showing at what times of the week a chat was most active:

However, using the “Email chat” function to export was quite hackish, and you needed to manually export the conversation history for each chat, and it would be out of date once you received a new message. I didn’t pursue the project much further, but I always thought about other insights could be pulled from chat histories.

With programmatic access to chat histories, there’s lots more that can be done with Telegram chats. Methods such as messages.search could me exceptionally useful. Perhaps dynamically generating statistics on conversations which peak and die down, or which are consistently active, or finding your favourite emojis or most common n-grams? The sky’s the limit (or the API rate limit, whichever is lower).

Updates

(2017–10–25 09:45 SGT) Modified message counting to skip unexpected dialogs

  1. ^ Personally, I can’t comment about Telegram’s security other than point out that Telegram conversations are not end-to-end encrypted by default, as well as bring up the common refrain about Telegram’s encryption protocol being self-developed and less-scrutinised compared to more-established protocols such as the Signal Protocol.

Ten articles before and after

Telegram-like uploading animation – Best Telegram

#47: Telegram and the path towards the end of ICOs – Best Telegram

Telegram поссорился с “Телеграфом”: почему Павел Дуров хочет от своего коллеги 100 миллионов рублей за статус в Facebook – Best Telegram

Introducing the Forbes newsbot on Telegram – Best Telegram

How to create your own Telegram bot who answer its users, without coding. – Best Telegram

Telegram Is Trying to Build a Digital Nation – Best Telegram

How to create a Telegram bounty on Bounty Ninja? – Best Telegram

Telegram to increase max room size after Refereum’s request – Best Telegram

CELEBRATION OF 100K MEMBERS MARK IN WANAKA TELEGRAM – Best Telegram

Telegram Announcement: The Upcoming PYDEX Launch – Best Telegram