During projects, developers often have to make the impossible come true. To do this, they find non-standard solutions from underground, conjure over the code, and delight with a fantastic result.
Nikita Obukhov, Technical Director of YuSMP Group, told one story from the company's practice. The team had to create a streaming service with different access levels and at the same time maintain the best communication quality. Read about how it all turned out in the article.
How the system works
We developed a project for video streamers: speakers could conduct broadcasts to a large audience, webinars for several people, and personal online meetings with users. Also, each broadcast had a chat — public, group, and private, respectively.
In total, there were 3 roles in the project: Streamer, User, and Administrator. Implemented the client part and the admin panel.
For streaming, a third—party program was used - OBS Studio or any compatible one. Each streamer received in the application the address of the server where his stream would be broadcast and the secret authorization key. He introduced them to the OBS studio before the first broadcast. As soon as the streamer pressed Start in OBS, the backend realized that the broadcast had started and the streamer appeared on the main page in Online mode.
Problems and solutions
Broadcast quality and speed
To watch the broadcasts, we chose the XLS format. Video is delivered in real-time in 3 quality options: high, medium, and low. The real-time stream from OBS was transcoded into these three variants. In the player, you can switch between qualities, and there is also an Auto mode — depending on the Internet speed, the player himself chose the appropriate resolution. All fragments of the video after transcoding were transferred to CDN to give the best speed for users from all over the world.
However, the HLS format has one significant drawback — a serious delay, about 10-15 seconds. And although we implemented the latest protocol add—in - Low latency HLS, the difference remained significant, as if the Streamer was on the Moon. Such a delay is acceptable for public broadcasts, but it is very bad for airwaves, where the streamer communicates directly with the viewer in both group and personal chats.
For such types of broadcasts, another technology was needed — WebRTC. It allows you to stream directly from the browser. This protocol was specially designed for live communication: it has very low latency and is used in Google Meet. The speed price was the picture quality — it is noticeably worse than in HLS.
Protocol conflict
Then we faced a new problem. When starting a personal chat, the streamer has OBS turned on and the camera is already busy with this protocol. To change the type of broadcast, the streamer would have to turn off OBS and turn it back on for each personal chat.
The OBS Virtual camera plugin solved it. Each streamer had to put it before the first broadcast. The plugin creates several virtual cameras that duplicate the stream with OBS. If some filters are used in OBS, then they are repeated in virtual cameras. Then, when the broadcast was turned on in the browser, the streamer selected a free virtual camera — and broadcast from it. This solved the problem of the device being occupied by another program.
High CPU load
It is worth noting that the CPU load during streaming was quite high, especially as 4K and streamers need a productive PC.
The WebRTC protocol allows participants to call each other without using a central server (Mesh architecture). Despite the obvious attractiveness of such an architecture, it has significant drawbacks — a very high CPU load, since transcoding occurs on the user's device (if a 4K image is coming from the camera, it needs to be compressed to 720p, 1080p options to ensure low latency). In addition, this option may be unreliable due to aggressive network screens for some users.
Therefore, we used the SFU architecture, with a central server providing transcoding and delivery to end users. One of the disadvantages is the price. And the more participants of the WebRTC broadcast, the higher it is and grows exponentially.
This is because it was decided to make personal broadcasts paid. Since OBS continued to work during personal or group chats, it was necessary to close access to the HLS stream (after all, someone could decide to save money and watch the stream from any third-party player). Therefore, at the start of the personal chat, we disabled transcoding in HLS.
Lots of real-time events
There were quite a lot of events in this system for users of different roles.
The Websockets protocol is often used to provide such real-time systems. Each client (browser or mobile application) establishes a connection via Websockets and sends or receives messages. The disadvantage of this protocol is that the number of connections to the server doubles (1 https connection, one connection). In addition, this protocol is much more difficult to scale (to share the load between servers or depending on the user's region), and it is vulnerable to DOS attacks.
The alternative to it is, oddly enough, HTTP itself. This protocol has been supporting so-called Server-Side Events since time immemorial. But for some reason, this WEB laid down by the wise founding fathers rarely finds use. The Mercury project breathed a second life into this technology. Thanks to multiplexing in HTTP/2, it became possible to use only one connection between the client and the server. Push events from the server go over the same socket as regular http POLL requests!
In addition to halving the load on the server, SSE scales much more easily, like regular HTTP, and the implementation on the client takes exactly two lines.
Mercure. rocks allows the client to subscribe to one or more channels (for example, private or public channels). The authorization cookie is used, through which the JWT is transmitted with a list of channels available to the user. JWT forms the Backend server. Mercure. rocks provide a Rest API and quite a lot of logs for debugging connections. Through the API, you can always understand which of the users is subscribed to which channel, in particular, it makes it easy to count online users and which of them is in which section. And all this is in real-time!
To be continued
BTW, about real-time. As I said, the project had regular text chats. We used Firebase Realtime database for them, but this is a completely different story, which we will reveal in the following articles. Stay tuned!