The recent pandemic situation has significantly impacted users’ behavior on the Internet and their expectations from web services. There is a new, strong requirement to integrate media technologies into even well established businesses, but despite the fact that these technologies are used a lot in telecoms, streaming, and e-learning platforms, they still remain too complicated to be embraced more widely.
The article introduces the architecture design for a post-covid e-commerce. The aim here is to debrief fundamentals of media and share personal experience in the e-commerce solution design.
There are several parts in the series:
- Introduce the fundamentals of streaming technologies and media pipeline
- Drill down the architecture design of media streaming technologies
- Debrief a content management platform and video generation pipeline
A global e-commerce platform is a complex IT ecosystem that includes front and back offices to provide a vendor’s portal, catalog management, multi-tenant data privacy, services to manage order, calculate taxes, control drop shipping itineraries, build prediction models based on users’ behavior and many other services. An integration of media streaming capabilities comes through the entire e-commerce stack therefore it’s not only multiple multimedia technologies to be adopted, but also a variety of strong technical and functional requirements to meet and hence it becomes a very sophisticated task.
A streaming platform designer should be conscious on selecting target use cases and appropriate technology to meet requirements because besides technical complexities it also has a high infrastructure cost. The low latency playback, effective scaling and video content management system are all difficult to implement efficiently. Here a list of core use cases that we picked up for the post-covid e-commerce platform:
- A user does online review of a set of products in a live stream;
- Users jointly discuss a set of products in a small private group of friends;
Both use cases are Real-Time Conferencing scenario and have similar functionally on a first look, but they are absolutely different technologies and requirements. They are a mix of media technologies, and therefore it’s worth to start with an overview of possible streaming modes, introduce fundamental definitions, and explain a general media pipeline.
The noted use cases are live and group call streaming modes of the real-time conferencing scenario. The live streaming session requires a single event host who streams video/audio to others. There is always a single media stream per event and there is an unlimited number of clients spread across the world, furthermore they might not be even authenticated. Moreover, the stream from the host to clients should be delivered with a constant predictable delay that doesn’t exceed 8–12 secs. The Live Streaming has to be ready for massive scale thus it always works in tandem with another streaming scenario, — video on demand.
A group call streaming mode is a session with a number of duplex connections, so each user receives the others’ streams and shares his/her own at the same time. The number of connected users is limited to a small group, they are usually authenticated and call from different platforms which means that any mobile platform & web-browser should be supported.
Media Playback Scenarios
There are 2 main media playback scenarios: video on demand & real-time conferencing. The video on demand (VOD) is a scenario when a user requests pre-generated video to play it on the client’s side. The video is always packed into a media container along with additional metadata to be able to read the content inside. The media stream itself might be additionally protected with digital rights management technologies (DRM). This flow is always used when you are watching another episode of your favorite series on Netflix.
The most widely popular media containers are AVI, MOV, MP4. The media container is a class of computer files that allows data streams (audio/video) to be embedded into a single file. However, not every media container is suitable for web streaming, there are other containers to be considered, — fMP4, MPEG-TS, WebM.
There are 2 possible implementations for the VOD scenario:
Progressive playback — the most common method for transferring files over the Internet, all static contents use this transferring. The web-app downloads the whole media first and then it can be played from any point.
Adaptive streaming — to contrast the progressive playback a media is never stored on a local drive. The technology is designed to deliver the video to a user in the most efficient way according to his/her bandwidth. Users watch the media content with the highest possible quality all the time. The media is delivered using independent fragments of information, — segments/chunks. User may play content as soon as the initial 2–3 chunks of the media have been received (pre-buffering). The player automatically changes the quality when max capacity of its buffer is reached.
Another media scenario is the real-time conferencing where a group of people gear up in a virtual room that can be additionally secured by an authorization policy. Various real-time conferencing modes use different underline technologies.
In a group call mode users receive audio/video streams from each other where on a live event they receive it from the host only. Media containers are optional for real-time conferencing scenarios and it depends on the specification of transport protocols. For example, the real-time conferencing over WebRTC uses containerless approach and additional P2P encryption to protect content from intermediators. The two devices need to be able to agree upon a mutually-understood codec for each track so they can successfully communicate.
While compression is always a necessity when dealing with media on the web, it’s of additional importance when videoconferencing in order to ensure that the participants are able to communicate without lag or interruptions. Of secondary importance is the need to keep the video and audio synchronized. — @Mozila
A Streaming Pipeline
Any streaming starts with raw media content that is either captured directly from a camera or with other digital recorders. The input can’t be transferred over the Internet so encoder, multiplexor and packager perform post-processing of raw bytes array to algorithmically reduce its size and optionally pack it into a certain media container.
Extra functions might be applied during encoding and decoding phases to incorporate other media data or even use machine learning technology.
There is emerging standard by W3C to integrate special processors as a part of WebRTC protocol: WebRTC insertable streams
There are such media codecs as h264 (AVC), h265(HEVC), VP8, VP9, OPUS, AAC, .. that implement math compression algorithms to be able to transfer content over the Internet efficiently to even low powered devices.
I’ve decided to stop here and split the material on multiple posts for simplicity. The article introduces the fundamentals of media streaming technology so it should help architects to understand such terms as codecs, media containers, streaming pipeline and how all these work together.
In the next several posts I will drill down and cover a high-level architecture for different streaming modes as well as content management platform for the post-covid e-commerce platform.
Special thanks to Ivan Burtsev for technical review.