WebRTC Detail Description with Types

Peer-to-Peer

WebRTC was designed to send media directly between clients via their browsers, also known as peer-to-peer (P2P). In the peer-to-peer architecture, communication between the clients is preceded by first establishing a signaling connection to the application server (sometimes referred to as a signaling server). The signaling method or protocol is not specified within the WebRTC specifications, thus allowing the adoption of an existing method (SIP, WebSockets, XMPP, etc.) or implementing a proprietary signaling process. The application server holds the business logic and acts as the intermediary for the Session Description Protocol (SDP) exchange. Once the SDP exchange completes, direct media communication between the two clients can begin.

Peer-to-Server

While WebRTC is designed to be primarily browser-to-browser, there are a growing number of use cases that benefit significantly when media is anchored in the network with a server to act as a media peer, also known as peer-to-server (P2S). Similar to peer-to-peer, in the peer-to-server architecture, the clients again establish a signaling connection to the application server. In this architecture, the application server continues to manage the business logic but also utilizes a media control connection to the server for the SDP exchange between the client and the media server. Once the SDP exchange completes, media communication between the client and server can begin.

Utilizing server-side processing can introduce advanced functionalities such as centralized recording for compliance purposes, audio/video playback, media analysis for speech-to-text detection, transcoding for connecting disparate networks, and media mixing for multiparty conferencing. Depending on the architecture, server-side processing can optimize bandwidth and minimize client compute, benefiting mobile clients by increasing battery life and providing a flexible user interface to clients.

Peer-to-Peer Mess

The peer-to-peer mesh topology operates without a centralized media server requiring each client to simultaneously send its encoded media to each participant client in the conference. Synchronously, the client must also receive and decode each participant’s media stream. Peer-to-peer mesh is often times referred to as “peer-to-peer mess” given the amount of streams required. For instance, in a four-person videoconference, each client browser is encoding and transmitting three media streams while also receiving and decoding three additional media streams.

The WebRTC client has full control over the video layout. Media latency is typically not an issue when using peer-to-peer mesh since media will most often be direct to the receiving client. The peer-to-peer mesh appeals to front-end developers because of the low cost to implement. However, these topology architectures are extremely limited in functionality and scale. For instance, without a centralized media server, the client is left to perform advanced features such as recording. Putting this type of functionality at the WebRTC client level not only causes added processing by the client but can also risk compliance requirements.

Regarding scalability, the process of encoding/decoding media streams is a compute intensive process - notably encoding being a ~4x more intensive process than decoding. Assuming each client in the peer-to-peer mesh is using the same codec, frame-rate and resolution, the transmitting client could leverage only encoding once. However, clients within the conference often do not share the same codec, frame-rate, and resolution. Therefore, additional encoding is required. Furthermore, bandwidth can be a preventive factor for scale with most notably the potential limited uplink bandwidth from most devices. For example, the bitrate for a single 720p resolution video stream starts at ~1.5 Mbps. In a standard four client conference, the uplink bandwidth required will be ~4.5 Mbps. It is for these reasons, WebRTC clients especially (but not limited to) mobile and tablet devices where compute and bandwidth are limited, will see a reduced scale.

In summary, while peer-to-peer mesh topologies produce a flexible user interface and low latency media connections, the disadvantages of scalability, transcoding, and lack of advanced functionally are a significant deterrent to launching many applications.

Multipoint Control Unit (MCU)

The use of Multiple Control Unit (MCU) topologies have been popular for real-time communications well before the inception of WebRTC. MCU topologies operate with a centralized media server to which WebRTC clients send their encoded media stream. From there, the MCU media server receives the video stream, decodes the stream, tiles the decoded frames with the streams from other participants, and then encodes the tiled video to send it back to the participant. Using the MCU topology architecture simplifies the stream the WebRTC client will need to send and receive and reduces the number of streams to just one.

Reducing the required encode/decode to one will therefore decrease both the client compute and bandwidth consumption, thus benefiting mobile type devices. Furthermore, since each stream is decoded and transcoded at the MCU media server there is no concern for each WebRTC client to share the same codec, frame-rate, and resolution profile. The incoming media streams from the various clients can now be transcoded to another codec, trans-sized to a different resolution, and trans-rated to a different frame rate thereby allowing each client to be optimized to their preferred profile. For instance, a mobile client preferring to utilize H.264 with VGA resolution and 15 frames-per-second can be connected to a laptop WebRTC client using the VP8 codec with 720p resolution and 30 frames-per-second. Furthermore, since the MCU media server is acting as a peer, quality of service (QoS) can be applied on a per stream basis thus not allowing one poor connection to dictate the quality of all users of a multi-party application.

A key benefit of the MCU topology is that it shifts the processing of the encode/decode from the client into the server, often as part of a cloud compute service, where processing resources are less expensive. The performance issue is becoming more important as newer video coding systems are very computationally intensive and conference users expect a high quality, high resolution video such as 720P and 1080P at 30 frames per second or higher. As with peer-to-peer topology, sharing encoders across multiple client streams can address the performance drawback and result in greater server scalability. And lastly, the MCU will utilize the same tiled video composition for each connecting client. By doing so, the video layout in a multiparty conference will be dedicated by the server-side for all client participants.

In summary, multipoint control unit topology results in a significantly lower compute and bandwidth for the clients and can interconnect disparate networks through transcoding/trans-rating/trans-sizing with the disadvantages of around server performance and an inflexible client user interface. However, the centralized computational resource of an MCU can become a limiting factor in cost-sensitive, large-scale, one-to-many applications.

Selective Forwarding Unit (SFU)

Selective Forwarding Unit (SFU), also known as video routing, is a topology allowing for WebRTC clients to send their encoded video stream to the centralized media server where it is then forwarded/routed to the other WebRTC clients. The SFU topology is an attractive approach to addressing the server performance issue, as it doesn’t involve the compute expense of video decoding and encoding. Additionally, without encoding/decoding, the latency of the added SFU media server is minimal. Lastly, the clients with full correspondence with the SFU media server have complete control over the streams it receives, and because the client is receiving the streams it wants, it can have full control over the user interface flexibility.

While the SFU topology has become a popular choice among WebRTC communities, perhaps the most common overlooked shortcoming of SFU topology is the default to using the ‘least common codec’. This means every participant in the conference need to use the same codec. For those multiparty conferences where all participants are using PC/laptops, the issue is negligible but introducing a mobile device that is hardware optimized for H.264 acceleration would be better suited using a different codec. The inability to transcode video streams can limit the type of clients that can be connected together.

Furthermore, the multiple streams being routed/forwarded by the SFU to the WebRTC client can cause increase in downlink bandwidth, thus causing increased decode processing. The issue can be mitigated by limiting the number of streams being forwarded to the client to either the active talker, a subset of the streams, or a combination of both. Additionally, using a method called simulcast allows multiple streams to be encoded by the WebRTC client. Typically, two streams with the first being encoded with high resolution and second encoded with lower resolution. This way, the SFU can forward/route the high definition stream of the active talker while still sending the lower definition streams of the listeners.

Lastly, traditional SIP based platforms cannot handle the multiple streams produced by the SFU. For this reason, without a gateway function in the middle, the SFU topologies are restricted to WebRTC only.

In summary, selective forwarding unit topology produces a flexible client user interface and improved server-side performance; the disadvantages of requiring a least common codec and client compute/bandwidth are concerns to be considered.

Converged Architectures - Applications For the Real World

As discussed, no topology is perfect, and each come with distinct advantages as well as disadvantages. Multipoint control unit architectures are ideal for when compute and bandwidth are limited and there is a need for interoperability with disparate networks but come at a cost of high server load and limiting video layout. On the other hand, selective forwarding unit topologies are ideal for high server performance and maximum flexibility for the client UI but come at a cost of requiring all connecting clients to share the same codec, frame-rate, and resolution profile. The tough decision is which to use for your application. Launching an application into real world scenarios requires on-demand access to both the capabilities of a MCU and the capabilities of a SFU. This complementing feature set has paved the way for a new, next generation hybrid-SFU/MCU architecture.

Next Generation Hybrid-SFU/MCU

The hybrid-SFU/MCU topology allows for the media stream to be delivered based on the preference optimized for the individual client. For example, in cases where the client is a mobile or SIP device the media server can deliver a single MCU-type mixed stream. For WebRTC clients capable of handling multiple streams and no restrictions on bandwidth or compute, then the media server can deliver forwarded/routed-type streams. Additionally, having the ability to transcode individual streams while leaving all others to be forwarded/routed eliminates the least common codec issue of SFU. Allowing these capabilities to be utilized from within the same media server can give the application server’s business logic the most flexibility in delivering a scalable, reliable, feature-rich service.

There are no comments yet.