Watch the video.
Update 3 (05/16/2020): Wrote an updated guide to use VMAF through FFmpeg.
Update 2 (01/06/2016): Fixed reference video bitrate unit from Kbps to KBps
When working with videos, you should be focusing all your efforts on best quality of streaming, less bandwidth usage, and low latency in order to deliver the best experience for the users.
This is not an easy task. You often need to test different bitrates, encoder parameters, fine tune your CDN and even try new codecs. You usually run a process of testing a combination of configurations and codecs and check the final renditions with your naked eyes. This process doesn’t scale, can’t we just trust computers to check that?
bit rate (bitrate): is a measure often used in digital video, usually it is assumed the rate of bits per seconds, it is one of the many terms used in video streaming.
We were about to start a new hack day session here at Globo.com and since some of us learned how to measure the noise introduced when encoding and compressing images, we thought we could play with the stuff we learned by applying the methods to measure video quality.
PSNR: is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise.
First, you calculate the MSE which is the average of the squares of the errors and then you normalize it to decibels.
For 3D signals (colored image), your MSE needs to sum all the means for each plane (ie: RGB, YUV and etc) and then divide by 3 (or 3 * MAX ^ 2).
To validate our idea, we downloaded videos (720p, h264) with the bitrate of 3400 kbps from distinct groups like News, Soap Opera and Sports. We called this group of videos the pivots or reference videos. After that, we generated some transrated versions of them with lower bitrates. We created 700 kbps, 900 kbps, 1300 kbps, 1900 kbps and 2800 kbps renditions for each reference video.
Heads Up! Typically the pivot video (most commonly referred to as reference video), uses a truly lossless compression, the bitrate for a YUV420p raw video should be 1280x720x1.5(given the YUV420 format)x24fps /1000 = 33177.6KBps, far more than what we used as reference (3400KBps).
We extracted 25 images for each video and calculate the PSNR comparing the pivot image with the modified ones. Finally, we calculate the mean. Just to help you understand the numbers below, a higher PSNR means that the image is more similar to the pivot.
|700 kbps||900 kbps||1300 kbps||1900 kbps||2800 kbps||3400 kbps|
We defined a PSNR of 38 (from our observations) as the ideal but then we noticed that the News group didn’t meet the goal. When we plotted the News data in the graph we could see what happened.
The issue with the video from the News group is that they’re a combination of different sources: External traffic camera with poor resolution, talking heads in a studio camera with good resolution and quality, some scenes with computer graphics (like the weather report) and others. We suspected that the News average was affected by those outliers but this kind of video is part of our reality.
We needed a better way to measure the quality perception so we searched for alternatives and we reached one of the Netflix’s posts: an approach toward a practical perceptual video quality metric (VMAF). At first, we learned that PSNR does not consistently reflect human perception and that Netflix is creating ways to approach this with the VMAF model.
They created a dataset with several videos including videos that are not part of the Netflix library and put real people to grade it. They called this score of DMOS. Now they could compare how each algorithm scores against DMOS.
They realized that none of them were perfect even though they have some strength in certain situations. They adopted a machine-learning based model to design a metric that seeks to reflect human perception of video quality (a Support Vector Machine (SVM) regressor).
The Netflix approach is much wider than using PSNR alone. They take into account more features like motion, different resolutions and screens and they even allow you train the model with your own video dataset.
“We developed Video Multimethod Assessment Fusion, or VMAF, that predicts subjective quality by combining multiple elementary quality metrics. The basic rationale is that each elementary metric may have its own strengths and weaknesses with respect to the source content characteristics, type of artifacts, and degree of distortion. By ‘fusing’ elementary metrics into a final metric using a machine-learning algorithm – in our case, a Support Vector Machine (SVM) regressor”
The best news (pun intended) is that the VMAF is FOSS by Netflix and you can use it now. The following commands can be executed in the terminal. Basically, with Docker installed, it installs the VMAF, downloads a video, transcodes it (using docker image of FFmpeg) to generate a comparable video and finally checks the VMAF score.
You saved around 1.89 MB (37%) and still got the VMAF score 94.
Using a composed solution like VMAF or VQM-VFD proved to be better than using a single metric, there are still issues to be solved but I think it’s reasonable to use such algorithms plus A/B tests given the impractical scenario of hiring people to check video impairments.
A/B tests: For instance, you could use X% of your user base for Y days offering them the newest changes and see how much they would reject it.
Motivated by a friend, we’ll share bits of our experience during the Olympic Games Rio 2016. Before starting, I would like to clarify that Globo.com only had rights for streaming the content to Brazil.
We used around 5.5 TB of memory with 1056 CPU’s across two PoP’s located on the southeast of the country.
Not so long; I’ll read it
The live streaming infrastructure for the Olympics was an enhancement iteration over the previous architecture for FIFA 2014 World Cup.
The ingest point receives an RTMP input using nginx-rtmp and then forwards the RTMP to the segmenter. This extra layer provides mostly scheduling, resource sharing and security.
Now let’s move to the user point of view. When the player wants to play a video, it needs to get a video chunk, requesting a file from our front-end, which provides caching, security, load balancing using nginx.
Modern network cards offers multiple-queues: pin each queue, XPS, RPS to a specific cpu.
When this front-end does not have the requested chunk it goes to the backend which uses nginx with lua to generate the playlist and serve the video chunks from cassandra.
This is just a macro view, for sure we also had to provide and scale many micro services to offer things like live thumb, electronic program guide, better usage of the ISP bandwidth, geofencing and others. We deployed them either on bare metal or tsuru.
In the near future we might investigate other adaptive stream format like dash, explore other kinds of input (not only RTMP), increase the number of bitrates, promote a better usage of our farm and distribute the content near of the final user.
Thanks @paulasmuth for pointing out some errors.
Attention: this post provides a very quick and simplistic (but functional) vision of the promised title.
In the beginning
Linux is a fantastic OS, it has more than we imagine and it still manages to get better. There is a feature called cgroups:
which provides a mechanism for easily managing and monitoring system resources, by partitioning things like cpu time, system memory, disk and network bandwidth, into groups, then assigning tasks to those groups
Let’s say we created a cgroup with: 50% of cpu, 20% memory, 2% of disk and a virtual network with 100% of bandwidth, now we can run our application under that cgroups restrictions.
Another cool feature of Linux is LXC (linux-containers):
which combines kernel’s cgroups and support for isolated namespaces to provide an isolated environment for applications
Now we’re able to provide a Linux machine capable of running multiple applications that run in isolation (like if there was an isolated OS for each application). This sounds like something we achieved with virtualization (app-level, os-level, cpu-level and so on) but faster and cheaper and without the overhead of running multiple kernels.
an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating-system-level virtualization on Linux. This is what Docker is but remember, it is not perfect.
The highlighted part is very interesting, docker will provide you a layer of abstraction that allows you to create and deploy your application within a container (an isolated, resource managed place to run processes) in a standardized way.
Docker machine, compose and so on
Life almost always get easier with abstractions, we (developers) don’t worry about how disks works (drivers) or even how a package left your pc and hit another one (we should know how this works :P). Our productivity had increased a lot since we relied on these abstractions.
And this is the same for the docker ecosystem, as we start to use it more often. We create best practices, solve issues with workarounds and etc, some of these will become part of the docker solution.
- docker-machine: An application needs a machine to run regardless if it’s local, physical, virtual or in the cloud.
- docker-compose: An application needs a way to declare its dependencies, either packages or distinct services like datastore.
Step 0: get ready
- If you’re on MacOS/Windows you’ll need to install VirtualBox or VMWare
- If you’re on MacOS/Windows install docker toolbox otherwise apt-get them all
Step 1: create the app
Let’s say we’ll create a rails 4 application with mongo.
Step 2: declare the app and its dependencies
We declare our dependencies by using two files: docker-compose.yaml and Dockerfile. In the Dockerfile we’ll describe how our machine should be (aka: all need packages and stuffs).
Then we can move to its broad services dependencies, like database or even web server. We’ll use mongo as datastore and nginx as the web server.
Step 3: deploy it locally
We need to create a machine for it and then we need to run it.
Step 4: deploy in the cloud
The same way we created a machine to run our app locally ,we can create any number of machines to run this application, even in cloud environment such as digitalocean, aws, azure, google and etc.
// TODO: some things
Let’s suppose we just created a staging environment and another developer come to help us, it seems that there is no an official way to share our created machine (amazon, google app engine, azure, digital ocean…) with team members. There are some workarounds but it’ll be nice to see this becoming a feature.
- Useful commands to troubleshooting, exploration and debug:
- To enter on a machine: $ docker-machine ssh staging (either local or cloud)
- To enter on a container: $ docker-compose run db bash (either local or cloud)
- To list files within a container: $ docker-compose run db ls -lah data/db
- To edit/add/remove data on mongo: $ mongo –host DOCKER_IP
- If you face any error like E: Failed to fetch … during the docker-compose build try it again
- If you face any error like “Error creating machine: Error running provisioning: Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded” during any deployment, try to download docker-toolbox again and install it.
Google is your friend.
In this talk, we will describe globo.com’s live video stream architecture, which was used to broadcast events such as the FIFA World Cup (with peak of 500K concurrent users), Brazilian election debates (27 simultaneous streams) and BBB (10 cameras streaming 24/7 for 3 months) .
NGINX is one of the main components of our platform, as we use it for content distribution, caching, authentication, and dynamic content. Besides our architecture, we will also discuss the Nginx and Operational System tuning that was required for a 19Gbps throughput in each node, the open source Cassandra driver for Nginx that we developed, and our recent efforts to migrate to nginx-rtmp.