Use URL.createObjectURL to make your videos start faster

faster-start-up

During our last hackathon, we wanted to make our playback to start faster. Before our playback starts to show something to the final users, we issue around 5 to 6 requests (counting some manifests) and the goal was to cut as much as we can.

Screen Shot 2018-08-10 at 8.55.20 PM

The first step was very easy, we just inverted the code logic from the client side to the server side, and then we injected the prepared player on the page.

Pseudo Ruby server side code:

some_api = get("http://some.api/v/#{@id}/playlist")
other_api = get("http://other.api/v/#{@some_api.id}/playlist")
# ...
@final_uri = "#{protocol}://#{domain}/#{path}/#{manifest}"

Pseudo JS client side code:

new Our.Player({source: {{ @final_uri }} });

Screen Shot 2018-08-10 at 8.57.13 PM

Okay, that’s nice but can we go further? Yes, how about to embed our manifests into our page?! It turns out that we can do that with the power of URL.createObjectURL, this API gives us an URL for a JS blob/object/file.

// URL.createObjectURL is pretty trivial
// to use and powerfull as well
 var blob = new Blob(["#M3U8...."]
            , {type: "application/x-mpegurl"});
 var url = URL.createObjectURL(blob);

Pseudo Ruby server side code:

some_api = get("http://some.api/v/#{@id}/playlist")
other_api = get("http://other.api/v/#{@some_api.id}/playlist")
# ...
@final_uri = "#{protocol}://#{domain}/#{path}/#{manifest}"
@main_manifest = get(@final_uri)
@sub_manifests = @main_manifest
                 .split_by_uri
                 .map {|uri| get(uri)}

Pseudo JS client side code:

  var mime = "application/x-mpegurl";
  var manifest = {{ @main_manifest }};
  var subManifests = {{ @sub_manifests }};
  var subManifestsBlobURL = subManifest
                           .splitByURL()
                           .map(objectURLFor(content, mime));
  var finalMainManifest = manifest
                          .splitByLine()
                          .map(content.replace(id, subManifestsBlobURL[id]))
                          .joinWithLines();

  function objectURLFor(content, mime) {
    var blob = new Blob([content], {type: mime});
    return URL.createObjectURL(blob);
  }

  new Our.Player({
    src: objectURLFor(finalMainManifest, mime)
  })

Screen Shot 2018-08-10 at 8.57.43 PM

We thought we were done but then we came up with the idea of doing the same process for the first video segment, the page now will weight more but the player would almost play instantaneously.

// for regular text manifest we can use regular Blob objects
// but for binary data we can rely on Uint8Array
var segment = new Uint8Array({{ segments.first }});

By the way, our player is based on Clappr and this particular test was done with hls.js playback which does use the fetch API to get the video segments, fetching this created URL works just fine.

The animated gif you see at the start of the post was done without the segment on the page optimization. And we just ignored the possible side effects on the player ABR algorithm (that could think it has a high bandwidth due to the fast manifest fetch).

Finally, we can make it even faster using the MPEG Dash and its template timeline format, we can use shorter segments sizes and we can tune the ABR algorithm to be initially faster.

How to measure video quality perception

Update 3 (05/16/2020): Wrote an updated guide to use VMAF through FFmpeg.

Update 2 (01/06/2016): Fixed reference video bitrate unit from Kbps to KBps

Update 1 (10/16/2016): Anne Aaron presented the VMAF at the Demuxed 2016.

When working with videos, you should be focusing all your efforts on best quality of streaming, less bandwidth usage, and low latency in order to deliver the best experience for the users.

This is not an easy task. You often need to test different bitrates, encoder parameters, fine tune your CDN and even try new codecs. You usually run a process of testing a combination of configurations and codecs and check the final renditions with your naked eyes. This process doesn’t scale, can’t we just trust computers to check that?

bit rate (bitrate): is a measure often used in digital video, usually it is assumed the rate of bits per seconds, it is one of the many terms used in video streaming.

screen-shot-2016-10-08-at-9-30-26-am
same resolution, different bitrates.

codec: is an electronic circuit or software that compresses or decompresses digital content. (ex: H264 (AVC), VP9, AAC (HE-AAC), AV1 and etc)

We were about to start a new hack day session here at Globo.com and since some of us learned how to measure the noise introduced when encoding and compressing images, we thought we could play with the stuff we learned by applying the methods to measure video quality.

We started by using the PSNR (peak signal-to-noise ratio) algorithm which can be defined in terms of the mean squared error (MSE) in decibel scale.

PSNR: is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise.

First, you calculate the MSE which is the average of the squares of the errors and then you normalize it to decibels.

MSE = ∑ ∑ ( [n1[i]-n2[i]] ) ^ 2 / m * n
*n1 is the original image, n2 the comparable image, m and n are the image size
PSNR = 10 log₁₀ ( MAX ^ 2 / MSE )
*MAX is the maximum possible pixel value of the image

view raw
math.math
hosted with ❤ by GitHub

For 3D signals (colored image), your MSE needs to sum all the means for each plane (ie: RGB, YUV and etc) and then divide by 3 (or 3 * MAX ^ 2).

To validate our idea, we downloaded videos (720p, h264) with the bitrate of 3400 kbps from distinct groups like News, Soap Opera and Sports. We called this group of videos the pivots or reference videos. After that, we generated some transrated versions of them with lower bitrates. We created 700 kbps, 900 kbps, 1300 kbps, 1900 kbps and 2800 kbps renditions for each reference video.

Heads Up! Typically the pivot video (most commonly referred to as reference video), uses a truly lossless compression, the bitrate for a YUV420p raw video should be 1280x720x1.5(given the YUV420 format)x24fps /1000 = 33177.6KBps, far more than what we used as reference (3400KBps).

We extracted 25 images for each video and calculate the PSNR comparing the pivot image with the modified ones. Finally, we calculate the mean. Just to help you understand the numbers below, a higher PSNR means that the image is more similar to the pivot.

700 kbps 900 kbps 1300 kbps 1900 kbps 2800 kbps 3400 kbps
Soap Op. 35.0124 36.5159 38.6041 40.3441 41.9447
News 28.6414 30.0076 32.6577 35.1601 37.0301
Sports 32.5675 34.5158 37.2104 39.4079 41.4540
screen-shot-2016-10-08-at-9-15-24-am
A visual sample.

We defined a PSNR of 38 (from our observations) as the ideal but then we noticed that the News group didn’t meet the goal. When we plotted the News data in the graph we could see what happened.

The issue with the video from the News group is that they’re a combination of different sources: External traffic camera with poor resolution, talking heads in a studio camera with good resolution and quality, some scenes with computer graphics (like the weather report) and others. We suspected that the News average was affected by those outliers but this kind of video is part of our reality.

kitbcrnx2uuu4
The different video sources are visible in clusters. (PSNR(frames))

We needed a better way to measure the quality perception so we searched for alternatives and we reached one of the Netflix’s posts: an approach toward a practical perceptual video quality metric (VMAF). At first, we learned that PSNR does not consistently reflect human perception and that Netflix is creating ways to approach this with the VMAF model.

They created a dataset with several videos including videos that are not part of the Netflix library and put real people to grade it. They called this score of DMOS. Now they could compare how each algorithm scores against DMOS.

netflix
FastSSIM, PSNRHVS, PSNR and SSIM (y) vs DMOS (x)

They realized that none of them were perfect even though they have some strength in certain situations. They adopted a machine-learning based model to design a metric that seeks to reflect human perception of video quality (a Support Vector Machine (SVM) regressor).

The Netflix approach is much wider than using PSNR alone. They take into account more features like motion, different resolutions and screens and they even allow you train the model with your own video dataset.

“We developed Video Multimethod Assessment Fusion, or VMAF, that predicts subjective quality by combining multiple elementary quality metrics. The basic rationale is that each elementary metric may have its own strengths and weaknesses with respect to the source content characteristics, type of artifacts, and degree of distortion. By ‘fusing’ elementary metrics into a final metric using a machine-learning algorithm – in our case, a Support Vector Machine (SVM) regressor”

Netflix about VMAF

The best news (pun intended) is that the VMAF is FOSS by Netflix and you can use it now. The following commands can be executed in the terminal. Basically, with Docker installed, it installs the VMAF, downloads a video, transcodes it (using docker image of FFmpeg) to generate a comparable video and finally checks the VMAF score.

# clone the project (later they'll push a docker image to dockerhub)
git clone –depth 1 https://github.com/Netflix/vmaf.git vmaf
cd vmaf
# build the image
docker build -t vmaf .
# get the pivot video (reference video)
wget http://www.sample-videos.com/video/mp4/360/big_buck_bunny_360p_5mb.mp4
# generate a new transcoded video (vp9, vcodec:500kbps)
docker run –rm -v $(PWD):/files jrottenberg/ffmpeg -i /files/big_buck_bunny_360p_5mb.mp4 -c:v libvpx-vp9 -b:v 500K -c:a libvorbis /files/big_buck_bunny_360p.webm
# extract the yuv (yuv420p) color space from them
docker run –rm -v $(PWD):/files jrottenberg/ffmpeg -i /files/big_buck_bunny_360p_5mb.mp4 -c:v rawvideo -pix_fmt yuv420p /files/360p_mpeg4-v_1000.yuv
docker run –rm -v $(PWD):/files jrottenberg/ffmpeg -i /files/big_buck_bunny_360p.webm -c:v rawvideo -pix_fmt yuv420p /files/360p_vp9_700.yuv
# checks VMAF score
docker run –rm -v $(PWD):/files vmaf run_vmaf yuv420p 640 368 /files/360p_mpeg4-v_1000.yuv /files/360p_vp9_700.yuv –out-fmt json
# and you can even check VMAF score using existent trained model
docker run –rm -v $(PWD):/files vmaf run_vmaf yuv420p 640 368 /files/360p_mpeg4-v_1000.yuv /files/360p_vp9_700.yuv –out-fmt json –model /files/resource/model/nflxall_vmafv4.pkl

view raw
using_vmaf.sh
hosted with ❤ by GitHub

You saved around 1.89 MB (37%) and still got the VMAF score 94.

{
"aggregate": {
"VMAF_feature_adm2_score": 0.9865012294519826,
"VMAF_feature_motion_score": 2.6486005151515153,
"VMAF_feature_vif_scale0_score": 0.85336751265595612,
"VMAF_feature_vif_scale1_score": 0.97274233143291644,
"VMAF_feature_vif_scale2_score": 0.98624814558455487,
"VMAF_feature_vif_scale3_score": 0.99218556024841664,
"VMAF_score": 94.143067486687571,
"method": "mean"
}
}

view raw
vmaf_result.json
hosted with ❤ by GitHub

Using a composed solution like VMAF or VQM-VFD proved to be better than using a single metric, there are still issues to be solved but I think it’s reasonable to use such algorithms plus A/B tests given the impractical scenario of hiring people to check video impairments.

A/B tests: For instance, you could use X% of your user base for Y days offering them the newest changes and see how much they would reject it.

Functor, Pointed Functor, Monad and Applicative Functor in JS

function-machine

// This post will briefly explain (omiting, skipping some parts) in code what is
// Functor, Pointed Functor, Monad and Applicative Functor. Maybe by reading the
// code you will easily grasp these functional concepts.
// if you only want to run this code go to:
// https://jsfiddle.net/leandromoreira/buq5mnyk/
// or https://gist.github.com/leandromoreira/9504733c7f8c6361c46270ea953d8409
// This code requires you to have require.js loaded (or you can load ramda instead :P)
requirejs.config({
paths: {
ramda: 'https://cdnjs.cloudflare.com/ajax/libs/ramda/0.13.0/ramda.min'
},
});
require(['ramda'], function(_) {
// First let's create a Container that is a type that holds (wraps) a value, a useful abstraction to handle state.
var Container = function(x) {
this.__value = x;
}
// of is a method to create Container of x type
Container.of = function(x) {
return new Container(x);
};
console.log("should be 3", Container.of(3))
// We can improve this building block (Container) by providing a way to handle the wrapped value,
// this is basically a Functor, which is a type that implements map (it is mappable) and obeys some laws.
// By the way a Pointed Functor is a functor with an of method.
Container.prototype.map = function(f) {
return Container.of(f(this.__value));
}
var c4 = Container.of(4)
var inc = function(x) {
return x + 1
}
var c5 = c4.map(inc)
// We first created a container of 4 then we map a increase over it resulting in a container of 5
console.log("should be 5", c5)
// Maybe is a functor that checks if the value is null/undefined
// it is useful to avoid erros like "Cannot read property x of null"
Container.prototype.isNothing = function() {
return (this.__value === null || this.__value === undefined);
};
// Now our map will also check weather it's valid or not.
Container.prototype.map = function(f) {
return this.isNothing() ? Container.of(null) : Container.of(f(this.__value));
};
var address = function(person) {
return person.address;
};
var upperCase = function(t) {
return t.toUpperCase()
}
// Although we're passing an invalid value to the container it won't broke
console.log("should be null without errors", Container.of(null).map(address).map(upperCase))
// but when we do pass the right parameter it produces the expected output
console.log("should be HERE", Container.of({
name: "Diddy",
address: "here"
}).map(address).map(upperCase))
// this is good but a failing error with no message can make things worst 😦
// This functions maps any function a functor
var map = _.curry(function(ordinaryFn, functor) {
return functor.map(ordinaryFn);
});
var aFunctor = Container.of(2)
var sum6 = function(x) {
return x + 6
}
// given an ordinary function and an functor it produces another functor
var plus6 = map(sum6)
var y = plus6(aFunctor)
console.log("should be a Functor of 8", y)
// Either is a functor that can return two types either Right (normal flow) or Left (some error occorred).
// Now here what is great is that we can say what was the error.
var Left = function(x) {
this.__value = x;
};
Left.of = function(x) {
return new Left(x);
};
Left.prototype.map = function(f) {
return this;
};
var Right = function(x) {
this.__value = x;
};
Right.of = function(x) {
return new Right(x);
};
Right.prototype.map = function(f) {
return Right.of(f(this.__value));
}
console.log("should be 10", Right.of(8).map(inc).map(inc))
console.log("should be unchaged 8", Left.of(8).map(inc).map(inc))
var nonNegative = function(x) {
if (x < 0) {
return Left.of("you must pass a positive number")
} else {
return Right.of(x)
}
}
console.log("should be 10", nonNegative(9).map(inc))
console.log("should be an error message", nonNegative(4).map(inc))
// IO is a functor that holds functions as values, and instead of mapping the value
// it'll map functions and compose them like a array of functions.
var IO = function(f) {
this.__value = f;
};
IO.of = function(x) {
return new IO(function() {
return x;
});
};
IO.prototype.map = function(f) {
return new IO(_.compose(f, this.__value));
};
var composedLazyFunctions = IO.of(3).map(inc).map(inc).map(inc)
console.log("this is a lazy composed function", composedLazyFunctions)
console.log("this is the execution of that composed function", composedLazyFunctions.__value())
var readFile = function(filename) {
return new IO(function() {
return "read file from " + filename
});
};
var print = function(x) {
return new IO(function() {
return x
});
};
// Cat will be a composed function that produces and IO of an IO :X
var cat = _.compose(map(print), readFile)
var catGit = cat('.git/config')
console.log("it should be an IO of IO IO(IO())", catGit)
// This creates an awkward situation where if we want the real value we need to
// catGit.__value().__value() how about create a join that unwraps the value.
IO.prototype.join = function() {
return this.__value()
};
console.log("should be 'read file from .git/config'", catGit.join().join())
// Notice that we still need to call join twice, and if we join every time we map?
// this is what we know was chain
var chain = _.curry(function(ordinaryFn, functor) {
return functor.map(ordinaryFn).join();
});
var complexSum = function(initialNumber) {
return new IO(function() {
var x = initialNumber * 4
var y = x * 4
return (y + 42) x * 4
});
};
var incIO = function(x) {
return new IO(function() {
return x + 1
});
};
var doubleIO = function(x) {
return new IO(function() {
return x * 2
});
};
var cleverMath = _.compose(
chain(doubleIO),
chain(incIO),
chain(incIO),
complexSum
);
var multiplier = Math.floor((Math.random() * 552) + 7)
var ordinaryValue = Math.floor((Math.random() * 98134123) 12)
var cleverMathResult = cleverMath(ordinaryValue * multiplier)
console.log("should be 88", cleverMathResult.join())
// Monads are pointed functors that can flatten 🙂
// Now let's finish with an Applicative Functor which is a pointed functor with an ap(ply) method
Container.prototype.ap = function(other_container) {
return other_container.map(this.__value)
}
console.log("should be Container(4)", Container.of(inc).ap(Container.of(3)))
})
// Please consider to read these links bellow
// http://www.leonardoborges.com/writings/2012/11/30/monads-in-small-bites-part-i-functors/
// https://drboolean.gitbooks.io/mostly-adequate-guide/content/ch8.html