Building an open-source OTT platform

Attention: I released resty-dynacode an openresty library enabling users to add Lua code dynamically to Nginx.

Create software from “scratch” might not be a good idea at first but it’s often a great way to study a specific technology or even to deepen your knowledge in a particular field of computer science.

In this three-post series, we’re going to build a simple video platform using open-source software, will add features to it so it handles the computional power on the front end (edge computing) and we’ll conclude designing a platform that will enable us to add code/features dynamically to the servers.

Screen Shot 2020-04-21 at 9.07.02 AM

An over-the-top (OTT) is a streaming media service offered directly to viewers via the Internet. OTT bypasses cable, broadcast, and satellite television platforms. Now you don’t need to spend your money that much.

Edge computing is the ability to put computation and storage closer to the place where it is demanded, in simpler terms is the code running within your front end servers.

We’re going to design two distinct services: a simple video streaming solution and an edge computing platform for this video streaming service.

  1. Building NOTT – an open source OTT video platform
  2. Add edge computation to NOTT – empower nginx with lua code
    • token authentication code
    • IP acl
    • forbid other sites
    • add a custom HLS tag on the fly
    • expose metrics in HTTP headers
    • count user request per IP
  3. Platformize the edge computing – using lua + redis
  4. $profit$


The new OTT is a VERY SIMPLE open-source video platform that expects an input signal and produces an output stream. It was made mostly as an excuse to discuss and design an edge computing platform around it.

NOTT is built using a simple html5 app
Screen Shot 2020-04-12 at 8.11.15 AM
NOTT architecture

The UI app is a simple static html5 file served from nginx. We’re using Clappr (backed by hls.js and shaka) as the selected player. The front end works as a caching layer for the video streaming and it also hosts the NOTT app.

The live streaming reaches the platform through FFmpeg, the broacasting, which is also used to transcode the input producing multiple renditions. The nginx-rtmp acts as a packager, converting the RTMP input into the adaptive output streaming format known as HLS.

The main selling point of our OTT platform is that it has the popular TV channel color bar (60fps) and the legendary TV show big buck bunny (partner’s licensed content). :slightly_smiling_face:

Compatibility: I didn’t test on all platforms (browsers, ios, android, CTVs), video is hard and NOTT won’t cover 100% of the devices but it should work in most places.

How does it work?

To broadcast the color bar TV show into the platform, we’ll use FFmpeg. It has some filters that are capable to create synthetic color bar frames at a given rate. It also offers an audio source filter known as sine can be used to create artificial sound.

This command creates color bar pictures at 60 frames per second and a sine wave sound at 48000 hertz. It encodes them to the video codec h264 using the libx264 and to the audio codec aac. Finally, we send them to the transcoder/packager using RTMP.

ffmpeg -f lavfi -i 'testsrc2=size=1280×720:rate=60,format=yuv420p' \
-f lavfi -i 'sine=frequency=440:sample_rate=48000:beep_factor=4' \
-c:v libx264 -preset ultrafast -tune zerolatency -profile:v high \
-b:v 1400k -bufsize 2800k -x264opts keyint=120:min-keyint=120:scenecut=-1 \
-c:a aac -b:a 32k -f flv rtmp://transcoder/encoder/colorbar

The ingest server runs nginx-rtmp and it acts as input service, receiving the FFmpeg synthetic stream. It also transcodes (spawning FFmpeg processes for that) and creates the HLS format in a given folder.

The front end servers will consume the streaming via HTTP backed by this ingest server.

rtmp {
server {
listen 1935;
application encoder {
live on;
exec ffmpeg -i rtmp://localhost:1935/encoder/$name
-c:v libx264 -b:v 750k -f flv -s 640×360 rtmp://localhost:1935/hls/$name_high
-c:v libx264 -b:v 400k -f flv -s 426×240 rtmp://localhost:1935/hls/$name_mid
-c:v libx264 -b:v 200k -f flv -s 426×240 rtmp://localhost:1935/hls/$name_low;
application hls {
live on;
hls on;
hls_variant _high BANDWIDTH=878000,RESOLUTION=640×360;
hls_variant _mid BANDWIDTH=528000,RESOLUTION=426×240;
hls_variant _low BANDWIDTH=264000,RESOLUTION=426×240;
view raw nginx.conf hosted with ❤ by GitHub

The front end server we chose was nginx, a scalable web server and reverse proxy. This will be the endpoint where the final users can access the html5 application to watch the stream. It will also work as a caching layer for scalability.

http {
upstream backend {
server ingest;
server {
listen 8080;
location / {
proxy_cache my_cache;
proxy_cache_lock on;
proxy_pass http://backend;
location /app {
alias /usr/local/openresty/nginx/;
view raw nginx.conf hosted with ❤ by GitHub

Finally, the app is a simple HTML static file that instantiates the player.

<!DOCTYPE html>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<title>NOTT – The New OTT</title>
<script type="text/javascript" src=""></script>
<body class="notstealenfromnetflix">
<ul class="flex-container">
<li class="flex-item">
<div id="player"></div>
var player = new Clappr.Player(
source: "http://localhost:8080/hls/colorbar.m3u8&quot;,
parentId: "#player",
poster: ';,
mute: true,
height: 360,
width: 640,
view raw app.html hosted with ❤ by GitHub

How to use it

The entire platform was conceived with Linux containers in mind so you just need to run make run and this is going to start it all. You also need to start the color bar in a different tab by running make broadcast_tvshow and point your browser to http://localhost:8080/app.

# make sure you have docker
git clone
cd nott
git checkout 0.0.3
# in a tab
make run
# wait until the platform is up and running
# and in another tab run
make broadcast_tvshow
# ^ for linux users you use –network=host and your
# IP instead of this
# for windows user I dunno =(
# but you can use OBS and point to your own machine
# open your browser and point it to http://localhost:8080/app
view raw hosted with ❤ by GitHub


The genuine reason we created this simplistic video platform is to have a software where we can explore the computation at the edge. The next post will be empowering the Nginx front end with Lua code to add features to NOTT, things like authentication and IP acl.

Good Code Design From Linux/Kernel

Learn how Linux/FFmpeg C partial codebase is organized to be extensible and act as if it were meant to have “polymorphism”. Specifically, we’re going to briefly explore how Linux concept of everything is a file works at the source code level as well as how FFmpeg can add support fast and easy for new formats and codecs.



Good software design – Introduction

To write useful and long term maintainable software we tend to look out for patterns and group them into abstractions and it seems that’s the case for devs behind Linux and FFmpeg too.

Software design

When we’re creating software, we’re building data structures and defining their behaviors and dependencies. The way we create and link them can be seen as the design/architecture of the software.

Let’s say we’re building a media framework that encodes/decodes video and audio. The codecs AV1, H264, HEVC, and AAC all do some common operations and if we can provide a generic abstraction that holds these common operations and data we can use this concept instead of relying on the concrete idea of what a specific codec does.

Through the years many developers noticed that software with a good design is a good idea that pays off as software grows in complexity.

This is one of the ideas behind the good design for software, to rely on components that are weakly linked and with boundaries around what it should do.


Maybe it’s easier to see all these concepts in practice. Let’s code a quick pseudo media stream framework that provides encoding and decoding for several codecs.

class AV1
def encode(bytes)
def decode(bytes)
class H264
def encode(bytes)
def decode(bytes)
# …
supported_codecs = [,,]
class MediaFramework
def encode(type, bytes)
codec = supported_codecs.find {|c| == type}

view raw


hosted with ❤ by GitHub

This pseudo-code in ruby tries to recreate what we’re discussing above, there is an implicit concept here of what operations a codec must have, in this case, the operations are encode and decode. Since ruby is a dynamically typed language any class can present these two operations and act as a codec for us.

Developers sometimes may use the words: contract, API, interface, behavior and operations as synonyms.

This design might be considered good because if we want to add a new codec we just need to provide an implementation and add it to the list, even the list could be built in a dynamic way but the idea is that this code seems easy to extend and maintain because it tries to keep link between the components weak (low coupling) and each component does only what it should do (cohese).

Rails framework even enforce some way to organize the code, it adopts the model-view-controller (MVC) architecture


When we go (no pun intended) to a statically typed language like golang we need to be more formal, describing the required types but it’s still doable.

type Codec interface {
Encode(data []int) ([]int, error)
Decode(data []int) ([]int, error)
type H264 struct {
func (H264) Encode(data []int) ([]int, error) {
// … lots of code
return data, nil
var supportedCodecs := []Codec{H264{}, AV1{}}
func Encode(codec string, data int[]) {
// here we can chose e use
// supportedCodecs[0].Encode(data)

view raw


hosted with ❤ by GitHub

The interface type in golang is much more powerful than Java’s similar construct because its definition is totally disconnected from the implementation and vice versa. We could even make each codec a ReadWriter and use it all around.


In the C language we still can create the same behavior but it’s a little bit different.

struct Codec
*int (*encode)(*int);
*int (*decode)(*int);
*int h264_encode(int *bytes)
*int h264_decode(int *bytes)
struct Codec av1 =
.encode = av1_encode,
.decode = av1_decode
struct Codec h264 =
.encode = h264_encode,
.decode = h264_decode
int main(int argc, char *argv[])

view raw


hosted with ❤ by GitHub

Code inspired by

We first define the abstract operations (functions in this case) in a generic struct and then we fill it with the concrete code, like the av1 decoder and encoder real code.

Many other languages have somewhat similar mechanisms to dispatch methods or functions as if they were part of an agreed protocol and then the system integration code can deal only with this high-level abstractions.

Linux Kernel – Everything is a file

Have you ever heard the expression everything is a file in Linux? The idea is to have a common interface for all kinds of resources in Linux, for instance, Linux handles network socket, special files (like /proc/cpuinfo) or even USB devices as files.

This is a powerful idea that can make easy to write or use programs for linux since we can rely in a set of well known operations from this abstraction called file. Let’s see this in action:

# the first case is the easiest, we're just reading a plain text file
$ cat /etc/passwd
# now here, we think we're reading a file but we are not! (technically yes.. anyway)
$ cat /proc/meminfo
MemTotal: 2046844 kB
MemFree: 546984 kB
MemAvailable: 1535688 kB
Buffers: 162676 kB
Cached: 892000 kB
# and finally we open a file (using fd=3) for read/write
# the "file" being a socket, we then send a request to this file >&3
# and we read from this same "file"
$ exec 3<> /dev/tcp/
$ printf 'HEAD / HTTP/1.1\nHost:\nConnection: close\n\n' >&3
$ cat <&3
HTTP/1.1 200 OK
Date: Wed, 21 Aug 2019 12:48:40 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2019-08-21-12; expires=Fri, 20-Sep-2019 12:48:40 GMT; path=/;
Set-Cookie: NID=188=K69nLKjqge87Ymv4h-gAW_lRfLCo7-KrTf01ULtY278lUUcaNxlEqXExDtVB104pdA8CLUZI8LMvJv26P_D8RMF3qCDzLTpjji96B9v_miGlZOIBro6pDreHP0yW7dz-9myBfOgdQjroAc0wWvOAkBu-zgFW_Of9VpK3IfIaBok; expires=Thu, 20-Feb-2020 12:48:40 GMT; path=/;; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Connection: close

view raw

hosted with ❤ by GitHub

This only is possible because the concept of a file (data structure and operations) was design to be one of the main way to communicate among sub-systems. Here’s a gist of the file_operations’ API.

struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);

The struct file_operations define what one should expect from a concept of what file can do.

const struct file_operations ext4_dir_operations = {
.llseek = ext4_dir_llseek,
.read = generic_read_dir,

view raw


hosted with ❤ by GitHub

Here we can see the directory implementation of these operations for the ext4 file system.

static const struct file_operations proc_cpuinfo_operations = {
.open = cpuinfo_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,

And even the cpuinfo proc files is done over this abstraction. When you’re operating files under linux you’re actually dealing with the VFS system, this system delegates to the proper implementation file implemenation.

Screen Shot 2019-08-21 at 10.14.07 AM


FFmpeg – Formats

Here’s an overview of FFmpeg flow/architecture that shows that the internal componets are linked mostly to the abstract concepts like AVCodec but not directly to their implemenation, H264, AV1 or etc.

FFmpeg architecture view from transmuxing flow


For the input files, FFmpeg creates a struct called AVInputFormat that is implemented by any format (video container) that wants to be used as an input. MKV files fill this structure with its implementation as the MP4 format too.


typedef struct AVInputFormat {
const char *name;
const char *long_name;
const char *extensions;
const char *mime_type;
ff_const59 struct AVInputFormat *next;
int raw_codec_id;
int priv_data_size;
int (*read_probe)(const AVProbeData *);
int (*read_header)(struct AVFormatContext *);
// matroska
AVInputFormat ff_matroska_demuxer = {
.name = "matroska,webm",
.long_name = NULL_IF_CONFIG_SMALL("Matroska / WebM"),
.extensions = "mkv,mk3d,mka,mks",
.priv_data_size = sizeof(MatroskaDemuxContext),
.read_probe = matroska_probe,
.read_header = matroska_read_header,
.read_packet = matroska_read_packet,
.read_close = matroska_read_close,
.read_seek = matroska_read_seek,
.mime_type = "audio/webm,audio/x-matroska,video/webm,video/x-matroska"
// mov (mp4)
AVInputFormat ff_mov_demuxer = {
.name = "mov,mp4,m4a,3gp,3g2,mj2",
.long_name = NULL_IF_CONFIG_SMALL("QuickTime / MOV"),
.priv_class = &mov_class,
.priv_data_size = sizeof(MOVContext),
.extensions = "mov,mp4,m4a,3gp,3g2,mj2",
.read_probe = mov_probe,
.read_header = mov_read_header,
.read_packet = mov_read_packet,
.read_close = mov_read_close,
.read_seek = mov_read_seek,

view raw


hosted with ❤ by GitHub

This design allows new codecs, formats, and protocols to be integrated and released easier. DAV1d (an av1 open-source implementation) was integrated into FFmpeg May this year and you can follow along the commit diff to see how easy it was. In the end, it needs to register itself as an available codec and follow the expected operations.

+AVCodec ff_libdav1d_decoder = {
+ .name = "libdav1d",
+ .long_name = NULL_IF_CONFIG_SMALL("dav1d AV1 decoder by VideoLAN"),
+ .id = AV_CODEC_ID_AV1,
+ .priv_data_size = sizeof(Libdav1dContext),
+ .init = libdav1d_init,
+ .close = libdav1d_close,
+ .flush = libdav1d_flush,
+ .receive_frame = libdav1d_receive_frame,
+ .priv_class = &libdav1d_class,
+ .wrapper_name = "libdav1d",

view raw


hosted with ❤ by GitHub

No matter the language we use we can (or at least try to) build a software with low coupling and high cohesion in mind, these two basic properties can allow you to build easier to maintain and extend software.

How to build a distributed throttling system with Nginx + Lua + Redis


At the last’s hackathon, Lucas Costa and I built a simple Lua library to provide a distributed rate measurement system that depends on Redis and run embedded in Nginx but before we explain what we did let’s start by understanding the problem that a throttling system tries to solve and some possible solutions.

Suppose we just built an API but some users are doing too many requests abusing their request quota, how can we deal with them? Nginx has a rate limiting feature that is easy to use:

events {
worker_connections 1024;
error_log stderr;
http {
limit_req_zone $binary_remote_addr zone=mylimit:10m rate=1r/m;
server {
listen 8080;
location /api0 {
default_type 'text/plain';
limit_req zone=mylimit;
content_by_lua_block {
ngx.say("hello world")

view raw


hosted with ❤ by GitHub

This nginx configuration creates a zone called mylimit that limits a user, based on its IP, to be able to only do a single request per minute. To test this, save this config file as nginx.conf and run the command:

docker run –rm -p 8080:8080 \
-v $(pwd)/nginx.conf:/usr/local/openresty/nginx/conf/nginx.conf \

view raw

hosted with ❤ by GitHub

We can use curl to test its effectiveness:

screen shot 2019-01-25 at 9.51.19 pm

As you can see, our first request was just fine, right at the start of the minute 50, but then our next two requests failed because we were restricted by the nginx limit_req directive that we setup to accept only 1 request per minute. In the next minute we received a successful response.

This approach has a problem though, for instance, a user could use multiple cloud VM’s and then bypass the limit by IP. Let’s instead use the user token argument:

events {
worker_connections 1024;
error_log stderr;
http {
limit_req_zone $arg_token zone=mylimit:10m rate=1r/m;
server {
listen 8080;
location /api0 {
default_type 'text/plain';
limit_req zone=mylimit;
content_by_lua_block {
ngx.say("hello world")

view raw


hosted with ❤ by GitHub

There is another good reason to avoid this limit by IP approach, many of your users can be behind a single IP and by rate limiting them based on their IP, you might be blocking some legit uses.

Now a user can’t bypass by using multiple IPs, its token is used as a key to the limit rate counter.

screen shot 2019-01-25 at 10.22.00 pm

You can even notice that once a new user requests the same API, the user with token=0xCAFEE, the server replies with success.

Since our API is so useful, more and more users are becoming paid members and now we need to scale it out. What we can do is to put a load balancer in front of two instances of our API. To act as LB we can still use nginx, here’s a simple (workable) version of the required config.

events {
worker_connections 1024;
error_log stderr;
http {
upstream app {
server nginx1:8080;
server nginx2:8080;
server {
listen 8080;
location /api0 {
proxy_pass http://app;

view raw


hosted with ❤ by GitHub

Now to simulate our scenario we need to use multiple containers, let’s use docker-compose to this task, the config file just declare three services, two acting as our API and the LB.

version: '3'
image: openresty/openresty:alpine
image: openresty/openresty:alpine
image: openresty/openresty:alpine

Run the command docker-compose up and then in another terminal tab simulate multiple requests.

When we request http://localhost:8080 we’re hitting the lb instance.

screen shot 2019-01-25 at 10.58.25 pm

It’s weird?! Now our limit system is not working, or at least not properly. The first request was a 200, as expected, but the next one was also a 200.

It turns out that the LB needs a way to forward the requests to one of the two APIs instances, the default algorithm that our LB is using is the round-robin which distributes the requests each time for a server going in the list of servers as a clock.

The Nginx limit_req stores its counters on the node’s memory, that’s why our first two requests were successful.

And if we save our counters on a data store? We could use redis, it’s in memory and is pretty fast.

screen shot 2019-01-25 at 11.28.41 pm

But how are we going to build this counting/rating system? This can be solved using a histogram to get the average, a leaky bucket algorithm or a simplified sliding window proposed by Cloudflare.

To implement the sliding window algorithm it’s actually very easy, you will keep two counters, one for the last-minute and one for the current minute and then you can calculate the current rate by factoring the two minutes counters as if they were in a perfectly constant rate.

To make things easier, let’s debug an example of this algorithm in action. Let’s say our throttling system allows 10 requests per minute and that our past minute counter for a token is 6 and the current minute counter is 1 and we are at the second 10.

last_counter * ((60 current_second) / 60) + current_counter
6 * ((60 10) / 60) + 1 = 6 # the current rate is 6 which is under 10 req/m

redis_client is an instance of a redis_client
key is the limit parameter, in this case ngx.var.arg_token
redis_rate.measure = function(redis_client, key)
local current_time = math.floor(
local current_minute = math.floor(current_time / 60) % 60
local past_minute = current_minute 1
local current_key = key .. current_minute
local past_key = key .. past_minute
local resp, err = redis_client:get(past_key)
local last_counter = tonumber(resp)
resp, err = redis_client:incr(current_key)
local current_counter = tonumber(resp) 1
resp, err = redis_client:expire(current_key, 2 * 60)
local current_rate = last_counter * ((60 (current_time % 60)) / 60) + current_counter
return current_rate, nil
return redis_rate

To store the counters we used three simple (O(1)) redis operations:

  • GET to retrieve the last counter
  • INCR to count the current counter and retrieve its current value.
  • EXPIRE to set an expiration for the current counter, since it won’t be useful after two minutes.

We decided to not use MULTI operation therefore in theory some really small percentage of the users can be wrongly allowed, one of the reasons to dismiss the MULTI operation was because we use a lua driver redis cluster without support but we use pipeline and hash tags to save 2 extra round trips.

Now it’s the time to integrate the lua rate sliding window algorithm into nginx.

http {
server {
listen 8080;
location /lua_content {
default_type 'text/plain';
content_by_lua_block {
local redis_client = redis_cluster:new(config)
local rate, err = redis_rate.measure(redis_client, ngx.var.arg_token)
if err then
ngx.log(ngx.ERR, "err: ", err)
if rate > 10 then

view raw


hosted with ❤ by GitHub

You probably want to use the access_by_lua phase instead of the content_by_lua from the nginx cycle.

The nginx configuration is uncomplicated to understand, it uses the argument token as the key and if the rate is above 10 req/m we just reply with 403. Simple solutions are usually elegant and can be scalable and good enough.

The lua library and this complete example is at Github and you can run it locally and test it without great effort.

Use URL.createObjectURL to make your videos start faster


During our last hackathon, we wanted to make our playback to start faster. Before our playback starts to show something to the final users, we issue around 5 to 6 requests (counting some manifests) and the goal was to cut as much as we can.

Screen Shot 2018-08-10 at 8.55.20 PM

The first step was very easy, we just inverted the code logic from the client side to the server side, and then we injected the prepared player on the page.

Pseudo Ruby server side code:

some_api = get("http://some.api/v/#{@id}/playlist")
other_api = get("http://other.api/v/#{}/playlist")
# ...
@final_uri = "#{protocol}://#{domain}/#{path}/#{manifest}"

Pseudo JS client side code:

new Our.Player({source: {{ @final_uri }} });

Screen Shot 2018-08-10 at 8.57.13 PM

Okay, that’s nice but can we go further? Yes, how about to embed our manifests into our page?! It turns out that we can do that with the power of URL.createObjectURL, this API gives us an URL for a JS blob/object/file.

// URL.createObjectURL is pretty trivial
// to use and powerfull as well
 var blob = new Blob(["#M3U8...."]
            , {type: "application/x-mpegurl"});
 var url = URL.createObjectURL(blob);

Pseudo Ruby server side code:

some_api = get("http://some.api/v/#{@id}/playlist")
other_api = get("http://other.api/v/#{}/playlist")
# ...
@final_uri = "#{protocol}://#{domain}/#{path}/#{manifest}"
@main_manifest = get(@final_uri)
@sub_manifests = @main_manifest
                 .map {|uri| get(uri)}

Pseudo JS client side code:

  var mime = "application/x-mpegurl";
  var manifest = {{ @main_manifest }};
  var subManifests = {{ @sub_manifests }};
  var subManifestsBlobURL = subManifest
                           .map(objectURLFor(content, mime));
  var finalMainManifest = manifest
                          .map(content.replace(id, subManifestsBlobURL[id]))

  function objectURLFor(content, mime) {
    var blob = new Blob([content], {type: mime});
    return URL.createObjectURL(blob);

  new Our.Player({
    src: objectURLFor(finalMainManifest, mime)

Screen Shot 2018-08-10 at 8.57.43 PM

We thought we were done but then we came up with the idea of doing the same process for the first video segment, the page now will weight more but the player would almost play instantaneously.

// for regular text manifest we can use regular Blob objects
// but for binary data we can rely on Uint8Array
var segment = new Uint8Array({{ segments.first }});

By the way, our player is based on Clappr and this particular test was done with hls.js playback which does use the fetch API to get the video segments, fetching this created URL works just fine.

The animated gif you see at the start of the post was done without the segment on the page optimization. And we just ignored the possible side effects on the player ABR algorithm (that could think it has a high bandwidth due to the fast manifest fetch).

Finally, we can make it even faster using the MPEG Dash and its template timeline format, we can use shorter segments sizes and we can tune the ABR algorithm to be initially faster.