Building the New Netflix Experience for TV

by Joubert Nel

We just launched a new Netflix experience for TV and game consoles. The new design is based on our premise that each show or movie has a tone and a narrative that should be conveyed by the UI. To tell a richer story we provide relevant evidence and cinematic art that better explain why we think you should watch a show or movie.

The new user interface required us to question our paradigms about what can be delivered on TV – not only is this UI more demanding of game consoles than any of our previous UIs, but we also wanted budget devices to deliver a richer experience than what was previously possible.

For the first time we needed a single UI that could accept navigation using a TV remote or game controller, as well as voice commands and remotes that direct a mouse cursor on screen.

Before we get into how we developed for performance and built for different input methods, let’s take a look at our UI stack.

UI Stack

My team builds Netflix UIs for the devices in your living room: PlayStation 3, PlayStation 4, Xbox 360, Roku 3, and recent Smart TVs and Blu-ray players.

We deploy UI updates with new A/B tests, support for new locales like the Netherlands, and new features like Profiles. While remaining flexible, we also want to take advantage of as much of the underlying hardware as possible in a cross-platform way.

So, a few years ago we broke our device client code into two parts: an SDK that runs on the metal, and a UI written in JavaScript. The SDK provides a rendering engine, JavaScript runtime, networking, security, video playback, and other platform hooks. Depending on the device, SDK updates range from quarterly to annually to never. The UI, in contrast, can be updated at any time and is downloaded (or retrieved from disk cache) when the user fires up Netflix.

Key, Voice, Pointer

The traditional way for users to control our UI on a game console or TV is via an LRUD input (left/right/up/down) such as a TV remote control or game controller. Additionally, Xbox 360 users should be able to navigate with voice commands and folks with an LG Magic Remote Smart TV must be able to navigate by pointing their remote control at elements on screen. Our new UI is our first to incorporate all three input methods in a single design.

We wanted to build our view components in such a way that their interaction behaviors are encapsulated. This code proximity makes code more maintainable and reusable and the class hierarchy more robust. We needed a consistent way to dispatch the three kinds of user input events to the view hierarchy.

We created a new JavaScript event dispatcher that routes key, pointer, and voice input in a uniform way to views. We needed an incremental solution that didn’t require refactoring the whole codebase, so we designed it to coexist with our legacy key handling and provide a migration path.

We must produce JavaScript builds that only contain code for those methods supported by the target device because reduced code size yields faster code parsing, in turn reducing startup time.

To produce lean builds, we use a text preprocessor to strip out input handling code that is irrelevant to a target platform. The advantage of using a text preprocessor instead of, for example, using mixins to layer in additional appearances and interactions, is that we get much higher levels of code proximity and simplicity.

Performance

Devices in the living room use DirectFB or OpenGL for graphics (or something OpenGL-like) and can use hardware acceleration for animating elements of the UI. Leveraging the GPU is key in creating a smooth experience that is responsive to user input – we’ve done it on WebKit using accelerated compositing (see WebKit in Your Living Room and Building the Netflix UI for Wii U).

The typical implementation of hardware accelerated animation of a rectangle requires width x height x bytes per pixel of memory. In our UI we animate entire scenes when transitioning between them; animating one scene at 1080p would require close to 8MB of memory (1920 x 1080 x 4) but at 720p requires 3.5MB (1280 x 720 x 4). We see devices with as little as 20MB memory allocated to a hardware-accelerated rendering cache. Moreover, other system resources such as main memory, disk cache, and CPU may also be severely constrained as compared to a mobile phone, laptop, or game console.

How can we squeeze as much performance as possible out of budget devices and add more cinematic animations on game consoles?

We think JavaScript, HTML and CSS are great technologies to build compelling experiences with, such as our HTML 5 player UI. But we wanted more fine-grained control of the graphics layer and wanted optimizations for apps that do not need reflowable content. Our SDK team built a new rendering engine with which we can deliver animations on very resource constrained devices, making it possible to give customers our best UI. We can also enrich the experience with cinematic animations & effects on game consoles.

The second strategy is by grouping devices into performance classes that give us entry points to turn different knobs such as pool sizes, prefetch ranges, effects, animations, and caching, to take advantage of fewer/more resources while maintaining the integrity of the UI design & interaction.

Delivering great experiences

In the coming weeks we will be diving into more details of our JavaScript code base on this blog.

Building the new Netflix experience for TV was a lot of work, but it gave us a chance to be a PlayStation 4 launch partner, productize our biggest A/B test successes of 2013, and delight tens of millions of Netflix customers.

If this excites you and want to help build the future UIs for discovering and watching shows and movies, join our team!