Scryer: Netflix's Predictive Auto Scaling Engine - Part 2

by Danny Yuan, Neeraj Joshi, Daniel Jacobson, Puneet Oberai

In part 1 of this series, we introduced Scryer, Netflix’s predictive autoscaling engine, and discussed its use cases and how it runs in Netflix. In this second installment, we will discuss the design of Scryer ranging from the technical implementation to the algorithms that drive its predictions.

Design of Scryer

Scryer has a simple data flow architecture. On a very high level, historical data flows into Scryer, and predicted actions flow out. The diagram below shows the architecture.

The API layer provides a RESTful interface for a web UI, as well as automation scripts to interact with Scryer.

The Data Collector module pulls metrics from a pluggable list of data sources, cleans the data, and transforms it into a format suitable for the Predictor. The data retrieval is currently done incrementally within a sliding time window to minimize the load on the data source. The data is is also stored in a secondary persistent store for resiliency purposes.

The Predictor generates predictions based on a pluggable list of prediction algorithms. We implemented two prediction algorithms for production, one of which is an augmented linear regression based algorithm, the other based on Fast Fourier Transformation. The Predictor module also provides life cycle hooks for pre and post processing of predictions. A pluggable prediction combiner is then used to combine multiple predictions to generate a single final prediction.

The Action Plan Generator module uses the prediction and other control parameters (e.g., server throughput, server start time etc), to compute an auto scaling plan. The auto scaling plan is optimized to minimize the number of scale-up events while maintaining an optimal scale-up batch size for each scale-up event. Pre and post action hooks are available to apply additional padding to instance counts if required. For example, we may need to add extra instances for holidays.

The Scaler module carries out the action plan generated by the Action Plan Generator module. It allows a different implementation of actions. Currently, he have implemented three different actions:

Emitting predictions and action steps to our monitoring dashboard at scheduled time. This is great for simulating the behavior of Scryer. We can easily visualize the predictions and actions, and compare the predictions with the actual workload in a same graph.
Scheduling each step using AWS API for Scheduled Actions
Scheduling actions that will scale a cluster using EC2 API

Metrics for Prediction Algorithms

The first order of business for building the prediction algorithm is to determine what metrics are to be used for prediction and autoscaling actions. When using Amazon Auto Scaling, we normally settle on load average. Load average fits because it is a good indicator of the capacity of a cluster, and it is independent of traffic pattern. Our goal is simply to keep load average within a certain range by adjusting cluster size. However, load average is a misfit for prediction because it is a result of auto scaling. It is too complicated, if not impossible, to predict something that also changes by the prediction. A metrics has to satisfy two conditions to be easily predictable:

It has a clear, relatively stable, and preferably a recurring pattern. We can predict reliably only what has repeatedly happened in the past.
It is independent of cluster performance. We deploy our code frequently, and the performance of a deployment may vary per deployment. If the metrics depends on cluster performance, prediction may deviate widely from the actual values of the metrics.

Therefore, we decided to use user traffic for prediction. In particular, we use request per second by default because most of our services are request-based. User traffic satisfies the aforementioned two conditions.

Once we determined which metrics to predict on, we would also need to figure out how to calculate scaling actions. Since the goal of auto scaling is to ensure a cluster has sufficient number of machines to serve all the user traffic, all we need to do is to predict the size of cluster, which depends on the average throughput of a server:

We can get throughput metrics from our monitoring system, or from stress testing. Scryer also allows users to override the throughput value manually via web UI or by calling a RESTful API.

Prediction Algorithms

The key to effective prediction algorithms is making use of as many signals as possible and at the same time ignoring noise in input metrics. We observed that our input metrics had the following characteristics:

They have clear weekly periodicity for the same day of a week. That is, the traffic of two adjacent Tuesdays is more similar than that of adjacent Tuesday and Wednesday.
Their daily patterns are similar, albeit different in shapes and scales
They have some small spikes and drops that we can deem as noise.
The change of traffic is relatively constant week by week. In other words, the traffic at the same time in the same day moves approximately linearly week by week.
There could be occasional large spikes or large drops due to system outages.

Based on our observations, we took two different approaches: FFT-based smoothing, and linear regression with clustered data points.

FFT-Based Prediction

The idea of this algorithm is to treat incoming traffic as a combination of multiple sine curves. Noise is of high frequency and low amplitude. Therefore, we can use an FFT filter that filters out noise based on given thresholds of frequency and amplitude. The filtered result is a smoothened curve. To predict a future value, we shift the curve to find the past value that is exactly one period away. Mathematically speaking, if the filtered result is a function of time \(f(t)\), and the future value is another function of time \(g(t)\), then \(g(t) = f(t - \omega)\), where \(\omega\) is the periodicity of the function \(f(t)\). The figure below illustrates the idea. The black curve is the input, and the blue curve is the smoothened result. We can see the sharp spikes are filtered out because they have much higher frequency and much smaller amplitude than the blue curve.

The FFT based algorithm is also capable of ignoring outages. It detects outages by applying standard statistical methods. Once an outage is detected, the algorithm will iteratively apply FFT on adjusted data until the outage is ignored. The following figure shows that a simulated big drop is reasonably ignored:

The prediction algorithm undergoes multiple iterations to gradually remove the effect of such drop, as shown by the figure below. The first iteration is red, and the last iteration is yellow. We can see that prediction becomes better with each iteration.

Linear Regression on Clustered Data Points

We can’t apply linear regression directly on input metrics, as the shape of the input is sinoid. However, given that each day has similar pattern with a linear trend, we can pick data points at the same time but different days, and then apply linear regression. This approach would require a lot of days. However, if we zoom in on the data, we would see that within a smaller time window, say 10 minutes, the data points are of relatively identical values. Therefore, we can pick a cluster of data points from each time window, and then apply linear regression. This turns out to produce very accurate predictions. The following series of figures illustrate how linear regression works. This method also complements the FFT-based method. Some traffic patterns may contain regular but short-lived spikes. Such spikes are not noises. FFT-based method unfortunately filters such spikes out. However, this method will predict such spikes. The diagram below illustrates such regular patterns that will get filtered by FFT-based method.

This diagram below shows workload of one of Netflix clusters within a 30-minute window. The workload does not fluctuate more than 4%.

Therefore, we can pick data points around the same time of different days in a specified time window. The number of chosen data points are progressively reduced as we move back in time. This naturally gives newer data more influence than older ones. We also choose larger set of data points from the days that are highly similar to each other based on a weight matrix. For example, Saturday's traffic is similar to other Saturdays' and to its adjacent Sunday's, so we choose more data points from weekends for a Saturday than from weekdays.

Once we have clusters of data points, we can then apply linear regression. The blue dots are selected clustered points, and the red line is the result of linear regression. Once we obtain the line, we can predict by a simple extrapolation of the line to a future time.

One potential problem with this approach is that a single outage may make a lot of points invalid, therefore skewing regression results. This is why we still combine FFT-based method with this algorithm. In addition, we also apply outlier detection algorithms to remove invalid points. We implemented both distribution based algorithm and deviation based algorithm. Both turned out to work well.

Future Work

While Scryer has dramatically improved our system scaling, there are still many things that we can do to make it better. We plan to improve Scryer on three areas in near future:

Making Scryer distributed. The current implementation of Scryer runs on a single server. It is capable of handling hundreds of clusters, and tolerates temporary system crash because it checkpoints important states in Cassandra. That said, making it distributed can reduce Scryer’s bootstrap time, therefore reducing its potential down time. A distributed Scryer can also be scaled up to handle many more clusters. Distributing Scryer also helps its resilience. If the single instance fails, so does Scryer. Of course, we still have AAS, but that is not optimal. Plus, because it is on a single instance, we have to opt it out of Chaos Monkey. Opting it in gives us more data on how the system fares if it does drop out, so being distributed has that benefit as well.
Implementing automatic feedback loop so Scryer can auto tune. We record and monitor the accuracy of Scryer’s predictions, effect of scaled actions, as well instance start time. We then use such data to tune parameters of our algorithms. This work, however, can be largely automated. We plan to implement a trend detector. If the prediction starts to consistently deviate from actual workload, the detector will capture such a deviation, and feed it to an auto-correcting module. The auto-correcting module will compensate the auto scaling accordingly, and will also tune the prediction algorithm if needed.
Improving our prediction algorithms. For example, we ran experiments to find out how to choose cluster of data points for linear regression. We plan to automate this process so we always get accurate parameters for choosing data points. We also plan to improve the heuristics on how to filter out noises in our FFT-based algorithms.

Conclusion

Scryer adopts a simple yet flexible design that allows users to configure its behaviors with ease. It has built-in fault tolerant features to cope with temporary data unavailability, occasional data irregularities such as outages, and system downtime. The algorithms employed by Scryer take advantage of Netflix’s traffic patterns, and achieve accurate results. Although the approaches and algorithms described above are already yielding excellent results, we are constantly reviewing them in an effort to improve Scryer..

Finally, we work on these kinds of exciting challenges all the time at Netflix. If you would like to join us in tackling such problems, check out our Jobs site.