Tuesday, March 12, 2013

Netflix and Python

There's a nice post on their technical blog about how Netflix uses Python.

I found this section quite interesting:

Data Science and Engineering

Our Data Science and Engineering teams rely heavily on Python to help surface insights from the vast quantities of data produced by the organization. Python is used in tools for monitoring data quality, managing data movement and syncing, expressing business logic inside our ETL workflows, and running various web applications to visualize data.

One such application is Sting, a lightweight RESTful web service that slices, dices, and produces visualizations of large in-memory datasets. Our data science teams use Sting to analyze and iterate against the results of Hive queries on our big data platform. While a Hive query may take hours to complete, once the initial dataset is loaded in Sting, additional iterations using OLAP style operations enjoy sub-second response times. Datasets can be set to periodically refresh, so results are kept fresh and up to date. Sting is written entirely in Python, making heavy use of libraries such as pandas and numpy to perform fast filtering and aggregation operations.

Here's the video from PyCon 2013: http://pyvideo.org/video/1743/python-at-netflix

No comments:

Post a Comment