Thiago P.

Scaling Up

Posted in System Admin

← back to the blog


From what I can gather, twitter had an image storage problem, handling 200gb per second.

Twitter used to upload tweets along with images, and pass the image, along with the tweet, through the entire pipeline.

Once saved images would live forever in the database.

To resolve this, twitter made media uploads (images, videos) separate. When a tweet is created that uses an image, a handle for that image would be sent along with the tweet instead of the entire image data.

They also gave images a 20 day TTL (time to live). After 20 days, images would be removed from the database.

In order to improve usage in slow internet areas, they created segmented, reusable uploads. When a user decides to upload an image, the upload is broken into segments and the segments are uploaded. If anyone upload fails, it can be done again. If you walk into a subway and lose connectivity, when you come out, the upload will be resumed.



Facebook had to deal with massive, short lived, viewership spikes during live video streams.

The Basics

  1. When a video stream is started, it is streamed to a Live Stream Server
  2. The live stream server transcodes the video stream into multiple bit rates
  3. For each bit rate, a set of 1 second segments is produced
  4. These 1 second segments are then sent to a data center cache
  5. The data center then sends the segments to Points of Presence (PoP) around the globe
  6. Users receive the video stream form the PoPs

The PoP

The Thundering Herd

If many users, at the same PoP, request the same segment, at the same time, and the segment does not exist in the cache and must be retrieved from the datacenter, then there may exist a thundering herd.

To solve this, only the first request for a segment is sent, all the other users who want that segment wait on that request.

The HTTP proxies also have a caching layer. Only the first request to the HTTP proxy will make a call to the PoP for the video segment. The others will wait on that request.



Over the course of 1 year and a half, uber went form 200 to 2000 engineers, producing more than 7000 git repos and more than 1000 microservices.

Here are some of the aspects of Ubers massive growth and the tradeoffs that came with it.

Source: Scaling Uber to 1000 Services • Matt Ranney

1000+ Microservices

Git Repos... Everywhere

Uber has skyrocketed into 7000+ git repos.

One is bad, one is good. Many are bad, many are good

 RPC is slow, but it works

How should communication between all those services be handled

Lots of Languages