system-design-primer/solutions/system_design/scaling_aws/README.md

404 lines
19 KiB
Markdown
Raw Normal View History

# Design a system that scales to millions of users on AWS
2017-03-05 13:07:31 +08:00
*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
2017-03-05 13:07:31 +08:00
## Step 1: Outline use cases and constraints
2017-03-05 13:07:31 +08:00
> Gather requirements and scope the problem.
> Ask questions to clarify use cases and constraints.
> Discuss assumptions.
2017-03-05 13:07:31 +08:00
Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
2017-03-05 13:07:31 +08:00
### Use cases
2017-03-05 13:07:31 +08:00
Solving this problem takes an iterative approach of: 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat, which is good pattern for evolving basic designs to scalable designs.
2017-03-05 13:07:31 +08:00
Unless you have a background in AWS or are applying for a position that requires AWS knowledge, AWS-specific details are not a requirement. However, **much of the principles discussed in this exercise can apply more generally outside of the AWS ecosystem.**
2017-03-05 13:07:31 +08:00
#### We'll scope the problem to handle only the following use cases
2017-03-05 13:07:31 +08:00
* **User** makes a read or write request
* **Service** does processing, stores user data, then returns the results
* **Service** needs to evolve from serving a small amount of users to millions of users
* Discuss general scaling patterns as we evolve an architecture to handle a large number of users and requests
* **Service** has high availability
2017-03-05 13:07:31 +08:00
### Constraints and assumptions
2017-03-05 13:07:31 +08:00
#### State assumptions
2017-03-05 13:07:31 +08:00
* Traffic is not evenly distributed
* Need for relational data
* Scale from 1 user to tens of millions of users
* Denote increase of users as:
* Users+
* Users++
* Users+++
2017-03-05 13:07:31 +08:00
* ...
* 10 million users
* 1 billion writes per month
* 100 billion reads per month
* 100:1 read to write ratio
* 1 KB content per write
2017-03-05 13:07:31 +08:00
#### Calculate usage
2017-03-05 13:07:31 +08:00
**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
2017-03-05 13:07:31 +08:00
* 1 TB of new content per month
* 1 KB per write * 1 billion writes per month
* 36 TB of new content in 3 years
* Assume most writes are from new content instead of updates to existing ones
* 400 writes per second on average
* 40,000 reads per second on average
2017-03-05 13:07:31 +08:00
Handy conversion guide:
2017-03-05 13:07:31 +08:00
* 2.5 million seconds per month
* 1 request per second = 2.5 million requests per month
* 40 requests per second = 100 million requests per month
* 400 requests per second = 1 billion requests per month
2017-03-05 13:07:31 +08:00
## Step 2: Create a high level design
2017-03-05 13:07:31 +08:00
> Outline a high level design with all important components.
2017-03-05 13:07:31 +08:00
![Imgur](http://i.imgur.com/B8LDKD7.png)
## Step 3: Design core components
2017-03-05 13:07:31 +08:00
> Dive into details for each core component.
2017-03-05 13:07:31 +08:00
### Use case: User makes a read or write request
2017-03-05 13:07:31 +08:00
#### Goals
2017-03-05 13:07:31 +08:00
* With only 1-2 users, you only need a basic setup
* Single box for simplicity
* Vertical scaling when needed
* Monitor to determine bottlenecks
2017-03-05 13:07:31 +08:00
#### Start with a single box
2017-03-05 13:07:31 +08:00
* **Web server** on EC2
* Storage for user data
* [**MySQL Database**](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
2017-03-05 13:07:31 +08:00
Use **Vertical Scaling**:
2017-03-05 13:07:31 +08:00
* Simply choose a bigger box
* Keep an eye on metrics to determine how to scale up
* Use basic monitoring to determine bottlenecks: CPU, memory, IO, network, etc
* CloudWatch, top, nagios, statsd, graphite, etc
* Scaling vertically can get very expensive
* No redundancy/failover
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* The alternative to **Vertical Scaling** is [**Horizontal scaling**](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
2017-03-05 13:07:31 +08:00
#### Start with SQL, consider NoSQL
2017-03-05 13:07:31 +08:00
The constraints assume there is a need for relational data. We can start off using a **MySQL Database** on the single box.
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) section
* Discuss reasons to use [SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
2017-03-05 13:07:31 +08:00
#### Assign a public static IP
2017-03-05 13:07:31 +08:00
* Elastic IPs provide a public endpoint whose IP doesn't change on reboot
* Helps with failover, just point the domain to a new IP
2017-03-05 13:07:31 +08:00
#### Use a DNS
2017-03-05 13:07:31 +08:00
Add a **DNS** such as Route 53 to map the domain to the instance's public IP.
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the [Domain name system](https://github.com/donnemartin/system-design-primer#domain-name-system) section
2017-03-05 13:07:31 +08:00
#### Secure the web server
2017-03-05 13:07:31 +08:00
* Open up only necessary ports
* Allow the web server to respond to incoming requests from:
* 80 for HTTP
* 443 for HTTPS
* 22 for SSH to only whitelisted IPs
* Prevent the web server from initiating outbound connections
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the [Security](https://github.com/donnemartin/system-design-primer#security) section
2017-03-05 13:07:31 +08:00
## Step 4: Scale the design
2017-03-05 13:07:31 +08:00
> Identify and address bottlenecks, given the constraints.
2017-03-05 13:07:31 +08:00
### Users+
2017-03-05 13:07:31 +08:00
![Imgur](http://i.imgur.com/rrfjMXB.png)
#### Assumptions
2017-03-05 13:07:31 +08:00
Our user count is starting to pick up and the load is increasing on our single box. Our **Benchmarks/Load Tests** and **Profiling** are pointing to the **MySQL Database** taking up more and more memory and CPU resources, while the user content is filling up disk space.
2017-03-05 13:07:31 +08:00
We've been able to address these issues with **Vertical Scaling** so far. Unfortunately, this has become quite expensive and it doesn't allow for independent scaling of the **MySQL Database** and **Web Server**.
2017-03-05 13:07:31 +08:00
#### Goals
2017-03-05 13:07:31 +08:00
* Lighten load on the single box and allow for independent scaling
* Store static content separately in an **Object Store**
* Move the **MySQL Database** to a separate box
* Disadvantages
* These changes would increase complexity and would require changes to the **Web Server** to point to the **Object Store** and the **MySQL Database**
* Additional security measures must be taken to secure the new components
* AWS costs could also increase, but should be weighed with the costs of managing similar systems on your own
2017-03-05 13:07:31 +08:00
#### Store static content separately
2017-03-05 13:07:31 +08:00
* Consider using a managed **Object Store** like S3 to store static content
* Highly scalable and reliable
* Server side encryption
* Move static content to S3
* User files
2017-03-05 13:07:31 +08:00
* JS
* CSS
* Images
* Videos
2017-03-05 13:07:31 +08:00
#### Move the MySQL database to a separate box
2017-03-05 13:07:31 +08:00
* Consider using a service like RDS to manage the **MySQL Database**
* Simple to administer, scale
* Multiple availability zones
* Encryption at rest
2017-03-05 13:07:31 +08:00
#### Secure the system
2017-03-05 13:07:31 +08:00
* Encrypt data in transit and at rest
* Use a Virtual Private Cloud
* Create a public subnet for the single **Web Server** so it can send and receive traffic from the internet
* Create a private subnet for everything else, preventing outside access
* Only open ports from whitelisted IPs for each component
* These same patterns should be implemented for new components in the remainder of the exercise
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the [Security](https://github.com/donnemartin/system-design-primer#security) section
2017-03-05 13:07:31 +08:00
### Users++
2017-03-05 13:07:31 +08:00
![Imgur](http://i.imgur.com/raoFTXM.png)
#### Assumptions
2017-03-05 13:07:31 +08:00
Our **Benchmarks/Load Tests** and **Profiling** show that our single **Web Server** bottlenecks during peak hours, resulting in slow responses and in some cases, downtime. As the service matures, we'd also like to move towards higher availability and redundancy.
2017-03-05 13:07:31 +08:00
#### Goals
2017-03-05 13:07:31 +08:00
* The following goals attempt to address the scaling issues with the **Web Server**
* Based on the **Benchmarks/Load Tests** and **Profiling**, you might only need to implement one or two of these techniques
* Use [**Horizontal Scaling**](https://github.com/donnemartin/system-design-primer#horizontal-scaling) to handle increasing loads and to address single points of failure
* Add a [**Load Balancer**](https://github.com/donnemartin/system-design-primer#load-balancer) such as Amazon's ELB or HAProxy
* ELB is highly available
* If you are configuring your own **Load Balancer**, setting up multiple servers in [active-active](https://github.com/donnemartin/system-design-primer#active-active) or [active-passive](https://github.com/donnemartin/system-design-primer#active-passive) in multiple availability zones will improve availability
* Terminate SSL on the **Load Balancer** to reduce computational load on backend servers and to simplify certificate administration
* Use multiple **Web Servers** spread out over multiple availability zones
* Use multiple **MySQL** instances in [**Master-Slave Failover**](https://github.com/donnemartin/system-design-primer#master-slave-replication) mode across multiple availability zones to improve redundancy
* Separate out the **Web Servers** from the [**Application Servers**](https://github.com/donnemartin/system-design-primer#application-layer)
* Scale and configure both layers independently
* **Web Servers** can run as a [**Reverse Proxy**](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
* For example, you can add **Application Servers** handling **Read APIs** while others handle **Write APIs**
* Move static (and some dynamic) content to a [**Content Delivery Network (CDN)**](https://github.com/donnemartin/system-design-primer#content-delivery-network) such as CloudFront to reduce load and latency
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the linked content above for details
2017-03-05 13:07:31 +08:00
### Users+++
2017-03-05 13:07:31 +08:00
![Imgur](http://i.imgur.com/OZCxJr0.png)
**Note:** **Internal Load Balancers** not shown to reduce clutter
2017-03-05 13:07:31 +08:00
#### Assumptions
2017-03-05 13:07:31 +08:00
Our **Benchmarks/Load Tests** and **Profiling** show that we are read-heavy (100:1 with writes) and our database is suffering from poor performance from the high read requests.
2017-03-05 13:07:31 +08:00
#### Goals
2017-03-05 13:07:31 +08:00
* The following goals attempt to address the scaling issues with the **MySQL Database**
* Based on the **Benchmarks/Load Tests** and **Profiling**, you might only need to implement one or two of these techniques
* Move the following data to a [**Memory Cache**](https://github.com/donnemartin/system-design-primer#cache) such as Elasticache to reduce load and latency:
* Frequently accessed content from **MySQL**
* First, try to configure the **MySQL Database** cache to see if that is sufficient to relieve the bottleneck before implementing a **Memory Cache**
* Session data from the **Web Servers**
* The **Web Servers** become stateless, allowing for **Autoscaling**
* Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
* Add [**MySQL Read Replicas**](https://github.com/donnemartin/system-design-primer#master-slave-replication) to reduce load on the write master
* Add more **Web Servers** and **Application Servers** to improve responsiveness
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the linked content above for details
2017-03-05 13:07:31 +08:00
#### Add MySQL read replicas
2017-03-05 13:07:31 +08:00
* In addition to adding and scaling a **Memory Cache**, **MySQL Read Replicas** can also help relieve load on the **MySQL Write Master**
* Add logic to **Web Server** to separate out writes and reads
* Add **Load Balancers** in front of **MySQL Read Replicas** (not pictured to reduce clutter)
* Most services are read-heavy vs write-heavy
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) section
2017-03-05 13:07:31 +08:00
### Users++++
2017-03-05 13:07:31 +08:00
![Imgur](http://i.imgur.com/3X8nmdL.png)
#### Assumptions
Our **Benchmarks/Load Tests** and **Profiling** show that our traffic spikes during regular business hours in the U.S. and drop significantly when users leave the office. We think we can cut costs by automatically spinning up and down servers based on actual load. We're a small shop so we'd like to automate as much of the DevOps as possible for **Autoscaling** and for the general operations.
#### Goals
* Add **Autoscaling** to provision capacity as needed
* Keep up with traffic spikes
* Reduce costs by powering down unused instances
* Automate DevOps
* Chef, Puppet, Ansible, etc
* Continue monitoring metrics to address bottlenecks
* **Host level** - Review a single EC2 instance
* **Aggregate level** - Review load balancer stats
* **Log analysis** - CloudWatch, CloudTrail, Loggly, Splunk, Sumo
* **External site performance** - Pingdom or New Relic
* **Handle notifications and incidents** - PagerDuty
* **Error Reporting** - Sentry
#### Add autoscaling
* Consider a managed service such as AWS **Autoscaling**
* Create one group for each **Web Server** and one for each **Application Server** type, place each group in multiple availability zones
* Set a min and max number of instances
* Trigger to scale up and down through CloudWatch
* Simple time of day metric for predictable loads or
* Metrics over a time period:
* CPU load
* Latency
* Network traffic
* Custom metric
* Disadvantages
* Autoscaling can introduce complexity
* It could take some time before a system appropriately scales up to meet increased demand, or to scale down when demand drops
### Users+++++
2017-03-05 13:07:31 +08:00
![Imgur](http://i.imgur.com/jj3A5N8.png)
**Note:** **Autoscaling** groups not shown to reduce clutter
2017-03-05 13:07:31 +08:00
#### Assumptions
2017-03-05 13:07:31 +08:00
As the service continues to grow towards the figures outlined in the constraints, we iteratively run **Benchmarks/Load Tests** and **Profiling** to uncover and address new bottlenecks.
2017-03-05 13:07:31 +08:00
#### Goals
2017-03-05 13:07:31 +08:00
We'll continue to address scaling issues due to the problem's constraints:
2017-03-05 13:07:31 +08:00
* If our **MySQL Database** starts to grow too large, we might consider only storing a limited time period of data in the database, while storing the rest in a data warehouse such as Redshift
* A data warehouse such as Redshift can comfortably handle the constraint of 1 TB of new content per month
* With 40,000 average read requests per second, read traffic for popular content can be addressed by scaling the **Memory Cache**, which is also useful for handling the unevenly distributed traffic and traffic spikes
* The **SQL Read Replicas** might have trouble handling the cache misses, we'll probably need to employ additional SQL scaling patterns
* 400 average writes per second (with presumably significantly higher peaks) might be tough for a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques
2017-03-05 13:07:31 +08:00
SQL scaling patterns include:
2017-03-05 13:07:31 +08:00
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
2017-03-05 13:07:31 +08:00
To further address the high read and write requests, we should also consider moving appropriate data to a [**NoSQL Database**](https://github.com/donnemartin/system-design-primer#nosql) such as DynamoDB.
2017-03-05 13:07:31 +08:00
We can further separate out our [**Application Servers**](https://github.com/donnemartin/system-design-primer#application-layer) to allow for independent scaling. Batch processes or computations that do not need to be done in real-time can be done [**Asynchronously**](https://github.com/donnemartin/system-design-primer#asynchronism) with **Queues** and **Workers**:
2017-03-05 13:07:31 +08:00
* For example, in a photo service, the photo upload and the thumbnail creation can be separated:
* **Client** uploads photo
* **Application Server** puts a job in a **Queue** such as SQS
* The **Worker Service** on EC2 or Lambda pulls work off the **Queue** then:
* Creates a thumbnail
* Updates a **Database**
* Stores the thumbnail in the **Object Store**
2017-03-05 13:07:31 +08:00
*Trade-offs, alternatives, and additional details:*
2017-03-05 13:07:31 +08:00
* See the linked content above for details
2017-03-05 13:07:31 +08:00
## Additional talking points
2017-03-05 13:07:31 +08:00
> Additional topics to dive into, depending on the problem scope and time remaining.
2017-03-05 13:07:31 +08:00
### SQL scaling patterns
2017-03-05 13:07:31 +08:00
* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
2017-03-05 13:07:31 +08:00
#### NoSQL
* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
2017-03-05 13:07:31 +08:00
### Caching
2017-03-05 13:07:31 +08:00
* Where to cache
* [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
* [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
* [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
* [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
* [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
* What to cache
* [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
* [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
* When to update the cache
* [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
* [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
* [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
* [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
2017-03-05 13:07:31 +08:00
### Asynchronism and microservices
2017-03-05 13:07:31 +08:00
* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
2017-03-05 13:07:31 +08:00
### Communications
2017-03-05 13:07:31 +08:00
* Discuss tradeoffs:
* External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
* Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
2017-03-05 13:07:31 +08:00
### Security
2017-03-05 13:07:31 +08:00
Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
2017-03-05 13:07:31 +08:00
### Latency numbers
2017-03-05 13:07:31 +08:00
See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
2017-03-05 13:07:31 +08:00
### Ongoing
2017-03-05 13:07:31 +08:00
* Continue benchmarking and monitoring your system to address bottlenecks as they come up
* Scaling is an iterative process