Monday 4 December 2023

Quick overview on Scalability

 

Scalability is the term that we use to describe a system’s ability to cope with increased load. In simpler terms, it's about how well a system can handle more requests as demand increase.

 

For instance, you could have constructed a system with the expectation that it would handle a maximum of 1 million users. However, as time passes, your application gains popularity, and the user base unexpectedly surges from 1 million to 10 million. If you haven't taken scalability into account when designing the system, its performance could decline or even crash when faced with an unforeseen increase in demand.

 

There are two main types of scalability:

 

Vertical Scaling

Involves increasing the capacity of a single machine or resource within a system. For example, adding more memory, upgrading the CPU, or expanding storage on a single server.

 

Horizontal Scaling

Involves adding more machines or nodes to a network or system.

For example, instead of upgrading a single server, you add more servers to distribute the workload.

 

Following are some questions that can guide you in assessing and planning for scalability

 

1. User load

How many users do we expect to use the system concurrently?

What is the anticipated growth rate in user numbers over time?

What happen to the application, when the load doubles, triples etc.,?

What different types of users will access the system and what are their usage patterns?

 

2. Transaction or requests Rate

How many transactions or operations does the system need to handle per unit of time?

Are there specific operations that are more resource-intensive?

Can the system handle sudden spikes in traffic or data influx?

 

3. Resource Utilization

How efficiently are the system's resources (CPU, memory, storage) utilized under current load?

Is there room for optimization in resource usage?

 

4. Response Time Expectations

What are the acceptable response time thresholds for different types of requests?

How quickly should the system respond under varying loads?

What are the acceptable latency levels under different load conditions?

 

5. Load Balancing

How is the workload distributed across multiple servers or nodes?

Is there an effective load balancing strategy in place?

 

6. Disaster recovery and Failure Handling

How does the system handle failures or unexpected events under heavy load?

Is there a strategy for graceful degradation in case of overload?

How will the system handle failures or outages at different scales?

What mechanisms are in place for replication, backups, and disaster recovery?

Can the system automatically recover from failures without significant downtime?

How will scalability impact disaster recovery plans and procedures?

 

7. Caching Strategies

What data can be cached to reduce the need for repeated processing?

Are there opportunities to implement effective caching mechanisms?

 

8. Database Scaling

How does the database handle increased data volume and transaction rates?

Is there a strategy for database scaling, such as sharding or replication?

 

9. Data Handling

How much data does the system need to manage, and how is it distributed?

What is the anticipated growth rate in data volume?

 

10. Traffic Patterns

What are the peak hours or periods of high user activity?

Are there specific events or conditions that may cause a sudden increase in traffic?

 

11. Infrastructure Architecture

Is the system designed for vertical scaling, horizontal scaling, or a combination of both?

What cloud or server infrastructure is being used, and how easily can it scale?

How can we add computing resources to handle the additional load?

What are the possible options I have to cope with the traffic growth?

What are the limitations of current Architecture?

Are there any dependencies on external systems that impact scalability?

 

12. Code Optimization

Are there areas in the codebase that can be optimized for better performance?

How well does the code handle concurrent requests?

 

By addressing these questions in depth, you can build a thorough understanding of your scalability requirements and make well-informed decisions on how to achieve them successfully. Keep in mind that scalability is an ongoing process, so consistently monitoring, assessing, and adjusting your approach is crucial for long-term success.

 

How to describe a load on the system?

Describing the load varies from application to application. In case of microservices, we can describe the load by number of requests per second, in case of data processing engine, we can describe the load by  ‘How much time it takes to run a Job of specific dataset size?’

 

Following are the key aspects to consider when describing the load on a system.

 

1. Number of Requests

Number of requests per second

Number of reads and writes to a database

 

2. Concurrency

Number of requests being processed simultaneously.

Number of users supported simultaeously

Number of active sessions

 

3. Type of load

User-initiated: Arising from users engaging directly with the system, such as logins, transactions, or API calls.

Data-driven: Arising from the processing or transfer of data, including batch jobs, ETL pipelines, or file uploads.

 

4. Resource utilization

CPU, memory, disk, network bandwidth usage.

 

5. Error rate

Frequency of errors or failures under load.

 

By focusing on  these elements clearly, you can create a clear understanding of the load and its consequences, facilitating well-informed decisions regarding system optimization, capacity planning, and managing diverse demands. Keep in mind that the emphasis on specific details should align with your context and audience.

 

                                                             System Design Questions

No comments:

Post a Comment