Programming for beginners: Don't Sweat the Spreadsheet: Quick Answers with Back-of-the-Envelope Math

"Back-of-the-envelope" means doing a quick and rough calculation on any piece of paper you have handy, like an envelope. It's not accurate, but it gives you a good idea of the answer instead of just guessing.

This way of estimating is popular in science, engineering, and finance because sometimes you need a quick answer before you can do a more detailed calculation. It's like making a rough sketch before you paint a picture.

Back-of-the-envelope calculations are a handy tool for estimating things quickly without digging into all the details.

Back-of-the-envelope calculations are:

1. Quick and Simplified: These calculations prioritize speed over precision. You use simple numbers and reasonable guesses to get a rough idea.

2. More than Guesswork: It's not just random guessing. You break down the problem into smaller parts and use basic math or common sense.

3. Identify Approximations: You aim to get close to the real answer, even if it's not exact. Being within a rough range (like within 5 to 10 times of real data) is usually sufficient.

These calculations are useful in the early stages of projects because they help you make decisions without spending a lot of time on complex calculations. They can also highlight issues early on. If your quick estimate suggests something might be too slow or expensive, it's a signal to investigate further.

To do back-of-the-envelope calculations, start by identifying the key factors, then make reasonable assumptions, perform simple math, and see what you come up with. It's a quick way to assess whether an idea has potential or needs further consideration.

Example: How Back-of-the-envelope calculation helps to start a restaurant.

For example, I am looking to launch a restaurant with a maximum capacity of 60 people, offering both South and North Indian cuisine for lunch and dinner. I want it to be located at near an IT tech park in Bangalore (India).

Let’s start with basic math

1. Location (Bangalore, India):

Monthly rent: 1,00,000 INR

Advance payment: 6,00,000 INR

2. Workers:

3 chefs and 5 regular workers:

Weekly wages for 3 chefs: 10,000 INR each * 3 = 30,000 INR

Weekly wages for 5 regular workers: 8,000 INR each * 5 = 40,000 INR

Monthly wages for chefs: 30,000 INR * 4 = 1,20,000 INR

Monthly wages for regular workers: 40,000 INR * 4 = 1,60,000 INR

3. Groceries:

Weekly expenses: 15,000 INR to 20,000 INR

Monthly expenses: 15,000 INR * 4 = 60,000 INR to 20,000 INR * 4 = 80,000 INR (let's take 70,000 INR as an average)

4. Utilities:

Monthly electricity bill: 7,000 INR to 11,000 INR

Let's take an average of 9,000 INR.

5. Marketing:

Monthly budget: 10,000 INR to 20,000 INR (let's take 15,000 INR as an average)

6. Other Expenses:

Monthly miscellaneous expenses: 10,000 INR to 20,000 INR (let's take 15,000 INR as an average)

Now, let's sum these up:

1. Rent and Advance: 1,00,000 INR + 6,00,000 INR = 7,00,000 INR (one-time payment)

2. Workers: 1,20,000 INR (chefs) + 1,60,000 INR (regular workers) = 2,80,000 INR (monthly)

3. Groceries: 70,000 INR (monthly)

4. Utilities: 9,000 INR (monthly)

5. Marketing: 15,000 INR (monthly)

6. Other Expenses: 15,000 INR (monthly)

So, the approximate total monthly expenses for the restaurant in Bangalore, India, with the increased rent and advance would be:

7,00,000 INR (one-time) + 2,80,000 INR + 70,000 INR + 9,000 INR + 15,000 INR + 15,000 INR = 11,89,000 INR

These figures are rough estimates and may vary based on specific circumstances and market conditions. These estimations provide a clear financial overview for launching a restaurant in Bangalore, India. With this, we have a comprehensive understanding of the monthly expenses, including staffing, groceries, utilities, marketing, and other miscellaneous costs. This allows us to make informed decisions about budgeting, pricing, and operational strategies, ensuring a smoother and more successful launch of your restaurant.

Let’s see how these estimations helps in capacity planning of system design. Let’s try to correlate above Restaurant example to software system design.

1. Number of Workers: The number of workers in the restaurant can be analogous to the number of servers or hardware components needed in a tech setup. Each worker in the restaurant serves a specific function, just as each server or hardware device serves a specific role in the tech infrastructure.

2. Salary/Investment: The salary of workers in the restaurant can be likened to the investment needed for hardware, software, and their maintenance in the tech world. Just as you need to allocate funds for paying employees in the restaurant, you also need to allocate funds for acquiring and maintaining hardware and software in a tech setup.

3. Location: The location of the restaurant, especially in proximity to an IT tech park, can be similar to the choice of a data center location in the tech world. Both decisions are strategic and can significantly impact the success and efficiency of operations.

By drawing these parallels, we can gain a deeper understanding of the financial and logistical considerations involved in both ventures. This can help you make more informed decisions and allocate resources effectively, whether you're starting a restaurant or setting up a tech infrastructure.

Apart from this, we need to grasp some basic quantitative metrics that helps in estimating the system.

Quick glance on data metrics are summarized in below table.

Name	Short name	Approximate value in bytes
Byte	B	1 byte = 8 bits
Kilobyte	KB	1,000 bytes
Megabyte	MB	1,000,000 bytes (1 million)
Gigabyte	GB	1,000,000,000 bytes (1 billion)
Terabyte	TB	1,000,000,000,000 bytes (1 trillion)
Petabyte	PB	1,000,000,000,000,000 bytes (1 quadrillion)

Quick glance on Latency numbers summarized in below table

Following approximated numbers are taken from (https://gist.github.com/jboner/2841832#file-latency-txt).

Operation Name	Time
L1 cache reference	0.5ns
Branch mispredict	5ns
L2 cache reference	14x L1 cache = 7ns
Mutex lock/unlock	25 ns
Main memory reference	100ns
Compress 1K bytes with Zippy	3000 ns
Send 1K bytes over 1 Gbps network	10,000ns
Read 4K randomly from SSD*	150,000ns
Read 1 MB sequentially from memory	250,000ns
Round trip within same datacenter	500,000ns
Read 1 MB sequentially from SSD	1,000,000ns = 1,000 us = 1 ms
Disk seek	10,000,000ns = 10,000us = 10 ms
Read 1 MB sequentially from disk	20,000,000ns = 20,000us = 20 ms

Notes

1 ns = 10^-9 seconds

1 us = 10^-6 seconds = 1,000 ns

1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

You can keep the link https://colin-scott.github.io/personal_website/research/interactive_latency.html handy to get a quick glance on latency numbers.

Let’s estimate an application estimations using above points

Suppose I'm tasked with developing a video streaming platform similar to YouTube for a client. According to their specifications, the platform will cater to 10,000 daily active users. Among these, roughly 10%, or 1,000 users, are expected to upload 2 to 3 videos daily, while the average user is anticipated to watch 4 videos per day. Now, I need to devise a capacity plan to accommodate these requirements.

To plan the capacity for video streaming application, we need to consider various factors including server resources, bandwidth requirements, and storage needs. Let's break down the requirements based on the information provided:

a. Daily Active Users (DAU): You have 10,000 daily active users.

b. User Uploads:

Approximately 10% of users upload 2 to 3 videos each day.

This means around 1000 users upload videos daily.

Let's assume an average video size of 100 MB.

c. User Views:

On average, a user watches 4 videos per day.

So, there are potentially 40,000 video views daily.

Now, let's calculate the capacity needs.

Server Resources:

a. Storage

1. Total daily uploads: 1000 users * 2.5 videos (average) * 100 MB = 250,000 MB = 250 GB

2. Assuming redundancy and future growth, let's allocate 500 GB for storage.

b. Bandwidth

1. Outbound bandwidth required: Number of views * Average video size

2. Total outbound bandwidth = 40,000 views * 100 MB = 4,000,000 MB = 4000 GB = 4 TB

3. Considering other data transfer (e.g., page requests, metadata), let's round it up to 5 TB.

c. Compute Resources

We need to ensure that the servers can handle concurrent uploads, video transcoding (if necessary), and serving video streams to users. This would depend on the specific technology stack you choose and may require load balancing and scalability measures.

These are rough estimates, to improve the efficiency, we can save frequently watched videos in cache, or we can even place the content in CDNs to serve faster etc.,

System Design Questions

Programming for beginners

Saturday, 11 May 2024

Don't Sweat the Spreadsheet: Quick Answers with Back-of-the-Envelope Math

No comments:

Post a Comment