(Rough Draft / Work in progress)
“Always bet on the web”, Tomster
Making a HTTP request is fairly straight forward and easy, but more than likely you haven’t associated the cost of making such requests. It is my hope to visualize the cost and give strategies on how to avoid or reduce that cost.
Note: I have used a representational price and the value of such has not been explicitly calculated but will give a good indicator of the total cost of a HTTP request.
Here is the visual for a typical HTTP request
[browser] ← — -> [webserver] ← — → [database]
Let’s break it down into stages
Basic HTTP Request
[browser] ← — -> [webserver]
Add a cost of making such a request
[browser] ←-$$ — -> [webserver]
While not expensive there is a cost to making a basic http request, the cost is compounded based on the complexity of the request, for example if it is data intensive or lengthy like sending an email. While this happens there is a cost for the allocation of the server resources and it is multiplied by the number of concurrent users.
Is there away to avoid this cost? Here are two such solutions.
First use cache headers when possible to reduce the cost of the payload and overhead of getting data that infrequently changes.
Secondly cache on the client, use localstorage or similar to avoid making the request in the first place that way there is no cost gained** as there is no request.
Bonus introduce a queuing system and push long running processes to the background wherein you can manage the resources need to fulfill the request and not bog down the webserver and eat up it’s limited resources.
HTTP Request with Simple Database Query
We just saw the cost of a basic HTTP request, now let’s look at how much more it costs when make a query to the database.
[webserver] ←-$$$$ — ->[database]
As you can see the cost is now 4x$ up from 2x$. Obviously there is an additional cost associated with making a database query and therefore we should avoid doing so when possible, but how?
We can introduce a caching on the webserver. Typically you can store the data to disk and retrieve it instead of querying the database but using a caching server like Redis or Memcache will improve the performance of retrieving data and make your application more responsive. One thing to look out for is thrashing wherein there is contention when the cache is being updated and other requests are coming in. One strategy is to let the first requestor create a lock, to inform other requests that data is being added/refreshed, and then let that first requestor handle the retrieval and subsequent updating of the cache. Depending on the time sensitivity of the data you could opt to show the old cached data whilst it is being updated.
Here is what it looks like and the cost.
[webserver] ← — $ — → [cache]←-$$$$ — ->[database]
HTTP Request with Complex Database Query
[database] ←-2x$ — -> [join] ← — -> [table(s)]
Adding another layer on the database side with joins, the more joins you have the more costly it becomes.
HTTP Request with Complex Database Query with No Index
What’s worse than joins queries that don’t use any index (or tables that don’t have any for that matter)
[database] ←-2x$ — -> [join] ← — -> [table(s)] ←-10x$ — → [no index]
What happens when there is no index, is the database engine is forced to do an entire table scan in an effort to find what you are searching for. This gets worse when you add a join to one or more tables who don’t have indexes.
If I recall correctly how you create the index matters, it should follow the same sequence as your where clause i.e.
SELECT FROM table WHERE columnb = 1 AND column_a = 2 then your index should be on column_b and column_b and NOT column_a and column_b.
The HTTP Request Financial Model
Here is the complete financial model of a HTTP request.
[browser] ←-$$ — -> [webserver]← — $ — → [cache]←-$$$$ — ->[database] ←-2x$ — -> [join] ← — -> [table(s)] ←-10x$ — → [no index]
You can see every step of the way, where you are incurring additional costs.
Here is a table to show the overall costs of based on different scenarios
HTTP Request with Database Query — 6x$
HTTP Request with Database Query with JOIN — 12x$
HTTP Request with Database Query with NO INDEX — 120x$
HTTP Request with Database Query with Cache Hit — 3x$
HTTP Request with Caching Headers — 1x$
No HTTP Request using Client side caching (using local storage) — 0x$
((Draw bar chart with cost on y axis and HTTP requests on x axis))
Hope this gives you a mental model so that the next time you make are about to make a request you think about the costs associated with such a request.