Precomputation | James’ Coffee Blog


I have noticed a pattern across my work in search and on static site generators: in both cases, the key to offering good performance for users is to precompute as much information as you can.

Text search engines rely on “reverse indices,” which map words to the documents they are in. Reverse indices are structured in such a way that allows efficient querying at scale. These indices are built up at index time, which refers to when a document is processed and put into the index structure required for querying.

Keeping a reverse index takes a bit of upfront computation for each record you add, but once you have the index query times can be fast. For example, my blog search engine, which indexes on the order of 500,000+ words across 1,000+ documents, can return results in < 10ms because the indices can be queried incredibly quickly.

If my search engine were to manually search through every file for each query, it is likely it would take longer to return results. And, as more documents are added, search would take longer. Reverse indices, on the other hand, can scale at O(1) because they commonly use dictionaries.

Static site generators similarly rely on precomputation. The idea is that you generate full HTML documents from templates and content files. This happens at a “build” step. Then, the files can be served statically from a folder using a web server like nginx. Serving static files is incredibly fast. The alternative is to build a dynamic site, where templates are computed on every page load. This adds a bit of latency to each request, as the server needs to assemble the page according to any logic you have.

As I was thinking about what to name this post, I saw that Wikipedia has an article on precomputation. They point out another example of this pattern: materialized views in databases. The premise is that you create a query that is executed and saved as a table. The results are then incrementally changed as new records are added or changed.

For example, if you have a query that counts new customers in a database, the materialized view would add 1 to the count every time a new customer signs up. This means that next time the query is run, the result can be retrieved by querying the count rather than having to manually count every record.

Materialized views are ideal for computationally expensive operations.

This all makes me wonder: in what other places does precomputation pop up?



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top