Rich content presentation on personal search engines

Search engines help us find information. Over the years, web search engines such as Google have experimented extensively with how that value proposition translates into a user experience.

At first, search engines helped you find web pages that addressed queries. Google famously showed “10 blue links.” Then, search engines started to render more rich results. For example, Google introduced the “knowledge panel,” which displayed information about a concept in its knowledge graph. This could include a description of a singer, the location of a sports venue, or something else that the graph and its sources knew about.

Google also introduced “featured snippets” that pull a sequences of sentences that the search engine thinks will answer, or help you answer, your query directly from an article, and shows them at the top of search results.

For some queries, like “when was the empire state building built”, Google can return the exact answer in big text.

This is all to say: a lot of experimentation has been done in the realm of presenting answers to questions in a search engine.

This all has me thinking: how does this apply to the concept of a personal website search engine?

Rethinking content presentation on my blog search engine

The foundation of my blog search is to return blog posts that are relevant to a query. Relevance is determined by the TF-IDF keyword ranking algorithm. A boost is then added depending on what fields are most relevant to a query, the number of links that point to a page, and more.

The result is a search page like this:

A search results page showing results for the query “coffee”

I have been thinking: given the search input, how can I introduce custom results that are relevant to me?

I had a few ideas in the back of my mind that came from personal itches and discussions with friends. These included:

Native ability to translate a Taylor Swift song title acronym into its full song;
A HTML representation of a calendar for a given month and year;
The ability to query how much I have used a term over the years my blog has been active, and more.

In this way, my blog search engine becomes more than a means by which to search blog posts: it has features that let me do things that I want to be easier. And because I am building the search engine, I can determine how I want things to work for me and others based on my experience with each problem I have in mind.

My search engine now has two components:

The text search, and;
A handler that returns special snippets for the afore-listed three ideas.

If a query is a Taylor Swift acronym, a box will appear at the top of the search page that translates the acronym. If a query is a month or a month and a year, a HTML calendar will appear. If a query contains wu (word use), the search engine shows the results for the query.

Here is an example of these results.

Acronyms

Translation for the acronym ATW

Calendar

A HTML calendar for a year

Word Use

Word use of the term “coffee” over the years

Design principles

In designing each widget, I ask myself: what do I want my experience to be? This gives me a wide field of view from which I can start to consider a functional design.

Then, I work to make the result as fast as possible, which is a goal for the whole search experience. It takes ~10ms for the search engine to calculate results for a text query. Any additional processing is going to be overhead on top of that, so it is important for me to think about speed in implementation.

A realization: Search engine query classification

Rich content presentation is contingent on the search engine understanding a query meets a certain criteria. I realised yesterday that I have been overthinking what it means for such “understanding” to take place in the context of my use.

To trigger the above queries, I have hard-coded rules that look out for keywords:

If a search query is equal to a song acronym, the acronym database is queries.
If a search query is “[month] [year]”, a calendar is generated using the calendar python package and returned.
If a search query contains “wu”, a calculation can take place to calculate word use. The results are then cached.

With this framing, I am starting to build a better mental model around how these additional features should flow.

First, a search query should run. Then, the search query should be matched against all possible “rich” options. If there is a rich option available, the results can be computed or retrieved, then returned as a special snippet on the page.

While this may seem obvious, the realm of ascertaining query intent is large and varied, and gets more complicated depending on how much you want to “understand” a query. I used to think I would need really advanced technology to make something that works, then I realised that pattern matching would do. (Of course, pattern matching optimized for speed!)

Conclusion

My mind is presently occupied with the question “what should a search engine be?” There is plenty of prior art that represents years of research into what a search engine could be. Google, Bing, DuckDuckGo, etc. have experimented extensively with text search and presenting direct answers to questions. Wolfram Alpha has developed an engine designed to answer questions like solving math formulas and retrieving demographic information that is built on a complex knowledge base and query understanding system.

With that said, I wonder: what’s next? I am certainly excited by the idea of solving use cases that existing search engines don’t solve in the way that I want. I am motivatd by the idea of building a tool to retrieve information as fast as possible, with as few barriers as possible. With the key parts in place – a search engine to retrieve relevant documents, a user interface – I am excited to explore making custom results for different query types.

Source link