Ask LukeW: Custom Re-ranker

by February 10, 2025

Since launching the Ask Luke feature on this website nearly two years ago, people have asked the system over 25,000 questions. But not all were getting answered even when they could have been. Enter... a custom re-ranker.

At a high-level, Ask Luke makes use of the thousands or articles, hundreds of presentations, and more I've authored over the years to answer people's questions about digital product design. To do so, we first process and clean-up all these files so we can retrieve the relevant parts of them when someone asks a question. After retrieval, those results are packaged up for Large Language Models to utilize when generating a reply.

Ask Luke architecture diagram

To find the parts of all these documents that can best answer any given question, we do both an embedding search (in vector space) and a keyword search. This combination of retrieval techniques ensures we're finding content that talks about related topics and specifically matches unique terms. Keyword search was a later addition after we saw that embeddings, which are great at semantic search, could miss needles in the haystack. For example, a concept like PID.

The results of both these searches get diversified to make sure we're not just repeating the same content. For example, I've given the same talk at different events so no need to use two versions. What's left of our search results is then filtered by a relevance score. If it meets the threshold, we include it in our instructions for whatever Large Language Model is being used for generation. Usually we fill up an LLM's context window with about ten results.

Ask Luke custom re-ranker exmaple

While these retrieval techniques work to answer most people's questions, they sometimes miss out on useful but not directly relevant content. So why not just lower the threshold to make use of more content when responding? We tried but irrelevant content would regularly pollute answers. After some experimentation, a custom re-ranker helped the most to expand coverage while maintaining quality. Questions that were not answered before now had useful replies as the images above and below illustrate.

Ask Luke custom re-ranker exmaple

What does the re-ranker do? If we don't have ten results that meet our relevance threshold. We take any results that meet a lower threshold and send them (in parallel) to a fast AI model (like Gemini Flash 2.0) that evaluates how well each could answer the question. Any results deemed useful are then used to backfill the instructions for content generation resulting in a wider set of questions we can answer well.

Further Reading

Additional articles about what I've tried and learned by rethinking the design and development of my Website using large-scale AI models.

Acknowledgments

Big thanks to Kian Sutarwala and Alex Peysakhovich for the development and AI research help.