Video: Embedded Experiences in Conversational UI

by December 6, 2023

In this 2.5min video from my How AI Ate My Website talk, I walk through how a conversational (chat) interface powered by generative AI can cite the materials it uses to answer people's questions through a unified embedded experience for different document types like videos, audio, Web pages, and more.


Now, as I mentioned, answers stem from finding the most relevant parts of documents, stitching them together, and citing those replies. You can see one of these citations in this example.

This also serves as an entry point into a deeper, object-specific experience. What does that mean? Well, when you see these cited sources, you can tap into any one of them to access the content. But instead of just linking out to a separate window or page, which is pretty common, we've tried to create a unified way of exploring each one.

Not only do you get an expanded view into the document, but you also get document-specific interactions, and the ability to ask additional questions scoped just to that open document.

Here's how that looks in this case for an article. You can select a citation to get the full experience, which includes a summary, the topics in the article, and again, the ability to ask questions just of that document. In this case, about evolving e-commerce checkout.

There's more document types than just webpages, though. Videos, podcasts, PDFs, images, and more. On Ask Luke, you ask a question, get an answer, with citations to videos, audio files, webpages, PDFs, etc. Each one has a unified, but document-type specific interface.

The video experience, for example, has an inline player, a scrubber with a real-time transcript, the ability to search that transcript, some auto-generated topics, summaries, and the ability to ask questions just of what's in the video.

When you search within the transcript, you can also jump directly to that part of the video in the inline player. Audio works the same way, just an audio player instead of a video screen. Here you can see the diarization and cleanup work at play, which is how we have the conversation broken down by speakers and their names and the timestamp for the transcript.

Webpages have a reader view, just like videos and audio files. We show a summary, key topics, give people the ability to ask questions scoped to that article, and by now you get the pattern.