Posts

How I built AI chat back in 2023

No model training; Off-the-shelf GPT 3.5 and some clever retrieval augmented generation.

Kevin WangFebruary 13, 2025

This is a very belated throwback to how I built an AI chat feature for HashiCorp, a company that I used to work at and who I hold dearly to my heart. None of the technical details I will share are sensitive, nor proprietary information. HashiCorp was and is not in the AI space so this is not any meaningful information that would give competitors any sort of advantage.

I’ll mostly be doing some self-reflecting, but I’ll also share the technical implementation and a simple architecture overview.

HashiCorp has also since removed the feature, but you can see it below, or in action on YouTube

Screenshot from HashiConf 2023, opening keynote. AI chat feature is front and center.

*sips coffee*

Getting on my high horse for a second...

Reflection

So, see that screenshot up there? Maybe you peeped the video too? Or maybe you even tuned in to HashiConf yourself. ...That was nearly two years ago now!

On multiple occasions, I have basically forgotten that I even built this, and considering all the impressive AI things that are being built today, it’s easy to quickly dismiss an older feature like AI docs chat.

But reflecting on it, it was a pretty proud thing to have shipped.

It provided value: The chat interface essentially circumvented old and confusing information architecture, and gave you a quick answer with cited sources. And it flet slick!
It was new: Very few people were doing this at the time. The most notable party was Supabase, which was the main source of inspiration.
The tactics were novel: Very few people were building this way at the time, and everyone was scrambling to figure out how to even approach building with AI.
It was simple: Simplicity is often overlooked, but so valuable.
I just shipped a thing: Shipping features, quickly, and overcoming multiple large-org hurdles is no small feat. If you’re in tech you probably have seen the recent catchphrase, “You can just ship things” ¹

*steps off figurative horse*

Anyways.

AI today

I am not an AI/ML engineer nor a diehard AI builder. I simply like to build things that deliver value. Aside from being a daily GitHub Copilot and ChatGPT user, I would not consider myself the most up-to-date with AI. There are countless developers out there who are better versed and more knowledgable about AI than me.

*scrambles to pull up OpenAI docs*

2025 docs don’t look too different than 2023.

Quickly glancing back through OpenAI’s key concepts, not a whole lot has changed. Sure, there are new image, video, and voice related functionalities, but the core feature of enabling a human to converse with, or instruct an LLM ² is still the same.

The things that have changed, that would’ve been convenient when I was building the AI chat feature:

Text to speech
Integrated datastore (for vector embeddings)
Integrated caching (for text-generation responses)

If you have any software engineering knowledge, you can see how two of those are not unique to AI. They are simply persistence layers. Nothing revolutionary.

Ok, I’m done level setting. On to the feature!

How I built domain specific AI chat

ChatGPT was already well established, but its knowledge base was too general for direct usage by an org like HashiCorp to be valuable. It needed to be tailored towards HashiCorp’s domain — Terraform, Vault, Nomad, etc. — in order to have a fighting chance at producing relevant and accurate answers.

Furthermore, I wanted to ship something in incremental steps — in increasing order of leverage and value — that I had high confidence would be useful at each step.

Context retrieval and augmentation was ultimately what I landed on due to its simplicitlistic nature.

Model training and fine-tuning were ruled out because they were expensive, and felt too difficult to get right. And mind you, I was a solo engineer with zero machine learning experience building this.

Here’s a high-level overview of the system:

The two main parts of the system are a knowledge store, and an API server for text-generation. The nuance and self-proclaimed cleverness that I want to callout are in how the knowledge store is populated and leveraged to augment the prompt passed to a vanilla GPT 3.5 ³ model.

This was a single prompt-and-response chat system. While the data model was built to support a continually growing conversation, that simply wasn’t yet leveraged and the initially released feature was kept simple.

The prompt

If you’ve never done hands-on prompt crafting, then this might feel novel or be eye-opening to you to see for the first time. Though in 2025, this should be old news.

Here is the system prompt that I used:

Role: System

You are a HashiCorp company member and AI assistant. Use the following instructions to respond to user inputs.

Use the provided context, delimited by triple quotes, as supplemental context for your answer seeking efforts.

Cite any URLs from the context in a list at the end of your response.

Use language like "we" and "our", not "they", when referring to HashiCorp.

'''
Document ID: 019568da-270f-748b-b836-fd1fb708b987
Summary: Some summary, previously generated via OpenAI completions, during the prior ETL process...
URL: /path/to/cited/doc

Document ID: ...
Summary: ...
URL: ...

Document ID: ...
Summary: ...
URL: ...
'''

Role: User

(shown for completeness)

What is the best way to deploy a Consul cluster on Kubernetes?

These two combined make up the full payload that gets sent to OpenAI’s /chat/completions endpoint.

The rest was basically instilling a ton of trust in GPT 3.5 to just work. In the event of an inaccurate response there was a downvote option that would flag the response accordingly, giving a human the visibility to do some follow up assessment.

Simple! That’s the end of it.

I’ll skip code-level details because, while there are some potentially interesting nuggets there, that zoomed-in view is largely trivial in comparison with the overall system.

The revelations

In the beginning, I remember distinct feelings of excitement and uncertainty leading up to figuring out how to build this.

To avoid getting remotely close to spinning my tires, I went straight to an expert. I grabbed lunch with a good friend, and MLOps engineer, to get some first hand guidance.

There were two revelational takeaways from that lunch:

The prescribed direction was: ”You can just instruct the model exactly like how you would instruct a human.”
You can do clever things... like using an LLM to summarize an entire conversation, and feed that result back into the LLM. LLM’s on their own do not have memory, so you have to provide it. This can looklike continually feeding a growing conversation back to the model.
- I only later came across this short post by Malte Ubl who sums up AI programming nicely as, “...just orchestration of AI.” ⁴, to which I’ll layer on that it’s just orchestration of API’s.

Hearing these tactics seriously dissolved the mental barriers I had, and gave me a clear line of sight to the finish line.

*sips coffee*; *stares down empty cup*

Thanks for tuning in!

https://www.cyberpatterns.xyz/p/you-can-just-do-things ↩
LLM stands for “Large Language Model” ↩
GPT 3.5 was the latest OpenAI LLM at the time. ↩
Malte’s LinkedIn post ↩