Building the Ultimate Interactive Search

When building a personal website, there's a fine line between adding useful features and overengineering solutions. This is the story of how I built an AI chatbot for my website that explores some unconventional architectural decisions—decisions that prioritize simplicity, cost-effectiveness, and developer experience over traditional "best practices."

The goal was straightforward: create an AI assistant that could answer questions about my work in film production and creative technology. But the implementation took some interesting turns, exploring RAG (Retrieval-Augmented Generation) without vector databases, leveraging browser storage instead of databases for chat history, and reimagining the traditional search bar as an AI-powered conversation interface.

Rethinking Search as an AI Chatbot

Most websites have a search bar. You type a query, get results, click through. It's functional but limited. What if instead of searching for keywords, you could have a conversation? What if the search bar itself became an entry point to an AI assistant?

That's exactly what I built. The traditional search functionality was replaced with an expandable AI chat button that lives in the header. Click it, type your question, and you're instantly in a conversation with an AI that knows everything about my work—blog posts, film projects, and professional background.

// The search bar becomes an AI chat interface
const handleSubmit = (e?: React.FormEvent) => {
  e?.preventDefault()
  if (query.trim()) {
    router.push(`/ai?prompt=${encodeURIComponent(query.trim())}`)
    setQuery("")
    setIsExpanded(false)
  }
}

This approach transforms the user experience from transactional (search → results → click) to conversational (ask → understand → follow up). Users can ask natural language questions like "What films has JP worked on?" or "Tell me about the Squid Game AI project" instead of trying to guess the right keywords.

RAG Without a Vector Database

When building a RAG system, the conventional wisdom is to use a vector database like Pinecone, Weaviate, or Qdrant. These are powerful tools designed specifically for similarity search across high-dimensional embeddings. But for a personal website with a relatively small corpus of content, do you really need a separate database service?

I decided to explore a simpler approach: storing embeddings directly in a JSON file. The entire embedding index lives in data/embeddings-index.json, which gets loaded into memory on each API request. For a website with dozens of blog posts and a film catalog, this works perfectly.

// Load embedding index from JSON file
const index = loadEmbeddingIndex()

// Generate embedding for the query
const queryEmbedding = await generateQueryEmbedding(message, apiKey)

// Search for relevant content using cosine similarity
const results = searchIndex(queryEmbedding, index, 5, 0.3)

The search implementation uses cosine similarity directly in memory. No network calls to external services, no additional infrastructure to manage, no extra costs. The embeddings are generated using OpenAI's text-embedding-3-small model (1536 dimensions), and the entire index is small enough to fit comfortably in memory.

This approach has several advantages:

Zero infrastructure overhead: No need to provision, maintain, or pay for a vector database
Fast queries: In-memory search is extremely fast for small to medium datasets
Simple deployment: The index is just a JSON file that gets committed to the repository
Cost effective: No per-query charges or monthly fees for vector database services

The trade-off is scalability—this approach won't work for millions of documents. But for a personal website, that's not a constraint. When you're dealing with hundreds of documents rather than millions, the simplicity of a JSON file far outweighs the complexity of a vector database.

Chat History Without a Database

Another unconventional decision: storing chat history in the browser's sessionStorage instead of a database. When users have conversations with the AI, those messages persist across page navigations but automatically clear when they close the tab.

const STORAGE_KEY = "jp-castel-chat-session"

// Load messages from sessionStorage on mount
useEffect(() => {
  try {
    const stored = sessionStorage.getItem(STORAGE_KEY)
    if (stored) {
      const parsed = JSON.parse(stored) as Message[]
      setMessages(parsed)
    }
  } catch {
    // sessionStorage not available
  }
}, [])

// Save messages to sessionStorage whenever they change
useEffect(() => {
  if (!isHydrated) return
  try {
    sessionStorage.setItem(STORAGE_KEY, JSON.stringify(messages))
  } catch {
    // sessionStorage not available or quota exceeded
  }
}, [messages, isHydrated])

This leverages what I call "Chrome tab cache"—the browser's built-in storage that persists for the lifetime of a tab. Users can navigate around the site, come back to the chat, and their conversation is still there. But when they close the tab, the history is gone, which is actually a feature for privacy.

Why not use a database? For a personal website chatbot, there's no real need to persist conversations across sessions. Users aren't logging in, and there's no requirement to maintain long-term conversation history. The sessionStorage approach provides exactly the right level of persistence: enough to maintain context during a browsing session, but not so much that it requires backend infrastructure.

The conversation history is still sent to the API for context (the last 10 messages), allowing the AI to maintain coherent multi-turn conversations. But that history lives in the browser, not on a server.

Automated Embedding Pipeline Architecture

One of the key challenges with RAG systems is keeping the embedding index up to date. When you publish a new blog post or update content, you need to regenerate embeddings. But you don't want to regenerate everything every time—that's wasteful and expensive.

The solution is intelligent change detection. The system computes MD5 hashes of source files and compares them against stored hashes in the index. Only changed or new documents get their embeddings regenerated.

// Change detection logic
const changes = detectChanges(index, sources)

if (changes.toAdd.length > 0 || changes.toUpdate.length > 0) {
  // Only process changed/new documents
  await generateEmbeddings(changes.toAdd, changes.toUpdate)
} else {
  console.log("No changes detected. Index is up to date.")
}

The embedding pipeline is integrated directly into the build process via a prebuild script:

{
  "scripts": {
    "prebuild": "npm run fetch-metadata && npm run generate-embeddings",
    "generate-embeddings": "tsx scripts/generate-embeddings.ts"
  }
}

This means every time the site is built (locally or in CI/CD), the embedding index is automatically updated. New blog posts get indexed, updated content gets re-embedded, and the chatbot always has access to the latest information.

Keeping the Chatbot Index Always Up to Date

The automated embedding pipeline ensures the chatbot index stays current, but there's more to the story. The system is designed to be part of a larger CI/CD pipeline that handles content updates seamlessly.

When content changes are detected:

New documents are automatically added to the index
Updated documents have their embeddings regenerated
Unchanged documents are skipped entirely (no API calls, no cost)

The change detection works across all content sources:

Blog posts in content/posts/*.md
Film catalog entries in lib/films.ts
About page biography content

Each source type has its own extraction logic, but they all feed into the same unified embedding index. The system is extensible—adding a new content source is just a matter of implementing the extraction logic and adding it to the pipeline.

For Docker deployments, there's even a cron job that runs daily to check for content changes and update embeddings without requiring a full rebuild. This is especially useful when content is mounted as volumes that might change independently of the container.

The Bigger Picture: CI/CD Integration

This chatbot implementation is just one piece of a larger automated pipeline. The embedding generation, change detection, and index updates are all designed to work seamlessly within a continuous integration and deployment workflow.

The goal is zero-maintenance content updates: write a blog post, push to the repository, and the entire system—from metadata fetching to embedding generation to deployment—happens automatically. The chatbot index stays current without manual intervention.

This is part of a broader philosophy of building systems that are self-maintaining. The infrastructure should handle the routine work of keeping data current, leaving developers free to focus on creating content rather than managing pipelines.

We'll explore the full CI/CD architecture in a future blog post, but the foundation is already here: automated change detection, intelligent regeneration, and seamless integration with build processes.

Conclusion

Building an AI chatbot for a personal website doesn't require vector databases, persistent chat history databases, or complex infrastructure. Sometimes the best solution is the simplest one that meets your actual requirements rather than the "industry standard" approach designed for scale you'll never need.

By exploring RAG without vector databases, leveraging browser storage for chat history, and automating the embedding pipeline, I've created a system that's:

Simple: Fewer moving parts, easier to understand and maintain
Cost-effective: No external services, minimal API usage
Fast: In-memory search, no network latency
Self-maintaining: Automated updates, intelligent change detection

The result is a chatbot that feels magical to users but is refreshingly straightforward to build and maintain. Sometimes overengineering means choosing the right level of simplicity for your actual needs, not the maximum complexity you could theoretically justify.

And this is just the beginning. The chat client is one component of a larger system that automates content management, deployment, and monitoring. But that's a story for another blog post.