AI Development
TODO
- https://www.youtube.com/watch?v=0Zr3NwcvpA0
- similarity search as an alternative to vector embeddings
fine tuning is taking a pre-existing model and training it further with your own data set.
you might want to fine tune instead of DAG
- You want consistent tone/format (e.g., legal summaries, internal policies)
- You have lots of repetitive tasks and want more consistency
- Prompt engineering isnāt enough
- You want to reduce token usage (less prompt length)
resources
- Andrej Karpathy Youtube and following him in general
- ex director of AI @ Tesla. His presentation on computer vision helped me a lot in thesis
- Deep dive into LLMs like ChatGPT
- Deep Dive into LLMs like ChatGPT (Andrej)
AI Tools concepts
top lvl goals and challenges
- reduce hallucinations
Governance & Guardrails
Prompts are reviewed for bias, hallucination risk, safety, and regulatory compliance.
Enterprises often apply structured fallback prompts if the primary one fails.
Prompt Chaining
Multi-step workflows where outputs from one prompt are fed into the next (e.g., extract ā classify ā summarize).
tools
LangChain / LlamaIndex: Useful for chaining prompts, managing context, building apps on top of LLMs.
Embedding and Vector DB
Embedding
is a numerical representation of data (e.g., text) in vector formāusually a list of floating-point numbers.
The goal: capture the meaning or semantic similarity of content.
Two similar texts will have embeddings that are close to each other in vector space.
š Example:
"Hello world" ā [0.12, -0.44, 0.88, ...] # A 1536-dimension vector
"Hi there" ā [0.10, -0.40, 0.85, ...] # Very similar vector
š§ Use Case:
- Search
- Recommendation systems
- Semantic similarity
- Retrieval-augmented generation (RAG)
A vector DB
stores and indexes embeddings.
- Enables fast similarity search: "Find the top-5 documents most similar to this query."
- Often used in RAG systems to retrieve context before prompting an LLM.
š§ How LLMs Use Embeddings + Vector DB
Typical RAG pipeline:
- Chunk + embed all documents
- Store vectors in a vector DB
- At query time:
- Embed the query
- Search for similar vectors
- Return top results
- Inject retrieved text into prompt ā LLM
Retrieval-Augmented Generation (RAG)
RAG, or Retrieval-Augmented Generation, is an AI framework that enhances the accuracy and relevance of large language model (LLM) responses by integrating information retrieval from external knowledge sources before generating text.
- LLMs are combined with knowledge bases or document stores.
- Instead of cramming everything into the prompt, relevant context is retrieved in real-time and injected into the prompt dynamically.
I initially had the impression that it might be simply appending data to a prompt sting or simply finding and replacing (and that is a part of the process). The process of retrieving the data can be sophisticated by using vector search or similarity checks.
A REST API that fetches user data from a DB and inserts it into a prompt that is then passed to an LLM can be considered a basic or simplistic form of RAG. But it lacks the sophistication of semantic search and relies on exact match/structured query
RAG search is a hybrid approach that:
- retrieves relevant documents or data using vector search (semantic similarity)
- augments the prompt sent to the LLM with those documents
- generates a final answer that's grounded in context
RAG Pipeline (Simplified)
- Preprocess Data
- Split documents into chunks (e.g., 500 words)
- Generate embeddings for each chunk
- Store embeddings in a vector database
- At Query Time
- Embed the userās question
- Use vector search to retrieve top-k similar chunks
- Inject those into the prompt:
Based on the following documents: [doc1], [doc2], ...Answer: "How do I register for sales tax in Quebec?"
- Send to LLM and get answer
ā Why RAG is Useful
| Problem LLMs Have | How RAG Helps |
|---|---|
| Hallucinations | Grounds answers in real facts |
| Limited context window | Retrieves only relevant info |
| No access to custom data | Injects private/company data |
| Outdated model knowledge | Real-time retrieval from fresh sources |
MAKER Framework
ways to make chatbot agentic
This is a problem we are tackling at Snapshot. as of 2026-01-14 We are trying to build a solution with the 3rd option, let's see if it succeeds
Here are the options to consider:
-
RAG search should be the last option because it's not accurate by nature, it is a similarity search.
-
defined interfaces using the backend server
- high maintenance - for new interface you need to go to backend, define schemas, run migrations, perhaps add more code in ai-manager as well
- Read replica of DB, readonly user, and give the agent free access to query this DB on the fly.
- low maintenance
- uncontrolled and highly experimental
Storing data generated by AI
We ask LLMs to generate JSON data. (Some LLMs have a parameter to set the output format to JSON) and store the output in a SQL relational DB.
Some data/summaries generated by AI we save in a single column of type JSON. This is useful for experimental outputs we need to iterate a lot over.
For the rest of the data we know the schema of and know that shouldn't change much in the future, keep using regular relational structure.