AI Development

TODO

https://www.youtube.com/watch?v=0Zr3NwcvpA0
similarity search as an alternative to vector embeddings

fine tuning is taking a pre-existing model and training it further with your own data set.
you might want to fine tune instead of DAG

You want consistent tone/format (e.g., legal summaries, internal policies)
You have lots of repetitive tasks and want more consistency
Prompt engineering isn’t enough
You want to reduce token usage (less prompt length)

resources

Andrej Karpathy Youtube and following him in general
- ex director of AI @ Tesla. His presentation on computer vision helped me a lot in thesis
- Deep dive into LLMs like ChatGPT
Deep Dive into LLMs like ChatGPT (Andrej)

AI Tools concepts

AI Agents concepts

top lvl goals and challenges

reduce hallucinations

Governance & Guardrails

Prompts are reviewed for bias, hallucination risk, safety, and regulatory compliance.
Enterprises often apply structured fallback prompts if the primary one fails.

Prompt Chaining

Multi-step workflows where outputs from one prompt are fed into the next (e.g., extract → classify → summarize).

tools

LangChain / LlamaIndex: Useful for chaining prompts, managing context, building apps on top of LLMs.

Embedding and Vector DB

Embedding

is a numerical representation of data (e.g., text) in vector form—usually a list of floating-point numbers.
The goal: capture the meaning or semantic similarity of content.
Two similar texts will have embeddings that are close to each other in vector space.

🔍 Example:
"Hello world" → [0.12, -0.44, 0.88, ...] # A 1536-dimension vector
"Hi there" → [0.10, -0.40, 0.85, ...] # Very similar vector

🧠 Use Case:

Search
Recommendation systems
Semantic similarity
Retrieval-augmented generation (RAG)

A vector DB

stores and indexes embeddings.

Enables fast similarity search: "Find the top-5 documents most similar to this query."
Often used in RAG systems to retrieve context before prompting an LLM.

🧠 How LLMs Use Embeddings + Vector DB

Typical RAG pipeline:

Chunk + embed all documents
Store vectors in a vector DB
At query time:
- Embed the query
- Search for similar vectors
- Return top results
Inject retrieved text into prompt → LLM

Retrieval-Augmented Generation (RAG)

RAG, or Retrieval-Augmented Generation, is an AI framework that enhances the accuracy and relevance of large language model (LLM) responses by integrating information retrieval from external knowledge sources before generating text.

LLMs are combined with knowledge bases or document stores.
Instead of cramming everything into the prompt, relevant context is retrieved in real-time and injected into the prompt dynamically.
I initially had the impression that it might be simply appending data to a prompt sting or simply finding and replacing (and that is a part of the process). The process of retrieving the data can be sophisticated by using vector search or similarity checks.

A REST API that fetches user data from a DB and inserts it into a prompt that is then passed to an LLM can be considered a basic or simplistic form of RAG. But it lacks the sophistication of semantic search and relies on exact match/structured query

RAG search is a hybrid approach that:

retrieves relevant documents or data using vector search (semantic similarity)
augments the prompt sent to the LLM with those documents
generates a final answer that's grounded in context

RAG Pipeline (Simplified)

Preprocess Data
- Split documents into chunks (e.g., 500 words)
- Generate embeddings for each chunk
- Store embeddings in a vector database
At Query Time
- Embed the user’s question
- Use vector search to retrieve top-k similar chunks
- Inject those into the prompt: Based on the following documents: [doc1], [doc2], ... Answer: "How do I register for sales tax in Quebec?"
Send to LLM and get answer

✅ Why RAG is Useful

Problem LLMs Have	How RAG Helps
Hallucinations	Grounds answers in real facts
Limited context window	Retrieves only relevant info
No access to custom data	Injects private/company data
Outdated model knowledge	Real-time retrieval from fresh sources

MAKER Framework

AI Theory - MAKER framework

ways to make chatbot agentic

This is a problem we are tackling at Snapshot. as of 2026-01-14 We are trying to build a solution with the 3rd option, let's see if it succeeds

Here are the options to consider:

RAG search should be the last option because it's not accurate by nature, it is a similarity search.
defined interfaces using the backend server

high maintenance - for new interface you need to go to backend, define schemas, run migrations, perhaps add more code in ai-manager as well

Read replica of DB, readonly user, and give the agent free access to query this DB on the fly.

low maintenance
uncontrolled and highly experimental

Storing data generated by AI

We ask LLMs to generate JSON data. (Some LLMs have a parameter to set the output format to JSON) and store the output in a SQL relational DB.

Some data/summaries generated by AI we save in a single column of type JSON. This is useful for experimental outputs we need to iterate a lot over.

For the rest of the data we know the schema of and know that shouldn't change much in the future, keep using regular relational structure.