AI Development

TODO

fine tuning is taking a pre-existing model and training it further with your own data set.
you might want to fine tune instead of DAG

resources

AI Tools concepts

AI Agents concepts

top lvl goals and challenges

Governance & Guardrails

Prompts are reviewed for bias, hallucination risk, safety, and regulatory compliance.
Enterprises often apply structured fallback prompts if the primary one fails.

Prompt Chaining

Multi-step workflows where outputs from one prompt are fed into the next (e.g., extract → classify → summarize).

tools

LangChain / LlamaIndex: Useful for chaining prompts, managing context, building apps on top of LLMs.

Embedding and Vector DB

Embedding

is a numerical representation of data (e.g., text) in vector form—usually a list of floating-point numbers.
The goal: capture the meaning or semantic similarity of content.
Two similar texts will have embeddings that are close to each other in vector space.

šŸ” Example:
"Hello world" → [0.12, -0.44, 0.88, ...] # A 1536-dimension vector
"Hi there" → [0.10, -0.40, 0.85, ...] # Very similar vector

🧠 Use Case:

A vector DB

stores and indexes embeddings.

🧠 How LLMs Use Embeddings + Vector DB

Typical RAG pipeline:

  1. Chunk + embed all documents
  2. Store vectors in a vector DB
  3. At query time:
    • Embed the query
    • Search for similar vectors
    • Return top results
  4. Inject retrieved text into prompt → LLM

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)

MAKER Framework

AI Theory - MAKER framework

ways to make chatbot agentic

This is a problem we are tackling at Snapshot. as of 2026-01-14 We are trying to build a solution with the 3rd option, let's see if it succeeds

Here are the options to consider:

  1. RAG search should be the last option because it's not accurate by nature, it is a similarity search.

  2. defined interfaces using the backend server

  1. Read replica of DB, readonly user, and give the agent free access to query this DB on the fly.

Storing data generated by AI

We ask LLMs to generate JSON data. (Some LLMs have a parameter to set the output format to JSON) and store the output in a SQL relational DB.

Some data/summaries generated by AI we save in a single column of type JSON. This is useful for experimental outputs we need to iterate a lot over.

For the rest of the data we know the schema of and know that shouldn't change much in the future, keep using regular relational structure.