OpenAI Embeddings and Vector Databases Crash Course

Published by Fudgy McFarlen on January 26, 2026January 26, 2026

Summary

A quick run thru of vectorized databases. He goes a bit quick. Looks like the video is from 2 years ago so the free online tools he used might not still be available. Might have to substitute others.

Demonstrates how words and phrases can be vectorized – { how does this process get done? Math please } – he uses OpenAI to generate.
Demonstrates storing vectorized phases into a vector database that consists of: Phrase , Vectorized ( Phrase ) —> SQL: Text , Blob
Creates a search phrase –> Vectorize ( search phase ) and use it to search against the SQL Vectrorized ( Phrase ). The search results are returned in descending order of match. { how does the match get calculated?}

Tools Used

Postman
GAI: what is postman software
Postman is a widely used API (Application Programming Interface) development platform that allows developers to design, build, test, document, and share APIs. It acts as an API client, enabling users to send various HTTP requests (GET, POST, PUT, DELETE) to web services and analyze responses, supporting REST, SOAP, and GraphQL.
GAI: is there an open source version of postman for ubuntu
Yes, there are several open-source, offline-friendly Postman alternatives for Ubuntu that offer similar API testing functionality without requiring cloud accounts. The top recommendations include Bruno, Hoppscotch, and Insomnium, all of which are cross-platform and available on Linux. ( ….more )

Research Links

YT: OpenAI Embeddings and Vector Databases Crash Course

Followup Questions

GAI: how do LLMs create the vectors from the words?

LLMs create word vectors (embeddings) by first converting text into numerical tokens and then passing them through a learned neural network layer that maps each token to a high-dimensional vector, placing similar words close together in a semantic space. These vectors are refined during training to capture context, grammar, and semantic relationships.

Key Steps in Creating Word Vectors:

Tokenization: Text is broken down into smaller units called tokens (words, sub-words, or characters).
Input Embedding Layer: Each token ID is looked up in an embedding matrix—a large, learnable table—which converts the token into a dense vector (e.g., 512 or 768 dimensions).
Contextualization (Transformers): The initial embeddings pass through transformer layers using attention mechanisms, allowing the vector for a word to be modified by surrounding words, making it context-aware.
Training (Learning Relationships): Models like Word2Vec or transformer-based models are trained on massive datasets to adjust these vectors so that words with similar meanings (e.g., "dog" and "puppy") or relationships (e.g., "king" – "man" + "woman" ≈ is approximately equal to ≈ "queen") are mathematically close

GAI: describe the process of converting a phrase to tokens

Summary Table

Step	Action	Example
Input	Raw text	"Unbelievable!"
Normalization	Clean & Lowercase	"unbelievable!"
Tokenization	Split into subwords	`["un", "believ", "able", "!"]`
Mapping	Convert to IDs	`[345, 12903, 762, 0]`

GAI: do all phrases map into the same dimension vector when embedding a phrase?

Yes, when embedding phrases using a specific, trained machine learning model, all input phrases—regardless of their length or complexity—are mapped into the same fixed-dimension vector space. While the content of the vectors (the numerical values) will differ to represent the distinct meaning of each phrase, the dimensionality (length of the vector) remains constant, which is crucial for computational efficiency and semantic comparison.

Example:

"hello" > [0.12, -0.5, …, 0.03] (768 dimensions)
"the quick brown fox jumps" > [0.88, 0.1, …, -0.2] (768 dimensions)

eXtP7rp

Artificial Intelligence AI LLM Large Language Models Table of Contents TOC

Artificial Intelligence AI LLM Large Language Models Table of Contents TOC
Page Search: eXtP7rp
——————————————————————————————————
Artificial Intelligence AI Questions – Creative questions to accelerate learning
Local Artificial Intelligence AI Search On a Website
OpenAI Embeddings and Vector Databases Crash Course
Artificial Intelligence AI Research Topic Gather List

End TOC