Embeddings & Vector Search
Overview
Section titled “Overview”Events are represented as vector embeddings for semantic search. The chatbot uses vector similarity to find events that match natural-language queries, going beyond keyword and structured filter matching. Embeddings are stored in PostgreSQL using the pgvector extension on Neon.
Architecture
Section titled “Architecture”Event created/updated │ ▼AI Model Router (task: "embedding") │ ▼Embedding model generates vector │ ▼Stored in EventEmbedding table (pgvector)
───────────────────────────────
User query (via chatbot or search) │ ▼AI Model Router (task: "embedding") │ ▼Query text → query vector │ ▼pgvector cosine similarity search │ ▼Top-N matching events returnedData Model
Section titled “Data Model”EventEmbedding
Section titled “EventEmbedding”| Field | Type | Constraints | Notes |
|---|---|---|---|
| id | String | PK, cuid | |
| eventId | String | FK → Event, ON DELETE CASCADE, unique | One embedding per event |
| embedding | vector(768) | not null | pgvector column. Dimensions match the embedding model config |
| textHash | String | not null | Hash of the source text used to generate the embedding. Used to detect staleness |
| createdAt | DateTime | default now | |
| updatedAt | DateTime | auto |
The textHash is a hash of title + description + category + tags.join(','). If the hash changes, the embedding is stale and must be regenerated.
pgvector Index
Section titled “pgvector Index”A HNSW (Hierarchical Navigable Small World) index is created on the embedding column for fast approximate nearest neighbor queries:
CREATE INDEX event_embedding_idx ON "EventEmbedding"USING hnsw (embedding vector_cosine_ops);Embedding Generation
Section titled “Embedding Generation”On Event Creation
Section titled “On Event Creation”When an event is created (user-created or external ingestion), the system generates an embedding from the event’s text content and stores it in EventEmbedding. The source text is: "{title}. {description}. Category: {category}. Tags: {tags.join(', ')}".
If the AI service is unavailable, the event is created without an embedding. A background job retries failed embeddings.
On Event Update
Section titled “On Event Update”When an event’s title, description, category, or tags change, the system recomputes the textHash. If the hash differs from the stored hash, a new embedding is generated and the row is updated.
External Event Ingestion
Section titled “External Event Ingestion”External events receive embeddings on initial ingestion. On subsequent syncs, embeddings are regenerated only if the textHash changes (i.e., the content actually changed).
Batch Backfill
Section titled “Batch Backfill”A CLI command or cron job can generate embeddings for events that have no EventEmbedding row (e.g., after initial deployment or if embedding generation failed).
Semantic Search
Section titled “Semantic Search”Query Flow
Section titled “Query Flow”- The caller provides a natural-language query string.
- The model router generates a query embedding using the
embeddingtask. - pgvector performs a cosine similarity search against
EventEmbedding. - Results are filtered by event status (OPEN, IN_PROGRESS) and startAt (future).
- Top-N events are returned, ordered by similarity score descending.
Hybrid Search
Section titled “Hybrid Search”Semantic search can be combined with structured filters. For example, the chatbot might search semantically within a category or date range:
SELECT e.*, 1 - (ee.embedding <=> $queryVector) AS similarityFROM "Event" eJOIN "EventEmbedding" ee ON ee."eventId" = e.idWHERE e.status IN ('OPEN', 'IN_PROGRESS') AND e."startAt" > NOW() AND e.category = $category -- optional structured filterORDER BY similarity DESCLIMIT $limit;Integration with Chatbot Tools
Section titled “Integration with Chatbot Tools”The chatbot’s searchEvents and searchGigs tools use semantic search when a free-text query parameter is provided. If only structured filters are provided (category, date range), standard SQL filtering is used without embeddings.
Scenarios
Section titled “Scenarios”S-EMBED-1: Embedding generated on event creation
Section titled “S-EMBED-1: Embedding generated on event creation”GIVEN user A creates an event with title "Jazz Night" and description "Live jazz at the Union"WHEN the event is savedTHEN an EventEmbedding row is created with the event's embedding vectorAND textHash is computed from the event's text contentS-EMBED-2: Embedding regenerated on content change
Section titled “S-EMBED-2: Embedding regenerated on content change”GIVEN event E has an embedding with textHash "abc123"WHEN user A updates event E's descriptionAND the new textHash is "def456"THEN the embedding is regeneratedAND the EventEmbedding row is updated with the new vector and hashS-EMBED-3: Embedding not regenerated on non-content change
Section titled “S-EMBED-3: Embedding not regenerated on non-content change”GIVEN event E has an embedding with textHash "abc123"WHEN user A updates event E's status from OPEN to IN_PROGRESSTHEN the textHash is unchangedAND the embedding is not regeneratedS-EMBED-4: Semantic search returns relevant results
Section titled “S-EMBED-4: Semantic search returns relevant results”GIVEN events exist: "Jazz Night at the Union", "Rock Concert at Newport", "Study Group for CS 101"WHEN the chatbot searches with query "live music events"THEN "Jazz Night at the Union" and "Rock Concert at Newport" rank highest by similarityAND "Study Group for CS 101" ranks lowestS-EMBED-5: Semantic search combined with structured filter
Section titled “S-EMBED-5: Semantic search combined with structured filter”GIVEN events exist in categories "music" and "academic"WHEN the chatbot searches with query "something fun tonight" and category = "music"THEN only music events are returned, ranked by semantic similarityS-EMBED-6: Embedding failure does not block event creation
Section titled “S-EMBED-6: Embedding failure does not block event creation”GIVEN the embedding model is unavailableWHEN user A creates an eventTHEN the event is created without an EventEmbedding rowAND the event is still searchable via structured filters (keyword, category, date)S-EMBED-7: Backfill generates missing embeddings
Section titled “S-EMBED-7: Backfill generates missing embeddings”GIVEN 10 events exist without EventEmbedding rowsWHEN the backfill job runsTHEN EventEmbedding rows are created for all 10 eventsS-EMBED-8: Event deletion cascades to embedding
Section titled “S-EMBED-8: Event deletion cascades to embedding”GIVEN event E has an EventEmbedding rowWHEN event E is deletedTHEN the EventEmbedding row is also deleted (ON DELETE CASCADE)