Grounded Systems

Independent ML Research & Engineering

Independent research and engineering work in retrieval systems, structured prediction, and language-grounded generation. Projects focus on controlled-vocabulary mapping, embedding-based retrieval, and constrained LLM output, with a focus on domains where precision and reliability matter more than open-ended generation.

Projects

Boring Embeddings

↗

Negative textual inversion embeddings for generative image models. Trained on community-engagement signals rather than manually curated defect lists, capturing unnamed visual patterns associated with low-quality outputs that tag-based approaches cannot express. Used as a default negative embedding in several popular Stable Diffusion workflows.

10M+ generations Textual Inversion Representation Learning Generative Models

Prompt Squirrel — Tag Retrieval System

↗

Retrieval-augmented system mapping natural language to a controlled vocabulary. Three-stage pipeline: LLM query reformulation into tag-like search phrases, HNSW approximate nearest-neighbor retrieval over fine-tuned FastText embeddings, and constrained LLM selection restricted to valid vocabulary items, eliminating hallucination of out-of-vocabulary terms entirely. Embeddings fine-tuned via alias-augmentation: domain alias metadata is injected as controlled noise during training, pulling misspellings and paraphrases toward their canonical forms in embedding space. Context rescoring uses late fusion of dense embedding similarity with a co-occurrence signal derived from tag co-occurrence pseudo-documents reduced via SVD. Modular per-category LLM query strategies run in parallel; listwise reranking chosen over pointwise specifically to enforce consistency across the selected tag set. Generalizes to any domain requiring unstructured-to-taxonomy mapping.

Retrieval Pipeline Constrained Generation FastText · FAISS · HNSW Late Fusion Listwise LLM Reranking Alias Augmentation Co-occurrence Modeling