Authors
Siddhesh Ramesh Surve ,USA
Abstract
Data Cubes are a cornerstone of Online Analytical Processing (OLAP), yet they traditionally operate on structured, symbolic dimensions. With the rise of unstructured data and vector embeddings, there is a critical need to bridge the gap between precise SQL-like aggregation and fuzzy vector similarity search. In this paper, we propose VectorCube, a novel neuro-symbolic framework that enables "drill-down" and "roll-up" operations within continuous vector spaces while retaining symbolic interpretability. We introduce Semantic Dimensions, dynamically induced by Large Language Models (LLMs), and Vector Measures, which store aggregate high-dimensional embeddings. Our key contribution is a Distributional Aggregation method that ensures rolling up vectors preserves their semantic distribution rather than collapsing them into a meaningless average. Experimental results on standard text classification datasets demonstrate that VectorCube enables complex natural language queries (e.g., "Show optimism trends in tech news") and outperforms traditional Text Cubes and flat RAG (Retrieval-Augmented Generation) systems in both semantic precision (by 14%) and query response speed.
Keywords
Data Cube, Neuro-Symbolic AI, Vector Databases, OLAP, Semantic Search