Skip to content

什么是向量数据库?#

¥What are vector databases?

向量数据库以数字形式存储信息:

¥Vector databases store information as numbers:

向量数据库是一种将数据存储为高维向量的数据库,这些向量是特性或属性的数学表示。(source)

¥A vector database is a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. (source)

此功能可实现快速准确的相似性搜索。使用向量数据库,你可以根据语义和上下文含义搜索相关数据,而无需使用传统的数据库查询。

¥This enables fast and accurate similarity searches. With a vector database, instead of using conventional database queries, you can search for relevant data based on semantic and contextual meaning.

简化的示例#

¥A simplified example

向量数据库可以存储句子 "n8n 是一个可自行托管的自动化工具。",但它不是以文本形式存储,而是存储一个维度数组(介于 0 和 1 之间的数字),这些维度代表句子的特性。这并不意味着将句子中的每个字母都转换为数字。与其让向量数据库中的向量描述句子,不如让向量数据库中的向量描述句子。

¥A vector database could store the sentence "n8n is a source-available automation tool that you can self-host", but instead of storing it as text, the vector database stores an array of dimensions (numbers between 0 and 1) that represent its features. This doesn't mean turning each letter in the sentence into a number. Instead, the vectors in the vector database describe the sentence.

假设在一个向量存储中,0.1 代表 automation tool0.2 代表 source available0.3 代表 can be self-hosted。最终可能会得到以下向量:

¥Suppose that in a vector store 0.1 represents automation tool, 0.2 represents source available, and 0.3 represents can be self-hosted. You could end up with the following vectors:

句子 向量(维度数组)
n8n 是一个可自行托管的自动化工具。 [0.1, 0.2, 0.3]
Zapier 是一款自动化工具。 [0.1]
Make 是一个自动化工具 [0.1]
Confluence 是一个你可以自行托管的 wiki 工具 [0.3]

This example is very simplified

实际上,向量要复杂得多。向量的大小可以从几十维到几千维不等。维度与单个特性之间没有一一对应的关系,因此你无法将单个维度直接转换为单个概念。此示例提供了一个近似的思维模型,而非真正的技术理解。

¥In practice, vectors are far more complex. A vector can range in size from tens to thousands of dimensions. The dimensions don't have a one-to-one relationship to a single feature, so you can't translate individual dimensions directly into single concepts. This example gives an approximate mental model, not a true technical understanding.

演示相似性搜索的强大功能#

¥Demonstrating the power of similarity search

Qdrant 提供 矢量搜索演示 来帮助用户了解向量数据库的强大功能。美食探索演示 展示了如何利用矢量存储根据视觉相似性来匹配图片。

¥Qdrant provides vector search demos to help users understand the power of vector databases. The food discovery demo shows how a vector store can help match pictures based on visual similarities.

此演示使用来自 Delivery Service 的数据。用户可以点赞或不喜欢菜肴照片,应用会根据照片的喜好推荐更多类似的菜肴。还可以选择查看配送范围内餐厅的结果。(source)

¥This demo uses data from Delivery Service. Users may like or dislike the photo of a dish, and the app will recommend more similar meals based on how they look. It's also possible to choose to view results from the restaurants within the delivery radius. (source)

有关完整的技术细节,请参阅 Qdrant demo-food-discovery GitHub 代码库

¥For full technical details, refer to the Qdrant demo-food-discovery GitHub repository.

嵌入、检索器、文本分割器和文档加载器#

¥Embeddings, retrievers, text splitters, and document loaders

向量数据库需要其他工具才能运行:

¥Vector databases require other tools to function:

  • 文档加载器和文本分割器:文档加载器拉取文档和数据,并为 embedding 做好准备。文档加载器可以使用文本分割器将文档拆分成多个块。

¥Document loaders and text splitters: document loaders pull in documents and data, and prepare them for embedding. Document loaders can use text splitters to break documents into chunks.

  • 嵌入:这些工具用于将数据(文本、图片等)转换为向量,然后再转换回原始数据。请注意,n8n 仅支持文本嵌入。

¥Embeddings: these are the tools that turn the data (text, images, and so on) into vectors, and back into raw data. Note that n8n only supports text embeddings.

  • 检索器:检索器从矢量数据库中获取文档。你需要将它们与嵌入配对,以便将向量转换回数据。

¥Retrievers: retrievers fetch documents from vector databases. You need to pair them with an embedding to translate the vectors back into data.