Apache Cassandra
This page provides a quickstart for using Apache Cassandra® as a Vector Store.
Cassandra is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with vector search capabilities.
Note: in addition to access to the database, an OpenAI API Key is required to run the full example.
Setup and general dependencies
Use of the integration requires the following Python package.
%pip install --upgrade --quiet langchain-community "cassio>=0.1.4"
Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo
(specifically, recent versions of datasets
, openai
, pypdf
and tiktoken
are required, along with langchain-community
).
import os
from getpass import getpass
from datasets import (
load_dataset,
)
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
API Reference:PyPDFLoader | Document | StrOutputParser | ChatPromptTemplate | RunnablePassthrough | ChatOpenAI | OpenAIEmbeddings | RecursiveCharacterTextSplitter
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY = ")
embe = OpenAIEmbeddings()
Import the Vector Store
from langchain_community.vectorstores import Cassandra
API Reference:Cassandra