Unlocking AI-Powered Search: Vector and Hybrid Search with OpenSearch and the Neural Plugin
In this blog post, we’ll summarize our three-part tutorial series on how to implement vector and hybrid search using OpenSearch and its Neural Plugin.
Reading time: 7 minutes
             
                    Unlocking AI-Powered Search: Vector and Hybrid Search with OpenSearch and the Neural Plugin
Modern search engines have evolved far beyond simple keyword matching. With the rise of artificial intelligence and machine learning, we can now search for content based on meaning, not just words. This is where vector search comes in. Instead of looking for exact matches, vector search finds results that are semantically similar—making search more accurate and user-friendly.
In this blog post, we’ll summarize our three-part tutorial series on how to implement vector and hybrid search using OpenSearch and its Neural Plugin. We’ll walk through the key steps and by the end, you’ll have a clear understanding of what it takes to build a smart, AI-powered search experience using OpenSearch.
Part 1: Preparing the Foundation – Configuring OpenSearch for Vector Search
The journey begins by setting up an OpenSearch cluster that’s ready to handle machine learning workloads. The Neural Plugin is available by default in OpenSearch distributions, so there’s no need for any extra installation steps. However, before using it, we need to make a few important cluster configuration changes.
Enable Machine Learning Settings
To run ML models, OpenSearch must be configured to allow machine learning tasks across the cluster. This includes:
- Enabling ML on any nodes , increasing flexibility and resource utilization. In production, you might want to allow machine learning tasks on certain nodes only.
- Raising the native memory threshold to accommodate computationally intensive model inference.
- Activating access control for ML models, which helps with governance and security when multiple users or services interact with different models.
These settings ensure the environment is ready to support the ingestion and inference processes required by vector search.
Create a Model Group (Optional but Recommended)
Model groups allow you to organize ML models in a structured way. While optional, they provide several benefits:
- Easier management of models as the number of models grows.
- Granular access control for different models or teams.
- Better traceability when working with multiple use cases or domains.
Once the model group is created, OpenSearch returns a model group ID, which will be needed in the next step.
Register a Model
Next, we register a model—downloaded directly from Hugging Face—to use for generating text embeddings. 
This registration process accomplishes several objectives:
•	It specifies the exact model we intend to use for our vector embeddings.
•	It defines the version of the model, ensuring consistency and reproducibility.
•	It associates the model with our previously created model group.
•	It indicates the format of the model, which is crucial for proper execution.
•	
Upon successful registration, OpenSearch will provide a model ID. This identifier is of utmost importance and will be referenced in subsequent steps when configuring our ingest and search pipelines.
This process happens asynchronously. Instead of a model ID, OpenSearch initially returns a task ID. We then use the task API to track the download progress and retrieve the final model ID once the process completes.
Deploy the Model
With the model registered, it still needs to be deployed, which means loading it into memory so it can perform inference. Again, OpenSearch handles this in the background, and we monitor it via the task API.
Once deployed, we can verify the model’s status using OpenSearch Dashboards under the Machine Learning section. This visual confirmation makes it easy to see that everything is working as expected.
Choosing the Right Model
One common question is: Which model should I use? The answer depends on your data, your search needs, and the specific context of your application. OpenSearch supports several models out of the box, including general-purpose text embedding models. You’ll want to test different options to find the one that delivers the best results for your use case. The documentation lists supported models and offers guidance on how to choose between them.
Part 2: Automating Embeddings – Ingest Pipelines and k-NN Index Setup
With our model successfully registered and deployed, the next step is to set up an ingest pipeline. This pipeline is responsible for processing incoming documents and generating the vector embeddings for each ingested document automatically.
Create an Ingest Pipeline
An ingest pipeline allows OpenSearch to apply a series of transformations to documents before they’re indexed. In our case, the pipeline uses the Neural Plugin’s embedding processor to:
- Take the input text from a specified field.
- Generate an embedding using the deployed model.
- Store the resulting vector in a separate field within the document.
By establishing this pipeline, we ensure that all documents indexed with this pipeline will automatically have their textual content transformed into vector embeddings, ready for neural and hybrid search operations.
Create a k-NN-Capable Index
To support efficient vector search, we create an index that’s optimized for k-Nearest Neighbor (k-NN) operations. Key configuration details include:
- Enabling k-NN functionality on the index.
- Using our ingest pipeline as the default pipeline for document ingestion.
- Specifying the vector field, including its name and the dimensionality of the embeddings.
- Choosing the search engine, typically Lucene with the HNSW algorithm, for fast and accurate similarity search.
This structure allows the index to store both the original document content and its corresponding vector, providing a unified base for traditional and vector-based search.
Ingest Documents
With our index and pipeline in place, we can now begin ingesting data. This step involves inserting documents into our newly created index, where they will be automatically processed by our ingest pipeline. When we index a new document:
- The text is automatically processed by the pipeline.
- The embedding is generated and stored.
- The document is indexed with both its metadata and vector representation.
One of the key advantages here is the transparency and simplicity: developers don’t need to manually generate or store embeddings—they’re handled under the hood by the ingest pipeline.
Part 3: Bringing It All Together – Implementing Search
With everything set up, it’s time to put our vector-enhanced search capabilities to the test:
1. Keyword Search
This is the traditional form of search, relying on text matching based on tokens and analyzers. It works well for exact or partial term matches, especially when the search input closely aligns with the document content.
While effective in many cases, keyword search can fall short when queries use synonyms, paraphrases, or related concepts not explicitly mentioned in the documents.
2. Neural Search
Neural search uses vector embeddings to perform semantic similarity search. Here’s how it works:
- The user’s query is embedded using the same model as the indexed documents.
- It then searches for the k-nearest neighbors (in this case, 5) to this query embedding in our vector space.
- The results are ordered by their cosine similarity to the query embedding.
This approach shines when users search with natural language, incomplete phrases, or domain-specific terms.
3. Hybrid Search
To get the best of both worlds, hybrid search combines keyword and neural search results using a custom search pipeline. This pipeline:
- Executes both keyword and vector searches in parallel.
- Normalizes the scores from each method.
- Merges and re-ranks the results based on a weighted score.
In our example, we gave more weight (0.7) to semantic similarity while still considering keyword relevance (0.3). This balanced method helps surface the most relevant documents—even if they don’t share exact terms with the query.
Conclusion
By following this three-part guide, we’ve implemented a fully functional hybrid search system using OpenSearch and the Neural Plugin. This setup allows users to perform:
- Classic keyword searches.
- Context-aware semantic queries.
- Powerful hybrid searches that combine the strengths of both.
Best of all, OpenSearch makes this advanced functionality approachable and easy to integrate. Whether you’re building a product catalog search, a document discovery engine, or a chatbot backend, vector and hybrid search can significantly boost relevance and user satisfaction.
Curious to learn more? Check out the full three-part series on our blog:
                
                
                
            Ready to get started?!
          Let's work together to navigate your OpenSearch journey. Send us a message and talk to the team today!
      
          Get in touch