Amazon Nova 2 Multimodal Embeddings with Amazon S3 Vectors and AWS Java SDK – Part 4 Implement similarity search
Introduction
In part 2, we covered creating text and image embeddings and in part 3, creating audio and video with Amazon Nova 2 Multimodal Embeddings and storing them in Amazon S3 Vectors using AWS Java SDK.
In this part of the series, we’ll take a look at the similarity search across all created multimodal embeddings in S3 Vectors. To those multimudal embeddings belong text, image, audio, and video embeddings.
You can find the code examples in my GitHub repository amazon-nova-2-multimodal-embeddings. Please give it a star if you like it, and follow me on GitHub for more examples.
Implement multimodal similarity search with Amazon Nova and Amazon S3 Vectors
We’ll reuse many parts of the process for creating and storing text and image embeddings described in part 2 and part 3. The relevant business logic of our sample application can still be found in the AmazonNovaMultimodalEmbeddings.
Here is how the relevant search method looks:
Let’s describe what’s happening here.
Create text embeddings for the search text
First of all, we invoke the createTextEmbeddings method described in part 2. We pass the search query as text and GENERIC_RETRIEVAL as the value of the embeddingPurpose. We set all other text embedding parameters exactly as we did it when we created embeddings (taskType as SINGLE_EMBEDDING and dimension as 384). For the complete embeddings request and response schema, I refer to the following article. Then we create text embeddings by invoking the corresponding Bedrock Model. For that, we need to build an InvokeModelRequest object by passing the text to create embeddings from. We also pass the JSON request converted to SDKBytes. Finally, we set the model id as amazon.nova-2-multimodal-embeddings-v1:0. Then, we use Bedrock Runtime Client to invoke the InvokeModelRequest synchronously and map the JSON response to the EmbeddingResponse and return its embeddings.
This all happens in the code part below:
Implement quering S3 Vectors
Then we build a VectorData object and set the returned embeddings from the previous step. After it, we build a QueryVectorsRequest object by passing the S3 bucket and index names, vector data as query vector. We also pass how many results to retrieve (topk). Finally we pass whether we’d like to return the distance and metadata, which we both set to true. Then we use the S3 Vectors Client to query vectors by passing the QueryVectorsRequest.
This all happens in the code part below:
The final part is to iterate over all vectors from the QueryVectorsResponse and print them:
Now let’s search by executing the following `search(“AWS Lambda”, 20)`. Here we search for the term “AWS Lambda” and would like to have the top 20 results. Remember that we created the following text embeddings: “AWS Lambda is a serverless compute service for running code without having to provision or manage servers. You pay only for the compute time you consume”. We also created AWS Lambda image embeddings in part 2, and audio and video embeddings from the AWS video about AWS Lambda function in part 3. We splitted both audio and video in 7 individual segments with 15 seconds each. For each of them, the separated embeddings were created.
Here is the output of our query:
Interpret search results
The impression is kind of a mixed one. We see that the results are sorted by the distance, and we see the key also printed out. There is no printed metadata, as we didn’t set it when storing the embeddings in the S3 Vectors. We see that certain video embeddings (15-second segments) are at the top (having a lower distance) as the corresponding Lambda image (the key is “AWS-Lambda”) and the text (the key is “AWS Lambda Definition”). But I also see that all audio segments are completely at the bottom of the search results. For them I’d expect to have a much lower distance (similar to their corresponding video segments). And also the Azure Function embeddings (the key is “Azure Functions Definition” for the text embeddings and the key is “Azure-Functions” for the image embeddings) have quite low distance, which I’d expect to be higher. Of course, the explanation for this can be that Azure Functions is similar to AWS Lambda – a Serverless service with similar benefits and features, but offered by a different cloud provider (Microsoft Azure). But should have gotten an impression of how the similarity search works.
Conclusion
In this part of the series, we covered the implementation of the similarity search across all (text, image, audio, and video) created embeddings in S3 Vectors.
If you like my content, please follow me on GitHub and give my repositories a star!