Using Multi-modal LLMs from Spring AI
A use case I’m currently building revolves around understanding the “Customer Basket” element of shopping.
The goal is to encourage bank customers to upload their shopping receipts so consumer behavior can be better understood.
To implement the use case, I thought I’d use multi-modal LLMs to look at photos and tell me whats in the basket.
(use an LLM to turn it into a JSON).
Multi-modal LLMs Seriously Speed Things Up!
Yes, this use case could have been built using a simple OCR approach, and - to some degree - that’s all I’m initially using the LLM for. But, the speed and simplicity afforded to the application by offloading the minutae of OCR to the LLM, and being able to access the information within the image in a dynamic and prompt-driven manner means the application can be built very quickly.
The non-AI analogy I can think of is Relational Databases and where logic resides. A table row filter can be done in the application layer, but why not leverage the power of SQL and push predicates to the database?
Here’s a quick sample of what’s being built… more soon
Dependencies
plugins {
id 'org.springframework.boot' version '3.5.0'
id 'io.spring.dependency-management' version '1.1.4'
id 'java'
}
group = 'ai.someexamplesof'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = '18'
repositories {
mavenCentral()
maven { url 'https://repo.spring.io/milestone' }
maven { url 'https://repo.spring.io/snapshot' }
maven {
name = 'Central Portal Snapshots'
url = 'https://central.sonatype.com/repository/maven-snapshots/'
}
}
dependencies {
// AI specific imports
implementation 'org.springframework.ai:spring-ai-client-chat:1.0.0'
// inference service implementation
implementation 'org.springframework.ai:spring-ai-starter-model-vertex-ai-gemini:1.0.0'
// generic
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'io.micrometer:micrometer-registry-prometheus' //prometheus exposure
testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
The Core Code
@PostMapping(value="/receipt", consumes=MediaType.MULTIPART_FORM_DATA_VALUE)
public Map<String,Object> processReceipt(@RequestParam("receipt") MultipartFile multipartFile, HttpServletResponse response) throws Exception {
logger.info("invoked /receipt");
Map<String,Object> responseObject = null;
//upload the file
try {
Path tempDir = Files.createTempDirectory("receipt-upload");
Path destination = tempDir.resolve(multipartFile.getOriginalFilename());
multipartFile.transferTo(destination);
//TODO put the receipt somewhere else
//convert to the byte array
Resource imageResource = new PathResource(destination);
// UserMessage userMessage = new UserMessage(prompt,List.of(new Media(MimeTypeUtils.IMAGE_JPEG,imageResource)),null);
UserMessage userMessage = UserMessage.builder()
.media(new Media(MimeTypeUtils.IMAGE_JPEG,imageResource))
.text(prompt)
.build();
//send the message and get the response
String reply = chatModel.call(userMessage);
//parse the JSON out of the response
if (reply.contains(NEGATIVE_ANSWER)) {
//set the 400
response.setStatus(HttpStatus.BAD_REQUEST.value());
responseObject = new HashMap<>();
responseObject.put("Error","Unfortunately, this image is not a receipt");
return responseObject;
} else {
responseObject = vertexResponseProcessor.parseJSON(reply);
} //end if
} catch (Exception e) {
logger.error("Error uploading kubeconfig file: ", e);
// return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).build();
response.setStatus(HttpStatus.INTERNAL_SERVER_ERROR.value());
return null; //dump out
}
//return
return responseObject;
}