Spring AI 与 NVIDIA LLM API-工具盒子

Spring AI 现在支持 NVIDIA®（英伟达™）的大型语言模型 API，可与各种模型集成。通过利用 NVIDIA 的 OpenAI 兼容 API，Spring AI 允许开发人员通过熟悉的 Spring AI API 使用 NVIDIA LLM。

SpringAI 和 NVIDIA API

本文将带你了解如何配置和使用 Spring AI OpenAI 聊天客户端来连接 NVIDIA LLM API。

完整的示例代码可从 nvidia-llm GitHub 仓库获取。
SpringAI / NVIDIA 整合文档。

先决条件 {#先决条件}

创建 NVIDIA 帐户并获得足够的积分。
从 NVIDIA 提供的 LLM 模型中选择自己喜欢的模型。如下面截图中的 meta/llama-3.1-70b-instruct。
从模型页面获取所选模型的 API Key。

NVIDIA API KEY

依赖 {#依赖}

首先，将 Spring AI OpenAI Starter 添加到 Maven pom.xml 中：

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

对于 Gradle 来说，需要在 build.gradle 添加如下依赖：

gradledependencies {
  implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
}

确保在构建文件中已添加了 Spring Milestone 和 Snapshot 仓库，以及 Spring AI BOM。

配置 Spring AI {#配置-spring-ai}

要在 Spring AI 中使用 NVIDIA LLM API，我们需要配置 OpenAI 客户端，使其指向 NVIDIA LLM API 端点，并使用 NVIDIA 特定的模型。

在项目中添加以下环境变量：

export SPRING_AI_OPENAI_API_KEY=<NVIDIA_API_KEY>
export SPRING_AI_OPENAI_BASE_URL=https://integrate.api.nvidia.com
export SPRING_AI_OPENAI_CHAT_OPTIONS_MODEL=meta/llama-3.1-70b-instruct
export SPRING_AI_OPENAI_EMBEDDING_ENABLED=false
export SPRING_AI_OPENAI_CHAT_OPTIONS_MAX_TOKENS=2048

或者，你也可以将这些内容添加到 application.properties 文件中：

spring.ai.openai.api-key=<NVIDIA_API_KEY> spring.ai.openai.base-url=https://integrate.api.nvidia.com spring.ai.openai.chat.options.model=meta/llama-3.1-70b-instruct NVIDIA LLM API 不支持 Embedding。 spring.ai.openai.embedding.enabled=false NVIDIA LLM API 要求明确设置该参数，否则会异常

spring.ai.openai.chat.options.max-tokens=2048

关键点：

api-key 设置为你的 NVIDIA API key。
base-url 设置为 NVIDIA LLM API 端点：https://integrate.api.nvidia.com。
model 设置为 NVIDIA LLM API 中可用的模型之一。
NVIDIA LLM API 要求明确设置 max-tokens，否则会出现服务器错误。
由于 NVIDIA LLM API 仅支持 LLM，因此我们可以禁用 embedding 端点：embedding.enabled=false。

有关配置属性的完整列表，请查阅参考文档。

示例代码 {#示例代码}

我们已经将 Spring AI 配置为使用 NVIDIA LLM API，现在来看一个如何在应用中使用它的简单示例。

@RestController
public class ChatController {
private final ChatClient chatClient;

@Autowired
public ChatController(ChatClient.Builder builder) {
    this.chatClient = builder.build();
}

@GetMapping(&quot;/ai/generate&quot;)
public String generate(@RequestParam(value = &quot;message&quot;, defaultValue =  &quot;Tell me a joke&quot;) String message) {
return  chatClient.prompt().user(message).call().content();
}

@GetMapping(&quot;/ai/generateStream&quot;)
public Flux&lt;String&gt; generateStream(
    @RequestParam(value = &quot;message&quot;, defaultValue = &quot;Tell me a joke&quot;) String message) {
    return chatClient.prompt().user(message).stream().content();
}

}

在 ChatController.java 示例中，我们创建了一个带有两个端点的简单 REST Controller：

/ai/generate：对给定的提示生成一个回复。
/ai/generateStream：流式响应，这对较长的输出或实时交互非常有用。

工具/函数调用 {#工具函数调用}

选择支持工具/函数的模型时，NVIDIA LLM API 端点支持工具/函数调用。

SpringAI - NVIDIA 函数调用

你可以在 ChatModel 中注册自定义 Java 函数，并让所提供的 LLM 模型智能地选择输出一个 JSON 对象，其中包含调用已注册函数中的一个或多个参数。这是一种强大的技术，可将 LLM 的功能与外部工具和 API 连接起来。

你可以点击这里了解有关 SpringAI/OpenAI 函数调用支持的更多信息。

工具示例 {#工具示例}

下面是一个如何在 Spring AI 中使用工具/函数调用的简单示例：

@SpringBootApplication
public class NvidiaLlmApplication {
public static void main(String[] args) {
    SpringApplication.run(NvidiaLlmApplication.class, args);
}

@Bean
CommandLineRunner runner(ChatClient.Builder chatClientBuilder) {
    return args -&gt; {
        var chatClient = chatClientBuilder.build();

        var response = chatClient.prompt()
            .user(&quot;What is the weather in Amsterdam and Paris?&quot;)
            .functions(&quot;weatherFunction&quot;) // reference by bean name.
            .call()
            .content();

        System.out.println(response);
    };
}

@Bean
@Description(&quot;Get the weather in location&quot;)
public Function&lt;WeatherRequest, WeatherResponse&gt; weatherFunction() {
    return new MockWeatherService();
}

public static class MockWeatherService implements Function&lt;WeatherRequest, WeatherResponse&gt; {

    public record WeatherRequest(String location, String unit) {}
    public record WeatherResponse(double temp, String unit) {}

    @Override
    public WeatherResponse apply(WeatherRequest request) {
        double temperature = request.location().contains(&quot;Amsterdam&quot;) ? 20 : 25;
        return new WeatherResponse(temperature, request.unit);
    }
}

}

在 NvidiaLlmApplication.java 示例中，当模型需要天气信息时，它会自动调用 weatherFunction Bean，然后获取实时天气数据。预期响应如下

The weather in Amsterdam is currently 20 degrees Celsius, and the weather in Paris is currently 25 degrees Celsius（阿姆斯特丹目前的天气为 20 摄氏度，巴黎目前的天气为 25 摄氏度）。