Implementing an LLM fallback mechanism

This is how we've implemented an LLM fallback mechanism quickly

Yesterday was... fun. OpenAI accidentally deactivated our account (it’s back now, false positive, all is good ), but for a couple of hours, everything at Vespper was breaking. We started noticing that almost all Vespper investigations (aka “runs”) were failing.

Diving into the logs, we found this message:
"Your OpenAI account has been deactivated. Contact support."

We immediately contacted OpenAI support to fix the issue, but waiting around wasn’t an option—our core product was down. So, here’s what we did to patch things up.

Context: how Vespper works

Here’s a quick rundown of how Vespper works:

Whenever an alert comes in (from Datadog, New Relic, SigNoz, PagerDuty, etc.), our LLM-based agent starts investigating. It digs through the organization’s internal tools—o11y systems, codebases, Slack—and, within minutes, posts findings back to Slack to help devs resolve incidents faster.

We use OpenAI as our main LLM provider. To make this flexible and user-friendly, we abstract LLM calls using LiteLLM, which allows us to swap between multiple LLM providers with a single API. If customers provide their own LLM keys, they get used. If not, we use our own keys.

So yeah... OpenAI being down was a big problem.

The Fix: Fallbacks to the Rescue

Thankfully, LiteLLM has this neat feature called fallbacks. It’s exactly what it sounds like: you pass an array of models to LiteLLM’s acompletion() function, and it tries them in order until something works. The first successful response stops the loop and gets returned.

In theory, the fix was straightforward. We just needed to create a fallback array with other models (like Claude) and pass it to LiteLLM. Here’s what the main part of our LLM call looked like before:

And here’s what it looked like after adding fallbacks:

The fallback array itself looked something like this:

Seemed easy enough. We ran it locally… and then hit another wall.

Problem #2: A Bug in LiteLLM

Instead of gracefully failing over, we got this error:
TypeError: litellm.main.acompletion() got multiple values for keyword argument 'model'

Back to debugging.

After diving into LiteLLM’s source code, we found the issue. The acompletion() function calls another function, async_completion_with_fallbacks, which iterates through the fallback array and makes API calls to the specified models.

Inside async_completion_with_fallbacks, there’s this line:

The problem here is it merges the fallback dictionary ({"model": ..., "api_key": ...}) into completion_kwargs (including the model key), while also passing the model separately to litellm.acompletion. This leads to the TypeError.This leads to the "got multiple values for keyword argument 'model' error.

The fix was to change fallback.get("model") to fallback.pop("model"). Using pop retrieves the model value and removes it from the fallback dictionary before merging it, so there’s no conflict. Here’s the updated code:

Deploying the Fix

We tested the fix locally, and everything worked :) However, we changed the LiteLLM source code directly for testing purposes. Our production service uses Docker + Poetry, and it installs LiteLLM at build time from the official PyPi registry, which doesn't include the fix.

We couldn’t wait for the contributors to publish a patch, so we forked LiteLLM, implemented the fix ourselves, and installed our fork via Poetry as a git dependency. After confirming everything worked in production, we opened an issue and a pull request to contribute the fix upstream.

Conclusion

This incident reminded us of an essential lesson: we should avoid hidden single points of failure. In our case, we relied too heavily on OpenAI, and when our account was deactivated, it disrupted our entire product.

Going forward, we’re auditing our systems to reduce dependency on any single provider. Fallbacks saved the day this time, but we’re committed to building even greater resiliency into Vespper to ensure uninterrupted service for our users.

Dudu Lasry
January 30, 2025