Skip to main content

VertexAI [Anthropic, Gemini, Model Garden]

Open In Colab

Pre-requisites​

  • pip install google-cloud-aiplatform (pre-installed on proxy docker image)

  • Authentication:

    • run gcloud auth application-default login See Google Cloud Docs

    • Alternatively you can set GOOGLE_APPLICATION_CREDENTIALS

      Here's how: Jump to Code

      • Create a service account on GCP
      • Export the credentials as a json
      • load the json and json.dump the json as a string
      • store the json string in your environment as GOOGLE_APPLICATION_CREDENTIALS

Sample Usage​

import litellm
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1" # proj location

response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])

Usage with LiteLLM Proxy Server​

Here's how to use Vertex AI with the LiteLLM Proxy Server

  1. Modify the config.yaml

    Use this when you need to set a different location for each vertex model

    model_list:
    - model_name: gemini-vision
    litellm_params:
    model: vertex_ai/gemini-1.0-pro-vision-001
    vertex_project: "project-id"
    vertex_location: "us-central1"
    - model_name: gemini-vision
    litellm_params:
    model: vertex_ai/gemini-1.0-pro-vision-001
    vertex_project: "project-id2"
    vertex_location: "us-east"
  2. Start the proxy

    $ litellm --config /path/to/config.yaml
  3. Send Request to LiteLLM Proxy Server

    import openai
    client = openai.OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
    )

    response = client.chat.completions.create(
    model="team1-gemini-pro",
    messages = [
    {
    "role": "user",
    "content": "what llm are you"
    }
    ],
    )

    print(response)

Specifying Safety Settings​

In certain use-cases you may need to make calls to the models and pass safety settigns different from the defaults. To do so, simple pass the safety_settings argument to completion or acompletion. For example:

response = completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
safety_settings=[
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]
)

Set Vertex Project & Vertex Location​

All calls using Vertex AI require the following parameters:

  • Your Project ID
import os, litellm 

# set via env var
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811" # Your Project ID`

### OR ###

# set directly on module
litellm.vertex_project = "hardy-device-38811" # Your Project ID`
  • Your Project Location
import os, litellm 

# set via env var
os.environ["VERTEXAI_LOCATION"] = "us-central1 # Your Location

### OR ###

# set directly on module
litellm.vertex_location = "us-central1 # Your Location

Anthropic​

Model NameFunction Call
claude-3-sonnet@20240229completion('vertex_ai/claude-3-sonnet@20240229', messages)
claude-3-haiku@20240307completion('vertex_ai/claude-3-haiku@20240307', messages)

Usage​

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "claude-3-sonnet@20240229"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
model="vertex_ai/" + model,
messages=[{"role": "user", "content": "hi"}],
temperature=0.7,
vertex_ai_project=vertex_ai_project,
vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

Model Garden​

Model NameFunction Call
llama2completion('vertex_ai/<endpoint_id>', messages)

Using Model Garden​

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
model="vertex_ai/<your-endpoint-id>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Gemini Pro​

Model NameFunction Call
gemini-procompletion('gemini-pro', messages), completion('vertex_ai/gemini-pro', messages)

Gemini Pro Vision​

Model NameFunction Call
gemini-pro-visioncompletion('gemini-pro-vision', messages), completion('vertex_ai/gemini-pro-vision', messages)

Gemini 1.5 Pro (and Vision)​

Model NameFunction Call
gemini-1.5-procompletion('gemini-1.5-pro', messages), completion('vertex_ai/gemini-pro', messages)

Using Gemini Pro Vision​

Call gemini-pro-vision in the same input/output format as OpenAI gpt-4-vision

LiteLLM Supports the following image types passed in url

Example Request - image url

import litellm

response = litellm.completion(
model = "vertex_ai/gemini-pro-vision",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whats in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
)
print(response)

Chat Models​

Model NameFunction Call
chat-bison-32kcompletion('chat-bison-32k', messages)
chat-bisoncompletion('chat-bison', messages)
chat-bison@001completion('chat-bison@001', messages)

Code Chat Models​

Model NameFunction Call
codechat-bisoncompletion('codechat-bison', messages)
codechat-bison-32kcompletion('codechat-bison-32k', messages)
codechat-bison@001completion('codechat-bison@001', messages)

Text Models​

Model NameFunction Call
text-bisoncompletion('text-bison', messages)
text-bison@001completion('text-bison@001', messages)

Code Text Models​

Model NameFunction Call
code-bisoncompletion('code-bison', messages)
code-bison@001completion('code-bison@001', messages)
code-gecko@001completion('code-gecko@001', messages)
code-gecko@latestcompletion('code-gecko@latest', messages)

Extra​

Using GOOGLE_APPLICATION_CREDENTIALS​

Here's the code for storing your service account credentials as GOOGLE_APPLICATION_CREDENTIALS environment variable:

def load_vertex_ai_credentials():
# Define the path to the vertex_key.json file
print("loading vertex ai credentials")
filepath = os.path.dirname(os.path.abspath(__file__))
vertex_key_path = filepath + "/vertex_key.json"

# Read the existing content of the file or create an empty dictionary
try:
with open(vertex_key_path, "r") as file:
# Read the file content
print("Read vertexai file path")
content = file.read()

# If the file is empty or not valid JSON, create an empty dictionary
if not content or not content.strip():
service_account_key_data = {}
else:
# Attempt to load the existing JSON content
file.seek(0)
service_account_key_data = json.load(file)
except FileNotFoundError:
# If the file doesn't exist, create an empty dictionary
service_account_key_data = {}

# Create a temporary file
with tempfile.NamedTemporaryFile(mode="w+", delete=False) as temp_file:
# Write the updated content to the temporary file
json.dump(service_account_key_data, temp_file, indent=2)

# Export the temporary file as GOOGLE_APPLICATION_CREDENTIALS
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.abspath(temp_file.name)

Using GCP Service Account​

  1. Figure out the Service Account bound to the Google Cloud Run service
  1. Get the FULL EMAIL address of the corresponding Service Account

  2. Next, go to IAM & Admin > Manage Resources , select your top-level project that houses your Google Cloud Run Service

Click Add Principal

  1. Specify the Service Account as the principal and Vertex AI User as the role

Once that's done, when you deploy the new container in the Google Cloud Run service, LiteLLM will have automatic access to all Vertex AI endpoints.

s/o @Darien Kindlund for this tutorial