Loading...
No results found.

10

Deploy Multi-Agent Systems with Agent Development Kit (ADK) and Agent Engine

Get access to 700+ labs and courses

Evaluate ADK agent performance using the Vertex AI Generative AI Evaluation Service

Lab 1 hour 30 minutes universal_currency_alt No cost show_chart Intermediate
info This lab may incorporate AI tools to support your learning.
Get access to 700+ labs and courses

GENAI110

Overview

Agent Development Kit (ADK) is a modular and extensible open-source framework for building AI agents. While ADK provides its own built-in evaluation module, this lab demonstrates how to use the Vertex AI Generative AI Evaluation Service to assess the performance of an ADK-based agent. This approach offers a broader, explainable, and quality-controlled toolkit to evaluate generative models or applications using custom metrics and human-aligned benchmarks.

In this lab, you will walk through a step-by-step guide to evaluate your ADK agent using Vertex AI Gen AI Evaluation.

Objective

By the end of this lab, you will be able to:

  • Build and run a local ADK agent.
  • Create and format an agent evaluation dataset.
  • Evaluate agent performance using:
    • Single tool usage evaluation
    • Trajectory-based evaluation
    • Response quality evaluation
  • Use Vertex AI’s Evaluation Service to generate explainable metrics and benchmark results.

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This Qwiklabs hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

What you need

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
  • Time to complete the lab.

Note: If you already have your own personal Google Cloud account or project, do not use it for this lab.

Note: If you are using a Pixelbook, open an Incognito window to run this lab.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the left is the Lab Details pane with the following:

    • The Open Google Cloud console button
    • Time remaining
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).

    The lab spins up resources, and then opens another tab that shows the Sign in page.

    Tip: Arrange the tabs in separate windows, side-by-side.

    Note: If you see the Choose an account dialog, click Use Another Account.
  3. If necessary, copy the Username below and paste it into the Sign in dialog.

    {{{user_0.username | "Username"}}}

    You can also find the Username in the Lab Details pane.

  4. Click Next.

  5. Copy the Password below and paste it into the Welcome dialog.

    {{{user_0.password | "Password"}}}

    You can also find the Password in the Lab Details pane.

  6. Click Next.

    Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  7. Click through the subsequent pages:

    • Accept the terms and conditions.
    • Do not add recovery options or two-factor authentication (because this is a temporary account).
    • Do not sign up for free trials.

After a few moments, the Google Cloud console opens in this tab.

Note: To access Google Cloud products and services, click the Navigation menu or type the service or product name in the Search field. Navigation menu icon and Search field

Task 1. Prepare the environment in Vertex AI Workbench

  1. In the Google Cloud console, navigate to Vertex AI by searching for it at the top of the console.

  2. Navigate to the Vertex AI > Dashboard page and click on Enable all recommended APIs.

  3. Search for Workbench in the console's top search bar, and click on the first result to navigate to Vertex AI > Workbench.

  4. Under Instances, click on Open JupyterLab next to your vertex-ai-jupyterlab instance. JupyterLab will launch in a new tab.

  5. Open the file.

Note: If you do not see notebooks in JupyterLab, please follow these additional steps to reset the instance:

1. Close the browser tab for JupyterLab, and return to the Workbench home page.

2. Select the checkbox next to the instance name, and click Reset.

3. After the Open JupyterLab button is enabled again, wait one minute, and then click Open JupyterLab.

  1. In the Select Kernel dialog, choose Python 3 from the list of available kernels.

  2. In the first code cell of the Python notebook, install the necessary Google Cloud dependencies.

  3. Either click the play button at the top or enter SHIFT+ENTER on your keyboard to execute the cell.

  4. To use the newly installed packages in this Jupyter runtime, you must restart the runtime. Wait for the [*] beside the cell to change to [1] to show that the cell has completed, then in the Jupyter Lab menus, select Kernel > Restart Kernel....

  5. When prompted to confirm, select Restart.

  6. Once the kernel has restarted, run the following in a new cell to set Google Cloud project information and initialize Vertex AI:

  7. Run through the Getting Started, Import libraries, Define helper functions and the Set Google Cloud project information sections of the notebook.

  • For PROJECT_ID use , and for BUCKET_NAME use . All EvalTask results will be stored in this bucket.
Note: You can skip any notebook cells that are noted Colab only. If you experience a 429 response from any of the notebook cell executions, wait 1 minute before running the cell again to proceed.

Task 2. Build and Run the ADK Agent

In this task, you will build your application using the Agent Development Kit (ADK), integrating the Gemini model and defining custom tools to simulate a product research agent.

  1. Define Tools:

    • Create get_product_details() and get_product_price() functions to return product information and pricing.
  2. Set the Model:

    • Assign "gemini-2.0-flash" to the model variable.
  3. Assemble the Agent:

    • Define agent_parsed_outcome(query) to:
      • Set application, user, and session IDs.
      • Initialize the Agent with instructions and tools.
      • Start a session and run the agent.
      • Parse and return the response.
  4. Test the Agent:

    • Run queries like "Get product details for shoes" using agent_parsed_outcome().
    • Display the output using Markdown.
Note: You may see warnings while running some steps. Those can be ignored.

Task 3. Evaluate an ADK Agent with Vertex AI Gen AI Evaluation

In this task, you will evaluate your ADK-based agent using the Vertex AI Gen AI Evaluation service. This includes building an evaluation dataset, running an evaluation task to check tool selection, and visualizing the results.

  1. Understand Evaluation Goals:

    • Learn about key evaluation types:
      • Monitoring: Tool selection, trajectory, response quality.
      • Observability: Latency and failure rate.
  2. Prepare Evaluation Dataset:

    • Define a set of prompts and their expected tool calls (reference trajectory).
    • Create a Pandas DataFrame with these examples.
    • Optionally include generated responses and predicted trajectories.
  3. Display Sample Data:

    • Use display_dataframe_rows() to preview a few rows from the dataset.
  4. Set Evaluation Metric:

    • Use TrajectorySingleToolUse to check if the correct tool was used, regardless of tool order.
  5. Run Evaluation Task:

    • Create an EvalTask using the dataset and metrics.
    • Run the task with agent_parsed_outcome and a unique experiment run name.
    • Store and organize results using output_uri_prefix.
  6. Visualize Results:

    • Display a sample of the evaluation metrics using helper functions to interpret the agent’s behavior.

Task 4. Perform Trajectory Evaluation

In this task, you will evaluate your agent’s tool usage sequence (trajectory) to determine if it is making logical and effective tool choices in the correct order based on the user's prompt.

  1. Understand Trajectory Evaluation:

    • Evaluate whether the agent uses the right tools in the correct sequence. This goes beyond checking if the right tool was used — it assesses the full reasoning path.
  2. Set Trajectory Metrics:

    • Use ground-truth-based metrics to measure different aspects of the agent’s trajectory:
      • trajectory_exact_match: Same tools, same order.
      • trajectory_in_order_match: Reference tools in correct order (extras allowed).
      • trajectory_any_order_match: All reference tools used (order/extras don’t matter).
      • trajectory_precision: Share of predicted actions found in the reference.
      • trajectory_recall: Share of reference actions found in the prediction.
  3. Run Evaluation Task:

    • Create an EvalTask with the evaluation dataset and trajectory metrics.
    • Run the task with agent_parsed_outcome and assign a unique run name.
    • Store results under a designated output path.
  4. Visualize Results:

    • Display a sample of metric results using display_dataframe_rows.
    • Generate bar plots to visualize trajectory metrics using plot_bar_plot.

Task 5. Conduct Response Evaluation

In this task, you will evaluate the final responses generated by the ADK agent, both in terms of language quality and how well the response follows the agent's tool usage.

  1. Understand Response Evaluation:

    • Evaluate the quality and appropriateness of the agent's final output using built-in and custom response metrics.
  2. Set Response Metrics:

    • Start with base response metrics such as:
      • safety: Checks for safe, non-toxic outputs.
      • coherence: Assesses fluency and logical flow.
  3. Run Evaluation Task:

    • Use an EvalTask to evaluate responses from agent_parsed_outcome using the selected response metrics.
    • Assign a unique run name and store results in the configured bucket path.
  4. Visualize Results:

    • Use helper functions to display evaluation results for inspection.
  5. Define Custom Metric for Trajectory-Conditioned Response:

    • Create a criteria to assess if the response logically follows the trajectory.
    • Define a binary rubric: 1 = Follows trajectory, 0 = Does not follow.
    • Use PointwiseMetricPromptTemplate to generate the evaluation prompt.
    • Define a new PointwiseMetric using this template.
  6. Set Combined Response and Trajectory Metrics

    • Combine trajectory and response-level metrics:
      • trajectory_exact_match
      • trajectory_in_order_match
      • safety
      • response_follows_trajectory (custom)
  7. Run Custom Evaluation Task:

    • Create a new EvalTask with the combined metrics.
    • Run evaluation and visualize sample results with plots and tables.
Note: Bonus – You can optionally bring your own dataset and evaluate a LangGraph-based agent using Vertex AI Gen AI Evaluation. This allows you to apply the same evaluation metrics (tool usage, trajectory, response quality) on your custom agent logic and prompts.

Congratulations!

You have successfully evaluated an ADK agent using Vertex AI Generative AI Evaluation. You built and executed the agent locally, prepared a custom evaluation dataset, and assessed the agent’s tool usage, action trajectory, and final response quality using built-in evaluation tools.

You evaluated the agent using the following components:

  • ADK (Agent Development Kit) for building and running the agent
  • A structured evaluation dataset with expected tool usage and references
  • Vertex AI Gen AI Evaluation Service
  • Tool usage evaluation metrics
  • Trajectory matching for step-by-step accuracy
  • Response quality metrics such as ROUGE and grounding confidence

Manual Last Updated September 01, 2025

Lab Last Updated September 01, 2025

Copyright 2023 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

Previous
Next

Before you begin

  1. Labs create a Google Cloud project and resources for a fixed time
  2. Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
  3. On the top left of your screen, click Start lab to begin

Use private browsing

  1. Copy the provided Username and Password for the lab
  2. Click Open console in private mode

Sign in to the Console

  1. Sign in using your lab credentials. Using other credentials might cause errors or incur charges.
  2. Accept the terms, and skip the recovery resource page
  3. Don't click End lab unless you've finished the lab or want to restart it, as it will clear your work and remove the project

This content is not currently available

We will notify you via email when it becomes available

Great!

We will contact you via email if it becomes available

One lab at a time

Confirm to end all existing labs and start this one

Use private browsing to run the lab

Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
Preview