
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Agent Development Kit (ADK) is a modular and extensible open-source framework for building AI agents. While ADK provides its own built-in evaluation module, this lab demonstrates how to use the Vertex AI Generative AI Evaluation Service to assess the performance of an ADK-based agent. This approach offers a broader, explainable, and quality-controlled toolkit to evaluate generative models or applications using custom metrics and human-aligned benchmarks.
In this lab, you will walk through a step-by-step guide to evaluate your ADK agent using Vertex AI Gen AI Evaluation.
By the end of this lab, you will be able to:
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This Qwiklabs hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab.
Note: If you are using a Pixelbook, open an Incognito window to run this lab.
Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the left is the Lab Details pane with the following:
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
If necessary, copy the Username below and paste it into the Sign in dialog.
You can also find the Username in the Lab Details pane.
Click Next.
Copy the Password below and paste it into the Welcome dialog.
You can also find the Password in the Lab Details pane.
Click Next.
Click through the subsequent pages:
After a few moments, the Google Cloud console opens in this tab.
In the Google Cloud console, navigate to Vertex AI by searching for it at the top of the console.
Navigate to the Vertex AI > Dashboard page and click on Enable all recommended APIs.
Search for Workbench in the console's top search bar, and click on the first result to navigate to Vertex AI > Workbench.
Under Instances, click on Open JupyterLab next to your vertex-ai-jupyterlab
instance. JupyterLab will launch in a new tab.
Open the
1. Close the browser tab for JupyterLab, and return to the Workbench home page.
2. Select the checkbox next to the instance name, and click Reset.
3. After the Open JupyterLab button is enabled again, wait one minute, and then click Open JupyterLab.
In the Select Kernel dialog, choose Python 3 from the list of available kernels.
In the first code cell of the Python notebook, install the necessary Google Cloud dependencies.
Either click the play button at the top or enter SHIFT+ENTER on your keyboard to execute the cell.
To use the newly installed packages in this Jupyter runtime, you must restart the runtime. Wait for the [*] beside the cell to change to [1] to show that the cell has completed, then in the Jupyter Lab menus, select Kernel > Restart Kernel....
When prompted to confirm, select Restart.
Once the kernel has restarted, run the following in a new cell to set Google Cloud project information and initialize Vertex AI:
Run through the Getting Started, Import libraries, Define helper functions and the Set Google Cloud project information sections of the notebook.
EvalTask
results will be stored in this bucket.In this task, you will build your application using the Agent Development Kit (ADK), integrating the Gemini model and defining custom tools to simulate a product research agent.
Define Tools:
get_product_details()
and get_product_price()
functions to return product information and pricing.Set the Model:
"gemini-2.0-flash"
to the model
variable.Assemble the Agent:
agent_parsed_outcome(query)
to:
Agent
with instructions and tools.Test the Agent:
"Get product details for shoes"
using agent_parsed_outcome()
.Markdown
.In this task, you will evaluate your ADK-based agent using the Vertex AI Gen AI Evaluation service. This includes building an evaluation dataset, running an evaluation task to check tool selection, and visualizing the results.
Understand Evaluation Goals:
Prepare Evaluation Dataset:
Display Sample Data:
display_dataframe_rows()
to preview a few rows from the dataset.Set Evaluation Metric:
TrajectorySingleToolUse
to check if the correct tool was used, regardless of tool order.Run Evaluation Task:
EvalTask
using the dataset and metrics.agent_parsed_outcome
and a unique experiment run name.output_uri_prefix
.Visualize Results:
In this task, you will evaluate your agent’s tool usage sequence (trajectory) to determine if it is making logical and effective tool choices in the correct order based on the user's prompt.
Understand Trajectory Evaluation:
Set Trajectory Metrics:
trajectory_exact_match
: Same tools, same order.trajectory_in_order_match
: Reference tools in correct order (extras allowed).trajectory_any_order_match
: All reference tools used (order/extras don’t matter).trajectory_precision
: Share of predicted actions found in the reference.trajectory_recall
: Share of reference actions found in the prediction.Run Evaluation Task:
EvalTask
with the evaluation dataset and trajectory metrics.agent_parsed_outcome
and assign a unique run name.Visualize Results:
display_dataframe_rows
.plot_bar_plot
.In this task, you will evaluate the final responses generated by the ADK agent, both in terms of language quality and how well the response follows the agent's tool usage.
Understand Response Evaluation:
Set Response Metrics:
safety
: Checks for safe, non-toxic outputs.coherence
: Assesses fluency and logical flow.Run Evaluation Task:
EvalTask
to evaluate responses from agent_parsed_outcome
using the selected response metrics.Visualize Results:
Define Custom Metric for Trajectory-Conditioned Response:
criteria
to assess if the response logically follows the trajectory.1 = Follows trajectory
, 0 = Does not follow
.PointwiseMetricPromptTemplate
to generate the evaluation prompt.PointwiseMetric
using this template.Set Combined Response and Trajectory Metrics
trajectory_exact_match
trajectory_in_order_match
safety
response_follows_trajectory
(custom)Run Custom Evaluation Task:
EvalTask
with the combined metrics.You have successfully evaluated an ADK agent using Vertex AI Generative AI Evaluation. You built and executed the agent locally, prepared a custom evaluation dataset, and assessed the agent’s tool usage, action trajectory, and final response quality using built-in evaluation tools.
You evaluated the agent using the following components:
Manual Last Updated September 01, 2025
Lab Last Updated September 01, 2025
Copyright 2023 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one
This lab tests your ability to build and run an ADK agent locally, create evaluation datasets, and assess the agent's performance using Vertex AI's Evaluation Service with explainable metrics and benchmarking methods.
Duration: 5m setup · 90m access · 90m completion
AWS Region: []
Levels: intermediate
Permalink: https://partner.cloudskillsboost.google/catalog_lab/32138