WorkArena
workarena · v1.0.0
WorkArena evaluates agents on enterprise service-desk workflows inside a real ServiceNow Personal Developer Instance. Tasks are organized into three levels: L1 atomic tasks (~33 unique tasks x multiple seeds), L2 compositional tasks, and L3 extended tasks with company protocols. Requires a free ServiceNow PDI and browser-automation tooling.
Install
pip install workarena-cube
Version: 1.0.0 · PyPI page
Feature Flags
Legal
Notices
Requires a free ServiceNow Personal Developer Instance (PDI). Register at developer.servicenow.com
More info →Reproducibility journal
How to submit →This is a reproducibility journal — not a leaderboard.
Submissions document how reference agents and models score over time, across infrastructures, cube versions, and package versions. Use it to detect drift and validate environments. Not a place to publish a new agent or fine-tune to "win" — there is no ranking, scores are self-reported, and submissions are unverified. To showcase a new agent or model, use ATLAS / EEE / your own benchmark page.
No submissions yet. Be the first — see how to submit.
Parallelization
task-parallel
Registry Entry (YAML)
id: workarena
name: "WorkArena"
version: "1.0.0"
description: >
WorkArena evaluates agents on enterprise service-desk workflows inside a
real ServiceNow Personal Developer Instance. Tasks are organized into three
levels: L1 atomic tasks (~33 unique tasks x multiple seeds), L2
compositional tasks, and L3 extended tasks with company protocols. Requires
a free ServiceNow PDI and browser-automation tooling.
package: workarena-cube
dev_install_url: "git+https://github.com/The-AI-Alliance/cube-harness#subdirectory=cubes/workarena"
authors:
- github: younik
name: Omar Younis
- github: NicolasAG
name: Nicolas Gontier
legal:
wrapper_license: MIT
benchmark_license:
reported: Apache-2.0
source_url: "https://github.com/ServiceNow/WorkArena/blob/main/LICENSE"
verified_by_original_authors: false
notices:
- type: software_registration
description: "Requires a free ServiceNow Personal Developer Instance (PDI). Register
at developer.servicenow.com"
url: "https://developer.servicenow.com"
tags:
- web
- gui
paper: "https://arxiv.org/abs/2402.05181"
parallelization_mode: task-parallel
status: degraded
task_count: 333
has_debug_task: true
has_debug_agent: true
resources: []
action_space: []
features:
async: false
streaming: false
multi_agent: false
multi_dim_reward: false