WorkArena
workarena · v1.0.0
web
gui
WorkArena evaluates agents on enterprise service-desk workflows inside a real ServiceNow Personal Developer Instance. Tasks are organized into three levels: L1 atomic tasks (~33 unique tasks x multiple seeds), L2 compositional tasks, and L3 extended tasks with company protocols. Requires a free ServiceNow PDI and browser-automation tooling.
Install
pip install workarena-cube
Version: 1.0.0 · PyPI page
333
Tasks
local
Infra
Yes
Debug Task
Yes
Debug Agent
Feature Flags
—
async
—
streaming
—
multi_agent
—
multi_dim_reward
Legal
Wrapper license
MIT
Benchmark license
Notices
Software Registration
Requires a free ServiceNow Personal Developer Instance (PDI). Register at developer.servicenow.com
More info →
License information is self-reported by the cube developer and has not been verified by the AI Alliance.
Always consult the source URL and original benchmark authors for authoritative terms.
Parallelization
Mode
task-parallel
Registry Entry (YAML)
id: workarena
name: "WorkArena"
version: "1.0.0"
description: >
WorkArena evaluates agents on enterprise service-desk workflows inside a
real ServiceNow Personal Developer Instance. Tasks are organized into three
levels: L1 atomic tasks (~33 unique tasks x multiple seeds), L2
compositional tasks, and L3 extended tasks with company protocols. Requires
a free ServiceNow PDI and browser-automation tooling.
package: workarena-cube
dev_install_url: "git+https://github.com/The-AI-Alliance/cube-harness#subdirectory=cubes/workarena"
authors:
- github: younik
name: Omar Younis
- github: NicolasAG
name: Nicolas Gontier
legal:
wrapper_license: MIT
benchmark_license:
reported: Apache-2.0
source_url: "https://github.com/ServiceNow/WorkArena/blob/main/LICENSE"
verified_by_original_authors: false
notices:
- type: software_registration
description: "Requires a free ServiceNow Personal Developer Instance (PDI). Register at developer.servicenow.com"
url: "https://developer.servicenow.com"
tags:
- web
- gui
paper: "https://arxiv.org/abs/2402.05181"
parallelization_mode: task-parallel
status: active
task_count: 333
has_debug_task: true
has_debug_agent: true
resources: []
action_space: []
features:
async: false
streaming: false
multi_agent: false
multi_dim_reward: false