CUBE Registry / OSWorld Active

OSWorld

osworld · v0.2.0

📄 Paper
os gui desktop multimodal

OSWorld benchmarks multimodal agents on open-ended computer tasks executed inside a real Ubuntu 22.04 desktop environment. Tasks span 369 scenarios across applications such as Chrome, LibreOffice, Thunderbird, and the OS shell. Each task launches a fresh VM from a qcow2 image, restores a snapshot, runs setup scripts, and evaluates the final desktop state.

By: @kushasareen (Kusha Sareen) , @amanjaiswal73892 (Aman Jaiswal) , @recursix (Alexandre Lacoste)

Install

pip install osworld-cube

Version: 0.2.0 · PyPI page

368
Tasks
aws
Infra
Yes
Debug Task
Yes
Debug Agent

Feature Flags

async
streaming
multi_agent
multi_dim_reward

Resources

Type Name Image URL Format Size RAM GPU
VMResourceConfig osworld-ubuntu-vm

Legal

Wrapper license MIT
Benchmark license
CC-BY-4.0 Self-reported — verify before commercial use Source →

Notices

Software Registration

Ubuntu desktop with pre-installed commercial applications including LibreOffice, Thunderbird, and others

License information is self-reported by the cube developer and has not been verified by the AI Alliance. Always consult the source URL and original benchmark authors for authoritative terms.
Slow check not yet run. Stress test results will appear here after the async compliance check completes.

Parallelization

Mode

benchmark-parallel

Max concurrent tasks

1

Registry Entry (YAML)

id: osworld
name: "OSWorld"
version: "0.2.0"
description: >
  OSWorld benchmarks multimodal agents on open-ended computer tasks executed
  inside a real Ubuntu 22.04 desktop environment. Tasks span 369 scenarios
  across applications such as Chrome, LibreOffice, Thunderbird, and the OS
  shell. Each task launches a fresh VM from a qcow2 image, restores a
  snapshot, runs setup scripts, and evaluates the final desktop state.
package: osworld-cube
dev_install_url: "git+https://github.com/The-AI-Alliance/cube-harness#subdirectory=cubes/osworld-cube"

authors:
- github: kushasareen
  name: Kusha Sareen
- github: amanjaiswal73892
  name: Aman Jaiswal
- github: recursix
  name: Alexandre Lacoste

legal:
  wrapper_license: MIT
  benchmark_license:
    reported: CC-BY-4.0
    source_url: "https://creativecommons.org/licenses/by/4.0/legalcode"
    verified_by_original_authors: false
  notices:
  - type: software_registration
    description: "Ubuntu desktop with pre-installed commercial applications including LibreOffice, Thunderbird, and others"

tags:
- os
- gui
- desktop
- multimodal

paper: "https://arxiv.org/abs/2404.07972"
supported_infra: [aws]
max_concurrent_tasks: 1
parallelization_mode: benchmark-parallel

resources:
- name: osworld-ubuntu-vm
  scope: task
  max_concurrent_agents:
  source_url: https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve/main/Ubuntu.qcow2.zip
  source_hash:
  default_ttl_seconds: 86400
  bootstrap_script_extra:
  requires_kvm: true
  _type: cube.resource.VMResourceConfig
  type: VMResourceConfig
status: active
task_count: 368
has_debug_task: true
has_debug_agent: true
action_space: []
features:
  async: false
  streaming: false
  multi_agent: false
  multi_dim_reward: false