OSWorld
osworld · v0.2.0
OSWorld benchmarks multimodal agents on open-ended computer tasks executed inside a real Ubuntu 22.04 desktop environment. Tasks span 369 scenarios across applications such as Chrome, LibreOffice, Thunderbird, and the OS shell. Each task launches a fresh VM from a qcow2 image, restores a snapshot, runs setup scripts, and evaluates the final desktop state.
By: @kushasareen (Kusha Sareen) , @amanjaiswal73892 (Aman Jaiswal) , @recursix (Alexandre Lacoste)
Install
pip install osworld-cube
Version: 0.2.0 · PyPI page
Feature Flags
Resources
| Type | Name | Image URL | Format | Size | RAM | GPU |
|---|---|---|---|---|---|---|
| VMResourceConfig | osworld-ubuntu-vm | — | — | — | — | — |
Legal
Notices
Ubuntu desktop with pre-installed commercial applications including LibreOffice, Thunderbird, and others
Parallelization
benchmark-parallel
1
Registry Entry (YAML)
id: osworld
name: "OSWorld"
version: "0.2.0"
description: >
OSWorld benchmarks multimodal agents on open-ended computer tasks executed
inside a real Ubuntu 22.04 desktop environment. Tasks span 369 scenarios
across applications such as Chrome, LibreOffice, Thunderbird, and the OS
shell. Each task launches a fresh VM from a qcow2 image, restores a
snapshot, runs setup scripts, and evaluates the final desktop state.
package: osworld-cube
dev_install_url: "git+https://github.com/The-AI-Alliance/cube-harness#subdirectory=cubes/osworld-cube"
authors:
- github: kushasareen
name: Kusha Sareen
- github: amanjaiswal73892
name: Aman Jaiswal
- github: recursix
name: Alexandre Lacoste
legal:
wrapper_license: MIT
benchmark_license:
reported: CC-BY-4.0
source_url: "https://creativecommons.org/licenses/by/4.0/legalcode"
verified_by_original_authors: false
notices:
- type: software_registration
description: "Ubuntu desktop with pre-installed commercial applications including LibreOffice, Thunderbird, and others"
tags:
- os
- gui
- desktop
- multimodal
paper: "https://arxiv.org/abs/2404.07972"
supported_infra: [aws]
max_concurrent_tasks: 1
parallelization_mode: benchmark-parallel
resources:
- name: osworld-ubuntu-vm
scope: task
max_concurrent_agents:
source_url: https://huggingface.co/datasets/xlangai/ubuntu_osworld/resolve/main/Ubuntu.qcow2.zip
source_hash:
default_ttl_seconds: 86400
bootstrap_script_extra:
requires_kvm: true
_type: cube.resource.VMResourceConfig
type: VMResourceConfig
status: active
task_count: 368
has_debug_task: true
has_debug_agent: true
action_space: []
features:
async: false
streaming: false
multi_agent: false
multi_dim_reward: false