Screenshot of COWPILOT evaluation result page. After each task is completed, the evaluation metric values are shown as summary.

Screenshot of COWPILOT evaluation result page. After each task is completed, the evaluation metric values are shown as summary.

Source publication
Preprint
Full-text available
While much work on web agents emphasizes the promise of autonomously performing tasks on behalf of users, in reality, agents often fall short on complex tasks in real-world contexts and modeling user preference. This presents an opportunity for humans to collaborate with the agent and leverage the agent's capabilities effectively. We propose CowPil...

Contexts in source publication

Context 1
... the human agent can choose to reject or pause the action (Figure 2, 3 ) and take over. They can also transfer the action back to the LLM-based agent by hitting the resume button (Figure 2, 4 ). This takeover-then-back process can be conducted unlimited times per task-solving session. ...
Context 2
... this approach isolates browsing sessions, restricts multi-tab navigation, and diverges from standard workflows, which limits practical usability. Chrome extensions, as adopted by tools like WebCanvas ( Pan et al., 2024b), WebOlympus (Zheng et al., 2024b), OpenWebAgent (Iong et al., 2024), and Taxy (TaxyAI, 2024), present a more userfriendly alternative. They are easy to install, lightweight, and integrate seamlessly into standard browsing environments, making them accessible to end-users. ...
Context 3
... work can focus on addressing such safety risks and transparency, including developing robust safeguards to prevent unintended actions. Figure 4 shows a screenshot of evaluation results by COWPILOT. After the task is completed, the summary will be shown containing the metric values covered by subsection 2.2. ...