CLAUDE LABJP
WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skillsWWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Articles/Claude.ai
Claude.ai/2026-04-20Advanced

Claude Computer Use 2026 — Desktop Automation Across Browser, Desktop, and CLI

The practical guide to Claude Computer Use. Covers the latest setup for macOS general availability, browser and desktop automation patterns, and practical production deployment strategies.

Claude Computer Use2Desktop AutomationBrowser Automation2Claude API99RPA

Premium Article

After about six months running Claude Computer Use in production environments, the initial skepticism of "will this actually work?" has shifted to a more interesting question: "how much can I delegate to it?" With general availability on macOS now in place and real-world adoption growing, I want to lay out the full picture of this capability and how to use it effectively.

What Computer Use Actually Is — The Screenshot Loop

The fundamental difference between Computer Use and other AI features is that Claude autonomously runs a "see → decide → act" loop. Internally, it works like this:

Take a screenshot → Claude looks at the image and decides where to click → Execute the specified coordinate click, type, or scroll → Take another screenshot to confirm the result → Move to the next action

The elegant part of this design is that it does not use a dedicated vision model trained on UI analysis. It uses Claude's general-purpose multimodal understanding — which means it can handle unfamiliar interfaces flexibly. The tradeoff is that pixel-coordinate precision has limits, making errors more likely with dense UIs or dynamic content.

Understanding this characteristic before you start is what separates successful production deployments from frustrating experiments.

Setup — From API Key to First Working Task

What you need

pip install anthropic pillow

Use claude-opus-4-6 or claude-sonnet-4-6. Computer Use is available on Bedrock and Vertex AI as well, but the Anthropic API direct connection gets new features fastest.

Minimal working code

import anthropic
import base64
from PIL import ImageGrab
 
client = anthropic.Anthropic()
 
def take_screenshot():
    """Capture screenshot and return as base64"""
    screenshot = ImageGrab.grab()
    screenshot = screenshot.resize(
        (screenshot.width // 2, screenshot.height // 2)
    )  # Resize to reduce token cost
    screenshot.save("/tmp/screen.png")
    with open("/tmp/screen.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")
 
def run_computer_use_task(task: str):
    screenshot_b64 = take_screenshot()
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        tools=[
            {
                "type": "computer_20241022",
                "name": "computer",
                "display_width_px": 1280,
                "display_height_px": 800,
            }
        ],
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/png",
                            "data": screenshot_b64,
                        },
                    },
                    {"type": "text", "text": task}
                ],
            }
        ],
    )
    return response

The Action Loop

The heart of Computer Use is the loop: when Claude returns a tool_use block, you execute that action, then send the resulting screenshot back to Claude.

import subprocess
import time
 
def execute_action(action: dict) -> str:
    action_type = action.get("action")
    
    if action_type == "screenshot":
        return take_screenshot()
    
    elif action_type == "left_click":
        x, y = action["coordinate"]
        subprocess.run(["cliclick", f"c:{x},{y}"])  # macOS
        time.sleep(0.5)
        return take_screenshot()
    
    elif action_type == "type":
        text = action["text"]
        subprocess.run(["cliclick", f"t:{text}"])
        time.sleep(0.3)
        return take_screenshot()
    
    elif action_type == "key":
        key = action["key"]
        subprocess.run(["cliclick", f"kp:{key}"])
        time.sleep(0.3)
        return take_screenshot()
    
    return take_screenshot()
 
def run_task_with_loop(task: str, max_iterations: int = 20):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": take_screenshot(),
                    },
                },
                {"type": "text", "text": task}
            ],
        }
    ]
    
    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=[{
                "type": "computer_20241022",
                "name": "computer",
                "display_width_px": 1280,
                "display_height_px": 800,
            }],
            messages=messages,
        )
        
        if response.stop_reason == "end_turn":
            print("Task complete")
            break
        
        tool_results = []
        for block in response.content:
            if block.type == "tool_use" and block.name == "computer":
                new_screenshot = execute_action(block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": [
                        {
                            "type": "image",
                            "source": {
                                "type": "base64",
                                "media_type": "image/png",
                                "data": new_screenshot,
                            },
                        }
                    ],
                })
        
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})
    
    return response

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Step-by-step production deployment from the latest macOS setup, with hands-on code examples
Architecture design for choosing between browser, desktop, and CLI automation routes
Practical techniques for screenshot optimization, error recovery, and cost management
Secure payment via Stripe · Cancel anytime
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Claude.ai2026-05-22
Holding the Line on Claude's Output Shape With <output_format> — A Pattern From My Indie App Copy Pipeline
How I keep multilingual App Store copy from drifting across 100+ locales by leaning on the <output_format> tag, with the prompts and validators I actually run.
Claude.ai2026-04-25
When Extended Thinking 'Does Not Work': 7 Causes That Hide Behind the Same Symptom
When you turn on Extended Thinking but the response feels identical to before, the cause is usually one of seven distinct problems. This guide walks through how to diagnose each from the API, the chat UI, the SDK, and the model layer.
Claude.ai2026-04-19
When Extended Thinking Actually Pays Off — and How to Configure It
A practical guide to Claude's Extended Thinking feature — which problems genuinely benefit from it, how to tune budget_tokens, and how to use the thinking block as a design tool, not just an output.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →