After about six months running Claude Computer Use in production environments, the initial skepticism of "will this actually work?" has shifted to a more interesting question: "how much can I delegate to it?" With general availability on macOS now in place and real-world adoption growing, I want to lay out the full picture of this capability and how to use it effectively.
What Computer Use Actually Is — The Screenshot Loop
The fundamental difference between Computer Use and other AI features is that Claude autonomously runs a "see → decide → act" loop. Internally, it works like this:
Take a screenshot → Claude looks at the image and decides where to click → Execute the specified coordinate click, type, or scroll → Take another screenshot to confirm the result → Move to the next action
The elegant part of this design is that it does not use a dedicated vision model trained on UI analysis. It uses Claude's general-purpose multimodal understanding — which means it can handle unfamiliar interfaces flexibly. The tradeoff is that pixel-coordinate precision has limits, making errors more likely with dense UIs or dynamic content.
Understanding this characteristic before you start is what separates successful production deployments from frustrating experiments.
Setup — From API Key to First Working Task
What you need
pip install anthropic pillowUse claude-opus-4-6 or claude-sonnet-4-6. Computer Use is available on Bedrock and Vertex AI as well, but the Anthropic API direct connection gets new features fastest.
Minimal working code
import anthropic
import base64
from PIL import ImageGrab
client = anthropic.Anthropic()
def take_screenshot():
"""Capture screenshot and return as base64"""
screenshot = ImageGrab.grab()
screenshot = screenshot.resize(
(screenshot.width // 2, screenshot.height // 2)
) # Resize to reduce token cost
screenshot.save("/tmp/screen.png")
with open("/tmp/screen.png", "rb") as f:
return base64.standard_b64encode(f.read()).decode("utf-8")
def run_computer_use_task(task: str):
screenshot_b64 = take_screenshot()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=[
{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1280,
"display_height_px": 800,
}
],
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64,
},
},
{"type": "text", "text": task}
],
}
],
)
return responseThe Action Loop
The heart of Computer Use is the loop: when Claude returns a tool_use block, you execute that action, then send the resulting screenshot back to Claude.
import subprocess
import time
def execute_action(action: dict) -> str:
action_type = action.get("action")
if action_type == "screenshot":
return take_screenshot()
elif action_type == "left_click":
x, y = action["coordinate"]
subprocess.run(["cliclick", f"c:{x},{y}"]) # macOS
time.sleep(0.5)
return take_screenshot()
elif action_type == "type":
text = action["text"]
subprocess.run(["cliclick", f"t:{text}"])
time.sleep(0.3)
return take_screenshot()
elif action_type == "key":
key = action["key"]
subprocess.run(["cliclick", f"kp:{key}"])
time.sleep(0.3)
return take_screenshot()
return take_screenshot()
def run_task_with_loop(task: str, max_iterations: int = 20):
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": take_screenshot(),
},
},
{"type": "text", "text": task}
],
}
]
for i in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=[{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1280,
"display_height_px": 800,
}],
messages=messages,
)
if response.stop_reason == "end_turn":
print("Task complete")
break
tool_results = []
for block in response.content:
if block.type == "tool_use" and block.name == "computer":
new_screenshot = execute_action(block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": new_screenshot,
},
}
],
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
return response