The opensource code that powers Claude's computer control and the maintainer they didn't hire

Input simulation is at the heart of the AI revolution. But what is it, and how can so many world-changing technologies rely on the kindness of a lone maintainer?

Sep 08, 2025

Robin Grell's blog post about being rejected by Anthropic blew up when it hit the front page of Hacker News. The opensource library he maintains, enigo, powers an unreleased feature of Claude Desktop to control your computer by moving your mouse or typing on a keyboard. When he applied for a position working on his own library at Anthropic, their hiring system automatically rejected his application, seemingly without a single human in the loop.

Robin’s story exposed three broken systems:

Permissive opensource licensing that lets companies extract infinite value while giving nothing back
AI-driven hiring that screens out the very people you're trying to hire
The general state of tech hiring in 2025 (spoilers: it’s not fun unless you’re an experienced ML engineer Meta has been clamboring to hire for Reasons)

But I wanted to dig deeper into the technical story behind the outrage. Input simulation is the linchpin for connecting AI to external systems, but what is it, and how did a German master's student end up maintaining critical infrastructure for the AI industry? Robin generously hopped on a call to explain.

Input Simulation is hard, actually

When JavaScript programmers think about simulating mouse clicks and keystrokes, we imagine it's as easy as sending a click event. Robin quickly corrected this assumption.

"It's surprisingly difficult to even do that simple thing," he told me. "The complexity arises mostly because of the different platforms that are supported. Everyone has their own ideas how to do that."

Enigo supports Windows, macOS, BSD, and Linux across multiple protocols (Wayland, X11, and libei). That's not just "write once, run everywhere"—each platform has different APIs, different security models, different ways of handling input events. Linux alone requires supporting multiple display server protocols as the ecosystem transitions from X11 to Wayland.

"Some compositors don't support all the protocols," Robin explained. "The goal of enigo is to just use the library, and the developer doesn't have to care what the compositor understands, which versions, and all that."

Even keyboard support requires manual labor. "For Linux, at least, I have to add [keys] manually. Recently, [F13 to F24 keys] were added, which existed on Windows, but on Linux, they didn't exist." (Correction: turns out enigo probably supported these already.)

My mind boggled trying to imagine that setup. Keyboards with 24 function keys? Robin laughed. "Most of them are simulated, but apparently, there were some old keyboards with two rows of function keys at the top." And apparently, there are some modern ones as well!

This kind of unglamorous infrastructure work makes "magical" AI features possible. Claude's computer control isn't breakthrough AI research. It's careful engineering on top of enigo’s cross-platform input abstraction.

The accidental infrastructure maintainer

Robin’s path to maintaining critical AI infrastructure started with a simple, unrelated goal to build a better smartphone keyboard for Linux.

"It all started with my master's thesis," he said. "I wanted to use the PinePhone, which is a Linux smartphone, and I got so used to swipe typing and next word prediction that I thought I would really miss it because it wasn't available for Linux at the time. That was 2020."

He planned to write a virtual keyboard with advanced features for Wayland. But first, he needed the basics: input simulation. That's when he discovered enigo, created by Dustin Bensing but abandoned with pending merge requests.

"I just asked him if he needed help maintaining it, and he was happy about it, gave me permissions on GitHub. He just completely trusted me, and that's how I started."

What Robin thought would be a quick detour became his primary focus. His keyboard project is still on the back burner because enigo keeps demanding more attention. "I naively thought it wouldn't change much, but I just have to continue to invest time into it."

In the opensource era, it is common for critical infrastructure to depend completely on one person's side project. One person’s hobby can become everyone's dependency. Robin now maintains a library with more than 300,000 downloads that powers everything from AI agents to accessibility tools to remote desktop software.

What Input Simulation enables

Most people's mental model of input simulation stops at gaming bots and auto-clickers. But Robin has seen many more interesting applications:

AI computer control: "Recently [enigo] was very popular with people writing AI agents such as Anthropic. You basically send screenshots to the server, the server analyzes it, and then tells the client where to move the mouse and click and type stuff."

Remote desktop software: RustDesk, a TeamViewer alternative, uses enigo to relay mouse and keyboard input between computers.

Input methods for underserved languages: The Afrim project uses enigo to handle African languages that can't fit on standard keyboards—users type phonetically and the input method converts to the correct characters.

Accessibility tools: Voice-controlled computers for people who can't use traditional input methods.

All these applications need to bridge the gap between different modalities—voice to keyboard, network to local input, and one language system to another.

Input simulation and bot detection

I was curious whether systems can distinguish between simulated input and human input.

"It depends," Robin said. "On X11, with the protocol I'm using, the compositor can tell, but most compositors don't mind. On macOS, I think if you wanted to as the OS, you can always tell that it's a simulated input."

For many websites, bot detection prevents scraping, DDOS attacks, and other unwanted automated activities (think of scalpers who buy up all the concert tickets as soon as they’re released so they can resell them at a markup, or shady organizations using bots to leave flood political articles with comments designed to sway public opinion). Bot detection is an arms race between site owners and bot operators. "You could analyze the [mouse] path, or you can tell if you simulate text that no human can physically type that quickly."

Developers can counter by adding delays, fuzzing mouse trajectories, or using more human-like timing patterns and inputs. However, the ultimate countermeasure, short of paying actual humans to perform actions, is input simulation, where key presses are simulated at the system level.

Of course, in the case of personal AI agents operating on our computers to perform tasks on our behalf, this is perfectly acceptable to the user and difficult for the property operator to detect. This makes distinguishing between bots, agents, and humans incredibly difficult, and it is one of the biggest open-ended questions facing the Agentic Web (which I’ll get into in a future post).

The academic hacker's perspective

While maintaining infrastructure for billion-dollar companies, Robin is pursuing a second master's degree in IT security at TU Darmstadt. "We have very cool classes where you don't just theoretically learn about it, but you do projects. I did a reverse engineering lab last semester and reverse-engineered the authentication method of the app to log into your tax revenue service in Germany."

He also hacked his bike: "It's just a replay attack on the electronic shifters. They send wireless signals to the rear derailleurs. But I had to downgrade the firmware and man-in-the-middle some requests of the app."

Robin's academic approach provides him with a unique perspective on the industry's chaos surrounding AI hiring and opensource extraction. "I love studying. I would like to do that for the rest of my life. But then every now and then, I also want to build things more long-term."

Robin didn't set out to solve input simulation for the entire Rust ecosystem or to “fill in his green squares” on GitHub to attract a FAANG employer. He just needed functionality for one project, found an abandoned library, and started fixing bugs.

The happy ending (sort of)

It’s been a few weeks since the Hacker News uproar, and Robin's viral blog post has since created some unexpected opportunities. Multiple CTOs from AI companies reached out, and Robin also got offers for Rust consulting work.

"I didn't expect it to be that popular, but it was an interesting experience," he reflected.

Robin did get in touch with real humans at Anthropic, but nothing came of it. And the deeper issues remain. Anthropic is using Robin's code in production, deployed to thousands of devices, generating revenue for a company valued at $60+ billion. His compensation? Stars on GitHub and download counts on crates.io—what he calls "the nerd equivalent of street creds."

enigo’s MIT license maximizes compatibility but also leaves room for exploitation. Had Dustin Bensing used AGPL, Anthropic might have been forced to contribute back or pay for a commercial license. Perhaps licensing is something we should consider more when taking over maintainership of a repository.

What this means for AI

Robin's story illustrates a fundamental tension in how AI capabilities get built. The most impressive demos often rely on surprisingly mundane infrastructure—cross-platform input simulation, screen capture APIs, text parsing libraries. This infrastructure is typically maintained by individuals or small teams, often as hobby projects.

As AI agents become more capable, they'll need more of this "boring" infrastructure. Libraries for controlling browsers, parsing documents, interfacing with APIs, handling authentication. The companies building billion-dollar AI systems will increasingly depend on the Robin Grells of the world.

The current norm is that companies extract infinite value from permissively licensed code while giving nothing back. This works fine when the stakes are low. But as this infrastructure becomes critical to AI capabilities, the sustainability questions become urgent.

What happens when the maintainer of a critical library burns out? When they get hit by a bus? When they decide their time is worth more than GitHub stars?

We’ll find out soon enough.

The Real Magic Trick

Claude's computer control isn't some breakthrough in artificial intelligence. It's carefully engineered, built on top of opensource libraries maintained by graduate students from wealthy nations.

This is actually good news. It means these capabilities are more accessible and reproducible than they appear. If you want to build AI computer control, you don't need to reverse-engineer Anthropic's research. You can use the same libraries they do.

The magic isn't in the AI model. The magic is in developers like Robin who solve the unglamorous but essential problems that make everything else possible. Every time Claude moves your mouse cursor, it's Robin and Dustin's hard work making it happen.

Anthropic might be getting the credit, but the arms and legs belong to opensource.

The original interview was much longer—we covered everything from German tax software security to electronic bike shifter vulnerabilities. You can catch it on YouTube.

Robin continues to maintain enigo at github.com/enigo-rs/enigo. If you're building something with input simulation, consider contributing back to make the project more sustainable. And if you're hiring for AI infrastructure roles, consider screening applications from maintainers of libraries you're using in production into a human-monitored process.

You can read Robin's original blog post at grell.dev.

I'll be speaking at the O'Reilly AI Codecon: Coding for the Agentic World on September 9 at 11:00 AM ET. Join us for an intensive exploration of the tools, workflows, and architectures defining this next era of programming. Sign up now to save your spot—for free!