Skip to main content

1. Beyond Elements: Mouse, Keyboard, and Image Automation

Mastering the "Human-Like" tools to automate any software, even the most stubborn ones

Sophie avatar
Written by Sophie
Updated this week

In an ideal world, every button and input box would have a clear "Element ID" that our bot could easily find. But as you progress in RPA, you will eventually encounter "stubborn" software—legacy accounting systems, secure portals, or custom-built tools—that hide their internal structure.

This is where we shift our strategy. Instead of trying to talk to the software's code, we use Mouse, Keyboard, and Image-based automation to mimic exactly how a human interacts with a computer. By looking at the screen, moving the cursor, and typing on the keys, these methods allow you to automate anything you can see.

Navigating with mouse and keyboard

When a button doesn't have a selectable ID, you can treat the interface like a physical workspace. Mouse automation allows you to navigate by targeting specific coordinates on the screen. It is a reliable fallback for opening hidden menus or triggering custom controls that don't respond to standard commands.

To complement the mouse, Keyboard automation acts as your bot’s voice. Beyond simple typing, its true power lies in Global Shortcuts. Using commands like Ctrl+C or Alt+Tab can often navigate a complex interface much faster and more reliably than a coordinate-based mouse click.

Pro Tip: To make these actions robust, always include a small "Deterministic Pause." Adding a fraction of a second between a mouse click and a keyboard stroke gives the application time to react, ensuring the bot doesn't move faster than the software can process the input.

Adding "Eyes" with image recognition

Sometimes, even coordinates aren't enough—perhaps a button moves every time you open the app, or a pop-up appears in a random spot. This is where Image-based automation becomes your most valuable ally. It grants your bot "visual recognition" capabilities.

Instead of looking for a coordinate or a piece of code, the bot scans the screen for a specific visual pattern, such as an icon or a logo you’ve taught it to recognize. This is particularly effective for handling Captchas, graphical buttons, or custom-drawn interfaces. You can even provide multiple versions of the same image, allowing the bot to find a button even if its color or background changes slightly. This visual independence makes your automation incredibly resilient to software updates.

Putting It All Together: A Multi-Layered Strategy

The most robust automations don't just use one method; they layer them like a professional workflow to handle complex sequences. Try Element-based actions first, then fall back to Mouse/Keyboard coordinates, and use Image Recognition as the final guide. Here is how a "human-like" automation typically flows:

  • Image Recognition to "look for" a stubborn Submit button or a specific icon.

  • Mouse Action to move the cursor to that visual target and perform a click.

  • Keyboard Action to type a confirmation message or use a shortcut to save the work.

By combining these actions, you create a fallback plan that works when conventional methods fail. Start with a simple exercise—like using the bot to open Notepad, type a sentence, and save the file—to get a feel for how these three methods work in harmony. With these tools in your kit, no software is truly "un-automatable."

Conclusion

Mastering mouse, keyboard, and image automation broadens the scope of your RPA deployments. You can automate high-volume, rule-based computer tasks such as data entry, software operation sequences, and routine communications even when conventional automation methods fail. This reduces manual effort, minimizes human error, and speeds up repetitive workflows, making your automation more resilient in diverse environments.

Did this answer your question?