All Hands AI Open Sources OpenHands CodeAct 2.1: A New Software Development Agent to Solve Over 50% of Real Github Issues in SWE-Bench

The world of software development has seen an explosion in the use of AI agents over the last few years, promising to enhance productivity, automate complex tasks, and make the lives of developers easier. However, one problem that remains prevalent is the significant gap between these promising AI agents and their ability to address real-world issues effectively. Most AI Agents struggle to understand the complexity and contextual nuances of software development challenges—especially when it comes to solving real GitHub issues that developers face every day. These AI agents often fall short, requiring extensive oversight or manual correction from developers, which defeats their purpose. Addressing this challenge requires a solution that is not just smarter but is able to keep up with the dynamic demands of software engineering, a space full of unique challenges and fast-moving projects.

All Hands AI Open Sources OpenHands CodeAct 2.1: a new software development agent, the first to solve over 50% of real GitHub issues in SWE-Bench, the standard benchmark for evaluating AI-assisted software engineering tools. OpenHands CodeAct 2.1 represents a significant leap forward, boasting a 53% resolution rate on SWE-Bench and a 41.7% success rate on SWE-Bench Lite. What makes OpenHands CodeAct 2.1 particularly revolutionary is that it has gone beyond experimentation in controlled environments and is now making a substantial impact on actual projects by solving real GitHub issues autonomously. Unlike other tools that are either too closed off for contribution or too niche to be useful to the broader community, OpenHands is an open-source agent that developers can freely use, improve, and adapt. With the perfect combination of openness and competitiveness, it has become the top choice for developers seeking an effective AI solution.

OpenHands CodeAct 2.1’s performance improvements are primarily rooted in three major updates. First, it switched to Anthropic’s new Claude-3.5 model, which significantly improves natural language understanding, allowing CodeAct to better interpret issues raised by developers. Second, the agent’s actions have been modified to use function calling, which brings more precision in task execution. This ensures that the agent can call specific pieces of code without misinterpretation, effectively addressing developer issues more accurately. Lastly, the developers behind CodeAct 2.1 made significant improvements regarding directory traversal, reducing instances of the agent getting stuck in repetitive or circular tasks—a common problem that plagued earlier iterations. By refining the agent’s capabilities to navigate directories intelligently, larger and more complicated issues are resolved smoothly, and efficiency is markedly increased.

The importance of these updates cannot be overstated. Having a 53% resolve rate on SWE-Bench means that over half of the issues in this benchmark were solved without any human intervention. Considering that SWE-Bench is specifically designed to be representative of real-world GitHub issues faced by software developers, this milestone demonstrates that OpenHands CodeAct 2.1 can directly impact software engineering workflows by solving a substantial number of issues autonomously. In the broader scope of automated development assistance, this is significant because it saves developers time and allows them to focus on higher-level challenges rather than getting bogged down by tedious issue resolution. Moreover, the open-source nature of OpenHands invites developers from around the globe to contribute and further improve the agent—a feature that the development community holds in high regard. The data from SWE-Bench Lite, where OpenHands CodeAct 2.1 achieved a 41.7% resolve rate, also supports its versatility and capability in handling less complex issues, which can be equally disruptive when left unchecked in a development pipeline.

In conclusion, OpenHands CodeAct 2.1 is a breakthrough in AI-driven software development, moving us a step closer to fully autonomous coding assistants that genuinely enhance productivity. Its ability to solve over 50% of real GitHub issues in SWE-Bench demonstrates not only technological advancement but also practical usability that developers can rely on day-to-day. The open-source nature of OpenHands ensures that it remains a community-driven effort with the promise of continued improvements. Whether developers are looking to run OpenHands locally, integrate it through GitHub actions, or sign up for the soon-to-be-released online version, it offers flexibility and an open invitation to all developers to join in its evolution. With major improvements in the agent’s capabilities—such as adopting Anthropic’s Claude-3.5, implementing function calling, and improving directory traversal—OpenHands CodeAct 2.1 is setting the standard for what an AI development agent should be: effective, accessible, and continuously evolving.


Check out the Details and GitHub here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Trending] LLMWare Introduces Model Depot: An Extensive Collection of Small Language Models (SLMs) for Intel PCs

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

🧵🧵 [Download] Evaluation of Large Language Model Vulnerabilities Report (Promoted)