DARPA, the Defense Department’s (DOD) R&D agency, will lean on emerging AI capabilities in a new program to deal with the costly and time-consuming challenge of rewriting C and C++ code to Rust in a move designed to meet the push for federal agencies and private organizations to adopt memory-safe programming languages.
DARPA – the Defense Advanced Research Projects Agency – this month announced its Translating All C to Rust (TRACTOR) program, which will use large language models (LLMs) and other machine learning techniques to automate the bulk of the tasks needed to move more than two decades of legacy code to the safer Rust language.
The White House Office of the National Cyber Director (ONCD) and CISA for months have urged developers to adopt Rust or other modern languages, such as Python or C#, to better protect their software by eliminating a whole class of memory safety vulnerabilities that account for most common vulnerabilities in languages like C and C++.
Those flaws can include buffer overflows, use of uninitiated memory, and use after free, which rises when a program continues to use a memory location after it has been freed, or deallocated.
CISA Director Jen Easterly late last year said that as much as two-thirds of all software vulnerabilities are the result of a lack of memory-safe coding.
That’s a lot of Code
However, officials with DARPA said that while there is a broad consensus in the software engineering community for the need for memory-safe languages, the problem is that C and C++ have been used in public and private-sector organizations for decades, making the chore of moving the massive amounts of such code to Rust significantly challenging.
They noted that the C language was created in the 1970s and has been used to create applications for everything from smartphones to space vehicles. The DOD itself “has long-lived systems that disproportionately depend on programming languages like C,” they wrote in a statement, adding that what’s needed is a way to rewrite “legacy code at scale that matches the vastness of the problem.”
This is where AI comes in, according to Dan Wallach, DARPA’s program manager for TRACTOR.
“You can go to any of the LLM websites 1722840373, start chatting with one of the AI chatbots, and all you need to say is ‘here’s some C code, please translate it to safe idiomatic Rust code,’ cut, paste and something comes out, and it’s often very good, but not always,” Wallach said in a statement. “The research challenge is to dramatically improve the automated translation from C to Rust, particularly for program constructs with the most relevance.”
Enlisting Developers’ Help
The goal of TRACTOR is to create a program that will produce the same quality and style that a skilled Rust programmer would develop. DARPA is turning to the software developer community to help find that program by hosting public competitions where LLM-powered proposals will be tested.
According to Wallach, proposals likely will include novel combinations that include software analysis, including static and dynamic analysis, and LLMs.
The program starts August 26 with a Proposers Day, which interested software engineers can attend in person or virtually. Participants need to register by August 19. Details and registration information can be found at SAM.Gov.
A Global Problem
In a 19-page technical report, the ONCD wrote that memory safety vulnerabilities “represent a major problem for the software industry as they cause manufacturers to continually release security updates and their customers to continually patch. These vulnerabilities persist despite software manufacturers historically expending significant resources attempting to reduce their prevalence and impact through various methods, including analyzing, patching, publishing new code and investing in training programs for developers.”
Meanwhile, customers are forced to expend significant resources responding to the flaws through complex patch management programs and incident response initiatives.
They affect the memory in a computer by allowing programmers to manipulate memory directly, which makes it easier to inadvertently introduce coding errors that could lead to a seemingly routine operating corrupting the state or memory, DARPA officials wrote.
In addition, security issues can come when a programming language shows what the agency called “undefined behavior,” which happens when the language standard has no specification or guidance on how the program should behave in conditions that aren’t explicitly defined in the standard.
A 23-page report by CISA and other U.S. and international cybersecurity agencies from such countries as Canada, the UK, Australia and New Zealand found that about 70% of Microsoft CVEs and flaws in Google’s Chromium project are memory safety vulnerabilities. In addition, for Mozilla, 32 of 34 critical- or high-rated flaws fell into this category.
Eugen Boglaru is an AI aficionado covering the fascinating and rapidly advancing field of Artificial Intelligence. From machine learning breakthroughs to ethical considerations, Eugen provides readers with a deep dive into the world of AI, demystifying complex concepts and exploring the transformative impact of intelligent technologies.