The idea of a lone programmer relying on their own genius and technical acumen to create the next great piece of software was always a stretch. Today it is more of a myth than ever. Competitive market forces mean that software developers must rely on code created by an unknown number of other programmers. As a result, most software is best thought of as bricolage — diverse, usually open-source components, often called dependencies, stitched together with bits of custom code into a new application.
This software engineering paradigm — programmers reusing open-source software components rather than repeatedly duplicating the efforts of others — has led to massive economic gains. According to the best available analysis, open-source components now comprise 90 percent of most software applications. And the list of economically important and widely used open-source components — Google’s deep learning framework TensorFlow or its Facebook-sponsored competitor PyTorch, the ubiquitous encryption library OpenSSL, or the container management software Kubernetes — is long and growing longer. The military and intelligence community, too, are dependent on open-source software: programs like Palantir have become crucial for counter-terrorism operations, while the F-35 contains millions of lines of code.
The problem is that the open-source software supply chain can introduce unknown, possibly intentional, security weaknesses. One previous analysis of all publicly reported software supply chain compromises revealed that the majority of malicious attacks targeted open-source software. In other words, headline-grabbing software supply-chain attacks on proprietary software, like SolarWinds, actually constitute the minority of cases. As a result, stopping attacks is now difficult because of the immense complexity of the modern software dependency tree: components that depend on other components that depend on other components ad infinitum. Knowing what vulnerabilities are in your software is a full-time and nearly impossible job for software developers.
Fortunately, there is hope. We recommend three steps that software producers and government regulators can take to make open-source software more secure. First, producers and consumers should embrace software transparency, creating an auditable ecosystem where software is not simply mysterious blobs passed over a network connection. Second, software builders and consumers ought to adopt software integrity and analysis tools to enable informed supply chain risk management. Third, government reforms can help reduce the number and impact of open-source software compromises.
The Road to Dependence
Conventional accounts of the rise of reusable software components often date it to the 1960s. Software experts such as Douglas McIlroy of Bell Laboratories had noted the tremendous expense of building new software. To make the task easier, McIlroy called for the creation of a “software components” sub-industry for mass-producing software components that would be widely applicable across machines, users, and applications — or in other words, exactly what modern open-source software delivers.
When open source started, it initially coalesced around technical communities that provided oversight, some management, and quality control. For instance, Debian, the Linux-based operating system, is supported by a global network of open-source software developers who maintain and implement standards about what software packages will and will not become part of the Debian distribution. But this relatively close oversight has given way to a more free-wheeling, arguably more innovative system of package registries largely organized by programming language. Think of these registries as app stores for software developers, allowing the developer to download no-cost open-source components from which to construct new applications. One example is the Python Package Index, a registry of packages for the programming language Python that enables anyone — from an idealistic volunteer to a corporate employee to a malicious programmer — to publish code on it. The number of these registries is astounding, and now every programmer is virtually required to use them.
The effectiveness of this software model makes much of society dependent on open-source software. Open-source advocates are quick to defend the current system by invoking Linus’s law: “Given enough eyes, all bugs are shallow.” That is, because the software source code is free to inspect, software developers working and sharing code online will find problems before they affect society, and consequently, society shouldn’t worry too much about its dependence on open-source software because this invisible army will protect it. That may, if you squint, have been true in 1993. But a lot has changed since then. In 2022, when there will be hundreds of millions of new lines of open-source code written, there are too few eyes and bugs will be deep. That’s why in August 2018, it took two full months to discover that a cryptocurrency-stealing code had been slipped into a piece of software downloaded over 7 million times.
Transferring control of a piece of open-source software to another party happens all the time without consequence. But this time there was a malicious twist. After Tarr transferred control, right9ctrl added a new component that tried to steal bitcoins from the victim’s computer. Millions upon millions of computers downloaded this malicious software package until developer Jayden Seric noticed an abnormality in October 2018.
Event-stream was simply the canary in the code mine. In recent years, computer-security researchers have found attackers using a range of new techniques. Some are mimicking domain-name squatting: tricking software developers who misspell a package name into downloading malicious software (dajngo vs. django). Other attacks take advantage of software tool misconfigurationswhich trick developers into downloading software packages from the wrong package registry. The frequency and severity of these attacks have been increasing over the last decade. And these tallies don’t even include the arguably more numerous cases of unintentional security vulnerabilities in open-source software. Most recently, the unintentional vulnerability of the widely used log4j software package led to a White House summit on open-source software security. After this vulnerability was discovered, one journalist titled an article, with only slight exaggeration, “The Internet Is on Fire.”
The Three-Step Plan
Thankfully, there are several steps that software producers and consumers, including the U.S. government, can take that would enable society to achieve the benefits of open-source software while minimizing these risks. The first step, which has already received support from the U.S. Department of Commerce and from industry as well, involves making software transparent so it can be evaluated and understood. This has started with efforts to encourage the use of a software bill of materials. This bill is a complete list or inventory of the components for a piece of software. With this list, software becomes easier to search for components that may be compromised.
In the long term, this bill should grow beyond simply a list of components to include information about who wrote the software and how it was built. To borrow logic from everyday life, imagine a food product with clearly specified but unknown and unanalyzed ingredients. That list is a good start, but without further analysis of these ingredients, most people will pass. Individual programmers, tech giants, and federal organizations should all take a similar approach to software components. One way to do so would be embracing Supply-chain Levels for Software Artifacts, a set of guidelines for tamper-proofing organizations’ software supply chains.
The next step involves software-security companies and researchers building tools that, first, sign and verify software and, second, analyze the software supply chain and allow software teams to make informed choices about components. The Sigstore project, a collaboration between the Linux Foundation, Google, and a number of other organizations, is one such effort focused on using digital signatures to make the chain of custody for open-source software transparent and auditable. These technical approaches amount to the digital equivalent of a tamper-proof seal. The Department of Defense’s Platform One software team has already adopted elements of Sigstore. Additionally, a software supply chain “observatory” that collects, curates, and analyzes the world’s software supply chain with an eye to countering attacks could also help. An observatory, potentially run by a university consortium, could simultaneously help measure the prevalence and severity of open-source software compromises, provide the underlying data that enable detection, and quantitatively compare the effectiveness of different solutions. The Software Heritage Dataset provides the seeds of such an observatory. Governments should help support this and other similar security-focused initiatives. Tech companies can also embrace various “nutrition label” projects, which provide an at-a-glance overview of the “health” of a software project’s supply chain.
These relatively technical efforts would benefit, however, from broader government reforms. This should start with fixing the incentive structure for identifying and disclosing open-source vulnerabilities. For example, “DeWitt clauses” commonly included in software licenses require vendor approval prior to publishing certain evaluations of the software’s security. This reduces society’s knowledge about which security practices work and which ones do not. Lawmakers should find a way to ban this anti-competitive practice. The Department of Homeland Security should also consider launching a non-profit fund for open-source software bug bounties, which rewards researchers for finding and fixing such bugs. Finally, as proposed by the recent Cyberspace Solarium Commission, a bureau of cyber statistics could track and assess software supply chain compromise data. This would ensure that interested parties are not stuck building duplicative, idiosyncratic datasets.
Without these reforms, modern software will come to resemble Frankenstein’s monster, an ungainly compilation of suspect parts that ultimately turns upon its creator. With reform, however, the U.S. economy and national security infrastructure can continue to benefit from the dynamism and efficiency created by open-source collaboration.
John Speed Meyers is a security data scientist at Chainguard. Zack Newman is a senior software engineer at Chainguard. Tom Pike is the dean of the Oettinger School of Science and Technology at the National Intelligence University. Jacqueline Kazil is an applied research engineer at Rebellion Defense. Anyone interested in national security and open-source software security can also find out more at the GitHub page of a nascent open-source software neighborhood watch. The views expressed in this publication are those of the authors and do not imply endorsement by the Office of the Director of National Intelligence or any other institution, organization, or U.S. government agency.
Image: stock photo