Why Google stores billions of lines of code in a single repository
Meta Info
Presented in Communications of the ACM 2016.
Authors: Rachel Potvin, Josh Levenberg (Google)
Understanding the paper
TL;DR
Google chose to stick with the central repository due to its advantages.
The monolithic model of source code management is not for everyone, e.g., organizations where large parts of the codebase are private or hidden between groups.
Key systems
Piper: The distributed source-code repository
Implemented on top of standard Google infrastructure (originally Bigtable, now Spanner)
Reply on the Paxos algorithm to guarantee consistency across replicas
CitC (Clients in the Cloud): The workspace client
With a cloud-based storage backend and a Linux-only FUSE13 file system
Critique: The code-review tool
Tricorder: Static analysis system
Code quality, test coverage, and test results
Rosie: large-scale cleanups and code changes
Create a large patch; find-and-replace
Split the large patch into smaller patches; test them independently; send for code review; commit them automatically once they pass tests and a code review
Statistics
Google’s monolithic software repository is used by 95% of its software developers worldwide.
The Google codebase includes
approximately 1 billion files
a history of 35 million commits
86TB of data (excluding release branches)
Over 99% of files stored in Piper are visible to all full-time Google engineers.
Over 80% of Piper users today use CitC.
Advantages of a monolithic codebase
Unified versioning → a single source of truth
Code sharing and reuse
Simplified dependency management
Avoid diamond dependency problem
Atomic changes
Large-scale refactoring
Collaboration across teams
Flexible team boundaries and code ownership
Code visibility and clear tree structure → implicit team namespacing
Costs and trade-offs
Tooling investments for both development and execution
Code-indexing system
Automated test infrastructure
Build infrastructure
Code search and browsing tools
Codebase complexity
Unnecessary dependencies → binary size bloating
Efforts invested in code health
Alternatives
Git (distributed version control systems)
A team at Google is focused on supporting Git, which is used by Google’s Android and Chrome teams outside the main Google repository.
Important for these teams due to external partner and open source collaborations.
The Git community strongly suggests and prefers developers have more and smaller repositories.
Git-clone will copy all content to one’s local machine.
Mercurial
An experimental effort
Last updated