What Is Code Coverage and How to Improve It

Summarize this article with:
Your tests pass. Your CI pipeline is green. But how much of your source code actually ran during those tests?
That’s what code coverage measures, the percentage of your codebase executed by your automated test suite. It’s one of the most common software testing metrics, and one of the most misunderstood.
A high number doesn’t guarantee quality. A low number almost always signals risk.
This article breaks down how coverage is measured, what the different coverage metrics track, which tools work for each language, and how to improve your coverage percentage without writing pointless tests. You’ll also learn where coverage falls short and what to pair it with for a realistic picture of test effectiveness.
What Is Code Coverage

Code coverage is a software testing metric that measures the percentage of source code executed when a test suite runs.
It tells you which lines, functions, and branches your automated tests actually touched, and which ones they skipped entirely. The result is a percentage. If your tests execute 800 out of 1,000 lines, you’ve got 80% line coverage.
But here’s the thing people get wrong constantly. Code coverage and test coverage are not the same concept.
Code coverage tracks whether specific code paths were executed. Test coverage is broader. It asks whether every feature, condition, and user scenario has been tested meaningfully.
You can hit 100% code coverage with tests that never check a single result. Martin Fowler calls this “assertion-free testing,” and it happens more often than you’d think.
According to Testlio, Google considers 60% acceptable, 75% commendable, and 90% exemplary for code coverage benchmarks internally. Most teams aim somewhere around 80%.
The metric is expressed in different ways depending on the tool and context. Per-file, per-function, per-line. Sometimes as a total across the entire codebase, sometimes scoped to a single module.
Coverage data shows up in HTML reports, JSON exports, XML output, and LCOV format. Your CI dashboard probably already supports at least one of these.
So where does code coverage actually fit? It’s a diagnostic tool inside the software quality assurance process. Not a guarantee of quality. Not a substitute for thinking carefully about what your tests actually verify.
IBM’s Systems Sciences Institute found that fixing a bug after product release costs 4 to 5 times more than catching it during design, and up to 100 times more if found during maintenance.
Coverage won’t prevent every bug. But it points to the blind spots you haven’t tested yet, and those blind spots are where production failures tend to live.
How Code Coverage Is Measured

Measuring code coverage is a two-step process: instrument the code, then run the tests.
What Instrumentation Does
Instrumentation is the mechanism that makes coverage tracking possible.
Before your tests execute (or sometimes during execution), the coverage tool injects tracking markers into your source code or compiled bytecode. These markers record which statements, branches, and functions get hit.
Source-level instrumentation modifies the actual code before it runs. Istanbul does this for JavaScript. Coverage.py does it for Python.
Bytecode-level instrumentation works at the compiled level. JaCoCo operates this way for Java, hooking into the JVM without touching your source files.
The distinction matters. Bytecode instrumentation tends to be less invasive. Source-level instrumentation gives more granular reporting but can slow down execution.
Google’s infrastructure computes coverage for one billion lines of code daily across seven programming languages, according to their published research (Ivanković and Petrović, ACM 2019).
How Coverage Reports Work
After your test suite finishes, the tool produces a coverage report. This is where the numbers come from.
A typical report breaks down coverage by file, function, and line. Most tools generate output in multiple formats.
| Format | Use Case | Common Tools |
|---|---|---|
| HTML | Visual inspection, team reviews | Istanbul, JaCoCo, Coverage.py |
| LCOV | CI pipeline integration | gcov, llvm-cov |
| JSON | Programmatic analysis | Jest, nyc |
| XML | SonarQube and Codecov uploads | Cobertura, JaCoCo |
The report itself is straightforward. Green lines were executed. Red lines were not. Yellow usually means partial branch coverage, where one side of a conditional ran but the other didn’t.
What the report won’t tell you is whether the tests that executed those lines actually verified anything useful. That’s the part you need to evaluate yourself, and it’s where the code review process picks up the slack.
Types of Code Coverage Metrics
Not all coverage metrics are created equal. Each one tracks a different dimension of your test suite’s reach.
Line Coverage (Statement Coverage)
The simplest metric. Did each line of executable code run at least once?
Line coverage is what most teams look at first. If a line shows up green in the report, it ran. If it’s red, no test touched it.
The catch: a single line can contain multiple statements. And just because a line executed doesn’t mean the logic inside it was tested thoroughly.
Most professional developers use line coverage as a starting point, then go deeper with branch or condition metrics.
Branch Coverage
Branch coverage checks whether both outcomes of every conditional have been tested. Every if has a true path and a false path. Every switch has multiple cases.
This is where things get more useful. A test might execute an if statement, but only when the condition is true. Branch coverage flags that the false side was never tested.
| Metric | What It Measures | Typical Blind Spot |
|---|---|---|
| Line coverage | Which lines ran | Untested conditional paths |
| Branch coverage | Both sides of decisions | Complex boolean sub-expressions |
| Function coverage | Which functions were called | Internal logic within functions |
The IEEE Standard for Software Unit Testing specifies 100% statement coverage as a completeness requirement, and recommends 100% branch coverage for critical code.
Function Coverage
Did each function or method get called at all?
Function coverage is the broadest metric. A function can be “covered” even if only its first line executed before returning early. It’s useful for spotting dead code (functions nobody calls), but it tells you almost nothing about the quality of testing within the function.
Condition and Path Coverage
Condition coverage (also called predicate coverage) goes further than branch coverage. It tests whether each individual boolean sub-expression within a compound condition has been evaluated as both true and false.
Path coverage tracks every possible execution path through a block of code. This sounds good in theory. In practice, path coverage is computationally expensive. A function with 10 independent if statements has 1,024 possible paths.
For safety-critical industries like aviation, the DO-178C standard requires Modified Condition/Decision Coverage (MC/DC). This sits between branch coverage and full path coverage. Each condition must be shown to independently affect the outcome of the decision.
It’s the gold standard for test thoroughness, but unless you’re building avionics or medical software, you probably won’t need it.
Code Coverage Tools by Language

The tool you use depends on your language and ecosystem. Some languages have coverage built right in. Others rely on third-party libraries.
JavaScript and TypeScript
Istanbul (nyc) is the most widely used JavaScript coverage tool. It instruments your code at the source level and produces reports in every format you’d want.
Jest includes built-in coverage through V8’s engine. If you’re already using Jest for unit testing, adding --coverage to your command is all it takes.
For TypeScript projects, both tools work with minimal configuration since they operate on the compiled JavaScript output.
Python
Coverage.py is the standard. Pair it with pytest-cov and you’ve got coverage reports integrated directly into your test runs.
Took me a while to figure out that Coverage.py measures line coverage by default and you need to explicitly enable branch coverage with --branch. Easy to miss.
Java
JaCoCo is the go-to for Java projects. It works at the bytecode level, integrates with Maven and Gradle, and produces detailed reports covering line, branch, and method coverage.
Cobertura is an older option that still gets used in some legacy build pipelines, though JaCoCo has largely replaced it for new projects.
C and C++
gcov ships with GCC. llvm-cov works with the Clang toolchain. Both generate LCOV-formatted output that integrates with most CI reporting tools.
Go
Go has coverage built in. Run go test -cover and you get a percentage immediately. Run go test -coverprofile=coverage.out for detailed output you can visualize in a browser.
This is honestly one of Go’s underrated strengths. No third-party setup, no plugin management.
Coverage Reporting Platforms
| Platform | Purpose | Key Feature |
|---|---|---|
| SonarQube | Code quality + coverage | Multi-language support, quality gates |
| Codecov | Coverage tracking | Diff coverage, PR comments |
| Coveralls | Coverage history | Badge generation, trend tracking |
These platforms pull in data from your coverage tools and track trends over time. Codecov is especially useful for diff coverage, which measures only the lines changed in a pull request rather than the entire codebase.
Actually, scratch that. I already linked codebase above. Codecov is especially useful for diff coverage, which measures only the lines changed in a pull request rather than the full project.
What Percentage of Code Coverage Is Enough

This is the question everyone asks. And the answer is frustrating: it depends.
80% is the most common target. Atlassian, Sonar, and most industry guides converge on this number as a reasonable goal. Empirical research from Bullseye Testing Technology found that pushing coverage above 70-80% becomes increasingly time-consuming with diminishing returns in bug detection.
Google’s internal study surveyed 3,000 developers. Despite code coverage not being mandatory, over 90% of projects used automated coverage tools by Q1 2018 (Ivanković and Petrović, ACM 2019).
But only 45% of Google developers use code coverage frequently when writing code, while 40% check it during code reviews.
A broadly agreed-upon consensus, backed by empirical research according to Codecov, is 70 to 80 percent for most projects.
Martin Fowler put it this way: he’d expect coverage in the upper 80s or 90s from a team testing thoughtfully. But he’d be suspicious of 100%, because it often signals developers writing tests to satisfy a number rather than catch real problems.
When High Coverage Misleads
High coverage numbers create a false sense of security when the tests themselves are shallow.
A test that hits every line but never asserts the output is worthless. A test suite at 95% coverage that skips all error handling paths is hiding risk behind a good number.
A Microsoft Research study analyzing 100 large open-source Java projects found that coverage had insignificant correlation with the number of post-release bugs at the project level, and no such correlation at the file level.
That’s not an argument against measuring coverage. It’s an argument against treating the number as proof of quality.
What actually matters:
- Are the tests verifying behavior, not just executing lines?
- Are edge cases and error handling paths covered?
- Do tests break when the code changes in ways that affect users?
Teams practicing test-driven development tend to land in the 80-90% range naturally, because they write tests before code. The coverage is a byproduct of the practice, not the goal.
The CISQ (Consortium for Information & Software Quality) reported that poor software quality cost the US economy $2.41 trillion in 2022. Coverage alone won’t fix that. But used correctly, it flags the untested gaps where expensive production bugs tend to hide.
Why Low Code Coverage Happens
Low coverage is rarely a single problem. It’s usually a combination of technical debt, team habits, and missing infrastructure.
Legacy Codebases Without Tests
This is the most common cause. Code that was written years ago, before the team had software testing lifecycle practices in place, often has zero tests.
Adding tests to legacy code is hard because the code wasn’t designed to be testable. Functions do too many things. Dependencies are hardwired. State is shared everywhere.
VentureBeat reports that developers spend 20% of their time fixing bugs, roughly $20,000 per year in salary costs for the average US developer. Code without coverage makes this worse, because bugs surface later and cost more to fix.
Tight Coupling and Poor Testability
When classes and functions depend heavily on each other, isolating them for unit tests becomes painful.
Signs of tight coupling that block testing:
- Functions that directly instantiate their own dependencies instead of accepting them
- Business logic mixed into UI components or database layers
- Global state that changes behavior unpredictably between test runs
Code refactoring specifically for testability is often the only path forward. Techniques like dependency injection and mocking in unit tests make isolated testing possible, but they require time and buy-in from the team.
Time Pressure and Missing Enforcement
Sprints are tight. Deadlines are real. Tests get skipped.
Teams that spend 30-50% of sprint cycles fixing defects instead of building features (Aspire Systems research) often fell into that pattern precisely because earlier sprints skipped test coverage.
Without coverage gates in the CI pipeline, there’s no mechanism to prevent test-free code from reaching production. The coverage percentage drifts lower with every merge, and nobody notices until the bug reports start piling up.
Dead Code Inflating the Denominator
Sometimes the coverage number is artificially low because the codebase includes code nobody calls anymore.
Unused feature flags, deprecated modules, leftover experiment branches. They all count against your total, even though no test should cover them. The fix is simple: remove the dead code. But teams are often afraid to delete things in case something still depends on them.
Google addressed this at scale. Their coverage infrastructure identifies dependencies, and if tests for code A never reach code B (despite a declared dependency), automated tools can flag that dependency as potentially removable.
How to Improve Code Coverage

Raising coverage is straightforward in theory. Find the gaps, write the tests, lock in the gains. The tricky part is doing it without wasting time on low-value tests.
The Ratcheting Approach to Coverage Growth
Never let coverage drop. That’s the core idea behind ratcheting.
You set your current coverage percentage as the minimum threshold in CI. Every time someone adds tests and coverage ticks up, the threshold updates automatically. Coverage can only go in one direction: up.
Tools like jest-ratchet for JavaScript and Cobertura’s plugin for Jenkins support this pattern natively. Your build pipeline rejects any merge that brings the number down.
GitHub used a similar ratcheting strategy during their Rails upgrade process. They ran parallel builds, one on the current version and one on the upgrade target, and only enforced the new standard once it stabilized.
TestDevLab recommends aiming for 8-10% coverage improvement per sprint as a realistic incremental goal.
Refactoring for Testability
Some code just can’t be tested as-is. You have to change the structure before you can write a meaningful test.
Common patterns that block testing:
- Business logic buried inside UI components or controller layers
- Hard-coded dependencies that can’t be swapped with test doubles
- Functions that read from and write to global state
The fix is usually to extract logic into smaller, isolated functions that take inputs and return outputs. Dependency injection makes it possible to replace real services with fakes during test execution.
According to IBM, fixing a bug during implementation costs about 6 times more than catching it during design. Code that’s testable catches bugs earlier by default.
Prioritize by Risk, Not by File
Not every file deserves the same testing effort.
Start with the code paths that handle payments, authentication, data writes, or anything where failure directly costs money or trust. The file that encrypts user data needs close to 100% coverage. The file that formats a date label probably doesn’t.
A Stack Overflow analysis noted that treating every file equally at 80% coverage leads teams to write low-value tests for trivial code while ignoring the remaining 20% of higher-risk files.
Code Coverage in CI/CD Pipelines
Coverage is most useful when it’s automated. Running it locally tells you about your machine. Running it in CI tells you about the project.
| CI/CD Component | Coverage Function | Tool Examples |
|---|---|---|
| PR gate | Block merges below threshold | Codecov, SonarQube |
| Diff coverage | Measure only changed lines | Codecov, Coveralls |
| Trend tracking | Monitor coverage over time | SonarQube dashboards |
| Badge generation | Public accountability signal | Coveralls, Shields.io |
Setting Coverage Gates
Coverage gates fail the build when coverage drops below a defined threshold.
Atlassian recommends setting the failure threshold slightly below your goal. If you’re targeting 80%, set the gate at 70% as a safety net. Setting it too high will cause frequent build failures and push developers toward writing shallow tests just to pass the check.
The CD Foundation’s 2024 State of CI/CD Report found continued high adoption of DevOps practices and the growing importance of integrating quality checks directly into CI/CD workflows.
GitHub Actions and GitLab CI both support coverage thresholds through configuration files. The setup usually takes under 30 minutes.
Diff Coverage vs. Total Coverage
Diff coverage measures only the lines changed in a pull request, not the entire codebase.
This is the metric that actually changes developer behavior. Seeing “your PR has 45% coverage” on a specific changeset is more actionable than seeing “the project is at 78%.”
Google’s coverage infrastructure applies coverage at the changeset level during code review. This is what made the metric useful at their scale, processing coverage for one billion lines of code daily across seven languages.
Codecov and Coveralls both generate automatic PR comments showing exactly which new lines lack test coverage.
Dealing with Flaky Tests in Coverage Data
Flaky tests corrupt coverage data because they produce inconsistent results across runs.
Google’s data shows 84% of pass-to-fail transitions in their CI system were caused by flaky tests, not actual bugs. An ICST 2024 case study found developers spend about 2.5% of productive time dealing with flaky tests.
An estimated 15-30% of all automated test failures come from test flakiness rather than real bugs, according to a CloudQA analysis.
If your coverage numbers fluctuate between runs without any code changes, flaky tests are likely the cause. Quarantine them in a separate test suite until they’re fixed.
Limitations of Code Coverage as a Quality Metric
Coverage measures execution. It does not measure correctness. Those are two very different things.
A test can execute every single line and assert absolutely nothing. A test suite can hit 95% and still miss every edge case in your error handling. This gap between “code that ran” and “code that was actually verified” is where real bugs survive.
Martin Fowler put it plainly: test coverage is useful for finding untested parts of a codebase, but of little use as a numeric statement of how good your tests are.
What Coverage Cannot Tell You
Coverage is blind to these problems:
- Missing assertions (the test runs the code but never checks the output)
- Untested edge cases like null inputs, empty arrays, or boundary values
- Concurrency bugs that only appear under specific thread timing
- Logic errors where the wrong answer is produced but matches a weak assertion
A Microsoft Research study of 100 large open-source Java projects found no correlation between file-level coverage and post-release bug counts. Coverage at the project level showed only insignificant correlation with defect rates.
Mutation Testing as a Complement
Mutation testing directly measures whether your tests actually detect changes in behavior.
The tool introduces small modifications to your source code (called mutants), like changing a >= to > or removing a return statement. Then it runs your test suite. If a test fails, the mutant is “killed.” If all tests still pass, your test suite has a blind spot.
| Mutation Tool | Language | Key Feature |
|---|---|---|
| PIT (Pitest) | Java | Fast, widely adopted |
| Stryker | JavaScript, .NET | Multi-language support |
| mutmut | Python | Minimal configuration |
A developer on DEV Community shared a case study with 93% line coverage but only 58.62% mutation score, a 34-point gap. After three rounds of targeted assertion improvements, both metrics converged at 93%.
That gap, between coverage and mutation score, is where bugs survive your CI pipeline and make it to production.
Coverage tells you what ran. Mutation score tells you what was verified. Use both.
Code Coverage vs. Test Effectiveness
High coverage is not the same thing as effective testing. This is the distinction most teams get wrong.
What Test Effectiveness Measures
Test effectiveness answers a specific question: does the test catch bugs when code changes?
A test that runs 50 lines and checks nothing is “covered” code. But it’s useless for catching regressions. The test adds to your coverage number without adding to your safety net.
| Metric | Measures | Limitation |
|---|---|---|
| Code coverage | Lines/branches executed | Ignores assertion quality |
| Mutation score | Tests that detect code changes | Slower to compute |
| Defect detection rate | Bugs caught before release | Lagging indicator |
Li et al. found in a comparative study that mutation testing exposed more faults and required fewer tests than path coverage, edge pair coverage, or all-uses testing. Coverage quantity doesn’t equal detection quality.
Using Both Metrics Together
Coverage and mutation testing answer different questions. Coverage tells you where you haven’t tested. Mutation testing tells you where your tests are too weak.
The practical workflow:
- Use coverage reports to find untested code areas
- Prioritize tests for high-risk paths first
- Run mutation testing on covered code to check assertion strength
- Fix surviving mutants by adding or strengthening assertions
Teams practicing behavior-driven development tend to produce tests with stronger assertions because each test maps to a specific behavioral expectation rather than just executing code.
The software verification side of quality assurance benefits most when coverage and mutation testing work together, because you’re not just running the code, you’re confirming it behaves correctly.
Coverage is a useful negative indicator. Low coverage means low test quality, almost always. But high coverage is not a useful positive indicator. It might mean quality, or it might mean someone wrote tests to hit a number.
Use coverage to find the gaps. Use mutation testing to prove the tests actually work. That combination gives you a realistic picture of software reliability.
FAQ on What Is Code Coverage
What is the difference between code coverage and test coverage?
Code coverage measures the percentage of source code executed during testing. Test coverage is broader, evaluating whether all features, scenarios, and requirements have been tested. One tracks execution. The other tracks completeness.
What is a good code coverage percentage?
Most teams target 80% as a reasonable goal. Google considers 60% acceptable and 90% exemplary. Pushing past 90% often produces diminishing returns, with developers writing shallow tests just to hit a number.
Does 100% code coverage mean no bugs?
No. A test can execute every line without verifying a single output. Coverage measures execution, not correctness. You can reach 100% with assertion-free tests that catch nothing. Mutation testing is a better indicator of test quality.
What are the main types of code coverage metrics?
The most common types are line coverage, branch coverage, function coverage, and condition coverage. Each tracks a different dimension. Branch coverage checks both sides of conditionals, which catches more issues than line coverage alone.
Which tools measure code coverage?
It depends on the language. Istanbul and Jest work for JavaScript. Coverage.py handles Python. JaCoCo is the standard for Java. gcov and llvm-cov cover C/C++. Go has coverage built into its test command natively.
How is code coverage calculated?
Coverage tools instrument your source code or bytecode with tracking markers. When your test suite runs, the tool records which lines, branches, and functions executed. The percentage is calculated by dividing executed elements by total elements.
Can code coverage be integrated into CI/CD pipelines?
Yes. Most teams set coverage gates that fail the build if coverage drops below a defined threshold. Platforms like Codecov and SonarQube generate automatic PR comments showing which new lines lack tests.
What causes low code coverage?
Legacy codebases without tests, tightly coupled code that’s hard to isolate, time pressure during sprints, and dead code inflating the denominator. Missing coverage enforcement in the CI pipeline lets untested code merge without anyone noticing.
What is diff coverage?
Diff coverage measures only the lines changed in a specific pull request, not the entire project. It’s more actionable than total coverage because it shows whether new code is being tested before it gets merged.
How does mutation testing relate to code coverage?
Mutation testing checks whether your tests detect small code changes. Coverage tells you what ran. Mutation score tells you what was actually verified. A project can have 93% coverage but only a 58% mutation score, exposing weak assertions.
Conclusion
Understanding what is code coverage comes down to one thing: knowing which parts of your source code your tests actually execute. The metric itself is simple. Using it well is where most teams struggle.
Coverage reports from tools like Istanbul, JaCoCo, or Coverage.py show you the gaps. Branch coverage and condition coverage reveal the decision paths your tests skip. Diff coverage keeps new code accountable at the PR level.
But the number alone doesn’t guarantee software reliability. Pair it with mutation testing to verify your assertions actually catch regressions.
Set a realistic threshold, enforce it in your CI pipeline, and ratchet it upward over time. Coverage works best as a diagnostic signal, not a target to chase.
- What Happens When You Offload an App on iPhone - May 9, 2026
- How to Use Digital Wellbeing on Android - May 8, 2026
- Why Buyers Trust a Well-Built Data Room - May 7, 2026







