How to Eliminate Flaky Tests: Proven Methods from Google & Microsoft

Flaky tests — automated tests that sometimes pass and sometimes fail without any changes to the code — are one of the biggest pain points in modern test automation. They consume time, slow down CI/CD pipelines, and erode confidence in test results, forcing teams to rerun tests or ignore failures that might be real issues. Reliable tests are essential for fast delivery and trustworthy quality indicators in automation.

What Makes Tests Flaky?

Before diving into solutions, it’s important to understand common root causes of flaky tests. Flakiness often stems from:

Timing issues and asynchronous operations, where tests fail because UI or APIs respond at variable times.
Shared state or dependencies between tests, which can cause one test to fail depending on others.
Unstable environments, such as inconsistent CI configs or network delays.
Poor selector strategies in UI tests (e.g., dynamic IDs), which break when the app changes.

These factors make automated tests unreliable, leading to wasted time diagnosing failures that aren’t real problems.

Here are the key strategies QA teams use to reduce flaky tests in automated suites:

1. Stabilize Test Design

Unreliable tests often come from poor structure or timing assumptions. Address this by:

Replacing hard waits (sleep) with explicit or dynamic waits so tests wait for conditions instead of fixed time.
Using stable selectors (e.g., IDs, custom attributes) so tests don’t break when the UI layout changes.
Refactoring tests to be independent, with clear setup and teardown steps and no shared state.

2. Improve Test Isolation

Test isolation ensures that one test doesn’t influence another. Methods include:

Clearing application state between tests (e.g., resetting databases or cache).
Using mocks and stubs for external services so tests don’t depend on unstable APIs.
Running each test in a clean container or virtual environment for consistency.

Isolation often drastically reduces flakiness because tests behave the same every time they run.

3. Retry and Quarantine Strategies

Retries and quarantines work well as short-term mitigation techniques:

Automatic retries: Configure your CI pipeline to rerun a failed test a few times before marking it as a real failure. This catches transient failures caused by timing or environmental hiccups.
Quarantine flaky tests: Temporarily remove flaky tests from the main suite and run them separately until they’re fixed. Microsoft follows this pattern internally, separating flaky tests so they don’t block the rest of the test suite.

Both strategies reduce noise in automated results, helping the team focus on real regressions.

Google’s Approach

Google emphasizes test isolation and reruns to assess whether a failure represents a real issue or flakiness. They also invest in tooling that tracks flaky patterns and uses retry logic internally to confirm whether failures are consistent.

Their internal testing philosophy also involves analyzing where flakiness can originate — from the test, the framework, the system under test, or the environment — and focusing on eliminating those root causes rather than just handling symptoms.

Microsoft’s Strategy

Microsoft has built infrastructure to infer, track, and manage flaky tests at scale:

They use data from test execution telemetry to identify tests that fail intermittently.
Flaky tests are quarantined so they don’t block CI/CD pipelines, and detailed bug reports are automatically filed to help developers fix the test.
Once a fix is verified, the test is re-introduced into the main suite, preventing coverage loss.

This approach balances preventing flaky tests from damaging pipeline confidence with ensuring flaky tests are still addressed proactively.

Successful teams also incorporate the following into their workflows:

Prioritizing critical tests: Focus automation on the most important paths; flaky tests in low-value areas can be temporarily disabled until stable.
Continuous refactoring: Regularly clean up and improve test code to avoid complexity and brittleness that leads to flakiness.
Use of advanced tools: Some teams use tools or plugins that automatically detect patterns of flaky behavior over time and help with analytics.