Continue on Error Critical Job
Continue-on-Error Critical Job
Description
Marking a critical deployment or verification step with continue-on-error: true hides failures—CI passes even when that step fails, so broken releases or incomplete security checks reach production. GitHub advises using explicit if: conditions for optional steps instead of swallowing errors. 1
Vulnerable Instance
- Critical job (build, test, deploy) sets
continue-on-error: trueon key steps. - Downstream jobs rely on artifacts or state from that step.
- Failure logs are ignored, so branch protection sees a green check even when the step fails.
name: Deploy
on: workflow_dispatch
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run database migrations
continue-on-error: true
run: ./scripts/migrate.sh
- name: Deploy app
run: ./scripts/deploy.shIf migrate.sh fails, the job still reports success and continues with deploy, leaving databases inconsistent.
Mitigation Strategies
- Remove
continue-on-errorfrom critical steps
Fail fast so branch protection and humans see the error. - Use conditional execution for optional checks
Replacecontinue-on-errorwithif: failure()or dedicated jobs that can fail independently. - Add retries instead of ignoring errors
Wrap flaky commands with retry logic or backoff scripts. - Emit explicit status artifacts
If a step is optional, write clear logs/artifacts and gate downstream jobs on their presence. - Document exception handling
If swallowing errors is unavoidable, explain the rationale in comments and alert channels.
Secure Version
name: Deploy (Safe)
on: workflow_dispatch
jobs:
+ validate:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v4
+ - name: Smoke tests
+ run: npm run test:smoke
+ - name: Optional telemetry
+ if: ${{ always() }}
+ run: ./scripts/report-telemetry.sh || echo "Telemetry failed"
+
release:
+ needs: validate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run database migrations
- continue-on-error: true
run: ./scripts/migrate.sh
- name: Deploy app
run: ./scripts/deploy.sh
Impact
| Dimension | Severity | Notes |
|---|---|---|
| Likelihood | Teams often enable continue-on-error temporarily and forget to remove it. | |
| Risk | Hidden failures push broken builds or skip security gates. | |
| Blast radius | All downstream deployments, migrations, or releases rely on the silently failing step. |
References
- GitHub Docs, “jobs.<job_id>.steps[].continue-on-error,” https://docs.github.com/actions/using-jobs/using-jobs-in-a-workflow#jobsjob_idstepscontinue-on-error 1
- GitHub Docs, “Workflow syntax for GitHub Actions,” https://docs.github.com/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepscontinue-on-error
GitHub Docs, “jobs.<job_id>.steps[].continue-on-error,” https://docs.github.com/actions/using-jobs/using-jobs-in-a-workflow#jobsjob_idstepscontinue-on-error ↩︎ ↩︎
Last updated on