Skip to main content
Image Lifecycle Strategies

Beyond the Build: Conceptualizing Image Lifecycles as State Machines in Vivido Workflows

Every image starts somewhere—a commit, a build trigger, a manual upload. But what happens after that first docker push ? In many teams, the answer is fuzzy. The image is "in production" until someone remembers to clean it up, or it gets overwritten by a newer tag, or a security scan runs weeks late. The lack of clear states leads to incidents: a vulnerable image running in production because no one knew it was still active, or a rollback that pulls an image that was already garbage-collected. This guide proposes a different mental model: treat each image as a state machine. At any moment, an image is in exactly one state— Building , Scanned , Staged , Active , Deprecated , Retired —and transitions between states happen only through defined actions. By mapping your Vivido workflows to this state machine, you gain clarity, auditability, and automation opportunities. Let's explore how.

Every image starts somewhere—a commit, a build trigger, a manual upload. But what happens after that first docker push? In many teams, the answer is fuzzy. The image is "in production" until someone remembers to clean it up, or it gets overwritten by a newer tag, or a security scan runs weeks late. The lack of clear states leads to incidents: a vulnerable image running in production because no one knew it was still active, or a rollback that pulls an image that was already garbage-collected.

This guide proposes a different mental model: treat each image as a state machine. At any moment, an image is in exactly one state—Building, Scanned, Staged, Active, Deprecated, Retired—and transitions between states happen only through defined actions. By mapping your Vivido workflows to this state machine, you gain clarity, auditability, and automation opportunities. Let's explore how.

Why This Topic Matters Now

Containerization and microservices have made image lifecycle management more complex than ever. A single application might have dozens of image variants—different architectures, base images, configuration layers—each with its own lifecycle. Without explicit state management, teams rely on naming conventions and manual checks to determine what is safe to deploy or delete. This approach breaks under scale.

Consider a typical incident: a developer pushes a fix, the CI pipeline builds and tags the image as latest, and the orchestrator pulls it. Later, a security scan reveals a critical vulnerability in the base layer. The team scrambles to find which environments are running that image, whether older versions are still in use, and whether the scan result applies to all tags. In a state-machine model, each image version would have a clear Scanned state with a pass/fail attribute, and the orchestrator would only pull images in the Active state after passing scan. The vulnerability would be caught before deployment, not after.

Another driver is compliance. Regulations like SOC 2 or FedRAMP require evidence that only approved images run in production. A state machine generates an immutable audit trail: every transition is logged with a timestamp and actor. Auditors can see that image abc123 moved from Building to Scanned to Staged to Active, and never went back to Building after being promoted. This is much harder to fake or misread than a tag-based system where tags can be overwritten.

Finally, the rise of GitOps and policy-as-code (e.g., OPA, Kyverno) aligns naturally with state machines. Policies can reference image states directly: "Only allow deployments from images in state Active with scan score ≥ 9.0." Vivido workflows can enforce these policies at transition points, creating a closed-loop system where images cannot accidentally drift into an invalid state.

Core Idea in Plain Language

A state machine is a simple concept: an object can be in one of a finite set of states, and it moves between states only when a specific event occurs. Think of a traffic light: it can be green, yellow, or red, but never green and red at the same time. The events (timer, sensor) trigger transitions. Image lifecycles work the same way.

In Vivido, an image lifecycle might have these states:

  • Building – The image is being created by a CI pipeline. It is not yet ready for any use.
  • Scanned – The image has passed a security and compliance scan. It is safe to deploy to staging.
  • Staged – The image is deployed to a staging environment and undergoing integration tests.
  • Active – The image is running in production, serving traffic.
  • Deprecated – The image is superseded by a newer version. It may still run but should be replaced.
  • Retired – The image is no longer in use and can be deleted.

Transitions between states happen through events: build complete moves from Building to Scanned; scan passed moves from Scanned to Staged; deploy to prod moves from Staged to Active; new version available moves from Active to Deprecated; all instances replaced moves from Deprecated to Retired. Each transition can have preconditions: for example, you cannot move from Scanned to Staged unless the scan score is above a threshold.

What makes this powerful is that the state machine is explicit and machine-readable. Instead of relying on tags like v1.2.3-prod or latest, the state is stored as metadata in your image registry or Vivido workflow engine. Tools can query the state directly: "List all images in Active state" or "Show images that have been in Deprecated for more than 30 days." This eliminates guesswork.

The model also handles failure gracefully. If a scan fails, the image stays in Building (or moves to a Failed state). It never reaches Staged. If a deployment to production fails, the image remains in Staged. Rollback becomes a matter of promoting an earlier image back to Active, not re-tagging or rebuilding.

How It Works Under the Hood

Implementing image lifecycles as state machines in Vivido requires a few key components: a state store, a transition engine, and integration points. Let's break each down.

State Store

The state store holds the current state of every image version. This can be a database table, a key-value store, or even annotations on the image manifest in a registry like Harbor or ECR. Vivido workflows can use its built-in workflow state mechanism, or you can use an external store like Redis or PostgreSQL. The important thing is that the state is durable and queryable.

Each image is identified by its digest (SHA256 hash), not its tag. Tags can change, but digests are immutable. The state store maps digest → current state + transition history. For example:

digest: sha256:abc123...
state: Active
history: [
{ from: Building, to: Scanned, at: 2024-01-15T10:00:00Z, by: CI-pipeline },
{ from: Scanned, to: Staged, at: 2024-01-15T10:05:00Z, by: deploy-bot },
{ from: Staged, to: Active, at: 2024-01-15T10:10:00Z, by: deploy-bot }
]

Transition Engine

The transition engine is a service (or a set of Vivido workflow steps) that validates and executes state changes. It checks preconditions, updates the state store, and triggers side effects like sending notifications or updating a deployment manifest.

Preconditions are rules that must be true for a transition to be allowed. For example:

  • Transition Scanned → Staged requires that the image has a scan result with severity < critical.
  • Transition Staged → Active requires that the staging tests passed and that the deployment target (e.g., Kubernetes namespace) is approved.
  • Transition Active → Deprecated requires that a newer image is in Active state.

If a precondition fails, the transition is blocked and an error is logged. This prevents invalid states from ever occurring.

Integration Points

The state machine integrates with your existing tooling through webhooks, CLI commands, or Vivido workflow triggers. For instance:

  • When a CI pipeline finishes building, it calls the transition engine to move the image from Building to Scanned.
  • A scanner (e.g., Trivy, Snyk) runs and then triggers the transition to Staged or Failed.
  • A deployment tool (e.g., ArgoCD, Spinnaker) checks the image state before deploying; if the state is not Active, it refuses.

Vivido's workflow engine excels here because you can model these transitions as steps in a larger pipeline. A single workflow might: build, scan, wait for approval, deploy to staging, run tests, then deploy to production—all while updating the state machine at each step.

Worked Example: Container Image Promotion Pipeline

Let's walk through a concrete example using Vivido workflows. We have a microservice called payment-api. Its image lifecycle states are: Building, Scanned, Staged, Active, Deprecated, Retired. We'll define a workflow that promotes an image from development to production.

Step 1: Build

A developer pushes code to the main branch. The CI pipeline (GitHub Actions) builds the image and pushes it to the registry with digest sha256:xyz. The pipeline then calls a Vivido workflow webhook with the digest and the event build_complete. The workflow updates the state store: sha256:xyz → state: Building. No other action is taken yet.

Step 2: Scan

The Vivido workflow triggers a security scan (using a built-in step or external service). The scan completes with a score of 9.5 (pass). The workflow moves the image to Scanned. If the scan had failed, the workflow would move it to a Failed state and notify the developer.

Step 3: Deploy to Staging

The workflow now deploys the image to a staging Kubernetes cluster. It updates the deployment manifest with the new digest. Once the pods are running and health checks pass, the workflow moves the image to Staged. It also runs integration tests against the staging environment.

Step 4: Approve and Deploy to Production

The workflow pauses for a manual approval step. A release manager reviews the test results and approves. The workflow then deploys the image to the production cluster. After confirming the deployment, it moves the image to Active. The previous active image (say sha256:abc) is moved to Deprecated automatically.

Step 5: Cleanup

A scheduled workflow runs daily, finding images in Deprecated state for more than 30 days. It moves them to Retired and eventually deletes them from the registry. The entire lifecycle is tracked.

This example shows how the state machine enforces order: you cannot skip from Building to Active, and you cannot deploy an image that hasn't been scanned. The workflow is self-documenting; an auditor can see exactly when each transition occurred.

Edge Cases and Exceptions

No model is perfect. Here are common edge cases you'll encounter and how to handle them.

Rollback

What if the new production image has a bug? You need to revert to the previous Active image. In a state machine, rollback is a transition from Active back to Active (or from Deprecated back to Active). The previous image's state changes from Deprecated to Active, and the current image moves to Deprecated. This is straightforward if you keep the history. However, you must ensure that the previous image still meets current compliance requirements (e.g., its scan might be outdated). A good practice is to re-scan the image before promoting it back to Active, or at least log that a rollback occurred without re-scan.

Partial Deployments (Canary, Blue-Green)

In a canary deployment, an image is running in production but only on a subset of instances. Should it be considered Active? One approach is to introduce a Canary state between Staged and Active. The image moves to Canary when it's deployed to a subset, then to Active after the canary passes. Alternatively, you can keep the state as Active but add a metadata field indicating the deployment percentage. The state machine should be flexible enough to accommodate your deployment strategy without adding too many states.

Multi-Region Replication

If your registry replicates images across regions, the state must be synchronized. A single image digest may be present in multiple regions, but its lifecycle state should be global. Use a central state store that all regions write to. When a transition occurs (e.g., from Staged to Active), the state update propagates to all regions. If a region is temporarily offline, the transition can be queued and retried.

Image Re-tagging

Users sometimes re-tag an existing image (e.g., tag latest to a new digest). This does not change the state of any image—it's just a labeling change. The state machine should ignore tag updates; it tracks digests. However, you might want to trigger a transition when a specific tag is updated (e.g., when latest points to a new digest, the old digest should be deprecated). This can be handled by a policy that watches tag changes and initiates state transitions accordingly.

Manual Override

Sometimes an operator needs to force a transition (e.g., to retire an image that is still running in an emergency). The state machine should allow overrides with proper logging and approval. Each override should record who did it and why. Overrides should be rare and auditable.

Limits of the Approach

State machines bring clarity, but they are not a silver bullet. Here are some limitations to consider.

Complexity at Scale

If you have hundreds of microservices, each with its own lifecycle, managing state machines for every image version can become unwieldy. You need a robust state store that can handle high write throughput and queries. Also, defining all possible states and transitions upfront is hard; you may need to iterate as your deployment process evolves.

State Explosion

Adding too many states (e.g., Building-x86, Building-arm, Scanned-low, Scanned-medium, Scanned-high) makes the model hard to reason about. Stick to a minimal set of states that map to meaningful lifecycle phases. Use metadata (like architecture or scan score) as attributes, not separate states.

Race Conditions

If multiple processes try to transition the same image simultaneously, you can get inconsistent states. Use optimistic locking or a distributed lock to ensure atomic transitions. Vivido workflows can help by serializing transitions for a given image.

Legacy Images

Images that were built before the state machine was introduced have no state. You need a migration strategy: either assign them a default state (e.g., Active if they are running, or Unknown) and require manual review, or simply ignore them and let the state machine only track new images. The latter is simpler but leaves blind spots.

Tooling Integration

Not all tools understand state machines. Your CI/CD pipeline, registry, and orchestrator may need custom plugins or webhooks to read and write state. This requires development effort. However, once integrated, the benefits often outweigh the cost.

Reader FAQ

Q: How do I handle images that are built but never deployed?
A: They remain in Building or Scanned state. A scheduled cleanup workflow can move them to Retired after a timeout (e.g., 7 days).

Q: Can I use tags as states instead of a separate state machine?
A: Tags are mutable and ambiguous. A tag like v1.2.3-prod might be overwritten. A state machine provides an immutable, auditable record. We recommend using both: tags for human readability, states for machine enforcement.

Q: What if my scan tool reports different results over time (e.g., new CVEs discovered)?
A: You can re-scan images periodically. If a previously scanned image now fails, you can transition it back to a Failed or Quarantined state, and alert the team. The state machine supports this by allowing transitions from any state to a failure state.

Q: How do I model multi-stage approvals?
A: Add intermediate states like AwaitingSecurityApproval or AwaitingQA. Each approval step moves the image to the next state. This makes the approval process explicit and auditable.

Q: Is this approach only for container images?
A: No. You can apply state machines to VM images, AMIs, firmware images, or any artifact that has a lifecycle. The principles are the same.

Practical Takeaways

Start small. Pick one service or image type and model its lifecycle with 4–6 states. Use Vivido workflows to implement transitions. Monitor for a few weeks and adjust states as needed. Once you see the benefits—fewer deployment incidents, easier audits, clearer communication—expand to other services.

Here are three specific next moves:

  1. Map your current lifecycle. List all the stages an image goes through from creation to deletion. Identify where ambiguity or manual steps exist. Those are candidates for state machine states.
  2. Define transition rules. For each state change, write down the preconditions and the actions that trigger it. Share this with your team for feedback.
  3. Implement a prototype. Use Vivido's workflow builder to create a simple pipeline for one image. Include state transitions and a basic state store (e.g., a JSON file or a small database). Run a few test cycles to see if the model holds.

State machines are not a new idea, but applying them to image lifecycles is a practical step toward more reliable and auditable deployments. The upfront investment pays off every time you avoid a production incident or pass an audit with clear evidence.

Share this article:

Comments (0)

No comments yet. Be the first to comment!