Workflow cost optimization

This guide provides strategies for optimizing costs associated with workloads running on Temporal Cloud while maintaining Workflow reliability and observability.

Overview

Temporal Cloud uses consumption-based pricing with two primary cost components: Actions and Storage. Optimization opportunities vary significantly based on your workload characteristics - Workflows with high signal volume face different cost drivers than long-running Workflows with large payloads.

info

Build Workflows following best practices first, then optimize based on observed costs. Premature optimization can compromise observability and create operational challenges.

Every optimization involves tradeoffs. This guide helps you make informed decisions about where and how to optimize based on your specific requirements.

Should you need additional guidance on Workflow design considerations, please reach out to a Temporal Solutions Architect.

Common anti-patterns

Avoid these patterns that either inflate costs unnecessarily or create problems through aggressive optimization:

Premature Activity consolidation

Combining Activities before understanding failure modes reduces observability and retry control. Activities should be split based on failure boundaries and retry requirements, not cost optimization alone. See How many Activities should I use in my Temporal Workflow for a decision framework.

Inappropriate use of Local Activities

Using Local Activities for all operations without understanding their failure semantics and limitations. Local Activities don't provide Worker-level isolation and have different retry behavior. See Local Activities for guidance.

Missing Continue-As-New

Long-running Workflows that don't implement Continue-As-New accumulate large Event Histories, increasing storage costs and impacting performance. Workflows running days or weeks or processing thousands of events require Continue-As-New.

High volume of Activity retries

Generally, the default values for Activity retries are quite good. However, excessive Activity retries often indicate underlying issues like timeouts that are too short or Activities that frequently fail. Detect Activity retry frequency and if high, consider increasing retry intervals or Activity timeouts before failures occur. See Spooky Stories: Chilling Temporal Anti-Patterns for guidance on retry defaults and patterns.

Large payloads in Workflow History

Passing multi-megabyte payloads through Workflows when external storage (S3, blob storage) is more appropriate. Use compression or the claim check pattern for large data.

Over-optimization at the expense of observability

Aggressively optimizing costs without maintaining sufficient visibility for debugging and operational needs. Balance cost reduction with your team's observability requirements.

Excessive Activity Heartbeats

Each Heartbeat counts as 1 Action. Only use Heartbeats for long-running Activities (10+ minutes) where you need to detect Worker failures and track progress. Short-running Activities that complete in seconds or minutes don't need Heartbeats. See Activity Heartbeat documentation for guidance.

Understanding cost drivers

Temporal Cloud pricing consists of Actions, Storage, and Support. If you are new to Temporal Cloud, see the pricing documentation to learn more and familiarize yourself with what results in a billable Action in Temporal Cloud.

Cost distribution

For most workloads, Actions represent the majority of total costs, with storage typically accounting for 10% or less of a monthly bill. Focus optimization efforts on what's driving costs with a specific workload:

High Actions costs generally indicate:

Many Activities per Workflow
Frequent Signals, Queries, or Updates
Long-running Activities with Heartbeats
High Activity retry rates
Extensive Query usage

High Storage costs generally indicate:

Large payloads in Workflow inputs, outputs, or Activity results
Long retention periods with high Workflow volume
Long-running Workflows without Continue-As-New
Workflows accumulating large Event Histories

Optimization priority

Actions optimization: Usually provides the largest cost reduction opportunity
Active Storage optimization: Relevant for long-running Workflows or large payloads
Retained Storage optimization: Relevant for high volume combined with long retention periods

Measuring

Establish baseline metrics before optimizing and be sure to validate impact after implementation. Specifically:

Actions consumption (per Workflow, per day/month, by Namespace)
Storage consumption (Active and Retained)
Monthly costs (total, per Namespace, per Workflow Type)
Observability metrics (time to debug, incident detection)

Actions optimization

Actions encompass Workflow operations, Activity Executions, Signals, Queries, and other interactions with Temporal. Each represents a unit of consumption.

Activity granularity

Activity granularity is a fundamental architectural decision that impacts both costs and observability. More Activities provide better visibility and retry control but increase the Action count. Fewer Activities reduce costs but limit observability.

For detailed discussion of this tradeoff, see How many Activities should I use in my Temporal Workflow?

Child Workflows vs Activities

Child Workflows cost 2 Actions compared to an Activity's 1 Action. See Child Workflows documentation for detailed comparison of capabilities and use cases.

Retry Policies

Each Activity retry counts as 1 Action. Default Retry Policies can be aggressive, which is appropriate for most operations but costly for expensive external operations. Consider limiting maximum attempts, increasing initial intervals, and using larger backoff coefficients for expensive operations while maintaining standard retry behavior for normal operations.

For particularly expensive operations, consider using next retry delay to dynamically control retry timing based on failure types, or implement an Activity pause pattern to wait for manual intervention rather than automatic retries.

Refer to this blog post on Mastering Workflow retry logic for resilient applications for additional guidance.

Local Activities

A Local Activity is an Activity Execution that executes in the same process as the Workflow Execution that spawns it. Therefore, multiple Local Activities that run back-to-back only count as a single billable action, whereas each regular Activity counts as a billable action. However, there are tradeoffs to converting regular Activities to Local Activities. For example, if a specific Local Activity fails, all of them will be retried together. Review the docs or reach out to your account team to learn more.

When to stick with Regular Activities

Use Regular Activities instead of Local Activities if you require any of the following:

Activities may take more than 10 seconds to complete
Independent retry control for each Activity
Need to avoid re-running expensive Activities when unrelated Activities fail
Immediate Signal/Update handling during execution
Separate resource management (like rate limits) for each Activity

Batching operations

Search Attributes

Search Attributes provided at Workflow start do not count as billable Actions. If Search Attribute values are known before starting the Workflow, provide them at Workflow start to eliminate these costs entirely.
For Search Attributes that must be updated during Workflow Execution, each UpsertSearchAttributes call counts as 1 Action regardless of how many attributes are updated. Batch multiple related attribute updates into single operations to reduce Actions consumed.

See the Temporal Cloud Action Documentation for details.

Signal handling

Where feasible, implement deduplication logic client-side or aggregate data into fewer Signals. Use SignalWithStart instead of separate StartWorkflow and SignalWorkflow calls when initiating Workflows with Signals.

Storage optimization

Storage costs are divided into Active Storage (open Workflows) and Retained Storage (closed Workflow History during retention period). Active Storage is significantly more expensive than Retained Storage.

Active Storage

Active Storage applies to open Workflows and their Event Histories. The following sections detail optimization opportunities for Active Storage.

Continue-As-New

For long-running Workflows with extended sleep/wait periods, calling Continue-As-New before sleeping closes the current execution (moving to cheaper Retained Storage) and starts fresh when work resumes, reducing Active Storage costs. See Continue-As-New documentation to learn more.

Compression

Large payloads increase Active Storage costs. Implement custom DataConverter with compression for moderately large payloads (100KB-1MB). See the Data Converter documentation to learn more.

Claim check pattern

For very large payloads or binary data, store data externally (S3 or GCS) and pass references through Workflows.

Retained Storage

Retained Storage applies to closed Workflow History during the retention period. The following sections detail optimization opportunities for Retained Storage.

Retention periods

Default Namespace retention is 30 days (configurable 1-90 days). Adjust based on operational and compliance requirements.

Considerations:

Shorter retention reduces costs but limits historical analysis
Audit investigation patterns before shortening retention
Ensure compliance requirements are met

See Namespace retention documentation for configuration details.

Workflow export

Temporal Cloud supports exporting Workflow Histories to external storage for compliance while maintaining shorter retention periods. Note that Workflow export costs 1 Action per export.

See Workflow History export documentation for more details. Alternatively, if you are looking to do analysis on closed Workflow Executions, review this blog post to learn how to gain insights from exported Workflow Histories.

Validation

Validation approach

Test in non-production: Validate functional correctness before production deployment
Monitor comprehensively: Leverage the Usage dashboard in the Cloud UI to track the impact on Actions and Storage after optimizations are made
Progressive rollout: Deploy to small percentage, validate, then expand. Review Worker Versioning documentation to learn about rolling out changes to Workflows
Continuous review: Re-evaluate optimization effectiveness quarterly as system evolves

Success criteria

Cost reduced without increasing mean time to repair (MTTR)
Workflow success rates maintained or improved
Reduced observability does not increase mean time to detect (MTTD) incidents

Tools

Temporal Cloud Usage dashboard for Actions and Storage metrics
Workflow History for per-Workflow billable Actions estimates
Export metrics to observability platforms (Datadog, Grafana, etc.) for custom monitoring

When to get help

Engage the Temporal team for Workflow audits when experiencing:

Complex Workflow patterns with unclear optimization paths
Compliance requirements limiting optimization options
Need for custom DataConverters or advanced patterns
Desire for expert validation of optimization strategies

Contact your Temporal Account Representative or Temporal support to discuss optimization services.

Overview​

Common anti-patterns​

Premature Activity consolidation​

Inappropriate use of Local Activities​

Missing Continue-As-New​

High volume of Activity retries​

Large payloads in Workflow History​

Over-optimization at the expense of observability​

Excessive Activity Heartbeats​

Understanding cost drivers​

Cost distribution​

Optimization priority​

Measuring​

Actions optimization​

Activity granularity​

Child Workflows vs Activities​

Retry Policies​

Local Activities​

When to stick with Regular Activities​

Batching operations​

Search Attributes​

Signal handling​

Storage optimization​

Active Storage​

Continue-As-New​

Compression​

Claim check pattern​

Retained Storage​

Retention periods​

Workflow export​

Validation​

Validation approach​

Success criteria​

Tools​

When to get help​