Workflow cost optimization
This guide provides strategies for optimizing costs associated with workloads running on Temporal Cloud while maintaining Workflow reliability and observability.
Overview
Temporal Cloud uses consumption-based pricing with two primary cost components: Actions and Storage. Optimization opportunities vary significantly based on your workload characteristics - Workflows with high signal volume face different cost drivers than long-running Workflows with large payloads.
Build Workflows following best practices first, then optimize based on observed costs. Premature optimization can compromise observability and create operational challenges.
Every optimization involves tradeoffs. This guide helps you make informed decisions about where and how to optimize based on your specific requirements.
Should you need additional guidance on Workflow design considerations, please reach out to a Temporal Solutions Architect.
Common anti-patterns
Avoid these patterns that either inflate costs unnecessarily or create problems through aggressive optimization:
Premature Activity consolidation
Combining Activities before understanding failure modes reduces observability and retry control. Activities should be split based on failure boundaries and retry requirements, not cost optimization alone. See How many Activities should I use in my Temporal Workflow for a decision framework.
Inappropriate use of Local Activities
Using Local Activities for all operations without understanding their failure semantics and limitations. Local Activities don't provide Worker-level isolation and have different retry behavior. See Local Activities for guidance.
Missing Continue-As-New
Long-running Workflows that don't implement Continue-As-New accumulate large Event Histories, increasing storage costs and impacting performance. Workflows running days or weeks or processing thousands of events require Continue-As-New.
High volume of Activity retries
Generally, the default values for Activity retries are quite good. However, excessive Activity retries often indicate underlying issues like timeouts that are too short or Activities that frequently fail. Detect Activity retry frequency and if high, consider increasing retry intervals or Activity timeouts before failures occur. See Spooky Stories: Chilling Temporal Anti-Patterns for guidance on retry defaults and patterns.
Large payloads in Workflow History
Passing multi-megabyte payloads through Workflows when external storage (S3, blob storage) is more appropriate. Use compression or the claim check pattern for large data.
Over-optimization at the expense of observability
Aggressively optimizing costs without maintaining sufficient visibility for debugging and operational needs. Balance cost reduction with your team's observability requirements.
Excessive Activity Heartbeats
Each Heartbeat counts as 1 Action. Only use Heartbeats for long-running Activities (10+ minutes) where you need to detect Worker failures and track progress. Short-running Activities that complete in seconds or minutes don't need Heartbeats. See Activity Heartbeat documentation for guidance.
Understanding cost drivers
Temporal Cloud pricing consists of Actions, Storage, and Support. If you are new to Temporal Cloud, see the pricing documentation to learn more and familiarize yourself with what results in a billable Action in Temporal Cloud.
Cost distribution
For most workloads, Actions represent the majority of total costs, with storage typically accounting for 10% or less of a monthly bill. Focus optimization efforts on what's driving costs with a specific workload:
High Actions costs generally indicate:
- Many Activities per Workflow
- Frequent Signals, Queries, or Updates
- Long-running Activities with Heartbeats
- High Activity retry rates
- Extensive Query usage
High Storage costs generally indicate:
- Large payloads in Workflow inputs, outputs, or Activity results
- Long retention periods with high Workflow volume
- Long-running Workflows without Continue-As-New
- Workflows accumulating large Event Histories
Optimization priority
- Actions optimization: Usually provides the largest cost reduction opportunity
- Active Storage optimization: Relevant for long-running Workflows or large payloads
- Retained Storage optimization: Relevant for high volume combined with long retention periods
Measuring
Establish baseline metrics before optimizing and be sure to validate impact after implementation. Specifically:
- Actions consumption (per Workflow, per day/month, by Namespace)
- Storage consumption (Active and Retained)
- Monthly costs (total, per Namespace, per Workflow Type)
- Observability metrics (time to debug, incident detection)
Actions optimization
Actions encompass Workflow operations, Activity Executions, Signals, Queries, and other interactions with Temporal. Each represents a unit of consumption.
Activity granularity
Activity granularity is a fundamental architectural decision that impacts both costs and observability. More Activities provide better visibility and retry control but increase the Action count. Fewer Activities reduce costs but limit observability.
For detailed discussion of this tradeoff, see How many Activities should I use in my Temporal Workflow?
Child Workflows vs Activities
Child Workflows cost 2 Actions compared to an Activity's 1 Action. See Child Workflows documentation for detailed comparison of capabilities and use cases.
Retry Policies
Each Activity retry counts as 1 Action. Default Retry Policies can be aggressive, which is appropriate for most operations but costly for expensive external operations. Consider limiting maximum attempts, increasing initial intervals, and using larger backoff coefficients for expensive operations while maintaining standard retry behavior for normal operations.
For particularly expensive operations, consider using next retry delay to dynamically control retry timing based on failure types, or implement an Activity pause pattern to wait for manual intervention rather than automatic retries.
Refer to this blog post on Mastering Workflow retry logic for resilient applications for additional guidance.
Local Activities
A Local Activity is an Activity Execution that executes in the same process as the Workflow Execution that spawns it. Therefore, multiple Local Activities that run back-to-back only count as a single billable action, whereas each regular Activity counts as a billable action. However, there are tradeoffs to converting regular Activities to Local Activities. For example, if a specific Local Activity fails, all of them will be retried together. Review the docs or reach out to your account team to learn more.
When to stick with Regular Activities
Use Regular Activities instead of Local Activities if you require any of the following:
- Activities may take more than 10 seconds to complete
- Independent retry control for each Activity
- Need to avoid re-running expensive Activities when unrelated Activities fail
- Immediate Signal/Update handling during execution
- Separate resource management (like rate limits) for each Activity
Batching operations
Search Attributes
- Search Attributes provided at Workflow start do not count as billable Actions. If Search Attribute values are known before starting the Workflow, provide them at Workflow start to eliminate these costs entirely.
- For Search Attributes that must be updated during Workflow Execution, each
UpsertSearchAttributescall counts as 1 Action regardless of how many attributes are updated. Batch multiple related attribute updates into single operations to reduce Actions consumed.
See the Temporal Cloud Action Documentation for details.
Signal handling
Where feasible, implement deduplication logic client-side or aggregate data into fewer Signals.
Use SignalWithStart instead of separate StartWorkflow and SignalWorkflow calls when initiating Workflows with Signals.
Storage optimization
Storage costs are divided into Active Storage (open Workflows) and Retained Storage (closed Workflow History during retention period). Active Storage is significantly more expensive than Retained Storage.
Active Storage
Active Storage applies to open Workflows and their Event Histories. The following sections detail optimization opportunities for Active Storage.
Continue-As-New
For long-running Workflows with extended sleep/wait periods, calling Continue-As-New before sleeping closes the current execution (moving to cheaper Retained Storage) and starts fresh when work resumes, reducing Active Storage costs. See Continue-As-New documentation to learn more.
Compression
Large payloads increase Active Storage costs. Implement custom DataConverter with compression for moderately large payloads (100KB-1MB). See the Data Converter documentation to learn more.
Claim check pattern
For very large payloads or binary data, store data externally (S3 or GCS) and pass references through Workflows.
Retained Storage
Retained Storage applies to closed Workflow History during the retention period. The following sections detail optimization opportunities for Retained Storage.
Retention periods
Default Namespace retention is 30 days (configurable 1-90 days). Adjust based on operational and compliance requirements.
Considerations:
- Shorter retention reduces costs but limits historical analysis
- Audit investigation patterns before shortening retention
- Ensure compliance requirements are met
See Namespace retention documentation for configuration details.
Workflow export
Temporal Cloud supports exporting Workflow Histories to external storage for compliance while maintaining shorter retention periods. Note that Workflow export costs 1 Action per export.
See Workflow History export documentation for more details. Alternatively, if you are looking to do analysis on closed Workflow Executions, review this blog post to learn how to gain insights from exported Workflow Histories.
Validation
Validation approach
- Test in non-production: Validate functional correctness before production deployment
- Monitor comprehensively: Leverage the Usage dashboard in the Cloud UI to track the impact on Actions and Storage after optimizations are made
- Progressive rollout: Deploy to small percentage, validate, then expand. Review Worker Versioning documentation to learn about rolling out changes to Workflows
- Continuous review: Re-evaluate optimization effectiveness quarterly as system evolves
Success criteria
- Cost reduced without increasing mean time to repair (MTTR)
- Workflow success rates maintained or improved
- Reduced observability does not increase mean time to detect (MTTD) incidents
Tools
- Temporal Cloud Usage dashboard for Actions and Storage metrics
- Workflow History for per-Workflow billable Actions estimates
- Export metrics to observability platforms (Datadog, Grafana, etc.) for custom monitoring
When to get help
Engage the Temporal team for Workflow audits when experiencing:
- Complex Workflow patterns with unclear optimization paths
- Compliance requirements limiting optimization options
- Need for custom DataConverters or advanced patterns
- Desire for expert validation of optimization strategies
Contact your Temporal Account Representative or Temporal support to discuss optimization services.