Securing Pulumi secrets with AWS KMS and HashiCorp Vault
Production infrastructure demands cryptographic control over state files. Pulumi’s default service-managed encryption lacks audit trails and cross-account portability. Migrating to AWS KMS or HashiCorp Vault enforces compliance boundaries. This guide details atomic provider swaps, strict Python 3.9+ typing patterns, and state recovery workflows.
Environment Isolation & Python 3.9+ Baseline
Virtual Environment & Dependency Pinning
Infrastructure code requires deterministic dependency resolution. Floating versions introduce silent breaking changes during provider upgrades. Pin pulumi, boto3, and hvac in pyproject.toml or requirements.txt. Isolate each stack in a dedicated virtual environment.
CLI: Initialize and activate a clean Python environment.
python3.9 -m venv .venv source .venv/bin/activate pip install -r requirements.txt
Strict Type Checking with mypy
Dynamic typing obscures configuration resolution errors until deployment. Enforce mypy --strict in CI pipelines. Annotate all configuration loaders and resource constructors. Catch None propagation before the Pulumi engine evaluates the dependency graph.
IAM & Vault Auth Pre-Flight Checks
Authentication deadlocks halt stack operations mid-execution. Validate AWS IAM kms:Decrypt and kms:Encrypt permissions before initializing the provider. For Vault, verify AppRole or TLS certificate validity. Run a dry-run credential fetch to confirm network routing and policy attachment.
Migrating to AWS KMS Secrets Provider
CLI Provider Swap Command
State migration must remain atomic. The change-secrets-provider subcommand re-encrypts ciphertext values without altering resource URNs. Target a specific KMS alias and AWS SDK version to avoid legacy API deprecation.
CLI: Execute the atomic migration sequence.
pulumi stack change-secrets-provider "awskms://alias/pulumi-secrets-key?region=us-east-1&awssdk=v2"
Typed Secret Retrieval in Python
Raw string interpolation bypasses Pulumi’s secret masking engine. Wrap sensitive values in pulumi.Output types immediately. Use explicit return annotations to prevent accidental serialization during resource graph compilation.
import pulumi
from typing import Dict, Optional
from pulumi import Output
def get_db_credentials(config: pulumi.Config) -> Dict[str, Optional[Output[str]]]:
"""Retrieve typed database credentials with explicit secret wrapping."""
username: str = config.require("db_username")
password: Optional[Output[str]] = config.get_secret("db_password")
if password is None:
raise ValueError("Missing required secret: db_password")
return {"user": username, "pass": password}
IAM Policy Scoping & Least Privilege
Broad KMS permissions violate zero-trust architectures. Scope policies to specific key aliases and restrict kms:GenerateDataKey to the Pulumi CLI execution role. Cross-account decryption requires explicit grant propagation. Consult the AWS Provider Deep Dive for granular IAM policy templates and alias routing strategies.
Integrating HashiCorp Vault Secrets Provider
Vault Transit Engine Configuration
The transit backend provides encryption-as-a-service without persistent secret storage. Enable the transit secrets engine and generate a dedicated keyring. Configure key rotation policies to align with organizational compliance windows.
CLI: Provision the transit path and key.
vault secrets enable transit vault write -f transit/keys/pulumi-stack type=aes256-gcm96
Token & Auth Method Mapping
Pulumi requires persistent authentication during stack operations. Map AppRole, TLS, or Kubernetes service accounts to the transit path. Align token TTLs with maximum deployment durations. Short-lived tokens trigger mid-apply 403 Forbidden failures.
Python Fallback Typing Patterns
Dynamic secret resolution often requires conditional fallbacks. Use typing.Optional and typing.Dict[str, Any] for heterogeneous configuration maps. Validate secret presence before passing values to resource constructors.
from typing import Dict, Optional, Any
import pulumi
def resolve_vault_secrets(config: pulumi.Config) -> Dict[str, Any]:
"""Dynamically resolve Vault-backed secrets with safe fallback typing."""
api_key: Optional[str] = config.get_secret("vault_api_key")
region: str = config.require("deployment_region")
resolved: Dict[str, Any] = {
"api_key": api_key,
"region": region,
"fallback_enabled": api_key is not None
}
return resolved
State Safety, Drift Detection & Safe Rollback
Pre-Migration State Snapshots
Provider transitions introduce cryptographic incompatibilities. Export the current state before executing any migration command. Store snapshots in version-controlled artifact storage. Maintain immutable backups for compliance audits.
CLI: Export stack state to a local artifact.
pulumi stack export --file state-pre-migration.json
Drift Detection via pulumi refresh
Post-migration state verification prevents silent configuration divergence. Run pulumi refresh to reconcile the local state file with live infrastructure. Review diff outputs for unexpected resource replacements or property resets.
Atomic State Import & Rollback
Decryption failures require immediate state restoration. Import the pre-migration snapshot to revert cryptographic bindings. Force overwrite the corrupted state file to unblock subsequent deployments. Reference Pulumi Patterns & Provider Management for automated stack lifecycle governance and versioned state recovery pipelines.
CLI: Execute forced state rollback on failure.
pulumi stack import --file state-pre-migration.json --force
Testing Boundaries & Secret Masking Validation
pytest Isolation for IaC
Unit tests must never invoke live cloud providers. Isolate configuration parsing from resource provisioning logic. Mock the Pulumi runtime engine to simulate stack evaluation without network calls.
Mocking KMS/Vault Responses
Patch pulumi.runtime.invoke and pulumi.config.Config using unittest.mock. Return deterministic ciphertext payloads during test execution. Validate type coercion and error handling paths without exposing real credentials.
CLI Output Redaction Verification
Secret masking relies on Pulumi’s internal serialization layer. Verify that pulumi preview and pulumi up outputs display [secret] placeholders. Assert that .apply() string interpolation raises ValueError when attempting synchronous plaintext conversion.
Common Mistakes & Remediation
| Mistake | Remediation | Impact |
|---|---|---|
Using pulumi config set without --secret during migration |
Always append --secret or enforce config.require_secret() in code. Verify ciphertext format in Pulumi.<stack>.yaml. |
Plaintext secrets committed to VCS, triggering compliance violations and audit failures. |
| Skipping IAM policy scoping or Vault transit path validation | Apply least-privilege kms:Decrypt/kms:Encrypt or Vault transit/encrypt/* policies. Validate with aws kms describe-key or vault read transit/keys/pulumi. |
CLI hangs on pulumi up with opaque AccessDenied or 403 Forbidden errors. |
Ignoring Python type hints for pulumi.Output secrets |
Wrap secrets in pulumi.Output types. Avoid synchronous string operations on Output objects. Use .apply() for async transformations. |
Runtime TypeError during dependency resolution and failed resource graph compilation. |
Frequently Asked Questions
Can I migrate Pulumi secrets to KMS/Vault without recreating resources?
Yes. pulumi stack change-secrets-provider only re-encrypts state values. Resource IDs and URNs remain intact. Validate the operation with pulumi preview before applying changes.
How does drift detection handle rotated KMS keys or Vault tokens?
Pulumi does not auto-detect key rotation. Implement CI/CD checks with pulumi refresh and monitor AWS CloudTrail or Vault audit logs for AccessDenied events during stack operations. Align token lifecycles with deployment windows.
What is the testing boundary for mocking KMS/Vault in Python IaC?
Use unittest.mock to patch pulumi.runtime.invoke and pulumi.config.Config. Never mock the actual secrets provider. Test configuration resolution, type safety, and error propagation in strict isolation.