Operations model

Deployment Shape

Recommended internet-facing shape:

client -> proxy/load balancer/TLS -> Nacelle service

The proxy should own TLS, coarse connection filtering, and external idle timeouts. Nacelle owns application limits, protocol handling, body limits, and graceful shutdown.

Startup

Use explicit limits and print the effective config for stress or benchmark services. For production services, record:

  • process version and git SHA
  • configured limits
  • listener addresses
  • feature flags
  • allocator settings

Shutdown

Wire OS signals to NacelleHost::shutdown_and_wait_timeout. Pick a drain deadline that matches service semantics. Short deadlines protect deploy velocity but can abort in-flight work.

Expected shutdown telemetry:

  • shutdown requested
  • listener stopped accepting
  • drain started
  • drain completed or timed out
  • active connections aborted

Metrics To Watch

  • nacelle.connections.active
  • nacelle.requests.active
  • nacelle.streaming_tasks.active
  • nacelle.memory.used_bytes
  • nacelle.connections.accepted
  • nacelle.connections.closed
  • nacelle.connections.in_flight
  • nacelle.requests.started
  • nacelle.requests.completed
  • nacelle.rejections
  • nacelle.timeouts
  • nacelle.requests.failed
  • nacelle.request.bytes
  • nacelle.response.bytes

Alerts should focus on sustained saturation, rising rejections, timeout spikes, and memory approaching the configured budget.

Benchmarking

The default OpenTelemetry profile keeps lifecycle metrics on. Request metrics are grouped under NacelleTelemetryConfig::request_metrics: started, completed, and byte_counts are on by default, while in_flight, duration_ms, and phase histograms are opt-in. The stress server's default OTel build prints a compact console snapshot every 5 seconds.

Request duration metrics remain opt-in through NacelleTelemetryConfig. With the default config, core/HTTP request paths avoid request timer work unless HTTP access logging is enabled.

Canonical OpenTelemetry metric names are resource-first. Instrument type is documented here rather than embedded in the metric name:

MetricTypeNotes
nacelle.connections.activeGaugeCurrent runtime active connections.
nacelle.requests.activeGaugeCurrent runtime active requests.
nacelle.streaming_tasks.activeGaugeCurrent runtime streaming body tasks.
nacelle.memory.used_bytesGaugeCurrent bytes allocated by runtime memory accounting.
nacelle.connections.acceptedCounterAccepted connections, labeled by listener/transport/TLS where available.
nacelle.connections.closedCounterClosed connections, labeled with close reason where available.
nacelle.connections.in_flightUpDownCounterPer-listener connection delta for transport-level detail.
nacelle.requests.startedCounterRequests started.
nacelle.requests.completedCounterRequests completed, labeled by status where available.
nacelle.requests.failedCounterRequests failed before normal completion.
nacelle.request.bytesCounterRequest bytes accounted by the transport/protocol path.
nacelle.response.bytesCounterResponse bytes accounted by the transport/protocol path.
nacelle.request.duration_msHistogramRequest duration, opt-in.
nacelle.phase.duration_msHistogramInternal phase duration, opt-in.

Run microbenchmarks before and after hot-path changes:

cargo bench -p nacelle --features bench,reference_protocol