You've Built a GenServer. Now Make It Fast, Observable and Bulletproof.

Last updated:

June 3, 2026

7 min read

Elixir

Ihor Katkov

Software Engineer

Sofiia Yurkevska

Content Writer

Contents

This is some text inside of a div block.

Sometimes, one ships a shiny new Genserver to production with the best hopes in heart, and for a good reason: it passed unit tests, it handles happy-path demo traffic, and there's so much work in it already that it just can't be that bad, right? Right? And then users – fellow humans – come in, bringing latency spikes, and CPU utilization climbs; suddenly, the BEAM scheduler view in `:observer` resembles a Christmas tree.

TL;DR

GenServer is powerful for state management, concurrency control, background processing, and resource management - but many problems don't need it and can use simple structs/functions instead

GenServer overhead includes process startup/management, inter-process communication, supervision/recovery, memory state maintenance, and lifecycle callback implementation

Common design flaws: business logic overload (mixing business rules with process management), treating GenServers like OOP objects, and ignoring Single Responsibility Principle (SRP)

Proper design: keep GenServers thin as coordinators, extract complex business logic to external modules like OrderService, focus each GenServer on one specific task

Testing strategies: isolated callback testing (call handle_call/handle_cast directly for simple state transitions) and live GenServer testing (run actual process for complex interactions like timeouts/retries)

Use explicit contracts and adapters for external services - create behavior modules with Stub implementations for tests and real implementations for production

Swap implementations either via compile-time config (Application.compile_env) or runtime options (pass mailer as GenServer option)

Bottom line: if testing feels difficult, your GenServer is doing too much - let that guide better design decisions

We've been there and learned that building a GenServer is the easy part; making it fast, observable, and bulletproof is where the real work starts.

In the previous article, we walked through a TDD approach to GenServers. This follow-up is the field manual I wish I had when I first pushed one of those servers to production. We'll build a mental model of how GenServers consume CPU cycles, then apply a toolbox of performance and observability techniques that you can drop into your code today.

By the end, you'll know how to:

Read the BEAM's "cost model" – mailbox size, scheduler reductions, message queue length – so you can spot trouble early.

Refactor hot paths so callbacks never block schedulers.

Push read-heavy state to ETS / `persistent_term` without losing consistency.

Add cheap, composable Telemetry so dashboards light up before pagers do.

Choose when to graduate from a single GenServer to GenStage, Broadway, or full-blown distributed sharding.

Let's dive in.

The GenServer Cost Mental Model

A GenServer is just a process with a mailbox, but the devil is in the scheduler details. The BEAM VM runs N schedulers – one per CPU core by default – and each scheduler processes a run queue of tasks. Key things to watch:

Mailbox size

`Process.info(pid, :message_queue_len)` tells you how many messages are waiting. A consistently growing queue is a red flag, as an overloaded mailbox can delay its replies, inflating the end‑to‑end latency of other processes on the same scheduler.

Reductions

Every BEAM operation costs reductions; long-running callbacks burn the budget, delaying other work.

Scheduler migrations

When a process hogs a scheduler for too long, it triggers load balancing, and the VM may migrate it to a different scheduler core. This context switch can lead to CPU cache misses as the process's data is no longer in the local L1/L2 cache, introducing latency.

Sync vs. async

`GenServer.call/3` blocks the caller; `cast/2` doesn't. Calls are convenient, but couple your lifecycles with back-pressure.

Tools to keep under the belt:

:observer.start()
:recon.proc(:info)
# A library like telemetry_metrics_statsd to consume telemetry events

Spend five minutes watching these metrics during load and your optimisation story usually writes itself.

Performance & Throughput Techniques

Keep Callbacks Non-Blocking

If a callback waits on disk, network, or a heavy CPU, your entire GenServer stalls. The key is to move blocking work out of the GenServer's main loop. The `Task`module provides several patterns for this.

For "fire-and-forget" work where the caller doesn't need a result, `Task.start/1` offloads the work into a new, linked process. The GenServer can immediately process the next message.

def handle_cast({:track_event, event}, state) do
  # This task is linked to the GenServer. If it crashes, the GenServer crashes.
  Task.start(fn -> Analytics.track(event) end)
  {:noreply, state}
end

When a result is needed but you can't block the GenServer, a common pattern is to have the GenServer start a task and return it to the caller. The caller then `Task.await/1`s the result. This frees the GenServer while the client waits.

# In the GenServer
def handle_call({:compute, input}, _from, state) do
  task = Task.async(fn -> heavy_math(input) end)
  {:reply, {:ok, task}, state}
end

# In a client module
def compute(server, input) do
  {:ok, task} = GenServer.call(server, {:compute, input})
  Task.await(task, 30_000) # Always use a timeout!
end

In case background jobs shouldn't be linked to your GenServer, use a `Task.Supervisor` to run them as supervised, independent processes.

Freshcode Tip

The goal is to keep your `handle_call` and `handle_cast` callbacks consistently fast (a good budget is <1ms). When profiling reveals a slow callback, delegate the work using one of these patterns.

Post-Init Heavy Work with `handle_continue`

Boot time matters when your GenServer sits inside a supervision tree - a slow `init/1` delays the whole app. Load large datasets after the process is up:

def init(opts) do
  {:ok, %{}, {:continue, :warm_cache}}
end

def handle_continue(:warm_cache, state) do
  cache = load_big_table()
  {:noreply, %{state | cache: cache}}
end

Your supervision tree comes online instantly, and the heavy work happens without blocking.

Externalize Read-Heavy State (ETS / `persistent_term`)

A GenServer's state is its bottleneck; every read is a serialized request. For highly contended data, moving state to `:ets` or `:persistent_term` can unlock massive read concurrency. But this power comes with sharp trade-offs: ETS tables, especially with `read_concurrency: true`, offer fast, parallel reads that come at a cost of:

Write Serialization

By default, all writes are still serialized through the single process that owns the table. Consider using `true` or `auto` (OTP 25+) for `write_concurrency`. Multiple instances from the same table can be modified and accessed simultaneously by different processes. This capability comes at the cost of higher memory usage and reduced efficiency for sequential operations and concurrent reads.

Consistency

`read_concurrency` can lead to dirty reads. A reader might see a partially updated record if a write is happening concurrently.

Ownership

The table's lifecycle is tied to the owner process. If it dies, the table vanishes.

For truly static data that is read frequently and written rarely, `:persistent_term` is a powerful alternative. Reads are virtually free—no message passing, no memory copies, no GC impact. The catch is that `persistent_term.put/2` is a globally blocking operation that can cause a multi-millisecond pause across the entire BEAM (on a modern OTP (25+), it's typically sub‑millisecond for small updates). It should only be used for data that is set once at application boot or updated very rarely during a maintenance window.

Freshcode Tip

Use these tools surgically. Profile your application, understand the read/write ratio, and always measure the performance impact of both reads and writes before committing to this pattern.

Batching & Coalescing Patterns

Sometimes the cheapest optimisation is to do less. Accumulate writes and flush every X milliseconds:

def init(_opts) do
  schedule_flush()
  {:ok, %{buffer: []}}
end

def handle_cast({:track, metric}, state) do
  {:noreply, %{state | buffer: [metric | state.buffer]}}
end

def handle_info(:flush, state) do
  schedule_flush()
  flush(state.buffer)
  {:noreply, %{state | buffer: []}}
end

defp schedule_flush do
  Process.send_after(self(), :flush, 1000)
end

Used sparingly, batching can smooth traffic spikes without complex back-pressure logic.

I always tell my developers: if something earns money, or if a change makes the code easier to integrate with a new feature, then go for it.

Alexander Johannes

JustOn

Clojure in Product.

Would you do it again?

Listen on
Spotify Podcast

Listen on
Apple Podcast

Listen on
Youtube Podcast

Listen to podcast

Back-Pressure & Demand Control

If producers outpace your GenServer, queues explode. Options:

Bounded mailbox is set a max queue length and reject or drop messages after a threshold.

Timeouts on `call/3` – force callers to handle slowness.

  @impl true
  def handle_call({:process, _item}, _from, state) do
    # Check the mailbox size first.
    case Process.info(self(), :message_queue_len) do
      {:message_queue_len, len} when len > @max_queue_len ->
        # "Reject" the call because the server is overloaded.
        {:reply, {:error, :overloaded}, state}

      _ ->
        # Mailbox is not full, process the request.
        # ... do actual work ...
        {:reply, :ok, %{state | processed: state.processed + 1}}
    end
  end

Consider moving to `GenStage` or `Broadway` when:

You need standardized, pull-based back-pressure across a multi-stage data processing pipeline.

Your workload naturally fits a consumer-producer model (e.g., consuming from SQS).

You need concurrent processing of events while preserving order within a partition.

Migration can be incremental. You can embed a `GenStage` producer inside an existing `GenServer` and fan out from there.

Sharding Hot Keys

One GenServer → one mailbox. Hot keys will hit the limit. Partition with a `Registry`:

key = :erlang.phash2(customer_id, 16)
# Note: if `customer_id` is user-controllable, this could be
# vulnerable to hash-collision attacks creating a hot shard.
{:ok, pid} = MyShardSupervisor.start_child(key)

Or reach for libraries like `hash_ring`.

Observability & Instrumentation

You can't fix what you can't see. The BEAM emits rich `:telemetry` events - use them.

:telemetry.execute([
  :my_app, :genserver, :callback, :stop
],
%{duration: duration},
%{module: __MODULE__, callback: :handle_call})

Pipe these events into a library like `PromEx` to expose them to Grafana or Datadog. Add tracing (`OpenTelemetry`) around external calls to stitch latency graphs end-to-end. Set *budgets* (SLOs) and alert on 95th percentile, not averages.

Conclusion

A GenServer is a beautiful abstraction, but it hides sharp edges. With a clear mental model and a small set of techniques – non-blocking callbacks, state externalisation, batching, back-pressure, and solid instrumentation – you can take that weekend prototype and run it under serious production load.

Every optimization is a trade-off. Always profile your application to identify true bottlenecks before adding complexity. Instrument first, optimise second.

Next up in this series: distributed GenServers and cluster-wide coordination – we'll tackle hand-off, global registries, and truly elastic scaling. Stay tuned. In case you’re looking for hand-on Elixir expertise for your next project, drop us a line.