You've Built a GenServer. Now Make It Fast, Observable and Bulletproof.
Last updated:
June 3, 2026
7 min read
Elixir
Ihor Katkov
Software Engineer

Sofiia Yurkevska
Content Writer
%20(1).avif)
Contents
See more
Sometimes, one ships a shiny new Genserver to production with the best hopes in heart, and for a good reason: it passed unit tests, it handles happy-path demo traffic, and there's so much work in it already that it just can't be that bad, right? Right? And then users – fellow humans – come in, bringing latency spikes, and CPU utilization climbs; suddenly, the BEAM scheduler view in `:observer` resembles a Christmas tree.
We've been there and learned that building a GenServer is the easy part; making it fast, observable, and bulletproof is where the real work starts.
In the previous article, we walked through a TDD approach to GenServers. This follow-up is the field manual I wish I had when I first pushed one of those servers to production. We'll build a mental model of how GenServers consume CPU cycles, then apply a toolbox of performance and observability techniques that you can drop into your code today.
By the end, you'll know how to:
Let's dive in.
The GenServer Cost Mental Model
A GenServer is just a process with a mailbox, but the devil is in the scheduler details. The BEAM VM runs N schedulers – one per CPU core by default – and each scheduler processes a run queue of tasks. Key things to watch:
Tools to keep under the belt:
:observer.start()
:recon.proc(:info)
# A library like telemetry_metrics_statsd to consume telemetry eventsSpend five minutes watching these metrics during load and your optimisation story usually writes itself.
Performance & Throughput Techniques
Keep Callbacks Non-Blocking
If a callback waits on disk, network, or a heavy CPU, your entire GenServer stalls. The key is to move blocking work out of the GenServer's main loop. The `Task`module provides several patterns for this.
For "fire-and-forget" work where the caller doesn't need a result, `Task.start/1` offloads the work into a new, linked process. The GenServer can immediately process the next message.
def handle_cast({:track_event, event}, state) do
# This task is linked to the GenServer. If it crashes, the GenServer crashes.
Task.start(fn -> Analytics.track(event) end)
{:noreply, state}
endWhen a result is needed but you can't block the GenServer, a common pattern is to have the GenServer start a task and return it to the caller. The caller then `Task.await/1`s the result. This frees the GenServer while the client waits.
# In the GenServer
def handle_call({:compute, input}, _from, state) do
task = Task.async(fn -> heavy_math(input) end)
{:reply, {:ok, task}, state}
end
# In a client module
def compute(server, input) do
{:ok, task} = GenServer.call(server, {:compute, input})
Task.await(task, 30_000) # Always use a timeout!
endIn case background jobs shouldn't be linked to your GenServer, use a `Task.Supervisor` to run them as supervised, independent processes.
Post-Init Heavy Work with `handle_continue`
Boot time matters when your GenServer sits inside a supervision tree - a slow `init/1` delays the whole app. Load large datasets after the process is up:
def init(opts) do
{:ok, %{}, {:continue, :warm_cache}}
end
def handle_continue(:warm_cache, state) do
cache = load_big_table()
{:noreply, %{state | cache: cache}}
endYour supervision tree comes online instantly, and the heavy work happens without blocking.
Externalize Read-Heavy State (ETS / `persistent_term`)
A GenServer's state is its bottleneck; every read is a serialized request. For highly contended data, moving state to `:ets` or `:persistent_term` can unlock massive read concurrency. But this power comes with sharp trade-offs: ETS tables, especially with `read_concurrency: true`, offer fast, parallel reads that come at a cost of:
For truly static data that is read frequently and written rarely, `:persistent_term` is a powerful alternative. Reads are virtually free—no message passing, no memory copies, no GC impact. The catch is that `persistent_term.put/2` is a globally blocking operation that can cause a multi-millisecond pause across the entire BEAM (on a modern OTP (25+), it's typically sub‑millisecond for small updates). It should only be used for data that is set once at application boot or updated very rarely during a maintenance window.
Batching & Coalescing Patterns
Sometimes the cheapest optimisation is to do less. Accumulate writes and flush every X milliseconds:
def init(_opts) do
schedule_flush()
{:ok, %{buffer: []}}
end
def handle_cast({:track, metric}, state) do
{:noreply, %{state | buffer: [metric | state.buffer]}}
end
def handle_info(:flush, state) do
schedule_flush()
flush(state.buffer)
{:noreply, %{state | buffer: []}}
end
defp schedule_flush do
Process.send_after(self(), :flush, 1000)
endUsed sparingly, batching can smooth traffic spikes without complex back-pressure logic.
Back-Pressure & Demand Control
If producers outpace your GenServer, queues explode. Options:
@impl true
def handle_call({:process, _item}, _from, state) do
# Check the mailbox size first.
case Process.info(self(), :message_queue_len) do
{:message_queue_len, len} when len > @max_queue_len ->
# "Reject" the call because the server is overloaded.
{:reply, {:error, :overloaded}, state}
_ ->
# Mailbox is not full, process the request.
# ... do actual work ...
{:reply, :ok, %{state | processed: state.processed + 1}}
end
endConsider moving to `GenStage` or `Broadway` when:
Migration can be incremental. You can embed a `GenStage` producer inside an existing `GenServer` and fan out from there.
Sharding Hot Keys
One GenServer → one mailbox. Hot keys will hit the limit. Partition with a `Registry`:
key = :erlang.phash2(customer_id, 16)
# Note: if `customer_id` is user-controllable, this could be
# vulnerable to hash-collision attacks creating a hot shard.
{:ok, pid} = MyShardSupervisor.start_child(key)
Or reach for libraries like `hash_ring`.
Observability & Instrumentation
You can't fix what you can't see. The BEAM emits rich `:telemetry` events - use them.
:telemetry.execute([
:my_app, :genserver, :callback, :stop
],
%{duration: duration},
%{module: __MODULE__, callback: :handle_call})Pipe these events into a library like `PromEx` to expose them to Grafana or Datadog. Add tracing (`OpenTelemetry`) around external calls to stitch latency graphs end-to-end. Set *budgets* (SLOs) and alert on 95th percentile, not averages.
Conclusion
A GenServer is a beautiful abstraction, but it hides sharp edges. With a clear mental model and a small set of techniques – non-blocking callbacks, state externalisation, batching, back-pressure, and solid instrumentation – you can take that weekend prototype and run it under serious production load.
Every optimization is a trade-off. Always profile your application to identify true bottlenecks before adding complexity. Instrument first, optimise second.
Next up in this series: distributed GenServers and cluster-wide coordination – we'll tackle hand-off, global registries, and truly elastic scaling. Stay tuned. In case you’re looking for hand-on Elixir expertise for your next project, drop us a line.
Key Takeaways
Resources & Further Reading
Official OTP docs –`gen_server`, `:erlang.process_info/2`.
Fred Hébert – "Adopting Erlang/OTP" chapters on monitoring.
Saša Jurić – "Elixir in Action", sections on performance.
Erlang Solutions – "Designing for Scalability with Erlang/OTP".
with Freshcode





