anchor
Freshcode
  /  
Insights
  /  

You've Built a GenServer. Now Make It Fast, Observable and Bulletproof.

You've Built a GenServer. Now Make It Fast, Observable and Bulletproof.

Last updated:

June 3, 2026

7 min read

Elixir

By

Ihor Katkov

Software Engineer

Sofiia Yurkevska

Content Writer

Contents

See more

This is some text inside of a div block.

Sometimes, one ships a shiny new Genserver to production with the best hopes in heart, and for a good reason: it passed unit tests, it handles happy-path demo traffic, and there's so much work in it already that it just can't be that bad, right? Right? And then users – fellow humans – come in, bringing latency spikes, and CPU utilization climbs; suddenly, the BEAM scheduler view in `:observer` resembles a Christmas tree.

TL;DR
  • GenServer is powerful for state management, concurrency control, background processing, and resource management - but many problems don't need it and can use simple structs/functions instead
  • GenServer overhead includes process startup/management, inter-process communication, supervision/recovery, memory state maintenance, and lifecycle callback implementation
  • Common design flaws: business logic overload (mixing business rules with process management), treating GenServers like OOP objects, and ignoring Single Responsibility Principle (SRP)
  • Proper design: keep GenServers thin as coordinators, extract complex business logic to external modules like OrderService, focus each GenServer on one specific task
  • Testing strategies: isolated callback testing (call handle_call/handle_cast directly for simple state transitions) and live GenServer testing (run actual process for complex interactions like timeouts/retries)
  • Use explicit contracts and adapters for external services - create behavior modules with Stub implementations for tests and real implementations for production
  • Swap implementations either via compile-time config (Application.compile_env) or runtime options (pass mailer as GenServer option)
  • Bottom line: if testing feels difficult, your GenServer is doing too much - let that guide better design decisions
  • We've been there and learned that building a GenServer is the easy part; making it fast, observable, and bulletproof is where the real work starts.

    In the previous article, we walked through a TDD approach to GenServers. This follow-up is the field manual I wish I had when I first pushed one of those servers to production. We'll build a mental model of how GenServers consume CPU cycles, then apply a toolbox of performance and observability techniques that you can drop into your code today.

    By the end, you'll know how to:

    icon
    Read the BEAM's "cost model" – mailbox size, scheduler reductions, message queue length – so you can spot trouble early.
    icon
    Refactor hot paths so callbacks never block schedulers.
    icon
    Push read-heavy state to ETS / `persistent_term` without losing consistency.
    icon
    Add cheap, composable Telemetry so dashboards light up before pagers do.
    icon
    Choose when to graduate from a single GenServer to GenStage, Broadway, or full-blown distributed sharding.

    Let's dive in.

    The GenServer Cost Mental Model

    A GenServer is just a process with a mailbox, but the devil is in the scheduler details. The BEAM VM runs N schedulers – one per CPU core by default – and each scheduler processes a run queue of tasks. Key things to watch:

    1
    Mailbox size
    `Process.info(pid, :message_queue_len)` tells you how many messages are waiting. A consistently growing queue is a red flag, as an overloaded mailbox can delay its replies, inflating the end‑to‑end latency of other processes on the same scheduler.
    2
    Reductions
    Every BEAM operation costs reductions; long-running callbacks burn the budget, delaying other work.
    3
    Scheduler migrations
    When a process hogs a scheduler for too long, it triggers load balancing, and the VM may migrate it to a different scheduler core. This context switch can lead to CPU cache misses as the process's data is no longer in the local L1/L2 cache, introducing latency.
    4
    Sync vs. async
    `GenServer.call/3` blocks the caller; `cast/2` doesn't. Calls are convenient, but couple your lifecycles with back-pressure.

    Tools to keep under the belt:

    :observer.start()
    :recon.proc(:info)
    # A library like telemetry_metrics_statsd to consume telemetry events

    Spend five minutes watching these metrics during load and your optimisation story usually writes itself.

    Performance & Throughput Techniques

    Keep Callbacks Non-Blocking

    If a callback waits on disk, network, or a heavy CPU, your entire GenServer stalls. The key is to move blocking work out of the GenServer's main loop. The `Task`module provides several patterns for this.

    For "fire-and-forget" work where the caller doesn't need a result, `Task.start/1` offloads the work into a new, linked process. The GenServer can immediately process the next message.

    def handle_cast({:track_event, event}, state) do
      # This task is linked to the GenServer. If it crashes, the GenServer crashes.
      Task.start(fn -> Analytics.track(event) end)
      {:noreply, state}
    end

    When a result is needed but you can't block the GenServer, a common pattern is to have the GenServer start a task and return it to the caller. The caller then `Task.await/1`s the result. This frees the GenServer while the client waits.

    # In the GenServer
    def handle_call({:compute, input}, _from, state) do
      task = Task.async(fn -> heavy_math(input) end)
      {:reply, {:ok, task}, state}
    end
    
    # In a client module
    def compute(server, input) do
      {:ok, task} = GenServer.call(server, {:compute, input})
      Task.await(task, 30_000) # Always use a timeout!
    end

    In case background jobs shouldn't be linked to your GenServer, use a `Task.Supervisor` to run them as supervised, independent processes.

    Freshcode Tip
    The goal is to keep your `handle_call` and `handle_cast` callbacks consistently fast (a good budget is <1ms). When profiling reveals a slow callback, delegate the work using one of these patterns.

    Post-Init Heavy Work with `handle_continue`

    Boot time matters when your GenServer sits inside a supervision tree - a slow `init/1` delays the whole app. Load large datasets after the process is up:

    def init(opts) do
      {:ok, %{}, {:continue, :warm_cache}}
    end
    
    def handle_continue(:warm_cache, state) do
      cache = load_big_table()
      {:noreply, %{state | cache: cache}}
    end

    Your supervision tree comes online instantly, and the heavy work happens without blocking.

    Externalize Read-Heavy State (ETS / `persistent_term`)

    A GenServer's state is its bottleneck; every read is a serialized request. For highly contended data, moving state to `:ets` or `:persistent_term` can unlock massive read concurrency. But this power comes with sharp trade-offs: ETS tables, especially with `read_concurrency: true`, offer fast, parallel reads that come at a cost of:

    1
    Write Serialization
    By default, all writes are still serialized through the single process that owns the table. Consider using `true` or `auto` (OTP 25+) for `write_concurrency`. Multiple instances from the same table can be modified and accessed simultaneously by different processes. This capability comes at the cost of higher memory usage and reduced efficiency for sequential operations and concurrent reads.
    2
    Consistency
    `read_concurrency` can lead to dirty reads. A reader might see a partially updated record if a write is happening concurrently.
    3
    Ownership
    The table's lifecycle is tied to the owner process. If it dies, the table vanishes.

    For truly static data that is read frequently and written rarely, `:persistent_term` is a powerful alternative. Reads are virtually free—no message passing, no memory copies, no GC impact. The catch is that `persistent_term.put/2` is a globally blocking operation that can cause a multi-millisecond pause across the entire BEAM (on a modern OTP (25+), it's typically sub‑millisecond for small updates). It should only be used for data that is set once at application boot or updated very rarely during a maintenance window.

    Freshcode Tip
    Use these tools surgically. Profile your application, understand the read/write ratio, and always measure the performance impact of both reads and writes before committing to this pattern.

    Batching & Coalescing Patterns

    Sometimes the cheapest optimisation is to do less. Accumulate writes and flush every X milliseconds:

    def init(_opts) do
      schedule_flush()
      {:ok, %{buffer: []}}
    end
    
    def handle_cast({:track, metric}, state) do
      {:noreply, %{state | buffer: [metric | state.buffer]}}
    end
    
    def handle_info(:flush, state) do
      schedule_flush()
      flush(state.buffer)
      {:noreply, %{state | buffer: []}}
    end
    
    defp schedule_flush do
      Process.send_after(self(), :flush, 1000)
    end

    Used sparingly, batching can smooth traffic spikes without complex back-pressure logic.

    I always tell my developers: if something earns money, or if a change makes the code easier to integrate with a new feature, then go for it.
    Alexander Johannes
    JustOn
    Alexander Johannes

    Back-Pressure & Demand Control

    If producers outpace your GenServer, queues explode. Options:

    icon
    Bounded mailbox is set a max queue length and reject or drop messages after a threshold.
    icon
    Timeouts on `call/3` – force callers to handle slowness.
      @impl true
      def handle_call({:process, _item}, _from, state) do
        # Check the mailbox size first.
        case Process.info(self(), :message_queue_len) do
          {:message_queue_len, len} when len > @max_queue_len ->
            # "Reject" the call because the server is overloaded.
            {:reply, {:error, :overloaded}, state}
    
          _ ->
            # Mailbox is not full, process the request.
            # ... do actual work ...
            {:reply, :ok, %{state | processed: state.processed + 1}}
        end
      end

    Consider moving to `GenStage` or `Broadway` when:

    icon
    You need standardized, pull-based back-pressure across a multi-stage data processing pipeline.
    icon
    Your workload naturally fits a consumer-producer model (e.g., consuming from SQS).
    icon
    You need concurrent processing of events while preserving order within a partition.

    Migration can be incremental. You can embed a `GenStage` producer inside an existing `GenServer` and fan out from there.

    Sharding Hot Keys

    One GenServer → one mailbox. Hot keys will hit the limit. Partition with a `Registry`:

    key = :erlang.phash2(customer_id, 16)
    # Note: if `customer_id` is user-controllable, this could be
    # vulnerable to hash-collision attacks creating a hot shard.
    {:ok, pid} = MyShardSupervisor.start_child(key)
    

    Or reach for libraries like `hash_ring`. 

    Observability & Instrumentation

    You can't fix what you can't see. The BEAM emits rich `:telemetry` events - use them.

    :telemetry.execute([
      :my_app, :genserver, :callback, :stop
    ],
    %{duration: duration},
    %{module: __MODULE__, callback: :handle_call})

    Pipe these events into a library like `PromEx` to expose them to Grafana or Datadog. Add tracing (`OpenTelemetry`) around external calls to stitch latency graphs end-to-end. Set *budgets* (SLOs) and alert on 95th percentile, not averages.

    Conclusion

    A GenServer is a beautiful abstraction, but it hides sharp edges. With a clear mental model and a small set of techniques – non-blocking callbacks, state externalisation, batching, back-pressure, and solid instrumentation – you can take that weekend prototype and run it under serious production load.

    Every optimization is a trade-off. Always profile your application to identify true bottlenecks before adding complexity. Instrument first, optimise second.

    Next up in this series: distributed GenServers and cluster-wide coordination – we'll tackle hand-off, global registries, and truly elastic scaling. Stay tuned. In case you’re looking for hand-on Elixir expertise for your next project, drop us a line.

    Key Takeaways

    icon
    Non-blocking callbacks keep schedulers healthy.
    icon
    Move highly contended reads to ETS or `:persistent_term`, but understand the trade-offs.
    icon
    Instrument first, optimise second.
    icon
    Utilize supervision and back-pressure patterns to maintain resilience.

    Resources & Further Reading

    Official OTP docs –`gen_server`, `:erlang.process_info/2`.

    Fred Hébert – "Adopting Erlang/OTP" chapters on monitoring.

    Saša Jurić – "Elixir in Action", sections on performance.

    Erlang Solutions – "Designing for Scalability with Erlang/OTP". 

    Build Your Team
    with Freshcode
    Author
    linkedin
    Ihor Katkov
    Software Engineer

    Experienced IT professional and team leader with a specialty in the trading domain.

    linkedin
    Sofiia Yurkevska
    Content Writer

    Infodumper, storyteller and linguist in love with programming - what a mixture for your guide to the technology landscape!

    Share your idea

    Uploading...
    fileuploaded.jpg
    Upload failed. Max size for files is 10 MB.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    What happens after
    you fill this form?
    We review your inquiry and respond within 24 hours
    A 30-minute discovery call is scheduled with you
    We address your requirements and manage the paperwork
    You receive a tailored budget and timeline estimation

    Talk to our expert

    Kareryna Hruzkova

    Kate Hruzkova

    Elixir Partnerships

    Our team scaling strategy means Elixir developers perform from day one, so you keep your product on track, on time.

    We review your inquiry and respond within 24 hours

    A 30-minute discovery call is scheduled with you

    We address your requirements and manage the paperwork

    You receive a tailored budget and timeline estimation

    elixir logo

    Talk to our expert

    Nick Fursenko

    Nick Fursenko

    Account Executive

    With our proven expertise in web technology and project management, we deliver the solution you need.

    We review your inquiry and respond within 24 hours

    A 30-minute discovery call is scheduled with you

    We address your requirements and manage the paperwork

    You receive a tailored budget and timeline estimation

    Looking for a Trusted Outsourcing Partner?