Using Clojure Transducers: A Practical Guide

Insights /

Practical Guide to Clojure Transducers

August 26, 2024

7 min read

Clojure

Oleksandr Druk

Clojure Developer

Sofiia Yurkevska

Content Writer

September 6, 2024

Freshcode

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Data transformation has been fundamental since the dawn of programming. But what if you could make these transformations more efficient, composable, and versatile?

That's what transducers are to a Clojure developer. They offer a powerful blend of efficiency, flexibility, and composability that significantly enhances your code's performance and readability. Whether you're working with collections, streams, or channels, transducers provide a unified approach to data transformation that's hard to beat. Let’s see how exactly they do that.

What are Transducers?

Transducers provide a highly efficient and composable way to process and/or transform data. A transducer is a function that describes the transformation process without knowing how exactly the thing it transforms is organized. This means the same transformation can be applied to various data structures like collections, streams, or channels without modification, greatly enhancing flexibility.

Transducers are versatile because they are independent of the context in which they are applied. They abstract away the input source and the accumulation mechanism, focusing purely on the transformation logic. Another benefit of this is that transducers can reduce developers' cognitive load. They can think about each transformation step independently without worrying about how data flows. This separation of concerns leads to clearer, more understandable code.

They avoid creating intermediate collections, reducing memory usage and improving performance.

They can be composed together to create complex transformations from simpler ones. While supporting eager evaluation, transducers also work well with lazy evaluation. This flexibility allows for efficient processing of potentially infinite sequences, a common scenario in many programming tasks. They can be easily parallelized, enabling efficient use of multi-core processors without changing the core transformation logic.

Another significant advantage is reusability. Transducers can be defined once and used in multiple contexts, whether with different collections or in core async channels, adhering to the DRY principle, a cornerstone of good software design.

Creating and Using Transducers

Let's start with a basic example to understand how transducers work. Suppose we want to transform a sequence of numbers by tripling them and filtering out even ones.

  (def numbers [1 2 3 4 5 6 7 8 9 10])

  (defn triple [x]
    (* x 3))

  (->> numbers
       (map triple)
       (filter even?)) ; => (6 12 18 24 30)

In the code above, map creates an intermediate collection passed to filter. This can be inefficient for large data sets. Now, let's see how we can achieve the same result using transducers:

  (def numbers [1 2 3 4 5 6 7 8 9 10])

  (defn triple [x]
    (* x 3))

  (def xform
    (comp
      (map triple)
      (filter even?)))

  (transduce xform conj [] numbers) ; => [6 12 18 24 30]

Here, we use the comp function to compose the map and filter transducers into a single transformation, xform. Then, the transduce function applies this transformation to the numbers collection, accumulating the results into an empty vector. The key difference is that no intermediate collections are created. The transducer processes each element through all steps before moving to the next one.

Notice that we used the transduce function to get a result in the example above. Four functions in the Clojure core take transducers as arguments: transduce, into, sequence, and eduction.

transduce

transduce is a special reduce for transducers. Usually, the reduce function goes through a list of numbers, for instance, applying a particular rule each time, and gives us the final result without creating a collection to keep intermediate results. So, transduce takes a whole bunch of rules and does the same:

  (transduce xform f coll)
  (transduce xform f init coll)

What happens here is:

we have a coll to process;
we have some set of rules inside the xform that we’ll run on top of f, some kind of reducing function telling how to accumulate results;
transduce kicks the process started immediately, and we can finally apply what we have in xform, putting results into init by the rule provided in f.

into

Use into to transform the input collection into a certain output collection as quickly as possible.

Use into to transform the input collection into a certain output collection as quickly as possible. into is good when conj is your reducing function of choice because you can not change it here.

  (into #{}
        (comp
          (take 8)
          (filter even?)
          (map triple))
        numbers) ; => #{24 6 12 18}

sequence

We mentioned a transduce reducing over a collection immediately, i.e. not lazily. When you need a lazy sequence, sequence can make it happen.

  (def xs
    (sequence
      (comp (filter even?) (map triple))
      numbers))

  (type xs) ; => clojure.lang.LazySeq

  (take 3 xs) ; => (6 12 18)

If you have a chain of transformations using ->>, those chains function one after another, and you want to make it faster with transducers, try using the sequence function. It's usually the easiest way to convert your existing code to use transducers.

eduction

Use the eduction function to capture the process of applying a transducer to a collection. It takes transducers (or xform, if you will), and instead of running them on the spot, it creates a plan for how to run them. The advantage of eduction is that it's efficient for situations where you might want to apply the same transformations multiple times or when working with data from external sources (like files). You create the plan once, and then you can use it whenever you need it without redoing all the setup each time.

  (def iter (eduction xf (range 5)))
  (reduce + 0 iter) ; => 6

From my experience, eduction is rarely used in the real world.

To better understand the distinctions between these core transducer functions and guide your choice in various scenarios, let's examine the following comparison table:

transduce

into

sequence

eduction

Primary Use

General-purpose reduction

Collection transformation

Lazy sequence creation

Delayed transducer application

Execution

Eager

Lazy

Delayed

Result Type

Any (determined by reducing function)

Collection

Lazy Sequence

Reducible/Iterable

Reducing Function

Custom

Always conj

N/A

Initial Value

Optional

Target collection

N/A

Memory Efficiency

High

High (due to laziness)

High

Use with Infinite Sequences

Yes

Flexibility

High

Medium

High

Typical Use Case

Complex reductions

Simple collection transformations

Working with large data sets

Reusable transformations

Refactoring from

reduce

map/filter chains

->> macro

N/A

Integration with core.async

Indirect

Direct

Code Conciseness

Medium

High

Medium

Performance

Very High

Primary Use

General-purpose reduction

Execution

Eager

Result Type

Any (determined by reducing function)

Reducing Function

Custom

Initial Value

Optional

Memory Efficiency

High

Use with Infinite Sequences

Flexibility

High

Typical Use Case

Complex reductions

Refactoring from

reduce

Integration with core.async

Indirect

Code Conciseness

Medium

Performance

Very High

Primary Use

Collection transformation

Execution

Eager

Result Type

Collection

Reducing Function

Always conj

Initial Value

Target collection

Memory Efficiency

High

Use with Infinite Sequences

Flexibility

Medium

Typical Use Case

Simple collection transformations

Refactoring from

map/filter chains

Integration with core.async

Indirect

Code Conciseness

High

Performance

Very High

Primary Use

Lazy sequence creation

Execution

Lazy

Result Type

Lazy Sequence

Reducing Function

N/A

Initial Value

N/A

Memory Efficiency

High (due to laziness)

Use with Infinite Sequences

Yes

Flexibility

Medium

Typical Use Case

Working with large data sets

Refactoring from

->> macro

Integration with core.async

Indirect

Code Conciseness

High

Performance

Very High

Primary Use

Delayed transducer application

Execution

Delayed

Result Type

Reducible/Iterable

Reducing Function

N/A

Initial Value

N/A

Memory Efficiency

High

Use with Infinite Sequences

Yes

Flexibility

High

Typical Use Case

Reusable transformations

Refactoring from

N/A

Integration with core.async

Direct

Code Conciseness

Medium

Performance

Very High

Transducers with clojure.async

Transducers can also be used with core.async channels, providing a powerful way to transform data as it flows through a channel.

  (require '[clojure.core.async :as a])

  (def xform
    (comp
      (map triple)
      (filter even?)))

  (def ch (a/chan 10 xform))

  (a/go
    (doseq [x numbers]
      (a/>! ch x))
    (a/close! ch))

  (a/go-loop []
    (when-let [x (a/<! ch)]
      (println x)
      (recur)))

  ;; Output: 6 12 18 24 30

Performance

Transducers perform transformations without creating intermediate collections, resulting in significant performance gains on large collections. They also offer a more efficient means of processing sequences by eliminating the creation of intermediate lazy sequences.

(quick-bench
  (->> (range 1e6)
       (filter odd?)
       (map inc)
       (take 1000000)
       (vec))) ; => Execution time mean : 109.297682 ms

(quick-bench
  (into []
        (comp
          (filter odd?)
          (map inc)
          (take 1000000))
        (range 1e6))) ; => Execution time mean : 60.394658 ms

As we can see, the transducer version is nearly twice as fast as the traditional approach. Here’s why:

By eliminating intermediate collections, transducers significantly reduce memory allocations.

Transducers combine multiple operations into a single pass, reducing the number of function calls and their associated overhead.

The streamlined nature of transducers allows for better JVM optimizations, including potential inlining of operations.

While lazy sequences are powerful, they can introduce overhead. Transducers provide an alternative that can be more efficient for certain operation types.

Writing transducer-friendly code

It's not rare to see functions like this in the wild:

  (defn increment-all [coll]
    (map inc coll))

  (defn filter-evens [coll]
    (filter even? coll))

  (defn double-all [coll]
    (map #(* 2 %) coll))

We can easily thread these functions but can’t make them transducers. The code is not transducer-friendly. We can make the function a multi-arity function that supports 0-arity and 1-arity. The 0-arity returns a transducer, and the 1-arity returns the result of the transformation.

So, instead, we get something like this:

  (defn increment-all
    ([]
     (map inc))
    ([coll]
     (sequence (increment-all) coll)))

  (defn filter-evens
    ([]
     (filter even?))
    ([coll]
     (sequence (filter-evens) coll)))

  (defn double-all
    ([]
     (map #(* 2 %)))
    ([coll]
     (sequence (double-all) coll)))

The beauty of writing functions like this is that the caller can choose between transducers and plain old seq functions.

  ;; thread version
  (->> numbers
       (increment-all)
       (filter-evens)
       (double-all)) ; => (4 8 12 16 20)

  ;; transducer version
  (def xform
    (comp
      (increment-all)
      (filter-evens)
      (double-all)))

  (into [] xform numbers) ; => [4 8 12 16 20]

When writing transducer-friendly code:

Use type hints to reduce reflection, especially in performance-critical sections where possible.

Consider using primitive-aware functions to avoid boxing/unboxing overhead for numerical operations.

Always profile your code to identify actual bottlenecks before optimizing. Tools like Criterium can help benchmark different approaches.

Ensure that the code remains readable and maintainable.

The performance difference may be negligible for small collections or simple transformations. Use transducers judiciously.

Conclusion

Transducers are a powerful feature in Clojure, enabling efficient, composable, and reusable data transformations. They abstract away the source and accumulation, allowing you to focus solely on the transformation logic. They let us compose powerful transformations from simple parts, apply those transformations to pretty much anything we want, and then separately decide how to build the final result of those transformations, boosting your confidence in the performance of your code.

Perhaps you're looking to integrate these advanced techniques into your existing codebase but aren't sure where to start. Feel free to reach out to discuss how we can assist you in harnessing the full power of Clojure for your development needs.