22.5Data parallelism with rayon

Last updated June 13, 2026

The last four lessons built concurrency from the ground up: threads, channels, locks, the traits that keep it all safe. That's the machinery, and it's worth understanding. But for one extremely common job, parallelizing work over a collection, you don't have to touch any of it. The rayon crate turns a sequential iterator into a parallel one by changing a single method name, and it handles the threads, the work distribution, and the joining for you. This lesson is a victory lap: the chapter-19 iterators you already know, suddenly running on every core at once.

It's also your second real dependency, after rand back in chapter 7. Adding a crate to do something genuinely hard in one line is the Rust ecosystem at its best, and worth experiencing.

A sequential baseline

Here's a computation that does real work for each item: summing the squares of a range, but pretend each square is expensive. Sequentially, with the iterators from chapter 19:

fn main() {
    let total: u64 = (1..=1_000_000u64)
        .map(|n| n * n)
        .sum();

    println!("{total}");
}

333333833333500000

This runs on one core, one item after another. For a million cheap multiplications that's fine, but if the per-item work were heavy (processing images, hashing, simulating), doing them one at a time would leave most of your machine's cores idle. We'd like to split the items across all cores.

Add the dependency

rayon lives on crates.io. Add it the way you added rand (lesson 7.9):

$ cargo add rayon

    Updating crates.io index
      Adding rayon v1.10.0 to dependencies

That writes rayon = "1.10" into Cargo.toml's [dependencies]. (The exact version will be whatever's current; the 1.10 line is what these examples were written against.) Now the one import that unlocks everything:

use rayon::prelude::*;

rayon's prelude adds parallel-iterator methods to standard collections and ranges, the same way the standard prelude gives you Vec and String without imports.

Change one word

Here is the entire payoff. Take the sequential program and change .map(...) 's driver from a normal iterator to a parallel one, by swapping the entry point. For a range or collection you'd reach .iter(); the parallel version is .par_iter(). For an owned range, the parallel constructor reads (1..=n).into_par_iter():

use rayon::prelude::*;

fn main() {
    let total: u64 = (1..=1_000_000u64)
        .into_par_iter()
        .map(|n| n * n)
        .sum();

    println!("{total}");
}

333333833333500000

Same answer. But into_par_iter instead of the implicit sequential iteration means rayon splits the range into chunks, runs the map and sum on multiple threads across your CPU cores, and combines the partial results, all behind that one method. The .map(|n| n * n).sum() is identical to the sequential version; you wrote the same pipeline you learned in chapter 19, and only the entry point changed. On a multi-core machine with genuinely expensive per-item work, this finishes in a fraction of the sequential time.

For a Vec it's exactly as smooth: v.iter() becomes v.par_iter(), and the rest of the chain, filter, map, sum, collect, works unchanged.

use rayon::prelude::*;

fn main() {
    let numbers: Vec<u64> = (1..=20).collect();

    let sum_of_even_squares: u64 = numbers
        .par_iter()
        .filter(|&&n| n % 2 == 0)
        .map(|&n| n * n)
        .sum();

    println!("{sum_of_even_squares}");
}

par_iter() in place of iter(), and this filter-map-sum now runs in parallel. Everything else is the chapter-19 pipeline you already know.

Why this is safe, and not magic

rayon can parallelize your iterator without you writing a single lock because of Send and Sync (lesson 22.4). It requires the closures and data in your pipeline to be Send/Sync, and the compiler checks that for you. If your map closure tried to mutate shared non-thread-safe state, it wouldn't compile, the same fearless-concurrency guarantee, now protecting code you didn't even realize was concurrent. rayon isn't bypassing Rust's safety; it's standing on top of it. That's why "just change iter to par_iter" is actually safe advice and not a footgun.

When it helps, and when it doesn't

Parallelism isn't free: splitting work across threads and recombining has overhead. For the filter-map-sum over twenty tiny numbers above, the parallel version is almost certainly slower than sequential, the coordination costs more than the work saved. rayon shines when there are many items and the per-item work is non-trivial: processing thousands of files, transforming a large image, running a simulation step over a big grid. The rule of thumb: reach for par_iter when you have a real amount of independent, CPU-bound work, and measure (in release mode, lesson 19.7) to confirm it actually helped. For small or I/O-bound work, plain sequential iterators are the right call.

Best practice

When you have a CPU-bound computation over many independent items, write it as an ordinary chapter-19 iterator pipeline first, get it correct and readable, then, if profiling shows it's a bottleneck, switch the entry iterator to par_iter/into_par_iter and measure the speedup. Don't parallelize speculatively: for small inputs or I/O-bound work, the coordination overhead can make rayon slower. The beauty is that the change is one word, so there's no reason to parallelize before you know you need to.

Quiz time

Question #1

What single change turns a sequential iterator pipeline into a parallel one with rayon, and what stays the same?

Show solution

You change the entry point of the pipeline: .iter() becomes .par_iter() (or .into_par_iter() for owned iteration), after bringing in use rayon::prelude::*. Everything else, the map, filter, sum, collect, stays exactly the same. rayon splits the work across threads and recombines the results behind that one method; the pipeline you wrote in chapter 19 is unchanged.

Question #2

How can rayon parallelize your code safely without you writing any locks?

Show solution

It relies on Send/Sync (lesson 22.4): rayon's parallel-iterator methods require the closures and data to be thread-safe, and the compiler checks those bounds. If your pipeline tried to do something unsafe across threads (mutating shared non-thread-safe state), it wouldn't compile. So rayon builds on Rust's existing fearless-concurrency guarantee rather than bypassing it, the safety is the same one protecting the rest of the chapter.

Question #3

When is par_iter not worth using?

Show solution

When there are few items or the per-item work is trivial (the overhead of splitting and recombining outweighs the savings), or when the work is I/O-bound rather than CPU-bound (parallelism helps with CPU work, not waiting). For small or light workloads, parallel can be slower than sequential. Use par_iter for large amounts of independent, CPU-heavy work, and measure in release mode to confirm it helped.

That completes the concurrency chapter. The summary and quiz (22.x) tie threads, channels, shared state, the marker traits, and rayon together. After it, chapter 23 turns to a different flavor of concurrency, async, the model for handling thousands of waiting tasks (network connections, timers) without a thread for each, and it's where Rust is honest about showing some seams.