Rust Tutorial #28: Rust for AI/ML — Polars, Burn, PyO3

In the previous tutorial, we explored embedded Rust. Now we look at Rust for AI and Machine Learning — why Rust is growing in this space, what tools exist, and how to implement core ML concepts from scratch.

This tutorial builds everything from standard Rust. No heavy dependencies. You will understand the math and patterns behind AI/ML, and learn about the crates that make production use practical.

Why Rust for AI/ML?

Python dominates AI/ML. So why use Rust?

Speed — Rust is 10-100x faster than Python for computation
Memory efficiency — no garbage collector, predictable allocation
Concurrency — safe parallelism with no data races
Deployment — single binary, no Python runtime needed
Production — many Python ML libraries already use Rust/C++ underneath

The Rust ML ecosystem is growing fast:

Crate	Purpose	Python Equivalent
Polars	DataFrames	pandas
Burn	ML framework	PyTorch
ndarray	N-dimensional arrays	NumPy
PyO3	Python interop	—
candle	LLM inference	transformers
linfa	Classical ML	scikit-learn

Matrix Operations

Every ML algorithm uses matrices. Let’s build a basic matrix type:

#[derive(Debug, Clone, PartialEq)]
struct Matrix {
    data: Vec<f64>,
    rows: usize,
    cols: usize,
}

impl Matrix {
    fn new(rows: usize, cols: usize) -> Self {
        Self {
            data: vec![0.0; rows * cols],
            rows,
            cols,
        }
    }

    fn from_vec(rows: usize, cols: usize, data: Vec<f64>) -> Self {
        assert_eq!(data.len(), rows * cols);
        Self { data, rows, cols }
    }

    fn get(&self, row: usize, col: usize) -> f64 {
        self.data[row * self.cols + col]
    }

    fn set(&mut self, row: usize, col: usize, value: f64) {
        self.data[row * self.cols + col] = value;
    }

    fn shape(&self) -> (usize, usize) {
        (self.rows, self.cols)
    }
}

We store the matrix as a flat Vec<f64>. Element (i, j) is at index i * cols + j. This is row-major order — the same layout used by C and NumPy.

Transpose

Flipping rows and columns:

fn transpose(&self) -> Matrix {
    let mut result = Matrix::new(self.cols, self.rows);
    for r in 0..self.rows {
        for c in 0..self.cols {
            result.set(c, r, self.get(r, c));
        }
    }
    result
}

Matrix Multiplication

The core operation in neural networks:

fn multiply(&self, other: &Matrix) -> Matrix {
    assert_eq!(self.cols, other.rows);
    let mut result = Matrix::new(self.rows, other.cols);
    for i in 0..self.rows {
        for j in 0..other.cols {
            let mut sum = 0.0;
            for k in 0..self.cols {
                sum += self.get(i, k) * other.get(k, j);
            }
            result.set(i, j, sum);
        }
    }
    result
}

This is O(n^3) — the naive algorithm. Real libraries like ndarray and BLAS use optimized versions that are much faster.

Element-wise Operations

fn add(&self, other: &Matrix) -> Matrix {
    let data: Vec<f64> = self.data.iter()
        .zip(&other.data)
        .map(|(a, b)| a + b)
        .collect();
    Matrix::from_vec(self.rows, self.cols, data)
}

fn scalar_multiply(&self, scalar: f64) -> Matrix {
    let data: Vec<f64> = self.data.iter().map(|v| v * scalar).collect();
    Matrix::from_vec(self.rows, self.cols, data)
}

fn apply(&self, f: impl Fn(f64) -> f64) -> Matrix {
    let data: Vec<f64> = self.data.iter().map(|v| f(*v)).collect();
    Matrix::from_vec(self.rows, self.cols, data)
}

apply() is powerful. You can pass any function — sigmoid, relu, or any custom activation.

DataFrame Operations

Data processing is the first step in any ML pipeline. Here’s a simple DataFrame:

#[derive(Debug, Clone)]
enum ColumnData {
    Float(Vec<f64>),
    Int(Vec<i64>),
    Text(Vec<String>),
}

struct DataFrame {
    columns: Vec<(String, ColumnData)>,
}

Statistical Summary

#[derive(Debug)]
struct ColumnStats {
    count: usize,
    mean: f64,
    min: f64,
    max: f64,
    std_dev: f64,
}

fn describe_float(&self, name: &str) -> Option<ColumnStats> {
    let col = self.get_float_column(name)?;
    let sum: f64 = col.iter().sum();
    let mean = sum / col.len() as f64;
    let min = col.iter().cloned().fold(f64::INFINITY, f64::min);
    let max = col.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
    let variance = col.iter()
        .map(|v| (v - mean).powi(2))
        .sum::<f64>() / col.len() as f64;
    let std_dev = variance.sqrt();

    Some(ColumnStats { count: col.len(), mean, min, max, std_dev })
}

Filtering and Grouping

fn filter_by_float(&self, column: &str, predicate: impl Fn(f64) -> bool) -> Vec<usize> {
    if let Some(col) = self.get_float_column(column) {
        col.iter().enumerate()
            .filter(|(_, v)| predicate(**v))
            .map(|(i, _)| i)
            .collect()
    } else {
        vec![]
    }
}

fn group_by_int_sum_float(&self, group_col: &str, sum_col: &str) -> HashMap<i64, f64> {
    let groups = self.get_int_column(group_col);
    let values = self.get_float_column(sum_col);
    match (groups, values) {
        (Some(g), Some(v)) => {
            let mut result: HashMap<i64, f64> = HashMap::new();
            for (group, value) in g.iter().zip(v.iter()) {
                *result.entry(*group).or_insert(0.0) += value;
            }
            result
        }
        _ => HashMap::new(),
    }
}

In production, use Polars for DataFrames. It is 10-100x faster than pandas for large datasets and has a lazy evaluation engine.

Activation Functions

Neural networks use activation functions to introduce non-linearity:

fn sigmoid(x: f64) -> f64 {
    1.0 / (1.0 + (-x).exp())
}

fn sigmoid_derivative(x: f64) -> f64 {
    let s = sigmoid(x);
    s * (1.0 - s)
}

fn relu(x: f64) -> f64 {
    if x > 0.0 { x } else { 0.0 }
}

fn relu_derivative(x: f64) -> f64 {
    if x > 0.0 { 1.0 } else { 0.0 }
}

Sigmoid — squashes output to [0, 1]. Used for binary classification.
ReLU — simple and fast. Used in most modern networks.

Single Neuron — Learning an AND Gate

A single neuron with sigmoid activation can learn basic logic:

struct Neuron {
    weights: Vec<f64>,
    bias: f64,
    learning_rate: f64,
}

impl Neuron {
    fn new(num_inputs: usize, learning_rate: f64) -> Self {
        Self {
            weights: vec![0.0; num_inputs],
            bias: 0.0,
            learning_rate,
        }
    }

    fn forward(&self, inputs: &[f64]) -> f64 {
        let sum: f64 = self.weights.iter()
            .zip(inputs)
            .map(|(w, x)| w * x)
            .sum::<f64>() + self.bias;
        sigmoid(sum)
    }

    fn train(&mut self, inputs: &[f64], target: f64) -> f64 {
        let prediction = self.forward(inputs);
        let error = target - prediction;
        let raw_sum = self.weights.iter()
            .zip(inputs)
            .map(|(w, x)| w * x)
            .sum::<f64>() + self.bias;
        let gradient = error * sigmoid_derivative(raw_sum);

        for (w, x) in self.weights.iter_mut().zip(inputs) {
            *w += self.learning_rate * gradient * x;
        }
        self.bias += self.learning_rate * gradient;

        error.powi(2)
    }
}

Training on AND gate data:

let training_data = vec![
    (vec![0.0, 0.0], 0.0),
    (vec![0.0, 1.0], 0.0),
    (vec![1.0, 0.0], 0.0),
    (vec![1.0, 1.0], 1.0),
];

let mut neuron = Neuron::new(2, 0.5);
for _ in 0..2000 {
    for (inputs, target) in &training_data {
        neuron.train(inputs, *target);
    }
}

// After training:
// [0, 0] -> 0.03 (correctly near 0)
// [0, 1] -> 0.15 (correctly near 0)
// [1, 0] -> 0.15 (correctly near 0)
// [1, 1] -> 0.82 (correctly near 1)

This is backpropagation in its simplest form. Real neural networks have many layers and millions of parameters, but the principle is the same: compute error, compute gradient, update weights.

K-Nearest Neighbors (KNN)

KNN is the simplest classification algorithm. To classify a new point, find the K closest training points and use majority voting:

fn euclidean_distance(a: &[f64], b: &[f64]) -> f64 {
    a.iter().zip(b)
        .map(|(x, y)| (x - y).powi(2))
        .sum::<f64>()
        .sqrt()
}

fn knn_classify(
    train_data: &[DataPoint],
    query: &[f64],
    k: usize,
) -> String {
    let mut distances: Vec<(f64, &str)> = train_data
        .iter()
        .map(|p| (euclidean_distance(&p.features, query), p.label.as_str()))
        .collect();

    distances.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap());

    let mut votes: HashMap<&str, usize> = HashMap::new();
    for (_, label) in distances.iter().take(k) {
        *votes.entry(label).or_insert(0) += 1;
    }

    votes.into_iter()
        .max_by_key(|(_, count)| *count)
        .map(|(label, _)| label.to_string())
        .unwrap_or_default()
}

KNN has no training phase. It just stores the data and computes distances at prediction time. This makes it simple but slow for large datasets.

Linear Regression

Find the best-fit line through data points:

fn linear_regression(x: &[f64], y: &[f64]) -> (f64, f64) {
    let n = x.len() as f64;
    let sum_x: f64 = x.iter().sum();
    let sum_y: f64 = y.iter().sum();
    let sum_xy: f64 = x.iter().zip(y).map(|(a, b)| a * b).sum();
    let sum_x2: f64 = x.iter().map(|a| a * a).sum();

    let slope = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x * sum_x);
    let intercept = (sum_y - slope * sum_x) / n;

    (slope, intercept)
}

fn r_squared(x: &[f64], y: &[f64], slope: f64, intercept: f64) -> f64 {
    let mean_y = y.iter().sum::<f64>() / y.len() as f64;
    let ss_tot: f64 = y.iter().map(|yi| (yi - mean_y).powi(2)).sum();
    let ss_res: f64 = x.iter().zip(y).map(|(xi, yi)| {
        let pred = slope * xi + intercept;
        (yi - pred).powi(2)
    }).sum();

    1.0 - ss_res / ss_tot
}

R-squared tells you how well the line fits. 1.0 means perfect fit. 0.0 means the line is no better than just using the mean.

Feature Scaling

Most ML algorithms need normalized input. Raw values like “age: 25, salary: 80000” give too much weight to salary.

Min-Max Normalization

Scales values to [0, 1]:

fn min_max_normalize(data: &[f64]) -> Vec<f64> {
    let min = data.iter().cloned().fold(f64::INFINITY, f64::min);
    let max = data.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
    let range = max - min;
    if range == 0.0 {
        return vec![0.0; data.len()];
    }
    data.iter().map(|v| (v - min) / range).collect()
}

Z-Score Normalization

Centers data around 0 with unit standard deviation:

fn z_score_normalize(data: &[f64]) -> Vec<f64> {
    let mean = data.iter().sum::<f64>() / data.len() as f64;
    let variance = data.iter()
        .map(|v| (v - mean).powi(2))
        .sum::<f64>() / data.len() as f64;
    let std_dev = variance.sqrt();
    if std_dev == 0.0 {
        return vec![0.0; data.len()];
    }
    data.iter().map(|v| (v - mean) / std_dev).collect()
}

Use min-max when you know the data range. Use z-score when you don’t.

Confusion Matrix — Evaluating Models

A confusion matrix tells you how well a classifier performs:

struct ConfusionMatrix {
    true_positive: usize,
    true_negative: usize,
    false_positive: usize,
    false_negative: usize,
}

impl ConfusionMatrix {
    fn accuracy(&self) -> f64 {
        let total = self.true_positive + self.true_negative
            + self.false_positive + self.false_negative;
        (self.true_positive + self.true_negative) as f64 / total as f64
    }

    fn precision(&self) -> f64 {
        self.true_positive as f64
            / (self.true_positive + self.false_positive) as f64
    }

    fn recall(&self) -> f64 {
        self.true_positive as f64
            / (self.true_positive + self.false_negative) as f64
    }

    fn f1_score(&self) -> f64 {
        let p = self.precision();
        let r = self.recall();
        2.0 * p * r / (p + r)
    }
}

Accuracy — what fraction of predictions are correct
Precision — of items predicted positive, how many are actually positive
Recall — of actually positive items, how many did we find
F1 Score — harmonic mean of precision and recall

The Rust ML Ecosystem

Polars — DataFrames

Polars is the fastest DataFrame library. It’s written in Rust and available for Python too.

// Example Polars code (add polars to Cargo.toml)
use polars::prelude::*;

let df = df! {
    "name" => ["Alex", "Sam", "Jordan"],
    "age" => [25, 30, 35],
    "salary" => [50000, 65000, 80000],
}?;

// Filter, group, aggregate
let result = df.lazy()
    .filter(col("age").gt(lit(25)))
    .group_by([col("name")])
    .agg([col("salary").mean()])
    .collect()?;

Polars uses lazy evaluation — it builds a query plan and optimizes it before running. This makes complex pipelines fast.

Burn — ML Framework

Burn is a deep learning framework written in Rust. It supports multiple backends (CPU, CUDA, Metal, WebGPU).

// Conceptual Burn example
use burn::prelude::*;

#[derive(Module, Debug)]
struct MyModel<B: Backend> {
    linear1: Linear<B>,
    linear2: Linear<B>,
}

impl<B: Backend> MyModel<B> {
    fn forward(&self, input: Tensor<B, 2>) -> Tensor<B, 2> {
        let x = self.linear1.forward(input);
        let x = relu(x);
        self.linear2.forward(x)
    }
}

Burn feels like PyTorch but with Rust’s type safety. The backend is generic — switch between CPU and GPU without changing model code.

PyO3 — Python Interop

PyO3 lets you call Python from Rust or Rust from Python:

// Expose Rust functions to Python
use pyo3::prelude::*;

#[pyfunction]
fn fast_compute(data: Vec<f64>) -> f64 {
    data.iter().map(|x| x.powi(2)).sum()
}

#[pymodule]
fn my_rust_module(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(fast_compute, m)?)?;
    Ok(())
}

Then in Python:

import my_rust_module
result = my_rust_module.fast_compute([1.0, 2.0, 3.0])

This is how many Python libraries get their speed — Polars, pydantic, and ruff are all written in Rust with Python bindings.

Candle — LLM Inference

Candle is Hugging Face’s Rust framework for running large language models. It’s designed for fast inference, not training.

ndarray — N-dimensional Arrays

Like NumPy for Rust. Provides efficient array operations with BLAS support.

When to Use Rust for ML

Use Rust when:

You need maximum performance (real-time inference, edge devices)
You’re building data pipelines that process terabytes
You want a single binary deployment (no Python environment)
You’re building a library that Python users will call via PyO3

Stay with Python when:

You’re prototyping and experimenting
You need the latest model architectures (PyTorch ecosystem is bigger)
Training speed is less important than development speed

The best approach is often both: prototype in Python, deploy critical paths in Rust.

Source Code

You can find the complete source code for this tutorial on GitHub:

kemalcodes/rust-tutorial (branch: tutorial-28-ai-ml)

What’s Next?

In the next tutorial, we explore WebAssembly with Rust — compiling Rust to WASM, wasm-bindgen, and the Leptos framework.

Why Rust for AI/ML?#

Matrix Operations#

Transpose#

Matrix Multiplication#

Element-wise Operations#

DataFrame Operations#

Statistical Summary#

Filtering and Grouping#

Activation Functions#

Single Neuron — Learning an AND Gate#

K-Nearest Neighbors (KNN)#

Linear Regression#

Feature Scaling#

Min-Max Normalization#

Z-Score Normalization#

Confusion Matrix — Evaluating Models#

The Rust ML Ecosystem#

Polars — DataFrames#

Burn — ML Framework#

PyO3 — Python Interop#

Candle — LLM Inference#

ndarray — N-dimensional Arrays#

When to Use Rust for ML#

Source Code#

What’s Next?#

Related Articles#