In the previous tutorial, we explored embedded Rust. Now we look at Rust for AI and Machine Learning — why Rust is growing in this space, what tools exist, and how to implement core ML concepts from scratch.
This tutorial builds everything from standard Rust. No heavy dependencies. You will understand the math and patterns behind AI/ML, and learn about the crates that make production use practical.
Why Rust for AI/ML?
Python dominates AI/ML. So why use Rust?
- Speed — Rust is 10-100x faster than Python for computation
- Memory efficiency — no garbage collector, predictable allocation
- Concurrency — safe parallelism with no data races
- Deployment — single binary, no Python runtime needed
- Production — many Python ML libraries already use Rust/C++ underneath
The Rust ML ecosystem is growing fast:
| Crate | Purpose | Python Equivalent |
|---|---|---|
| Polars | DataFrames | pandas |
| Burn | ML framework | PyTorch |
| ndarray | N-dimensional arrays | NumPy |
| PyO3 | Python interop | — |
| candle | LLM inference | transformers |
| linfa | Classical ML | scikit-learn |
Matrix Operations
Every ML algorithm uses matrices. Let’s build a basic matrix type:
#[derive(Debug, Clone, PartialEq)]
struct Matrix {
data: Vec<f64>,
rows: usize,
cols: usize,
}
impl Matrix {
fn new(rows: usize, cols: usize) -> Self {
Self {
data: vec![0.0; rows * cols],
rows,
cols,
}
}
fn from_vec(rows: usize, cols: usize, data: Vec<f64>) -> Self {
assert_eq!(data.len(), rows * cols);
Self { data, rows, cols }
}
fn get(&self, row: usize, col: usize) -> f64 {
self.data[row * self.cols + col]
}
fn set(&mut self, row: usize, col: usize, value: f64) {
self.data[row * self.cols + col] = value;
}
fn shape(&self) -> (usize, usize) {
(self.rows, self.cols)
}
}
We store the matrix as a flat Vec<f64>. Element (i, j) is at index i * cols + j. This is row-major order — the same layout used by C and NumPy.
Transpose
Flipping rows and columns:
fn transpose(&self) -> Matrix {
let mut result = Matrix::new(self.cols, self.rows);
for r in 0..self.rows {
for c in 0..self.cols {
result.set(c, r, self.get(r, c));
}
}
result
}
Matrix Multiplication
The core operation in neural networks:
fn multiply(&self, other: &Matrix) -> Matrix {
assert_eq!(self.cols, other.rows);
let mut result = Matrix::new(self.rows, other.cols);
for i in 0..self.rows {
for j in 0..other.cols {
let mut sum = 0.0;
for k in 0..self.cols {
sum += self.get(i, k) * other.get(k, j);
}
result.set(i, j, sum);
}
}
result
}
This is O(n^3) — the naive algorithm. Real libraries like ndarray and BLAS use optimized versions that are much faster.
Element-wise Operations
fn add(&self, other: &Matrix) -> Matrix {
let data: Vec<f64> = self.data.iter()
.zip(&other.data)
.map(|(a, b)| a + b)
.collect();
Matrix::from_vec(self.rows, self.cols, data)
}
fn scalar_multiply(&self, scalar: f64) -> Matrix {
let data: Vec<f64> = self.data.iter().map(|v| v * scalar).collect();
Matrix::from_vec(self.rows, self.cols, data)
}
fn apply(&self, f: impl Fn(f64) -> f64) -> Matrix {
let data: Vec<f64> = self.data.iter().map(|v| f(*v)).collect();
Matrix::from_vec(self.rows, self.cols, data)
}
apply() is powerful. You can pass any function — sigmoid, relu, or any custom activation.
DataFrame Operations
Data processing is the first step in any ML pipeline. Here’s a simple DataFrame:
#[derive(Debug, Clone)]
enum ColumnData {
Float(Vec<f64>),
Int(Vec<i64>),
Text(Vec<String>),
}
struct DataFrame {
columns: Vec<(String, ColumnData)>,
}
Statistical Summary
#[derive(Debug)]
struct ColumnStats {
count: usize,
mean: f64,
min: f64,
max: f64,
std_dev: f64,
}
fn describe_float(&self, name: &str) -> Option<ColumnStats> {
let col = self.get_float_column(name)?;
let sum: f64 = col.iter().sum();
let mean = sum / col.len() as f64;
let min = col.iter().cloned().fold(f64::INFINITY, f64::min);
let max = col.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
let variance = col.iter()
.map(|v| (v - mean).powi(2))
.sum::<f64>() / col.len() as f64;
let std_dev = variance.sqrt();
Some(ColumnStats { count: col.len(), mean, min, max, std_dev })
}
Filtering and Grouping
fn filter_by_float(&self, column: &str, predicate: impl Fn(f64) -> bool) -> Vec<usize> {
if let Some(col) = self.get_float_column(column) {
col.iter().enumerate()
.filter(|(_, v)| predicate(**v))
.map(|(i, _)| i)
.collect()
} else {
vec![]
}
}
fn group_by_int_sum_float(&self, group_col: &str, sum_col: &str) -> HashMap<i64, f64> {
let groups = self.get_int_column(group_col);
let values = self.get_float_column(sum_col);
match (groups, values) {
(Some(g), Some(v)) => {
let mut result: HashMap<i64, f64> = HashMap::new();
for (group, value) in g.iter().zip(v.iter()) {
*result.entry(*group).or_insert(0.0) += value;
}
result
}
_ => HashMap::new(),
}
}
In production, use Polars for DataFrames. It is 10-100x faster than pandas for large datasets and has a lazy evaluation engine.
Activation Functions
Neural networks use activation functions to introduce non-linearity:
fn sigmoid(x: f64) -> f64 {
1.0 / (1.0 + (-x).exp())
}
fn sigmoid_derivative(x: f64) -> f64 {
let s = sigmoid(x);
s * (1.0 - s)
}
fn relu(x: f64) -> f64 {
if x > 0.0 { x } else { 0.0 }
}
fn relu_derivative(x: f64) -> f64 {
if x > 0.0 { 1.0 } else { 0.0 }
}
- Sigmoid — squashes output to [0, 1]. Used for binary classification.
- ReLU — simple and fast. Used in most modern networks.
Single Neuron — Learning an AND Gate
A single neuron with sigmoid activation can learn basic logic:
struct Neuron {
weights: Vec<f64>,
bias: f64,
learning_rate: f64,
}
impl Neuron {
fn new(num_inputs: usize, learning_rate: f64) -> Self {
Self {
weights: vec![0.0; num_inputs],
bias: 0.0,
learning_rate,
}
}
fn forward(&self, inputs: &[f64]) -> f64 {
let sum: f64 = self.weights.iter()
.zip(inputs)
.map(|(w, x)| w * x)
.sum::<f64>() + self.bias;
sigmoid(sum)
}
fn train(&mut self, inputs: &[f64], target: f64) -> f64 {
let prediction = self.forward(inputs);
let error = target - prediction;
let raw_sum = self.weights.iter()
.zip(inputs)
.map(|(w, x)| w * x)
.sum::<f64>() + self.bias;
let gradient = error * sigmoid_derivative(raw_sum);
for (w, x) in self.weights.iter_mut().zip(inputs) {
*w += self.learning_rate * gradient * x;
}
self.bias += self.learning_rate * gradient;
error.powi(2)
}
}
Training on AND gate data:
let training_data = vec![
(vec![0.0, 0.0], 0.0),
(vec![0.0, 1.0], 0.0),
(vec![1.0, 0.0], 0.0),
(vec![1.0, 1.0], 1.0),
];
let mut neuron = Neuron::new(2, 0.5);
for _ in 0..2000 {
for (inputs, target) in &training_data {
neuron.train(inputs, *target);
}
}
// After training:
// [0, 0] -> 0.03 (correctly near 0)
// [0, 1] -> 0.15 (correctly near 0)
// [1, 0] -> 0.15 (correctly near 0)
// [1, 1] -> 0.82 (correctly near 1)
This is backpropagation in its simplest form. Real neural networks have many layers and millions of parameters, but the principle is the same: compute error, compute gradient, update weights.
K-Nearest Neighbors (KNN)
KNN is the simplest classification algorithm. To classify a new point, find the K closest training points and use majority voting:
fn euclidean_distance(a: &[f64], b: &[f64]) -> f64 {
a.iter().zip(b)
.map(|(x, y)| (x - y).powi(2))
.sum::<f64>()
.sqrt()
}
fn knn_classify(
train_data: &[DataPoint],
query: &[f64],
k: usize,
) -> String {
let mut distances: Vec<(f64, &str)> = train_data
.iter()
.map(|p| (euclidean_distance(&p.features, query), p.label.as_str()))
.collect();
distances.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap());
let mut votes: HashMap<&str, usize> = HashMap::new();
for (_, label) in distances.iter().take(k) {
*votes.entry(label).or_insert(0) += 1;
}
votes.into_iter()
.max_by_key(|(_, count)| *count)
.map(|(label, _)| label.to_string())
.unwrap_or_default()
}
KNN has no training phase. It just stores the data and computes distances at prediction time. This makes it simple but slow for large datasets.
Linear Regression
Find the best-fit line through data points:
fn linear_regression(x: &[f64], y: &[f64]) -> (f64, f64) {
let n = x.len() as f64;
let sum_x: f64 = x.iter().sum();
let sum_y: f64 = y.iter().sum();
let sum_xy: f64 = x.iter().zip(y).map(|(a, b)| a * b).sum();
let sum_x2: f64 = x.iter().map(|a| a * a).sum();
let slope = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x * sum_x);
let intercept = (sum_y - slope * sum_x) / n;
(slope, intercept)
}
fn r_squared(x: &[f64], y: &[f64], slope: f64, intercept: f64) -> f64 {
let mean_y = y.iter().sum::<f64>() / y.len() as f64;
let ss_tot: f64 = y.iter().map(|yi| (yi - mean_y).powi(2)).sum();
let ss_res: f64 = x.iter().zip(y).map(|(xi, yi)| {
let pred = slope * xi + intercept;
(yi - pred).powi(2)
}).sum();
1.0 - ss_res / ss_tot
}
R-squared tells you how well the line fits. 1.0 means perfect fit. 0.0 means the line is no better than just using the mean.
Feature Scaling
Most ML algorithms need normalized input. Raw values like “age: 25, salary: 80000” give too much weight to salary.
Min-Max Normalization
Scales values to [0, 1]:
fn min_max_normalize(data: &[f64]) -> Vec<f64> {
let min = data.iter().cloned().fold(f64::INFINITY, f64::min);
let max = data.iter().cloned().fold(f64::NEG_INFINITY, f64::max);
let range = max - min;
if range == 0.0 {
return vec![0.0; data.len()];
}
data.iter().map(|v| (v - min) / range).collect()
}
Z-Score Normalization
Centers data around 0 with unit standard deviation:
fn z_score_normalize(data: &[f64]) -> Vec<f64> {
let mean = data.iter().sum::<f64>() / data.len() as f64;
let variance = data.iter()
.map(|v| (v - mean).powi(2))
.sum::<f64>() / data.len() as f64;
let std_dev = variance.sqrt();
if std_dev == 0.0 {
return vec![0.0; data.len()];
}
data.iter().map(|v| (v - mean) / std_dev).collect()
}
Use min-max when you know the data range. Use z-score when you don’t.
Confusion Matrix — Evaluating Models
A confusion matrix tells you how well a classifier performs:
struct ConfusionMatrix {
true_positive: usize,
true_negative: usize,
false_positive: usize,
false_negative: usize,
}
impl ConfusionMatrix {
fn accuracy(&self) -> f64 {
let total = self.true_positive + self.true_negative
+ self.false_positive + self.false_negative;
(self.true_positive + self.true_negative) as f64 / total as f64
}
fn precision(&self) -> f64 {
self.true_positive as f64
/ (self.true_positive + self.false_positive) as f64
}
fn recall(&self) -> f64 {
self.true_positive as f64
/ (self.true_positive + self.false_negative) as f64
}
fn f1_score(&self) -> f64 {
let p = self.precision();
let r = self.recall();
2.0 * p * r / (p + r)
}
}
- Accuracy — what fraction of predictions are correct
- Precision — of items predicted positive, how many are actually positive
- Recall — of actually positive items, how many did we find
- F1 Score — harmonic mean of precision and recall
The Rust ML Ecosystem
Polars — DataFrames
Polars is the fastest DataFrame library. It’s written in Rust and available for Python too.
// Example Polars code (add polars to Cargo.toml)
use polars::prelude::*;
let df = df! {
"name" => ["Alex", "Sam", "Jordan"],
"age" => [25, 30, 35],
"salary" => [50000, 65000, 80000],
}?;
// Filter, group, aggregate
let result = df.lazy()
.filter(col("age").gt(lit(25)))
.group_by([col("name")])
.agg([col("salary").mean()])
.collect()?;
Polars uses lazy evaluation — it builds a query plan and optimizes it before running. This makes complex pipelines fast.
Burn — ML Framework
Burn is a deep learning framework written in Rust. It supports multiple backends (CPU, CUDA, Metal, WebGPU).
// Conceptual Burn example
use burn::prelude::*;
#[derive(Module, Debug)]
struct MyModel<B: Backend> {
linear1: Linear<B>,
linear2: Linear<B>,
}
impl<B: Backend> MyModel<B> {
fn forward(&self, input: Tensor<B, 2>) -> Tensor<B, 2> {
let x = self.linear1.forward(input);
let x = relu(x);
self.linear2.forward(x)
}
}
Burn feels like PyTorch but with Rust’s type safety. The backend is generic — switch between CPU and GPU without changing model code.
PyO3 — Python Interop
PyO3 lets you call Python from Rust or Rust from Python:
// Expose Rust functions to Python
use pyo3::prelude::*;
#[pyfunction]
fn fast_compute(data: Vec<f64>) -> f64 {
data.iter().map(|x| x.powi(2)).sum()
}
#[pymodule]
fn my_rust_module(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(fast_compute, m)?)?;
Ok(())
}
Then in Python:
import my_rust_module
result = my_rust_module.fast_compute([1.0, 2.0, 3.0])
This is how many Python libraries get their speed — Polars, pydantic, and ruff are all written in Rust with Python bindings.
Candle — LLM Inference
Candle is Hugging Face’s Rust framework for running large language models. It’s designed for fast inference, not training.
ndarray — N-dimensional Arrays
Like NumPy for Rust. Provides efficient array operations with BLAS support.
When to Use Rust for ML
Use Rust when:
- You need maximum performance (real-time inference, edge devices)
- You’re building data pipelines that process terabytes
- You want a single binary deployment (no Python environment)
- You’re building a library that Python users will call via PyO3
Stay with Python when:
- You’re prototyping and experimenting
- You need the latest model architectures (PyTorch ecosystem is bigger)
- Training speed is less important than development speed
The best approach is often both: prototype in Python, deploy critical paths in Rust.
Source Code
You can find the complete source code for this tutorial on GitHub:
kemalcodes/rust-tutorial (branch: tutorial-28-ai-ml)
What’s Next?
In the next tutorial, we explore WebAssembly with Rust — compiling Rust to WASM, wasm-bindgen, and the Leptos framework.