PyTorch is the standard framework for deep learning research and production. Most AI papers, Hugging Face models, and state-of-the-art systems use PyTorch.

This article gets you from zero to a working neural network.

Setup

pip install torch torchvision
import torch
print(torch.__version__)   # 2.x
print(torch.cuda.is_available())  # True if you have a GPU

Tensors

A tensor is the fundamental data structure in PyTorch. It is like a NumPy array, but it can run on GPU and supports automatic differentiation.

Creating Tensors

import torch

# From a Python list
a = torch.tensor([1.0, 2.0, 3.0])
print(a)         # tensor([1., 2., 3.])
print(a.shape)   # torch.Size([3])
print(a.dtype)   # torch.float32

# 2D tensor
m = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
print(m.shape)   # torch.Size([2, 2])

# Zeros, ones, random
zeros = torch.zeros(3, 4)
ones = torch.ones(2, 3)
rand = torch.rand(2, 3)        # uniform random [0, 1)
randn = torch.randn(2, 3)      # normal distribution
arange = torch.arange(0, 10, 2)  # [0, 2, 4, 6, 8]

Tensor Operations

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print(a + b)         # tensor([5., 7., 9.])
print(a * b)         # tensor([ 4., 10., 18.])
print(torch.dot(a, b))  # tensor(32.)  (dot product)

# Matrix multiplication
m1 = torch.ones(2, 3)
m2 = torch.ones(3, 4)
print(torch.mm(m1, m2).shape)   # torch.Size([2, 4])
# Or use the @ operator
print((m1 @ m2).shape)          # torch.Size([2, 4])

# Reshaping
x = torch.arange(12, dtype=torch.float32)
print(x.reshape(3, 4))   # 3 rows, 4 columns
print(x.view(2, 6))      # alternative reshape (shares memory)

Moving Between NumPy and PyTorch

import numpy as np

# NumPy → PyTorch
arr = np.array([1.0, 2.0, 3.0])
tensor = torch.from_numpy(arr)

# PyTorch → NumPy
tensor = torch.tensor([1.0, 2.0, 3.0])
arr = tensor.numpy()

Autograd: Automatic Differentiation

Autograd is what makes PyTorch powerful. It automatically computes gradients — which are needed for training.

# requires_grad=True tells PyTorch to track this tensor
x = torch.tensor(3.0, requires_grad=True)

# Define a computation
y = x ** 2 + 2 * x + 1

# Compute gradients
y.backward()

# dy/dx = 2x + 2 = 2*3 + 2 = 8
print(x.grad)   # tensor(8.)

In raw PyTorch, you call .backward() yourself inside the training loop — once per batch. Higher-level frameworks like PyTorch Lightning hide this call, but you still need to understand what it does.

GPU Usage

# Check device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using: {device}")

# Move tensor to GPU
x = torch.randn(3, 3)
x = x.to(device)

# Create tensor directly on GPU
x_gpu = torch.randn(3, 3, device=device)

On Apple Silicon, use mps instead:

device = (
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using: {device}")

Building a Model with nn.Module

nn.Module is the base class for all neural networks in PyTorch.

import torch.nn as nn

class MyNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, output_size),
        )

    def forward(self, x):
        return self.network(x)

model = MyNet(input_size=8, hidden_size=64, output_size=1)
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params:,}")  # 4,801

Full Training Loop

Here is a complete example: train a neural net to predict California housing prices. The output values are approximate — your exact numbers will vary because the model weights are not seeded.

Add torch.manual_seed(42) before creating the model if you want reproducible results.

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

# --- 1. Load and prepare data ---
housing = fetch_california_housing()
X, y = housing.data, housing.target.reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Normalize features (important for neural nets)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.float32)

# DataLoader for batching
train_ds = TensorDataset(X_train_t, y_train_t)
train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)

# --- 2. Define model ---
device = (
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available()
    else "cpu"
)

model = MyNet(input_size=8, hidden_size=64, output_size=1).to(device)

# --- 3. Loss and optimizer ---
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# --- 4. Training loop ---
EPOCHS = 50

for epoch in range(EPOCHS):
    model.train()
    total_loss = 0.0

    for X_batch, y_batch in train_loader:
        X_batch = X_batch.to(device)
        y_batch = y_batch.to(device)

        # Forward pass
        predictions = model(X_batch)
        loss = criterion(predictions, y_batch)

        # Backward pass
        optimizer.zero_grad()   # clear old gradients
        loss.backward()         # compute gradients
        optimizer.step()        # update weights

        total_loss += loss.item()

    if (epoch + 1) % 10 == 0:
        avg_loss = total_loss / len(train_loader)
        print(f"Epoch {epoch+1}/{EPOCHS} — Loss: {avg_loss:.4f}")

# --- 5. Evaluate ---
model.eval()
with torch.no_grad():
    X_test_dev = X_test_t.to(device)
    y_pred = model(X_test_dev).cpu()
    test_loss = criterion(y_pred, y_test_t)
    rmse = torch.sqrt(test_loss)
    print(f"\nTest RMSE: {rmse:.4f}")

Expected output:

Epoch 10/50 — Loss: 0.4821
Epoch 20/50 — Loss: 0.3974
Epoch 30/50 — Loss: 0.3612
Epoch 40/50 — Loss: 0.3401
Epoch 50/50 — Loss: 0.3287

Test RMSE: 0.5823

Making Predictions

model.eval()
with torch.no_grad():
    new_house = np.array([[5.0, 20.0, 6.0, 1.0, 1200.0, 3.0, 37.88, -122.23]])
    new_house_scaled = scaler.transform(new_house)
    new_house_tensor = torch.tensor(new_house_scaled, dtype=torch.float32).to(device)

    price = model(new_house_tensor).item()
    print(f"Predicted price: ${price * 100_000:,.0f}")

Key Concepts Summary

ConceptWhat It Does
torch.tensor()Creates a tensor
requires_grad=TrueEnables gradient tracking
.backward()Computes gradients
nn.ModuleBase class for all models
nn.Linear(in, out)Fully connected layer
nn.SequentialStack layers in order
optimizer.zero_grad()Clears old gradients before each step
optimizer.step()Updates model weights
model.train()Training mode (enables dropout etc.)
model.eval()Evaluation mode
torch.no_grad()Disables gradient tracking (faster inference)

What’s Next?

You now have a working PyTorch model. The next steps in this series cover LLM APIs — calling GPT-5, Claude Sonnet 4, and Gemini 2.5 Flash with a few lines of Python, and building your first AI-powered app.

Stay tuned for the next article in the AI/ML for Developers series.