PyTorch is the standard framework for deep learning research and production. Most AI papers, Hugging Face models, and state-of-the-art systems use PyTorch.
This article gets you from zero to a working neural network.
Setup
pip install torch torchvision
import torch
print(torch.__version__) # 2.x
print(torch.cuda.is_available()) # True if you have a GPU
Tensors
A tensor is the fundamental data structure in PyTorch. It is like a NumPy array, but it can run on GPU and supports automatic differentiation.
Creating Tensors
import torch
# From a Python list
a = torch.tensor([1.0, 2.0, 3.0])
print(a) # tensor([1., 2., 3.])
print(a.shape) # torch.Size([3])
print(a.dtype) # torch.float32
# 2D tensor
m = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
print(m.shape) # torch.Size([2, 2])
# Zeros, ones, random
zeros = torch.zeros(3, 4)
ones = torch.ones(2, 3)
rand = torch.rand(2, 3) # uniform random [0, 1)
randn = torch.randn(2, 3) # normal distribution
arange = torch.arange(0, 10, 2) # [0, 2, 4, 6, 8]
Tensor Operations
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
print(a + b) # tensor([5., 7., 9.])
print(a * b) # tensor([ 4., 10., 18.])
print(torch.dot(a, b)) # tensor(32.) (dot product)
# Matrix multiplication
m1 = torch.ones(2, 3)
m2 = torch.ones(3, 4)
print(torch.mm(m1, m2).shape) # torch.Size([2, 4])
# Or use the @ operator
print((m1 @ m2).shape) # torch.Size([2, 4])
# Reshaping
x = torch.arange(12, dtype=torch.float32)
print(x.reshape(3, 4)) # 3 rows, 4 columns
print(x.view(2, 6)) # alternative reshape (shares memory)
Moving Between NumPy and PyTorch
import numpy as np
# NumPy → PyTorch
arr = np.array([1.0, 2.0, 3.0])
tensor = torch.from_numpy(arr)
# PyTorch → NumPy
tensor = torch.tensor([1.0, 2.0, 3.0])
arr = tensor.numpy()
Autograd: Automatic Differentiation
Autograd is what makes PyTorch powerful. It automatically computes gradients — which are needed for training.
# requires_grad=True tells PyTorch to track this tensor
x = torch.tensor(3.0, requires_grad=True)
# Define a computation
y = x ** 2 + 2 * x + 1
# Compute gradients
y.backward()
# dy/dx = 2x + 2 = 2*3 + 2 = 8
print(x.grad) # tensor(8.)
In raw PyTorch, you call .backward() yourself inside the training loop — once per batch. Higher-level frameworks like PyTorch Lightning hide this call, but you still need to understand what it does.
GPU Usage
# Check device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using: {device}")
# Move tensor to GPU
x = torch.randn(3, 3)
x = x.to(device)
# Create tensor directly on GPU
x_gpu = torch.randn(3, 3, device=device)
On Apple Silicon, use mps instead:
device = (
"cuda" if torch.cuda.is_available()
else "mps" if torch.backends.mps.is_available()
else "cpu"
)
print(f"Using: {device}")
Building a Model with nn.Module
nn.Module is the base class for all neural networks in PyTorch.
import torch.nn as nn
class MyNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.network = nn.Sequential(
nn.Linear(input_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, output_size),
)
def forward(self, x):
return self.network(x)
model = MyNet(input_size=8, hidden_size=64, output_size=1)
print(model)
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params:,}") # 4,801
Full Training Loop
Here is a complete example: train a neural net to predict California housing prices. The output values are approximate — your exact numbers will vary because the model weights are not seeded.
Add torch.manual_seed(42) before creating the model if you want reproducible results.
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
# --- 1. Load and prepare data ---
housing = fetch_california_housing()
X, y = housing.data, housing.target.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Normalize features (important for neural nets)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Convert to tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.float32)
# DataLoader for batching
train_ds = TensorDataset(X_train_t, y_train_t)
train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)
# --- 2. Define model ---
device = (
"cuda" if torch.cuda.is_available()
else "mps" if torch.backends.mps.is_available()
else "cpu"
)
model = MyNet(input_size=8, hidden_size=64, output_size=1).to(device)
# --- 3. Loss and optimizer ---
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
# --- 4. Training loop ---
EPOCHS = 50
for epoch in range(EPOCHS):
model.train()
total_loss = 0.0
for X_batch, y_batch in train_loader:
X_batch = X_batch.to(device)
y_batch = y_batch.to(device)
# Forward pass
predictions = model(X_batch)
loss = criterion(predictions, y_batch)
# Backward pass
optimizer.zero_grad() # clear old gradients
loss.backward() # compute gradients
optimizer.step() # update weights
total_loss += loss.item()
if (epoch + 1) % 10 == 0:
avg_loss = total_loss / len(train_loader)
print(f"Epoch {epoch+1}/{EPOCHS} — Loss: {avg_loss:.4f}")
# --- 5. Evaluate ---
model.eval()
with torch.no_grad():
X_test_dev = X_test_t.to(device)
y_pred = model(X_test_dev).cpu()
test_loss = criterion(y_pred, y_test_t)
rmse = torch.sqrt(test_loss)
print(f"\nTest RMSE: {rmse:.4f}")
Expected output:
Epoch 10/50 — Loss: 0.4821
Epoch 20/50 — Loss: 0.3974
Epoch 30/50 — Loss: 0.3612
Epoch 40/50 — Loss: 0.3401
Epoch 50/50 — Loss: 0.3287
Test RMSE: 0.5823
Making Predictions
model.eval()
with torch.no_grad():
new_house = np.array([[5.0, 20.0, 6.0, 1.0, 1200.0, 3.0, 37.88, -122.23]])
new_house_scaled = scaler.transform(new_house)
new_house_tensor = torch.tensor(new_house_scaled, dtype=torch.float32).to(device)
price = model(new_house_tensor).item()
print(f"Predicted price: ${price * 100_000:,.0f}")
Key Concepts Summary
| Concept | What It Does |
|---|---|
torch.tensor() | Creates a tensor |
requires_grad=True | Enables gradient tracking |
.backward() | Computes gradients |
nn.Module | Base class for all models |
nn.Linear(in, out) | Fully connected layer |
nn.Sequential | Stack layers in order |
optimizer.zero_grad() | Clears old gradients before each step |
optimizer.step() | Updates model weights |
model.train() | Training mode (enables dropout etc.) |
model.eval() | Evaluation mode |
torch.no_grad() | Disables gradient tracking (faster inference) |
What’s Next?
You now have a working PyTorch model. The next steps in this series cover LLM APIs — calling GPT-5, Claude Sonnet 4, and Gemini 2.5 Flash with a few lines of Python, and building your first AI-powered app.
Stay tuned for the next article in the AI/ML for Developers series.