Neural networks power most AI you use today. ChatGPT, image recognition, voice assistants — all neural networks.

You do not need a math degree to understand them. This article explains the concepts clearly, with code examples in plain Python and PyTorch.

What Is a Neural Network?

A neural network is a function. It takes numbers in, does math, and produces numbers out.

That’s it. The magic is in how it learns which math to do.

Imagine you want to predict house prices from size and location. A simple model: price = a * size + b * location + c. The job of training is finding the right values for a, b, and c. Neural networks do the same thing, but with millions of parameters instead of three.

Neurons

A single neuron does three things:

  1. Multiplies each input by a weight
  2. Adds a bias
  3. Applies an activation function
output = activation(w1*x1 + w2*x2 + w3*x3 + bias)

Here is a single neuron in Python:

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def neuron(inputs, weights, bias):
    # Weighted sum
    z = sum(w * x for w, x in zip(weights, inputs)) + bias
    # Activation
    return sigmoid(z)

# Example: two inputs
result = neuron(
    inputs=[2.0, 3.0],
    weights=[0.5, -0.3],
    bias=0.1
)
print(f"Neuron output: {result:.4f}")  # 0.5498

One neuron is weak. Thousands of them, arranged in layers, become powerful.

Layers

Neurons are organized into layers:

  • Input layer — raw features (e.g., image pixels, sensor readings)
  • Hidden layers — where the learning happens
  • Output layer — the prediction

Each layer passes its output to the next layer. This chain is called the forward pass.

Input → [Hidden Layer 1] → [Hidden Layer 2] → Output

More hidden layers = deeper network = more capacity to learn complex patterns. That is why it is called deep learning.

Forward Pass in Code

Here is a 2-layer network in plain NumPy — no frameworks:

import numpy as np

def relu(x):
    return np.maximum(0, x)

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# Random weights (in real training, these are learned)
np.random.seed(42)

W1 = np.random.randn(4, 3)   # 3 inputs → 4 neurons in hidden layer
b1 = np.zeros(4)

W2 = np.random.randn(2, 4)   # 4 hidden → 2 outputs (2 classes)
b2 = np.zeros(2)

# Input: 3 features
x = np.array([1.5, -0.5, 2.0])

# Forward pass
z1 = W1 @ x + b1       # linear transformation
a1 = relu(z1)           # activation

z2 = W2 @ a1 + b2      # second linear transformation
output = softmax(z2)    # probabilities for 2 classes

print("Class probabilities:", output)
# e.g. [0.31, 0.69] → class 1 is more likely

Activation Functions

Without activation functions, a 10-layer network would behave like a single linear equation. Activation functions add non-linearity — the ability to learn complex patterns.

ActivationFormulaWhen to Use
ReLUmax(0, x)Hidden layers (default choice)
Sigmoid1/(1+e^-x)Binary output (0–1)
Tanh(e^x - e^-x)/(e^x + e^-x)Hidden layers (centered at 0)
Softmaxe^xi / sum(e^x)Multi-class output (probabilities)

In practice: use ReLU for hidden layers, sigmoid for binary output, softmax for multi-class output.

How the Network Learns: Backpropagation

The network starts with random weights. It predicts badly. Then it improves.

Here is how:

  1. Forward pass — compute prediction with current weights
  2. Calculate loss — measure how wrong the prediction is (e.g., Mean Squared Error)
  3. Backward pass (backpropagation) — calculate how much each weight contributed to the error (gradients)
  4. Update weights (gradient descent) — nudge each weight in the direction that reduces error

These are two separate steps. Backpropagation computes the gradients. Gradient descent uses those gradients to update the weights.

The key formula for weight update:

weight = weight - learning_rate * gradient

learning_rate controls how big each step is. Too large = unstable. Too small = slow.

You do not need to compute gradients by hand. Frameworks like PyTorch do this automatically.

The Same Network in PyTorch

Here is the same 2-layer network, now using PyTorch:

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(3, 4)   # 3 inputs → 4 hidden
        self.layer2 = nn.Linear(4, 2)   # 4 hidden → 2 outputs
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.layer2(x)
        return x

model = SimpleNet()
print(model)

# Single forward pass
x = torch.tensor([1.5, -0.5, 2.0])
output = model(x)
print("Output:", output)
print("Predicted class:", output.argmax().item())

PyTorch handles all the weight initialization, matrix multiplication, and gradient bookkeeping.

Why Depth Matters

A shallow network (1 hidden layer) can approximate any function — in theory. In practice, it needs an exponentially large number of neurons.

A deep network (many layers) learns hierarchical features. For images:

  • Layer 1 detects edges
  • Layer 2 detects shapes (eyes, wheels)
  • Layer 3 detects objects (faces, cars)

Each layer builds on the previous one. This is why deep networks outperform shallow ones on complex tasks.

Key Terms Summary

TermMeaning
WeightA learnable parameter multiplied by the input
BiasA learnable offset added to the neuron output
Activation functionAdds non-linearity so the network can learn complex patterns
Forward passComputing the prediction from input to output
LossHow wrong the prediction is
GradientDirection of steepest increase in loss
BackpropagationCalculating gradients layer by layer
Learning rateStep size when updating weights
EpochOne full pass through the training data

What’s Next?

Now that you understand the concepts, it is time to build real neural networks. The next article covers PyTorch — tensors, autograd, building models with nn.Module, and writing a full training loop.

Getting Started with PyTorch: Tensors, Autograd, and Your First Neural Net