How Neural Networks Work: A Developer's Guide

Neural networks power most AI you use today. ChatGPT, image recognition, voice assistants — all neural networks.

You do not need a math degree to understand them. This article explains the concepts clearly, with code examples in plain Python and PyTorch.

What Is a Neural Network?

A neural network is a function. It takes numbers in, does math, and produces numbers out.

That’s it. The magic is in how it learns which math to do.

Imagine you want to predict house prices from size and location. A simple model: price = a * size + b * location + c. The job of training is finding the right values for a, b, and c. Neural networks do the same thing, but with millions of parameters instead of three.

Neurons

A single neuron does three things:

Multiplies each input by a weight
Adds a bias
Applies an activation function

output = activation(w1*x1 + w2*x2 + w3*x3 + bias)

Here is a single neuron in Python:

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def neuron(inputs, weights, bias):
    # Weighted sum
    z = sum(w * x for w, x in zip(weights, inputs)) + bias
    # Activation
    return sigmoid(z)

# Example: two inputs
result = neuron(
    inputs=[2.0, 3.0],
    weights=[0.5, -0.3],
    bias=0.1
)
print(f"Neuron output: {result:.4f}")  # 0.5498

One neuron is weak. Thousands of them, arranged in layers, become powerful.

Layers

Neurons are organized into layers:

Input layer — raw features (e.g., image pixels, sensor readings)
Hidden layers — where the learning happens
Output layer — the prediction

Each layer passes its output to the next layer. This chain is called the forward pass.

Input → [Hidden Layer 1] → [Hidden Layer 2] → Output

More hidden layers = deeper network = more capacity to learn complex patterns. That is why it is called deep learning.

Forward Pass in Code

Here is a 2-layer network in plain NumPy — no frameworks:

import numpy as np

def relu(x):
    return np.maximum(0, x)

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# Random weights (in real training, these are learned)
np.random.seed(42)

W1 = np.random.randn(4, 3)   # 3 inputs → 4 neurons in hidden layer
b1 = np.zeros(4)

W2 = np.random.randn(2, 4)   # 4 hidden → 2 outputs (2 classes)
b2 = np.zeros(2)

# Input: 3 features
x = np.array([1.5, -0.5, 2.0])

# Forward pass
z1 = W1 @ x + b1       # linear transformation
a1 = relu(z1)           # activation

z2 = W2 @ a1 + b2      # second linear transformation
output = softmax(z2)    # probabilities for 2 classes

print("Class probabilities:", output)
# e.g. [0.31, 0.69] → class 1 is more likely

Activation Functions

Without activation functions, a 10-layer network would behave like a single linear equation. Activation functions add non-linearity — the ability to learn complex patterns.

Activation	Formula	When to Use
ReLU	`max(0, x)`	Hidden layers (default choice)
Sigmoid	`1/(1+e^-x)`	Binary output (0–1)
Tanh	`(e^x - e^-x)/(e^x + e^-x)`	Hidden layers (centered at 0)
Softmax	`e^xi / sum(e^x)`	Multi-class output (probabilities)

In practice: use ReLU for hidden layers, sigmoid for binary output, softmax for multi-class output.

How the Network Learns: Backpropagation

The network starts with random weights. It predicts badly. Then it improves.

Here is how:

Forward pass — compute prediction with current weights
Calculate loss — measure how wrong the prediction is (e.g., Mean Squared Error)
Backward pass (backpropagation) — calculate how much each weight contributed to the error (gradients)
Update weights (gradient descent) — nudge each weight in the direction that reduces error

These are two separate steps. Backpropagation computes the gradients. Gradient descent uses those gradients to update the weights.

The key formula for weight update:

weight = weight - learning_rate * gradient

learning_rate controls how big each step is. Too large = unstable. Too small = slow.

You do not need to compute gradients by hand. Frameworks like PyTorch do this automatically.

The Same Network in PyTorch

Here is the same 2-layer network, now using PyTorch:

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(3, 4)   # 3 inputs → 4 hidden
        self.layer2 = nn.Linear(4, 2)   # 4 hidden → 2 outputs
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.layer2(x)
        return x

model = SimpleNet()
print(model)

# Single forward pass
x = torch.tensor([1.5, -0.5, 2.0])
output = model(x)
print("Output:", output)
print("Predicted class:", output.argmax().item())

PyTorch handles all the weight initialization, matrix multiplication, and gradient bookkeeping.

Why Depth Matters

A shallow network (1 hidden layer) can approximate any function — in theory. In practice, it needs an exponentially large number of neurons.

A deep network (many layers) learns hierarchical features. For images:

Layer 1 detects edges
Layer 2 detects shapes (eyes, wheels)
Layer 3 detects objects (faces, cars)

Each layer builds on the previous one. This is why deep networks outperform shallow ones on complex tasks.

Key Terms Summary

Term	Meaning
Weight	A learnable parameter multiplied by the input
Bias	A learnable offset added to the neuron output
Activation function	Adds non-linearity so the network can learn complex patterns
Forward pass	Computing the prediction from input to output
Loss	How wrong the prediction is
Gradient	Direction of steepest increase in loss
Backpropagation	Calculating gradients layer by layer
Learning rate	Step size when updating weights
Epoch	One full pass through the training data

What’s Next?

Now that you understand the concepts, it is time to build real neural networks. The next article covers PyTorch — tensors, autograd, building models with nn.Module, and writing a full training loop.

Getting Started with PyTorch: Tensors, Autograd, and Your First Neural Net

What Is a Neural Network?#

Neurons#

Layers#

Forward Pass in Code#

Activation Functions#

How the Network Learns: Backpropagation#

The Same Network in PyTorch#

Why Depth Matters#

Key Terms Summary#

What’s Next?#

Related Articles#