Where do Weights Come From?

We make them up! At first, at least. Weights and biases are initialized, often with some form of randomness, and then iteratively adjusted using the training data to minimize the network’s prediction error. This adjustment process uses the gradients derived from the loss function to guide the updates.

  1. Random Initialization: Weights are typically initialized with small random values. This breaks the symmetry, ensuring that each neuron learns something different from the start. If all weights started with the same value, all neurons in each layer would update identically, making them redundant.
  2. Bias Initialization: Biases can be initialized to zero or small values. Starting with zero is often acceptable since the random weights already introduce the asymmetry.
  3. Advanced Initialization Techniques: There are methods for initializing starting weights and biases that aren’t random, but we won’t go into that much here. Later, we will explore the powerful concept of Transfer Learning that uses this concept.

Return to the Module 2 page or Continue to the next topic.