Walker Evolution: Neuroevolution from Scratch
Evolving walking creatures in real-time, entirely in your browser
This demo evolves a population of spring-mass bipedal creatures to walk using
neuroevolution. Each creature has a small neural network brain (MLP with ~1,000 weights) that
controls four muscles. Every generation, the top performers survive and mutate to produce
offspring. No gradients, no backpropagation — just selection and mutation. Everything runs
client-side with zero dependencies.
Live Evolution
Watch 20 creatures compete each generation. The best walkers survive and pass their neural network weights to mutated offspring. Use speed controls to fast-forward through generations.
Fitness History
Best and average fitness per generation. Fitness = distance traveled + survival bonus.
Evolution Pipeline
How neuroevolution trains the walking creatures each generation.
Run all 20 creatures in a physics simulation for up to 520 steps. Each creature's neural network decides muscle activations at every timestep.
Score each creature by distance traveled plus a small survival bonus. Creatures that fall (body hits ground) are eliminated early.
The top 5 elite creatures survive unchanged. The remaining 15 slots are filled by mutated copies of randomly chosen elites.
Each weight is perturbed by Gaussian noise (sigma=0.09). With 4% probability, a weight is completely replaced — enabling occasional large jumps in behavior space.
Creature Anatomy
Each creature is a spring-mass biped with 5 nodes and 7 springs.
Physics Model
- 5 point masses: body, 2 knees, 2 feet
- 4 controllable muscle springs (hip-to-knee, knee-to-foot)
- 3 structural springs (knee brace, max leg reach)
- Gravity, damping, and ground collision
- Velocity clamping for stability
Neural Network
- Architecture: 18 → 22 → 22 → 4 (MLP)
- Inputs: relative joint positions, velocities, foot contacts, body height
- Outputs: 4 muscle activations [0, 1]
- Activation: tanh (hidden), sigmoid (output)
- ~1,000 trainable weights per creature
Neuroevolution sidesteps the need for differentiable reward signals entirely. Unlike PPO or other policy gradient methods that require carefully shaped reward functions and gradient computation, evolutionary strategies simply ask: "which creatures walked the farthest?" This makes them naturally suited to problems with sparse, non-differentiable, or deceptive reward landscapes. The trade-off is sample efficiency — evolution needs many more evaluations to converge. But for small networks and simple physics, it works remarkably well, and the emergent gaits are often surprisingly creative and diverse.