Feed-Forward Network (FFN)
Feed-Forward Network (FFN) is a component within transformer architectures consisting of fully connected layers that process each position independently, applying non-linear transformations to enhance model expressiveness and learning capacity. This multilayer perceptron typically features two linear transformations with a non-linear activation function between them, commonly using ReLU or GELU activation. FFN layers operate position-wise, meaning each token position is processed separately through identical network parameters, enabling parallel computation while maintaining positional independence. The component significantly expands model capacity by increasing the dimensionality of intermediate representations, often scaling hidden dimensions by factors of four or more before projecting back to the original embedding size. Advanced FFN implementations incorporate techniques like gated linear units, mixture-of-experts architectures, and sparse activation patterns to optimize computational efficiency while maintaining performance. FFN layers serve as crucial components for capturing complex patterns and non-linear relationships within transformer models.