the (discretized) probability density <-> the (discrete) signal intensity <-> the coefficients of the polynomial
If we look at the discrete case, since the operation of convolution is computationally expensive (O(n^3)), people often rely on the Fast Fourier Transform Algorithm (FFT), that runs in O(n log(n)). An excellent and very enjoyable video on the FFT applied to the polynomial multiplication is https://youtu.be/h7apO7q16V0?t=1.
EDIT: never mind, I just figured out that you pointed a typo in the parent comment. Straightforward implementation is indeed O(n^2)
If you consider the operator of convolution (rescaling for unit variance), the normal distribution is the only attractor (under many "natural" choices of metrics).
In the generating function pic, the distribution is borne as coefficients of polynomials or of formal power series, and convolution is polynomial multiplication. One example of your step is multiplying by (x+1)/2 (Bernoulli trial), and that gives approaches to the gaussian by chunks of normalized binomial coefficients.
Other related limit, described as a functional central limit theorem is given by Donsker's theorem , giving the passage from the discrete situation (random walk) to the continous (Brownian motion).
Probability of a certain sum value s is:
sum of probabilities of all (a,b) with a + b = s
(a: value from first input distribution. b: value from second input distribution)
with probability (a,b) = probability(a) * probability(b)
For me, the way to understanding convolutions was to to put sample rand(0, 1) a bunch of times and bucket the samples, then plot the number of samples in a bucket. You get a more-or-less flat graph. Now if you add or multiply two variables, you get different shapes. Once you spent enough effort on trying to understand why, you derive convolutions.
(Cross-)correlation is convolution with a reversed kernel (second signal) (or vice-versa, of course). (For discrete signals, reversing the kernel is just a swapping the indexes around; which makes absolutely no difference for deep-learning). Convolution is more "natural" because it's abelian, whereas swapping signal and kernel in cross-correlation time-reverses the result.
Crucially, independent (i.e., uncorrelated) random variables.
(At the other extreme, the sum of maximally correlated random variables is computed as a "comonotonic sum" instead).