A (probability) measure \(\mu\) on some space \(\mathcal{X}\): \(\mathcal{B}\rightarrow\mathbb{R}\), can be think as a measurement of the size of all the sets in \(\mathcal{X}\), e.g:
\(\mu(A)=a\) says the volume of set \(A\) under measure \(\mu\) is \(a\).
Lebesgue integration (Real Analysis): \(\int f d\mu=\lim_{n\rightarrow\infty}\int s_n d\mu\), where \(s_n\) is finite linear combination of indicator functions.
A probability measure can be naturally associated to a distribution \(F\), i.e., if \(\mathcal{X}=\mathbb{R}\), let \(F(x)=\mu\big((-\infty, x]\big)\). From now on, we can use measures in replace of distributions.
let’s say we have \(X_1, \dots, X_n\in \mathcal{X}\) from a measure \(P\), use which we can construct the empirical measure \(\mathbb{P}_n\):
\(\mathbb{P}_n:=n^{-1} \sum_{i=1}^n\delta_{X_i},\quad\) where \(\delta_{x}(A)=1_A(x)\) (the Dirac measure).