## Convexity

Let $f$ be a function with domain the set of real numbers. If the second derivative is greater than zero for all $x\in R$ this function is convex\footnote{In the case of vector inputs this is called positive semi-definite}.

Consider the case of two random variables $x_1$ and $x_2$, as seen in the diagram below:

Defnition Let f be a real valued function defined on an interval $I = [a, b]$. $f$ is said to be convex on I if $\forall x_1, x_2 \in I, \lambda \in [0, 1]$,

$$f(\lambda x_1 + (1 - \lambda)\,x_2) \le \lambda f(x_1) + (1- \lambda)\,f(x_2).$$

$f$ is said to be strictly convex if the inequality is strict. Intuitively, this definition states that the function falls below the straight line (the secant) from points $(x_1, f(x_1))$ to $(x_2, f(x_2))$. In other words, the equality is satisfied only for $\lambda = 0$ and $\lambda = 1$.

## Jensen’s Inequality

Let $f$ be a convex function defined on an interval $I$. If $x_1,x_2,\dots,x_n \in I {\rm and} \lambda_1, \lambda_2,\ldots,\lambda_n \ge 0$ with $\sum^n_{i=1} \lambda_i = 1$,

$$f \left( \sum_{i=1}^n \lambda_i \, x_i \right) \le \sum_{i=1}^n \lambda_i f(x_i)$$

Proof: For $n = 1$ this is trivial. The case $n = 2$ corresponds to the definition of convexity (see above). To show that this is true for all natural numbers, we proceed by induction. Assume the theorem is true for some $n$ then,

By interpreting the $\lambda_i$ as the probability distribution over a discrete variable $x$ taking the values ${x_i}$:

Theorem. Let $f$ be a convex function, and $X$ be a random variable, then

Furthermore, if $f$ is stricly convex (i.e. $f’‘(x)>0$), then $E[f(x)]=f(E[X])$ only if $X=E[X]$ with probability 1 ($X$ is constant).