From Richard Feynman’s Lectures on Computation, p. 123:

Now the average information in a message is calculated in standard probabilistic fashion; it is just:

\begin{array}{rcl} Average~ information & = & \sum information~ in~ symbol~ \alpha_{i} \\ &   & \qquad \cdot (expected~ number~ of~ appearances~ of~ \alpha_{i}) \\ & = & -\sum (log_{2}p_{i})\times (Np_{i})                 \\ \end{array}

which is our previous result. Incidentally, Shannon called this average information the “entropy”, which some think was a big mistake, as it led many to overemphasize the link between information theory and thermodynamics.1

BACK TO POST 1 Legend has it that Shannon adopted this term on the advice of the mathematician John von Neumann, who declared that it would give him “ … a great edge in debates because nobody really knows what entropy is anyway .” [RPF]