Shannon, information contained in a message in a noiseless channel:

N items (messages, characters)
Convert N to appropriate base (2 is most efficient)
Total information contained is log N i.e. bit count.
Information contained in one message is 1/N of total i.e. 1/N log N
log N = -log (1/N)
1/N is the probability of a particular message.
Therefore one char/message entropy is = -p log p
Total, H(X) = SUM -p log p

***************

shannon source coding (noiseless) theorem.

For symbols of equal probability:
Ave info per symbol = Entropy (H) = log (base 2) number of poss symbols (e.g. number of letters)
expressed in bits
Total entropy of amessage is the sum, obviously.

In general for symbols of diff probability it is the weighted average bits per symbol.

Q. If its noiseless and temperate is noise then shannon entropy is just energy. i.e it represents the limit where alphabets are most efficient.

Q. How does number of poss symbols relate to number of poss states.

Information Is Not Uncertainty
http://www.lecb.ncifcrf.gov/~toms/information.is.not.uncertainty.html
Information is always a measure of the decrease of uncertainty at a receiver (or molecular machine).

Information Theory Primer With an Appendix on Logarithms Postscript version: ftp://ftp.ncifcrf.gov/pub/delila/primer.ps web versions: http://www.lecb.ncifcrf.gov/toms/paper/primer/
http://www.ccrnp.ncifcrf.gov/~toms/paper/primer/latex/index.html

tag as paper

Source coding and channel coding are the same
i.e. message with noise is same as alphabetic 1 encoding to alphabet 2 encoding.

Is there a way to merge them?

The semantics of information theory are muddled.