In a previous article entropy was defined as the expected number of bits in a binary number required to enumerate all the outcomes.  This was expressed as follows:

entropy= H(x)= sum{kappa=1}{N}{delim{[}{-P(x_i) * log_2 P(x_i) }{]}} 

In physics ( nature ) it is found that the probability distribution that represents a physical process is the one that has the maximum entropy given the constraints on the physical system.   What are constraints?  An example of a probabalistic system is a die with 6 sides.  For now pretend you do not know that it is equally likely to show any 1 of the 6 faces when you roll it.  Assume only that it is balanced.

In the case of a die the above summation is equivalent to the following sort of computation:

  • Initial assumption set of 6 probabilities that  sum up = 1  … this is a given as it has to be at least one of the 6 faces unless it stands on edge Twilight Zone style.  Lets assume P(xi) = 0.05, 0.05, 0.05, 0.05,0.05, 0.75  …. you know instinctively this is not correct but demonstrates the maximum entropy principle

The total entropy given these probabilities = (.05) * (4.322) * 5 + 0.75 * (.415)= 1.0805 + .311= 1.39 bits

Let us use our common sense now.  We know there are 6 equally probable states that can roll up.  So its easy to calculate the number of bits required.  

  • Bits required = log26 = 2.585 bits 

Thus we can see our initial assumption of probabilities yields an entropy number less than we would expect from common sense.   How do we find the maximum entropy possible? 

  • Use the Langrangian maximization method. 
  • Maximize the entropy phrase with the constraint that  

          sum{kappa=1}{N}{P(x_i)}=1       …. sum over all probabities must = 1

The langrangian is formed as follows:

     L=sum{kappa=1}{N}{delim{[}{-P(x_i) * log_2 P(x_i) }{]}}+lambda(1-sum{kappa=1}{N}{delim{[}{P(x_i)}{]}}  )  

Now differentiating the langrangian and setting the derivative = 0 we can find the maximal entropic probability

     {partial L} / {partial P_i}= {-log_2 P(x_i)}-1-{lambda}=0 

     {-log_2 P(x_i)}=1+{lambda}    solving for the Pi  yields

     {P(x_i)}= e^{1+{lambda}}   All the Pi= the same constant with the probabilities summing to 1….Thus Pi=1/6 since N=6

While this is alot of work to derive the obvious it there is a purpose. In the case of more complicated situations where the probability distribution is not obvious this method works.  For example in the case of the Black Body emission curve of Planck.  Given just the quantization of energy levels you can derive the black body curve!!  This principle is woven all through nature.  Learn it because it will serve you well. 

Some interesting Notes to myself — myself? I meant me.


Leave a Reply

Your email address will not be published. Required fields are marked *