Archive for the ‘Maximum-Entropy-Principle’ Category

Maximum Entropy Distribution for Random Variable of Extent [0,Infinity] and a Mean Value Mu

Sunday, June 22nd, 2014

The maximum entropy constraints are as follows:

  • Over the interval [0,infinity]
  •  sum{kappa=0}{N}{P(x_i)}=1       …. sum over all probabities must = 1
  •  sum{kappa=0}{N}{P(x_i){x_i}}=mu     …. given an average value AKA "mean"

The langrangian is formed as follows:

 L=sum{kappa=0}{N}{{-P(x_i)}{log_2 P(x_i)}}+lambda_0(1-sum{kappa=1}{N}{P(x_i)} )+lambda_1(mu-sum{kappa=1}{N}{{P(x_i)}{x_i}})  

 {partial L} / {partial P_i}= {-log_2 P(x_i)}-1-lambda_0-lambda_1{x_i}=0    ….setting equal to zero to find the extrema point

 {log_2 P(x_i)}=-1-lambda_0-lambda_1{x_i}

Allowing  {lambda_1} to take up the slack to turn base 2 log into natural log:

 {P(x_i)}=e^{-1-lambda_0} e^{-lambda_1{x_i}}

Using the sum of probabilities =1 criteria

 sum{kappa=0}{N}{P(x_i)}=e^{-1-lambda_0} {1/{1-e^{-lambda_1}}}=1  

 sum{kappa=0}{N}{x_i}{P(x_i)}={e^{-lambda_1{x_i}}/(1-e^{-lambda_1{x_i}})^2}={mu}    ( See below for derivation)

 

Derivation of mean value infinite sum:

                 sum{kappa=0}{N}{x_i}{e^{-lambda_1{x_i}}}=0+1*e^{-lambda_1}+2*e^{-2{lambda_1}}+3*e^{-3{lambda_1}}cdots  

           e^{-lambda_1}{sum{kappa=0}{N}{x_i}{e^{-lambda_1{x_i}}}=..................1*e^{-2{lambda_1}}+2*e^{-3{lambda_1}}+3*e^{-4{lambda_1}}}cdots  

Subtracting we get the same old geometric series that we all know

(1-e^{-lambda_1{x_i}}){sum{kappa=0}{N}{x_i}{e^{-lambda_1{x_i}}}}=0+1*e^{-lambda_1}+1*e^{-2{lambda_1}}+1*e^{-3{lambda_1}}cdots

Rearranging terms:

(1-e^{-lambda_1{x_i}}){sum{kappa=0}{N}{x_i}{e^{-lambda_1{x_i}}}}={e^{-lambda_1{x_i}}/(1-e^{-lambda_1{x_i}})}

{sum{kappa=0}{N}{x_i}{e^{-lambda_1{x_i}}}}={e^{-lambda_1{x_i}}/(1-e^{-lambda_1{x_i}})^2}

Another way of looking at the series:

Infinite Series Multiplication Table – The product of the 2 series is the sum of all the product entries ad infinitum
e^{-3{lambda_1{x_i}}} e^{-3{lambda_1{x_i}}}
e^{-2{lambda_1{x_i}}} e^{-2{lambda_1{x_i}}} e^{-3{lambda_1{x_i}}}
e^{-lambda_1{x_i}} e^{-lambda_1{x_i}} e^{-2{lambda_1{x_i}}} e^{-3{lambda_1{x_i}}}
1 1 e^{-lambda_1{x_i}} e^{-2{lambda_1{x_i}}} e^{-3{lambda_1{x_i}}}
  1 e^{-lambda_1{x_i}} e^{-2{lambda_1{x_i}}} e^{-3{lambda_1{x_i}}}

The table uses 2 exponential series each starting with 1.  In order to get the same series as the solution in the derivation above multiple the result by e^{-lambda_1{x_i}}

It forms a sort of number wedge or number cone. I wonder if it extends to 3 dimensions?

Observations  ( Need to complete this ) 

  • ….delay like Z transform
  • continuous form correspondence with discrete form

Research Links

Long hand division generation of polynomials

Wednesday, November 12th, 2008

Do a long hand division of 1/{1-x}  

x greater than or equal to 1 does not result in convergence of this sum.  However this algorithm can still be used to do some interesting things.  Let us use a complex value of   x = .707+.707i  

Each power of x yields a result one step around this unit circle. Thus this series is the Z transform of the associated sequence.  [1,0] , [0.707,0.707] , [0,1] ……. This sequence is  Sin( (n-1)*pi/{4})  

Thus the z transform of this sequence is:   1/{1-(0.707+0.707i)*x}    

If you want to get express in terms of n instead of n-1 you can multiply by 1/x.  Since x is the place holder it is easy to see if you want to slide a series one unit to the left by dividing by x. 

 Sin( n*pi/{4})    : note this series starts at 45 degrees phase!

 

More information:

Z transform of and exponentially decaying sequence

Tuesday, November 11th, 2008

The series:   x^n      doubleleftright 1+ x + x^2 + x^3 + ....  = 1/{1-x}    n=0,1,2...   this converges for x < 1 : Both of these expressions are the Z transform of the  x^n   exponential decay sequence.  The first expression is easier to deal with because it is smaller and easier to work with.  The following diagram uses a decay sequence with  x = 0.75 

 

The filter takes what ever input it is fed and every interval multiplies it by 0.75.   To see the filters time response you can thus feed in a single ping.  This results in the filter output tracing out a special response.  This  is called the impulse response. It is the same as the filter plot above.

The exponential decay is maximum entropy.  That is to say this is how concentrated things soak out into the rest of the world as they become more dilute. 

Two 2 dimensional determinant of a matrix animation showing it is equal to the area of the parallelogram

Tuesday, November 4th, 2008

The 2 dimensional determinant of a matrix can be interpreted as the area of a parallelogram as shown in the following diagram.

This carries on through higher dimensions.  Below depicts a 3 variable system.

The rows r1, r2, r3 are vectors each. The various summations taken 1, 2 and 3 at a time define a parallelepiped. 

 

The following excerpt is from X and may yield some insight when maximum entropy principle is applied. ( still working on this )

 

 

Derivation of the Normal Gaussian distribution from physical principles

Monday, August 25th, 2008

In many physical systems the question arises what is the probability distribution that describes a system with a given expected energy E  over the interval from -infinity to + infinity?     Again you will use the maximum entropy principle to determine this.

The constraints are as follows:

  •  sum{kappa=1}{N}{P(x_i)}=1       …. sum over all probabities must = 1
  •  sum{kappa=1}{N}{P(x_i){x_i}}=mu     …. given an average value AKA "mean"
  •  sum{kappa=1}{N}{P(x_i){x_i}^2}=E     ….. given an "energy" or standard deviation AKA "variance"

The langrangian is formed as follows:

 L=sum{kappa=1}{N}{{-P(x_i)}{log_2 P(x_i)}}+lambda_0(1-sum{kappa=1}{N}{P(x_i)} )+lambda_1(mu-sum{kappa=1}{N}{{P(x_i)}{x_i}})+lambda_2(E-sum{kappa=1}{N}{{P(x_i)}{x_i}^2})  

 {partial L} / {partial P_i}= {-log_2 P(x_i)}-1-lambda_0-lambda_1{x_i}-lambda_2{x_i}^2=0    ….setting equal to zero to find the extrema point

Now the problem is to solve for the lambda coefficients.   I use a trick.  I assume the curve centered at the Y axis.   The curve must have the same amount of entropy on the left side of the Y axis as on the right or entropy will not be maximized.  Because the polynomial is even degree the probability curve must be "even"….that is symmetric about the Y axis.  That reduces the previous langrangian to:

 {partial L} / {partial P_i}= {-log_2 P(x_i)}-lambda_2{x_i}^2=0 

 {log_2 P(x_i)} = -lambda_2{x_i}^2 

   P(x_i) = e^{-lambda_2{x_i}^2}          …..which you will recognize as the gaussian distribution

 

 

Solve for the coefficients using identity of the integral of normal -infin to + infin

Using the identity and setting it to 1: 

int{-infty}{+infty}{e^{-{(x)^2/{2sigma^2}}}}=sigma{sqrt{2pi}} =1     yields:  lambda_2={pi} 

         The resultant distribution is:      {e^{-{pi}x^2}}    which is the normal distribution.

 Now for the other coefficients the job is made easier by observing the distribution can only retain this even about the mean form if  the polynomial is in the form :  (x-mu)^2  : this form can propagate along the X axis without distribution shape change.  Gaussian wave packet can not change shape because it is already max entropy.  If it changes shape it decreases entropy which requires force and increases its energy.  But that would be a change in state which we are assuming is not happening.

 

Observations

  • base state of quantum harmonic oscillator is gaussian : It is maximum entropy.  
  • gaussian wave packet can not mishapen because it is already max entropy.  If it changes shape it decreases entropy which requires force.
  • gaussian is the base state of the wave packet?  Is it possible to have forms of the higher energy states?

 

The Maximum Entropy Principle – The distribution with the maximum entropy is the distribution nature chooses

Sunday, August 17th, 2008

In a previous article entropy was defined as the expected number of bits in a binary number required to enumerate all the outcomes.  This was expressed as follows:

entropy= H(x)= sum{kappa=1}{N}{delim{[}{-P(x_i) * log_2 P(x_i) }{]}} 

In physics ( nature ) it is found that the probability distribution that represents a physical process is the one that has the maximum entropy given the constraints on the physical system.   What are constraints?  An example of a probabalistic system is a die with 6 sides.  For now pretend you do not know that it is equally likely to show any 1 of the 6 faces when you roll it.  Assume only that it is balanced.

In the case of a die the above summation is equivalent to the following sort of computation:

  • Initial assumption set of 6 probabilities that  sum up = 1  … this is a given as it has to be at least one of the 6 faces unless it stands on edge Twilight Zone style.  Lets assume P(xi) = 0.05, 0.05, 0.05, 0.05,0.05, 0.75  …. you know instinctively this is not correct but demonstrates the maximum entropy principle

The total entropy given these probabilities = (.05) * (4.322) * 5 + 0.75 * (.415)= 1.0805 + .311= 1.39 bits

Let us use our common sense now.  We know there are 6 equally probable states that can roll up.  So its easy to calculate the number of bits required.  

  • Bits required = log26 = 2.585 bits 

Thus we can see our initial assumption of probabilities yields an entropy number less than we would expect from common sense.   How do we find the maximum entropy possible? 

  • Use the Langrangian maximization method. 
  • Maximize the entropy phrase with the constraint that  

          sum{kappa=1}{N}{P(x_i)}=1       …. sum over all probabities must = 1

The langrangian is formed as follows:

     L=sum{kappa=1}{N}{delim{[}{-P(x_i) * log_2 P(x_i) }{]}}+lambda(1-sum{kappa=1}{N}{delim{[}{P(x_i)}{]}}  )  

Now differentiating the langrangian and setting the derivative = 0 we can find the maximal entropic probability

     {partial L} / {partial P_i}= {-log_2 P(x_i)}-1-{lambda}=0 

     {-log_2 P(x_i)}=1+{lambda}    solving for the Pi  yields

     {P(x_i)}= e^{1+{lambda}}   All the Pi= the same constant with the probabilities summing to 1….Thus Pi=1/6 since N=6

While this is alot of work to derive the obvious it there is a purpose. In the case of more complicated situations where the probability distribution is not obvious this method works.  For example in the case of the Black Body emission curve of Planck.  Given just the quantization of energy levels you can derive the black body curve!!  This principle is woven all through nature.  Learn it because it will serve you well. 

Some interesting Notes to myself — myself? I meant me.

Use of Maximum Entropy to explain the form of Energy States of an Electron in a Potential Well

Friday, May 23rd, 2008

The base state of an electron in an infinite potential well has the most "space" for the electron state.  Thus it has the maximum entropy. Take that same state and imagine pinching the electrons existence to nil in the middle of the trough.  Now you have state-2.  The electron now exists in a smaller entropic state and guess what?  It contains exploitable energy now. This is like a spring compressed.  The electron can decompress and exert force / expend energy.  For example in an interaction with another atom possibly a recoil could occur.   In a crystal lattice an electron can transfer its energy to the atom next door and in effect yield conduction.  All these are preliminary suppositions subject to more scrutiny. electron-in-infinite-well.bmp As mentioned before since the electron exists in this potential well in the form of free fall it can not have any acceleration.  Thus its distribution must thoroughly avoid the edges of the well were it would indeed experience accelerations by bouncing and recoiling off of the walls.