Archive for November 21st, 2008

Find the equilateral triangle

Friday, November 21st, 2008

Show that the curve:

x^3 + 3xy + y^3 = 1

contains only one set of three distinct points, A, B, and C, which are vertices of an equilateral triangle, and find its area.

The “curve” x^3+3xy+y^3-1 = 0 is actually reducible, because the left side factors as
(x+y-1)(x^2-xy+y^2+x+y+1 Moreover, the second factor is

{1/2}*((x + 1)^2 + (y + 1)^2 + (x - y)^2) 

so it only vanishes at (-1,-1). Thus the curve in question consists of the single point (-1,-1) together with the line

x + y = 1

To form a triangle with three points on this curve, one of its vertices must be (-1,-1) . The other two vertices lie on the line x + y = 1 so the length of the altitude from (-1,-1) is the distance from (-1,-1) to (1/2,1/2)  or    {3 sqrt{2}}/2  The area of an equilateral triangle of height h is   h^2 sqrt{3}/3   , so the desired area is   {3 sqrt{3}}/2   .

Remark:

The factorization used above is a special case of the fact that

 x^3 + y^3 + z^3-3xyz = (x + y + z)(x + omega y + omega 2z)(x + omega 2y + omega z).

Where omega denotes a primitive cube root of unity. That fact in turn follows from the evaluation of the determinant of the circulant matrix

matrix{3}{3}{x y z z x y y z x }

by reading off the eigenvalues of the eigenvectors (1, omega i, omega 2i) for     i = 0, 1, 2    

Nvidia Tesla Desktop SuperComputer Notes

Friday, November 21st, 2008

Nvidia has developed a desktop super computer based on graphics processing units.   I will keep my notes on the subject here.

This will probably be a good area for people to get in early.  It is going to enable many things.  There are currently probably not many people who have experience with these computational techniques.

GPU processing architecture

 

In a GPU more transistors are devoted towards processing than data caching and flow control.  More specifically, the GPU is especially well-suited to address problems that can be expressed as data-parallel computations – the same program is executed on many data elements in parallel – with high arithmetic intensity – the ratio of arithmetic operations to memory operations. Because the same program is executed for each data element, there is a lower requirement for sophisticated flow control; and because it is executed on many data elements and has high arithmetic intensity, the memory access latency can be hidden with calculations instead of big data caches.

 On page 57 of reference #1 an example matrix multiply is detailed with example code.  Matrix multiplication is very important to many different types of simulations.

References

  1.  NVIDIA CUDA Compute Unified Device Architecture
  2. Thread - A thread of execution is a fork of a program into two or more concurrently running tasks. The implementation of threads and processes differs from one operating system to another, but in general, a thread is contained inside a process and different threads in the same process share some resources (most commonly memory), while different processes do not.