=====================================================================
project: Matrix v1.0
 author: Jean-Michel RICHER
  email: jean-michel.richer@univ-angers.fr
   date: February, 2016 
=====================================================================

In the README file we can see that some results for dimensions (1024, 2048, 3072) 
are source of a problem due to cache access for implementation 1 which is the 
implementation given by the mathematical formula.

For example on an Intel Core i5 4570 we have the following execution times:

_________________
dimension | time
_________________
   1023   | 2.781
   1024   | 4.860 !!
   1025   | 2.496

The time for a square matrix of size 1024 should be around 2.xxx.


---------------------------------------------------------------------
Intel VTune Amplifier
---------------------------------------------------------------------

The sample analysis shows that misses of the L2 cache is roughly 7 times
more important for dimension 1024 than for 1025.
 
dimension  L2_RQSTS.DEMAND_DATA_RD_MISS
1024            1,094,216,413
1025              153,202,298 


