* *: how does BLAS get such extern performance
* Howto optimizate GEMM http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/
* ulmBLAS: http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/
* CPU intrisics optimizations: http://www.cnblogs.com/zyl910/
*book1: Automatic blocking of nested loops
*book2: the science of programming matrix computation
Amazing topics !