SHORT Double Precision MFLOP/s

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0  SIMD_COMP_INST_RETIRED_PACKED_DOUBLE
PMC1  SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
CPI  FIXC1/FIXC0
DP [MFLOP/s]    1.0E-06*(PMC0*2.0+PMC1)/time
Packed [MUOPS/s] 1.0E-06*PMC0/time
Scalar [MUOPS/s] 1.0E-06*PMC1/time
Vectorization ratio [%] 100*PMC0/PMC1

LONG
Formulas:
DP [MFLOP/s] = 1.0E-06*(SIMD_COMP_INST_RETIRED_PACKED_DOUBLE*2+SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE)/time
Packed [MUOPS/s] = 1.0E-06*SIMD_COMP_INST_RETIRED_PACKED_DOUBLE/runtime
Scalar [MUOPS/s] = 1.0E-06*SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE/runtime
Vectorization ratio [%] = 100*SIMD_COMP_INST_RETIRED_PACKED_DOUBLE/SIMD_COMP_INST_RETIRED_SCALAR_DOUBLE
-
Profiling group to measure double SSE FLOPs. Don't forget that your code might also execute X87 FLOPs.
On the number of SIMD_COMP_INST_RETIRED_PACKED_DOUBLE you can see how well your code was vectorized.


