利用指令特性使HVX在六边形DSP中实现。

I was using Hexagon-SDK 3.0 to compile my sample application for HVX DSP architecture. There are many tools related to Hexagon-LLVM available to use located folder at:

我使用了六边形sdk 3.0来编译我的HVX DSP架构的样例应用程序。有许多与六边形llvm相关的工具可以使用位于:

~/Qualcomm/HEXAGON_Tools/7.2.12/Tools/bin

I wrote a small example to calculate the product of two arrays to makes sure I can utilize the HVX hardware acceleration. However, when I generate my assembly, either with -S , or, with -S -emit-llvm I don't find any definition of HVX instructions such as vmem, vX, etc. My C application is executing on hexagon-sim for now till I manage to find a way to run in on the board as well.

我写了一个小例子来计算两个数组的乘积，以确保我可以利用HVX硬件加速。然而,当我生成装配,s,或者,- s -emit-llvm我没有找到任何的定义HVX vmem等指令,vX等等。我的C应用程序执行hexagon-sim现在,直到我能找到一个方法来运行在黑板上。

As far as I understood, I need to define my HVX part of the code in C Intrinsic, but was not able to adapt the existing examples to match my own needs. It would be great if somebody could demonstrate how this process can be done. Also in the [Hexagon V62 Programmer's Reference Manual][1] many of the intrinsic instructions are not defined.

据我所知，我需要在C内部定义我的HVX部分代码，但是不能适应现有的示例来满足我自己的需求。如果有人能证明这个过程是如何实现的，那就太好了。同样在[Hexagon V62程序员参考手册][1]中，许多内在的指令没有被定义。

Here is my small app in pure C:

这是我在纯C的小应用:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#if defined(__hexagon__)
#include "hexagon_standalone.h"
#include "subsys.h"
#endif
#include "io.h"
#include "hvx.cfg.h"


#define KERNEL_SIZE     9
#define Q               8
#define PRECISION       (1<<Q)

double vectors_dot_prod2(const double *x, const double *y, int n)
{
    double res = 0.0;
    int i = 0;
    for (; i <= n-4; i+=4)
    {
        res += (x[i] * y[i] +
                x[i+1] * y[i+1] +
                x[i+2] * y[i+2] +
                x[i+3] * y[i+3]);
    }
    for (; i < n; i++)
    {
        res += x[i] * y[i];
    }
    return res;
}


int main (int argc, char* argv[])
{
    int n;
    long long start_time, total_cycles;
/* -----------------------------------------------------*/
/*  Allocate memory for input/output                    */
/* -----------------------------------------------------*/
    //double *res  = memalign(VLEN, 4 *sizeof(double));
    const double *x  = memalign(VLEN, n *sizeof(double));
    const double *y  = memalign(VLEN, n *sizeof(double));

    if (  *x  == NULL || *y == NULL ){
        printf("Error: Could not allocate Memory for image\n");
        return 1;
}   
    #if defined(__hexagon__)
        subsys_enable();
        SIM_ACQUIRE_HVX;
    #if LOG2VLEN == 7
        SIM_SET_HVX_DOUBLE_MODE;
    #endif
    #endif

    /* -----------------------------------------------------*/                                                
    /*  Call fuction                                        */
    /* -----------------------------------------------------*/
    RESET_PMU();
    start_time = READ_PCYCLES();

    vectors_dot_prod2(x,y,n);

    total_cycles = READ_PCYCLES() - start_time;
    DUMP_PMU();



    printf("Array product of x[i] * y[i] = %f\n",vectors_dot_prod2(x,y,4));

    #if defined(__hexagon__)
        printf("AppReported (HVX%db-mode):  Array product of x[i] * y[i] =%f\n", VLEN, vectors_dot_prod2(x,y,4));
    #endif

return 0;
}

I compile it using hexagon-clang:

我用六边形来编译:

hexagon-clang -v  -O2 -mv60 -mhvx-double -DLOG2VLEN=7 -I../../common/include -I../include -DQDSP6SS_PUB_BASE=0xFE200000 -o arrayProd.o  -c  arrayProd.c

Then link it with subsys.o (is found in DSK and already compiled) and -lhexagon to generate my executable:

然后把它和subsys连接起来。o(在DSK中发现并已编译)和-lhexagon生成我的可执行文件:

hexagon-clang -O2 -mv60 -o arrayProd.exe  arrayProd.o subsys.o -lhexagon

Finally, run it using the sim:

最后，使用sim卡运行:

hexagon-sim -mv60 arrayProd.exe

1 个解决方案

#1

A bit late, but might still be useful.

有点晚了，但可能还是有用的。

Hexagon Vector eXtensions are not emitted automatically and current instruction set (as of 8.0 SDK) only supports integer manipulation, so compiler will not emit anything for the C code containing "double" type (it is similar to SSE programming, you have to manually pack xmm registers and use SSE intrinsics to do what you need).

六角向量扩展不发出自动和当前指令集(截止8.0 SDK)只支持整数操作,所以编译器不会释放任何包含“双重”类型的C代码(它类似于上交所编程,你必须手动包装xmm寄存器和使用SSE intrinsic做你需要)。

You need to define what your application really requires. E.g., if you are writing something 3D-related and really need to calculate double (or float) dot products, you might convert yout floats to 16.16 fixed point and then use instructions (i.e., C intrinsics) like Q6_Vw_vmpyio_VwVh and Q6_Vw_vmpye_VwVuh to emulate fixed-point multiplication.

您需要定义应用程序真正需要的内容。例如，如果你写的是与3d相关的东西，并且真的需要计算double(或浮点数)的产品，你可以把你的浮点数转换成16。16的固定点，然后使用指令(例如:，例如Q6_Vw_vmpyio_VwVh和Q6_Vw_vmpye_VwVuh，以模拟定点乘法。

To "enable" HVX you should use HVX-related types defined in

为了“启用”HVX，您应该使用定义的HVX相关类型。

#include <hexagon_types.h>
#include <hexagon_protos.h>

The instructions like 'vmem' and 'vmemu' are emitted automatically for statements like

像“vmem”和“vmemu”这样的指令是自动发出的。

// I assume 64-byte mode, no `-mhvx-double`. For 128-byte mode use 32 int array
int values[16] = { 1, 2, 3, ..... };

/* The following line compiles to 
     {
          r4 = __address_of_values
          v1 = vmem(r4 + #0)
     }
   You can get the exact code by using '-S' switch, as you already do
*/
HVX_Vector v = *(HVX_Vector*)values;

You (fixed-point) version of dot_product may read out 16 integers at a time, multiply all 16 integers in a couple of instructions (see HVX62 programming manual, there is a tip to implement 32-bit integer multiplication from 16-bit one), then shuffle/deal/ror data around and sum up rearranged vectors to get dot product (this way you may calculate 4 dot products almost at once and if you preload 4 HVX registers - that is 16 4D vectors - you may calculate 16 dot products in parallel).

你(定点)版本的dot_product可能读出16整数,用16个整数的指令(见HVX62编程手册,有一个技巧来实现32位整数乘法从16位),然后洗/协议/ ror数据和总结重新排列向量点积(这样你可以计算4点产品几乎立即,如果预加载4 HVX寄存器- 16 4 d向量可以并行计算16点积)。

If what you are doing is really just byte/int image processing, you might use specific 16-bit and 8-bit hardware dot products in Hexagon instruction set, instead of emulating doubles and floats.

如果您所做的仅仅是字节/int图像处理，您可能会在六边形指令集中使用特定的16位和8位硬件点产品，而不是模拟双打和浮点数。

#1