视频流GPU解码在ffempg的实现（一）-基本概念

这段时间在实现Gpu的视频流解码，遇到了很多的问题。

得到了阿里视频处理专家蔡鼎老师以及英伟达开发季光老师的指导，在这里表示感谢！

基本命令（linux下）

1.查看物理显卡

lspci  | grep -i vga

root@g1060server:/home/user# lspci  | grep -i vga

:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev )

:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)

:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)

2.直接查看英伟达的物理显卡信息
有的时候因为服务器型号，GPU型号等不兼容等问题，会导致主板无法识别到插入的显卡，
我们可用下面的命令来查看主板是否识别到了显卡：

root@g1060server:/home/user# lspci | grep -i nvidia

:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)

:00.1 Audio device: NVIDIA Corporation Device 10f1 (rev a1)

:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)

:00.1 Audio device: NVIDIA Corporation Device 10f1 (rev a1)

出现上面的东西，说明主板已经识别到显卡信息

cuda版本，驱动信息

root@g1060server:/home/user# nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) - NVIDIA Corporation

Built on Wed_Jul_17_18::13_PDT_2013

Cuda compilation tools, release 5.5, V5.5.0

英伟达显卡运行状态信息

root@g1060server:/home/user# nvidia-smi

modprobe: ERROR: could not insert 'nvidia_340': No such device

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

查看失败，一般没安装驱动

user@g1060server:~$ nvidia-smi

Fri Jan   ::

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|     GeForce GTX ...  Off  | ::00.0  On |                  N/A |

| %   35C    P8    10W / 120W |   3083MiB /  6071MiB |      %      Default |

+-------------------------------+----------------------+----------------------+

|     GeForce GTX ...  Off  | ::00.0 Off |                  N/A |

| %   37C    P8    10W / 120W |   2542MiB /  6072MiB |      %      Default |

+-------------------------------+----------------------+----------------------+

查看成功

查看cuda驱动是否安装成功

root@g1060server:/home/user# cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery

root@g1060server:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery# ls

deviceQuery  deviceQuery.cpp  deviceQuery.o  Makefile  NsightEclipse.xml  readme.txt

root@g1060server:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery# make

make: 没有什么可以做的为 `all'。

root@g1060server:/usr/local/cuda-8.0/samples/1_Utilities/deviceQuery# ./deviceQuery

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned

-> CUDA driver version is insufficient for CUDA runtime version

Result = FAIL

再次确认cuda驱动安装失败

查看cuda是否安装成功

/usr/local/cuda/extras/demo_suite/deviceQuery

root@g1060server:/home/user/mjl/test# /usr/local/cuda/extras/demo_suite/deviceQuery

/usr/local/cuda/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected  CUDA Capable device(s)

Device : "GeForce GTX 1060 6GB"

  CUDA Driver Version / Runtime Version          9.0 / 8.0

  CUDA Capability Major/Minor version number:    6.1

  Total amount of global memory:                  MBytes ( bytes)

  () Multiprocessors, () CUDA Cores/MP:      CUDA Cores

  GPU Max Clock rate:                             MHz (1.71 GHz)

  Memory Clock rate:                              Mhz

  Memory Bus Width:                              -bit

  L2 Cache Size:                                  bytes

  Maximum Texture Dimension Size (x,y,z)         1D=(), 2D=(, ), 3D=(, , )

  Maximum Layered 1D Texture Size, (num) layers  1D=(),  layers

  Maximum Layered 2D Texture Size, (num) layers  2D=(, ),  layers

  Total amount of constant memory:                bytes

  Total amount of shared memory per block:        bytes

  Total number of registers available per block:

  Warp size:

  Maximum number of threads per multiprocessor:

  Maximum number of threads per block:

  Max dimension size of a thread block (x,y,z): (, , )

  Max dimension size of a grid size    (x,y,z): (, , )

  Maximum memory pitch:                           bytes

  Texture alignment:                              bytes

  Concurrent copy and kernel execution:          Yes with  copy engine(s)

  Run time limit on kernels:                     No

  Integrated GPU sharing Host Memory:            No

  Support host page-locked memory mapping:       Yes

  Alignment requirement for Surfaces:            Yes

  Device has ECC support:                        Disabled

  Device supports Unified Addressing (UVA):      Yes

  Device PCI Domain ID / Bus ID / location ID:    /  /

  Compute Mode:

     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device : "GeForce GTX 1060 6GB"

  CUDA Driver Version / Runtime Version          9.0 / 8.0

  CUDA Capability Major/Minor version number:    6.1

  Total amount of global memory:                  MBytes ( bytes)

  () Multiprocessors, () CUDA Cores/MP:      CUDA Cores

  GPU Max Clock rate:                             MHz (1.71 GHz)

  Memory Clock rate:                              Mhz

  Memory Bus Width:                              -bit

  L2 Cache Size:                                  bytes

  Maximum Texture Dimension Size (x,y,z)         1D=(), 2D=(, ), 3D=(, , )

  Maximum Layered 1D Texture Size, (num) layers  1D=(),  layers

  Maximum Layered 2D Texture Size, (num) layers  2D=(, ),  layers

  Total amount of constant memory:                bytes

  Total amount of shared memory per block:        bytes

  Total number of registers available per block:

  Warp size:

  Maximum number of threads per multiprocessor:

  Maximum number of threads per block:

  Max dimension size of a thread block (x,y,z): (, , )

  Max dimension size of a grid size    (x,y,z): (, , )

  Maximum memory pitch:                           bytes

  Texture alignment:                              bytes

  Concurrent copy and kernel execution:          Yes with  copy engine(s)

  Run time limit on kernels:                     No

  Integrated GPU sharing Host Memory:            No

  Support host page-locked memory mapping:       Yes

  Alignment requirement for Surfaces:            Yes

  Device has ECC support:                        Disabled

  Device supports Unified Addressing (UVA):      Yes

  Device PCI Domain ID / Bus ID / location ID:    /  /

  Compute Mode:

     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

> Peer access from GeForce GTX  6GB (GPU0) -> GeForce GTX  6GB (GPU1) : Yes

> Peer access from GeForce GTX  6GB (GPU1) -> GeForce GTX  6GB (GPU0) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = , Device0 = GeForce GTX  6GB, Device1 = GeForce GTX  6GB

Result = PASS

查看成功

主要流程

要想实现ffempg的GPU化，必须要要对ffempg的解码流程有基本的认识才能改造（因为GPU也是这个流程，不过中间一部分用GPU运算）

我在http://www.cnblogs.com/baldermurphy/p/7828337.html中曾经帖出过CPU解码的流程

主要流程如下

    avformat_network_init();

    av_register_all();//1.注册各种编码解码模块，如果3.3及以上版本，里面包含GPU解码模块

    std::string tempfile = “xxxx”;//视频流地址

    avformat_find_stream_info(format_context_, nullptr）//2.拉取一小段数据流分析，便于得到数据的基本格式

    if (AVMEDIA_TYPE_VIDEO == enc->codec_type && video_stream_index_ < )//3.筛选出视频流

    codec_ = avcodec_find_decoder(enc->codec_id);//4.找到对应的解码器

    codec_context_ = avcodec_alloc_context3(codec_);//5.创建解码器对应的结构体

    av_read_frame(format_context_, &packet_); //6.读取数据包

    avcodec_send_packet(codec_context_, &packet_) //7.发出解码

    avcodec_receive_frame(codec_context_, yuv_frame_) //8.接收解码 

    sws_scale(y2r_sws_context_, yuv_frame_->data, yuv_frame_->linesize, , codec_context_->height, rgb_data_, rgb_line_size_) //9.数据格式转换

GPU解码需要改变4,7,8,9这几个步骤，也就是

找到gpu解码器，

拉取数据给GPU解码器，

得到解码后的数据，

数据格式使用gpu转换（如果需要的话，如nv12转bgra），

最终的格式由具体的需求确定，比如很多opengl的互操作，转成指定的格式（bgra），共用一段内存，数据立刻刷新，连拷贝都不用；

如果是转化成图片，又是另一种需求(bgr)；

适用场景的匹配

不得不提的一点是，GPU运算是一个很好的功能，可是也要看需求和场景，如果不考虑这个，可能得不偿失

比如一个极端的例子，opencv里面也有实现图片的解码，可是在追求效率的时候不会使用它的，

因为一张图片数据上传到GPU（非并行，很耗时），解码（非常快），GPU显存拷贝到内存（非并行，很耗时）

在上传和拷贝出来的就花了几百毫秒，而图片数据的操作很频繁，这会导致cpu占用率的得不到很好的缓解，甚至有的时候，不降反升，解码虽然快，可是用户的体验是慢，而且CPU，GPU都占用了

主要的几个网站

英伟达推荐的ffempg的gpu解码sdk

https://developer.nvidia.com/nvidia-video-codec-sdk

检查显存泄露的工具

http://docs.nvidia.com/cuda/cuda-memcheck/index.html#device-side-allocation-checking

视频流GPU解码在ffempg的实现（一）-基本概念的更多相关文章

视频流GPU解码在ffempg的实现（二）-GPU解码器
1.gpu解码器的基本调用流程要做视频流解码,必须要了解cuda自身的解码流,因为二者是一样的底层实现,不一样的上层调用那cuda的解码流程是如何的呢在https://developer.nvi ...
H264-YUV通过RTP接收视频流ffmpeg解码SDL实时播放
写在前面的话写一个简单的播放器,通过RTP接收视频流,进行实时播放.最初,使用ffplay或者vlc接收按照SDP协议文件可以播放视频,但是视频中断后重启,不能正确的解包,时常会出现如下的错误信息. ...
【GPU编解码】GPU硬解码---CUVID
问题描述:项目中,需要对高清监控视频分析处理,经测试,其解码过程所占CPU资源较多,导致整个系统处理效率不高,解码成为系统的瓶颈. 解决思路: 利用GPU解码高清视频,降低解码所占用CPU资源,加速解 ...
【计算机视觉】【并行计算与CUDA开发】GPU硬解码---CUVID
问题描述:项目中,需要对高清监控视频分析处理,经测试,其解码过程所占CPU资源较多,导致整个系统处理效率不高,解码成为系统的瓶颈. 解决思路: 利用GPU解码高清视频,降低解码所占用CPU资源,加速解 ...
在iOS平台使用ffmpeg解码h264视频流（转）
在iOS平台使用ffmpeg解码h264视频流,有需要的朋友可以参考下. 对于视频文件和rtsp之类的主流视频传输协议,ffmpeg提供avformat_open_input接口,直接将文件路径或UR ...
在iOS平台使用ffmpeg解码h264视频流
来源:http://www.aichengxu.com/view/37145 在iOS平台使用ffmpeg解码h264视频流,有需要的朋友可以参考下. 对于视频文件和rtsp之类的主流视频传输协议,f ...
iOS视频流开发（1）—视频基本概念
iOS视频流开发(1)-视频基本概念手机比PC的优势除了便携外,她最重要特点就是可以快速方便的创作多媒体作品.照片分享,语音输入,视频录制,地理位置.一个成功的手机APP从产品形态上都有这其中的一项 ...
EasyNVR无插件直播服务器软件览器低延时播放监控摄像头视频（EasyNVR播放FLV视频流）
背景描述 EasyNVR的使用者应该都是清楚的了解到,EasyNVR一个强大的功能就是可以进行全平台的无插件直播.主要原因在于rtsp协议的视频流(默认是需要插件才可以播放的)经由EasyNVR处理可 ...
FFMPEG SDK流媒体开发2---分离&period;mp4等输入流音视频而且进行解码输出
对于FFMPEG SDK 提供的Demuxing 为我们实现多路复用提供了非常多方便,以下的案案例实现的是分离一个媒体文件的音频视频流而且解码输出到不同的文件里. 对于音频被还原回 ...

随机推荐

安装Python环境时遇到的问题
问题描述:An error occurred during the installation of assembly 'Microsoft.VC90.MFC,version="9.0.210 ...
用HttpWebRequest提交带验证码的网站
using System; using System.Drawing; using System.IO; using System.Net; using System.Text; using Syst ...
Docker入门教程（三）Dockerfile
Docker入门教程(三)Dockerfile [编者的话]DockerOne组织翻译了Flux7的Docker入门教程,本文是系列入门教程的第三篇,介绍了Dockerfile的语法,DockerOn ...
innobackupex err2
报错: [root@DB dbdata]# innobackupex --defaults-file=/etc/my.cnf --user=root --password=123 /data/dbda ...
caches 文件夹删除
模拟器可以删除真机不行
HTML5----响应式（自适应）网页设计
第一步:在网页代码的头部,加入一行viewport元标签 <meta name="viewport" content="width=device-width, in ...
基数排序python实现
基数排序python实现基数排序基数排序(英语:Radix sort)是一种非比较型整数排序算法,其原理是将整数按位数切割成不同的数字,然后按每个位数分别比较.由于整数也可以表达字符串(比如名字或 ...
bzoj 4621&colon; Tc605 动态规划
题解: 一道比较简单的题目想着想着就把题目记错了..想成了可以把某段区间覆盖为其中一个数其实是比较简单的每个点的贡献一定是一个区间(就跟zjoi2018那题一样) 然后问题就变成了给你n个区间让 ...
发送http请求的方法
在http/1.1 协议中,定义了8种发送http请求的方法 get post options head put delete trace connect patch. 根据http协议的设计初衷,不 ...
windows、linux下通过ftp上传文件小脚本
一.windows @echo off #open ip 将要上传文件的IP地址echo open IP>ftp.up #用户名echo ninic>>ftp.up #密码echo ...