关注获取更多计算机视觉与深度学习知识
OpenCV支持的并行框架
OpenCV从4.5版本开始,新增了并行代码执行支持,以常见的图像像素遍历卷积计算为例,演示OpenCV中卷积计算并行代码执行与非并行的卷积计算代码执行,同时对比时间消耗。OpenCV并行框架支持下面几种方法启用并行加速,分别是:1. Intel TBB (第三方库,需显式启用)2. C=并行C/C++编程语言扩展 (第三方库,需显式启用)3. OpenMP (编译器集成, 需显式启用)4. APPLE GCD (苹果系统自动使用)5. Windows RT并发(Windows RT自动使用)6. Windows并发(运行时部分, Windows,MSVC++ >= 10自动使用)7. Pthreads
在VS IDE中开启OpenMP,只需要右键点击项目,从属性中
这样就可以开启并行加速。
卷积并行实现与时间比较
OpenCV支持两种方式的并行代码实现,分别是:parallel_for_ParallelLoopBody以3x3卷积为例,原始的代码实现如下:start = (double)cv::getTickCount();for (int row = 0; row < rows; row++) {for (int col = 1; col < cols - 1; col++) {int sum = src.at<uchar>(row, col) + src.at<uchar>(row - 1, col) + src.at<uchar>(row + 1, col) + src.at<uchar>(row, col - 1) + src.at<uchar>(row - 1, col - 1) + src.at<uchar>(row + 1, col - 1) + src.at<uchar>(row, col + 1) + src.at<uchar>(row - 1, col + 1) + src.at<uchar>(row + 1, col + 1);int pv = sum / 9; dst.at<uchar>(row, col) = pv; }}parallel_for_ 3x3卷积的代码实现如下double start = (double)cv::getTickCount();parallel_for_(Range(0, rows * cols), [&](const Range &range){for (int r = range.start; r < range.end; r++) {int i = r / cols, j = r % cols;double value = 0;for (int k = -sz; k <= sz; k++) { uchar *sptr = src.ptr(i + sz + k);for (int l = -sz; l <= sz; l++) { value += kernel.ptr<double>(k + sz)[l + sz] * sptr[j + sz + l]; } } dst.ptr(i)[j] = saturate_cast<uchar>(value); }});double time = (((double)cv::getTickCount() - start)) / cv::getTickFrequency();std::cout << "parallel_for_conv3x3 execute time: " << time * 1000 << " ms" << std::endl;ParallelLoopBody 3x3卷积的代码实现如下classparallelConvolution : publicParallelLoopBody{private: Mat m_src, &m_dst; Mat m_kernel;int sz;public: parallelConvolution(Mat src, Mat &dst, Mat kernel) : m_src(src), m_dst(dst), m_kernel(kernel) { sz = kernel.rows / 2; }virtualvoidoperator()(const Range &range) const CV_OVERRIDE {for (int r = range.start; r < range.end; r++) {int i = r / m_src.cols, j = r % m_src.cols;doublevalue = 0;for (int k = -sz; k <= sz; k++) {const uchar *sptr = m_src.ptr(i + sz + k);for (int l = -sz; l <= sz; l++) {value += m_kernel.ptr<double>(k + sz)[l + sz] * sptr[j + sz + l]; } } m_dst.ptr(i)[j] = saturate_cast<uchar>(value); } }};
调用方式如下:
start = (double)cv::getTickCount();parallelConvolution obj(src, dst, kernel);parallel_for_(Range(0, rows * cols), obj);time = (((double)cv::getTickCount() - start)) / cv::getTickFrequency();std::cout << "parallelConvolution conv3x3 execute time: " << time * 1000 << " ms" << std::endl;
运行结果如下:
我晕倒,不是说并行加速了,可是我得不到OpenCV官方教程上那样的明显加速的结果,因为教程上没有说明它到底是用了那个并行加速框架得到的。有时候 “尽信书不如无书”
对此,我自己也有一些原因分析,但是更希望大家留言分析一下相关原因,为什么没有加速效果??
|