文章/答案/技术大牛

发布

社区首页 >问答首页 >具有全局范围的thrust::device_vector

问具有全局范围的thrust::device_vector
EN

Stack Overflow用户

提问于 2019-02-18 07:26:55

回答 1查看 1.7K关注 0票数 3

我正在编写一个程序来计算三角形网格数据的许多属性。其中一些属性，我想用推力：：方法计算，其他属性需要使用CUDA内核中的原始内存指针来计算。

为了将数据传输到GPU，我在一个transfer.cu文件中获得了这一点(因为不支持在普通C++代码中创建和操作thrust::device_vector)：

// thrust vectors (global)
thrust::host_vector<glm::vec3> trianglethrust_host;
thrust::device_vector<glm::vec3> trianglethrust_device;

extern "C" void trianglesToGPU_thrust(const trimesh::TriMesh *mesh, float** triangles) {
// fill host vector
for (size_t i = 0; i < mesh->faces.size(); i++) {
    // PUSHING DATA INTO HOST_VECTOR HERE (OMITTED FOR CLARITY)
} 
// copy to GPU by assigning host vector to device vector, like in the Thrust documentation
trianglethrust_device = trianglethrust_host;
// save raw pointer
*triangles = (float*)thrust::raw_pointer_cast(&(trianglethrust_device[0]));
}

这个函数trianglestoGPU_thrust是从我的C++程序的主要方法调用的。所有这些都可以正常工作，直到程序退出，并且(全局定义的) trianglethrust_device向量超出了范围。推力试图释放它，但是CUDA上下文已经消失，导致了一个cudaErrorInvalidDevicePointer

什么是解决我的问题的最佳做法？

TL;DR:我想要一个在我的程序期间存在的thrust::device_vector，因为我想向它抛出推力：：函数(如transform等)，以及在CUDA中通过原始指针访问来读取和操作它。

解决方案:在我的例子中，我似乎是自由使用原始数据指针在更远的进程中。删除这个自由，结束我的主循环

trianglethrust_device.clear();
trianglethrust_device.shrink_to_fit();
trianglethrust_device.device_vector~;

在CUDA运行库被拆除之前强制清除该向量。这是可行的，但可能仍然是一种相当丑陋的做法。

我推荐Robert对这个问题的回答，并将其标记为有效。

c++

cuda

thrust

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-02-18 07:51:59

正如您已经发现的，推力向量容器本身不能放在文件范围内。

一种可能的解决方案是简单地在main开头创建所需的向量，然后将对这些向量的引用传递给任何需要它们的函数。

如果您真的想要“全局行为”，可以在全局/文件范围内放置指向向量的指针，然后在main开头初始化所需的向量，并将全局范围的指针设置为指向main中创建的向量。

根据注释中的问题，我认为主文件是一个与主机编译器一起编译的.cpp文件是重要的/可取的。因此，我们可以使用前面提到的概念，并结合堆上向量的分配，以避免在程序终止之前的去分配。下面是一个完整的例子：

$ cat main.cpp
#include "transfer.h"

int main(){

  float **triangles, *mesh;
  triangles = new float *[1];
  mesh = new float[4];
  mesh[0] = 0.1f; mesh[1] = 0.2f; mesh[2] = 0.3f;
  trianglesToGPU_thrust(mesh, triangles);
  do_global_work(triangles);
  finish();
}
$ cat transfer.h
void trianglesToGPU_thrust(const float *, float **);
void do_global_work(float **);
void finish();
$ cat transfer.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include "transfer.h"
#include <iostream>
#include <cstdio>
#include <thrust/copy.h>

__global__ void k(float *data, size_t ds){
  for (int i = 0; i < ds; i++) printf("%f,", data[i]);
}

// thrust vectors (global)
thrust::host_vector<float> *trianglethrust_host;
thrust::device_vector<float> *trianglethrust_device;

void trianglesToGPU_thrust(const float *mesh, float** triangles) {
//create vectors
  trianglethrust_host = new thrust::host_vector<float>;
  trianglethrust_device = new thrust::device_vector<float>;

// fill host vector
  size_t i = 0;
  while (mesh[i] != 0.0f) {
    (*trianglethrust_host).push_back(mesh[i++]);
  }
// copy to GPU by assigning host vector to device vector, like in the Thrust documentation
  *trianglethrust_device = *trianglethrust_host;
// save raw pointer
  *triangles = (float*)thrust::raw_pointer_cast(&((*trianglethrust_device)[0]));
}

void do_global_work(float** triangles){

  std::cout << "from device vector:" << std::endl;
  thrust::copy((*trianglethrust_device).begin(), (*trianglethrust_device).end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl << "from kernel:" << std::endl;
  k<<<1,1>>>(*triangles, (*trianglethrust_device).size());
  cudaDeviceSynchronize();
  std::cout << std::endl;
}

void finish(){
  if (trianglethrust_host) delete trianglethrust_host;
  if (trianglethrust_device) delete trianglethrust_device;
}
$ nvcc -c transfer.cu
$ g++ -c main.cpp
$ g++ -o test main.o transfer.o -L/usr/local/cuda/lib64 -lcudart
$ ./test
from device vector:
0.1,0.2,0.3,
from kernel:
0.100000,0.200000,0.300000,
$

下面是另一种方法，类似于前面的方法，在全局范围内使用推力容器的std::vector (只有transfer.cu文件不同于前面的示例，main.cpp和transfer.h是相同的)：

$ cat transfer.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include "transfer.h"
#include <iostream>
#include <cstdio>
#include <thrust/copy.h>
#include <vector>

__global__ void k(float *data, size_t ds){
  for (int i = 0; i < ds; i++) printf("%f,", data[i]);
}

// thrust vectors (global)
std::vector<thrust::host_vector<float> > trianglethrust_host;
std::vector<thrust::device_vector<float> > trianglethrust_device;

void trianglesToGPU_thrust(const float *mesh, float** triangles) {
//create vectors
  trianglethrust_host.resize(1);
  trianglethrust_device.resize(1);

// fill host vector
size_t i = 0;
  while (mesh[i] != 0.0f) {
    trianglethrust_host[0].push_back(mesh[i++]);
  }
// copy to GPU by assigning host vector to device vector, like in the Thrust documentation
  trianglethrust_device[0] = trianglethrust_host[0];
// save raw pointer
  *triangles = (float*)thrust::raw_pointer_cast(trianglethrust_device[0].data());
}

void do_global_work(float** triangles){

  std::cout << "from device vector:" << std::endl;
  thrust::copy(trianglethrust_device[0].begin(), trianglethrust_device[0].end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl << "from kernel:" << std::endl;
  k<<<1,1>>>(*triangles, trianglethrust_device[0].size());
  cudaDeviceSynchronize();
  std::cout << std::endl;
}

void finish(){
  trianglethrust_host.clear();
  trianglethrust_device.clear();
}
$ nvcc -c transfer.cu
$ g++ -o test main.o transfer.o -L/usr/local/cuda/lib64 -lcudart
$ ./test
from device vector:
0.1,0.2,0.3,
from kernel:
0.100000,0.200000,0.300000,
$

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54742267

复制

相似问题

问具有全局范围的thrust::device_vector
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问具有全局范围的thrust::device_vectorEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问具有全局范围的thrust::device_vector
EN