OpenCL - clEnqueueNDRangeKernel

1. Executing Kernels - 执行内核

https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/

2. clEnqueueNDRangeKernel

Enqueues a command to execute a kernel on a device.
此函数所入队的命令可以在设备上执行内核。

cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue,
 	                           cl_kernel kernel,
 	                           cl_uint work_dim,
 	                           const size_t *global_work_offset,
 	                           const size_t *global_work_size,
 	                           const size_t *local_work_size,
 	                           cl_uint num_events_in_wait_list,
 	                           const cl_event *event_wait_list,
 	                           cl_event *event)

2.1 Parameters

command_queue

A valid command-queue. The kernel will be queued for execution on the device associated with command_queue.
command_queue 是一个命令队列。排队的内核会在 command_queue 所关联的设备上执行。

kernel

A valid kernel object. The OpenCL context associated with kernel and command_queue must be the same.
kernel 是一个内核对象。kernel 和 command_queue 必须位于同一 OpenCL 上下文中。

work_dim

The number of dimensions used to specify the global work-items and work-items in the work-group. work_dim must be greater than zero and less than or equal to CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS.
work_dim 用来指定全局作业项以及作业组中作业项的维数。其值必须大于零并且小于等于CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS。

global_work_offset

global_work_offset can be used to specify an array of work_dim unsigned values that describe the offset used to calculate the global ID of a work-item. If global_work_offset is NULL, the global IDs start at offset (0, 0, ..., 0).
global_work_offset 是一个数组，有 work_dim 个元素，且元素为无符号值；所描述的偏移量可用来计算作业项的全局 ID。如果 global_work_offset 是 NULL，则全局 ID开始于偏移量 (0, 0, ..., 0)。

global_work_size

Points to an array of work_dim unsigned values that describe the number of global work-items in work_dim dimensions that will execute the kernel function. The total number of global work-items is computed as global_work_size[0] * ... * global_work_size[work_dim - 1].
global_work_size 也是一个数组，有 work_dim 个元素，且元素均为无符号值。对于将要执行 kernel 的全局作业项而言，他在某个维度上的数目由数组中的相应元素表示，总数为 global_work_size[0] * ... * global_work_size[work_dim - 1]。

local_work_size

Points to an array of work_dim unsigned values that describe the number of work-items that make up a work-group (also referred to as the size of the work-group) that will execute the kernel specified by kernel. The total number of work-items in a work-group is computed as local_work_size[0] *... * local_work_size[work_dim - 1].
local_work_size 也是一个数组，有 work_dim 个元素，且元素均为无符号值。对于将要执行 kernel 的作业组而言，其中作业项的数目由数组中的相应元素表示，总数为 local_work_size[0] * ... * local_work_size[work_dim - 1]。

指向一个数组，有 work_dim 个元素，且元素均为无符号值。该数组描述组成一个工作组的工作项的数量 (也称为工作组的大小)，这些工作组将执行 kernel 指定的内核。工作组中工作项的总数计算为 local_work_size[0] * ... * local_work_size[work_dim - 1]。

The total number of work-items in the work-group must be less than or equal to the CL_DEVICE_MAX_WORK_GROUP_SIZE value specified in table of OpenCL Device Queries for clGetDeviceInfo and the number of work-items specified in local_work_size[0], ..., local_work_size[work_dim - 1] must be less than or equal to the corresponding values specified by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], ..., CL_DEVICE_MAX_WORK_ITEM_SIZES[work_dim - 1].
工作组中的工作项总数必须小于或等于在 OpenCL 设备查询表中 clGetDeviceInfo 所指定的 CL_DEVICE_MAX_WORK_GROUP_SIZE 值，在 local_work_size[0], ..., local_work_size[work_dim - 1] 中指定的工作项数量必须小于或等于由 CL_DEVICE_MAX_WORK_ITEM_SIZES[0], ..., CL_DEVICE_MAX_WORK_ITEM_SIZES[work_dim - 1] 指定的值。

The explicitly specified local_work_size will be used to determine how to break the global work-items specified by global_work_size into appropriate work-group instances. If local_work_size is specified, the values specified in global_work_size[0], ..., global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0], ..., local_work_size[work_dim - 1].
明确指定的 local_work_size 将用于确定如何将 global_work_size 指定的全局工作项目分解为适当的工作组实例。如果指定了 local_work_size，则 global_work_size[0], ..., global_work_size[work_dim - 1] 中指定的值必须可以被local_work_size[0], ..., local_work_size[work_dim - 1] 整除。

显式指定的 local_work_size 可用来确定怎样将 global_work_size 所指定的全局作业项划分成多个作业组实体。如果指定了 local_work_size，global_work_size[i] 必须能被相应的 local_work_size[i] 整除，其中 0 ≤ i ≤ work_dim − 1 0 \leq i \leq \text{work\_dim} - 1 0≤i≤work_dim−1。

local_work_size can also be a NULL value in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances.
local_work_size 也可以是 NULL，这样的话 OpenCL 实现将自己决定如何将全局作业项划分成多个作业组实体。

这些作业组实体将在多个计算器件上并行执行，或在单个计算器件上并发执行。

每个作业项都有一个唯一的全局 ID。在内核中，可以通过对 global_work_size 和 global_work_offset 的运算得到这个全局 ID。另外，每个作业项在作业组中还有一个唯一的局部 ID。在内核中，可以通过对 local_work_size 的运算得到这个局部 ID。局部 ID始终起自 (0, 0, ..., 0)。

event_wait_list and num_events_in_wait_list

Specify events that need to complete before this particular command can be executed. If event_wait_list is NULL, then this particular command does not wait on any event to complete. If event_wait_list is NULL, num_events_in_wait_list must be 0. If event_wait_list is not NULL, the list of events pointed to by event_wait_list must be valid and num_events_in_wait_list must be greater than 0. The events specified in event_wait_list act as synchronization points. The context associated with events in event_wait_list and command_queue must be the same. The memory associated with event_wait_list can be reused or freed after the function returns.
指定在执行此特定命令之前需要完成的事件。如果 event_wait_list 为 NULL，则此特定命令不等待任何事件完成。如果 event_wait_list 为 NULL，则 num_events_in_wait_list 必须为 0。如果 event_wait_list 不是 NULL，则 event_wait_list 指向的事件列表必须有效，并且 num_events_in_wait_list 必须大于 0。在 event_wait_list 中指定的事件充当同步点。与 event_wait_list 和 command_queue 中的事件关联的上下文必须相同。函数返回后，可以重新使用或释放与 event_wait_list 关联的内存。

event_wait_list 和 num_events_in_wait_list 中列出了执行此命令前要等待的事件。如果 event_wait_list 是 NULL，则无须等待任何事件，并且 num_events_in_wait_list 必须是 0。如果 event_wait_list 不是 NULL，则其中所有事件都必须是有效的，并且 num_events_in_wait_list 必须大于 0。event_wait_list 中的事件充当同步点，并且必须与 command_queue 位于同一个上下文中。此函式返回后，即可回收并重用 event_wait_list 所关联的内存。

event

Returns an event object that identifies this particular kernel execution instance. Event objects are unique and can be used to identify a particular kernel execution instance later on. If event is NULL, no event will be created for this kernel execution instance and therefore it will not be possible for the application to query or queue a wait for this particular kernel execution instance.
返回标识此特定内核执行实例的事件对象。事件对象是唯一的，以后可以用来标识特定的内核执行实例。如果 event 为 NULL，则不会为该内核执行实例创建任何事件，因此应用程序将无法查询或排队等待该特定内核执行实例。

event 会返回一个事件对象，用来标识此拷贝命令，可用来查询或等待此命令完成。而如果 event 是 NULL，就没办法查询此命令的状态或等待其完成了。

2.2 Notes

The work-group size to be used for kernel can also be specified in the program source using the __attribute__ ((reqd_work_group_size(X, Y, Z))) qualifier. In this case the size of work group specified by local_work_size must match the value specified by the reqd_work_group_size __attribute__ qualifier.
也可以在程序源代码中使用 __attribute__ ((reqd_work_group_size(X, Y, Z))) 限定符为 kernel 指定作业组的大小。在这种情况下，由 local_work_size 指定的工作组的大小必须与 reqd_work_group_size __attribute__ 限定符指定的值匹配。

也可以在程序源码中通过限定符 __attribute__ ((reqd_work_group_size(X, Y, Z))) 为 kernel 指定作业组的大小。这种情况下，local_work_size 的值必须与此特性限定符所指定的值相匹配。

These work-group instances are executed in parallel across multiple compute units or concurrently on the same compute unit.
这些工作组实例在多个计算单元上并行执行，或在同一计算单元上并发执行。

Each work-item is uniquely identified by a global identifier. The global ID, which can be read inside the kernel, is computed using the value given by global_work_size and global_work_offset. In addition, a work-item is also identified within a work-group by a unique local ID. The local ID, which can also be read by the kernel, is computed using the value given by local_work_size. The starting local ID is always (0, 0, ... 0).
每个工作项均由全局标识符唯一标识。可以在内核内部读取的全局 ID 是使用 global_work_size 和 global_work_offset 给出的值来计算的。此外，工作项还通过唯一的局部 ID 在工作组中标识。局部 ID 也可以由内核读取，它是使用 local_work_size 给定的值来计算的。局部 ID 始终开始于 (0, 0, ... 0)。

每个作业项都有一个唯一的全局 ID。在内核中，可以通过对 global_work_size 和 global_work_offset 的运算得到这个全局 ID。另外，每个作业项在作业组中还有一个唯一的局部 ID。在内核中，可以通过对 local_work_size 的运算得到这个局部 ID。局部 ID 始终开始于 (0, 0, ... 0)。

• 云收呗D0聚合码	• Markdown基础语法
• ALOKEX交易所——数字经济崛起领航完美未来	• 捷径系统极简健身如何选择健身房管理系统？
• “区块链+政务” 将如何前行，接下政务信息化改	• Fabric1.0.0单机环境部署

• Esp8266天猫精灵_RGB灯_非点灯平台	• STM32F103 串口1和串口3对发数据配合蓝牙模块
• TMS570学习【1】了解什么是TMS570	• 新闻稿 \| Qt公司收购froglogic公司以巩固市场领
• [Java]SpringBoot2整合mqtt服务器EMQ实现消息订	• 苹果群控投屏同步操作原理及运用的平台APP分享

• Esp8266天猫精灵_RGB灯_非点灯平台	• STM32F103 串口1和串口3对发数据配合蓝牙模块
• TMS570学习【1】了解什么是TMS570	• 新闻稿 \| Qt公司收购froglogic公司以巩固市场领
• [Java]SpringBoot2整合mqtt服务器EMQ实现消息订	• 苹果群控投屏同步操作原理及运用的平台APP分享
• STM32查询式按键输入[直接用寄存器]	• Ubuntu系统 USB设备端口绑定
• 2021-04-14 第四次按键输入实验	• Flutter扫码功能完美实现