I think we need to break this into two parts for Windows and LabVIEW.
In the first part, we need to have a user level API (C based) for your kernel driver. The only way to access a kernel driver is via the OS layer of Read/Write/DeviceIoControl - any API methods you have in the kernel driver cannot be accessed directly from user space. As far as avoiding copies, you need to look at the options for the memory buffers used in these APIs. If you can lock the memory in the kernel then the user mode code can write directly into it.
For simplicity of testing I would recommend creating this initial user-mode API in straight C, rather than trying to either (a) get LV to call the OS methods directly or (b) use the CINs to call them.
Second part - Once you have that API, you can use either the Call Library Node or CINs to invoke them from LV.
BTW - the document link I sent you was regarding creating a driver for your PCI card without having to write a kernel driver. If you already have the kernel driver, it isn't going to help that much.