For a reference on the hardware that I'm using, I have a Lenovo Y410P laptop with i7 4600 and geforce 755m graphics card on a purely virgin install.

CUDA seems to be the bane of laptop graphics cards in Kali and after banging my head against the wall trying everywhichway going nowhere with the repository drivers, I went for an install outside of the prepackaged DKMS method that BlackmoreOps uses which uses the built in repositories of Kali. I also obtain a newer driver in the process. I figured I'd help out anyone else crawling the troubleshooting forums for help.

NOTE: I have not tried to play games or anything with this method. This is 100% for getting CUDA utilities running for programs like cudaHashcat (which I've replaced cpyrit with). I have not nor plan on trying this with virtualbox, vmware, or parallels and I cannot make any assurances that this will work there. It also strikes me as slightly strange to try and get a video card to play games on a Kali distro since it's only real purpose is security analysis for network administrators and hobbists. I have a dedicated box for Kali, my free time is spent on a mac. Summing up, don't ask me about games.

cpyrit testing will come along in the future and I'll update the post hopefully shortly.

Also: this is pretty painless (which was sadly anticlimatic)

Things you will need before beginning:
CUDA Kit: https://developer.nvidia.com/cuda-downloads
The kit that I used for this is Cuda 7.0.28 which is more recent than others on this forum who are using 5.5.
cudaHashcat: http://hashcat.net/oclhashcat/

I LOVE cudaHashcat. It's set, forget, and throw a cold pack under the laptop to keep it below 90C.

Note again: This is on a perfectly fresh Kali 1.1.0 install from 4/16/2015. I make zero guarantees about other installations. And by fresh, I mean not even upgraded through apt, 100% virgin except for what is installed below.

Prenote: obtain linux-headers as always. By now, you should know how to do it if you're running Kali, but if you don't: apt-get install linux-headers-$(uname -r)

Prepackages: install freeglut3-dev and linxmu-dev

1. Download the CUDA dev kit for Ubuntu 14.04. This does work and the version I am using is 7.0. This will also include the 446 driver which is several versions later than the 440 found in the repositories as of writing this post.

2. Download the cudaHashcat kit for testing purposes.

3. chmod +x both these .RUN files

4. apt-get purge nvidia-* if you have previous files installed. Get rid of it all so that the new packaged doesn't conflict and wig out.

Reboot just to be safe

5. Blacklist nouveau without mucking with grub. Open your favorite editor and place blacklist nouveau in /etc/modprobe.d/kali-blacklist.conf
As an observance, I've seen a lot of guides that say you should uninstall nouveau. This is unncessary and in my opinion a dangerous game to play. Nouveau works, it just does, so if it all goes belly up, you'll like having a driver that works on hand.

6. Ctrl+Alt+F1 and service gdm3 stop

Now you're ready to begin installing the nvidia drivers and cuda pacakage.

7. ./cuda_7.0.28_linux.run

Say yes to everything including DKMS install. It'll install your drivers, the packages. If you want, I put my libraries in /usr/share/..... rather than /local/

reboot

8. Install bumblebee primus

once this is installed

8.1. tee /proc/acpi/bbswitch <<<OFF
8.2. tee /proc/acpi/bbswitch <<<ON

reboot

I don't know why this worked for me and I am 100% willing to bet that it's because of an idiosyncracy in my hardware. Probably Lenovo's underwhelming bios...

9. lsmod | grep nouveau and then lsmod | grep nvidia

Both were blank for me, so modprobe nvidia

Now the nvidia driver is loaded into the system and I can grep its existence.

The dangerous thing to do now is to go and do what everyone says next: glxinfo | grep -i "direct rendering". I get the GLX missing error. It looks like the world is lost when this doesn't come up, but it's not.

10. Finally, install cudahashcat. Just unzip and go for a test run.
Sometimes, it'll say that it didn't find a nvidia driver. Try running example 0, then 400, then 500, and for some bizarre reason, it gets the idea. I have a fair grasp of linux going all the way back to Dapper Drake and it's wonderful little quirks, but this one escapes me.

If you have a decent or even nicer card than mine, you'll love cudaHashcat. It's fast, it's easy, versatile, and there is no reason to precompute hashes so you get the exilerating rush of watching big numbers in black and white actively working rather than big numbers prepping.