Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

[dead]


No it is not. CPU and GPU overhead is close to 0 anyways if you are loading weights at 10GB/s.

NVMEs are much, much slower than RAM. Especially unified/soldered RAM.

Bandwidth-wise, it's fun when you have a storage array instead of just 1 nvme. Then you can saturate the pcies, and go beyond what's cost effective on ram. Interesting to think of this as opening the door to 10-100T MoEs..

Wasn't there a storage device some Years ago (decade plus) that was RAM strapped to a PCI-E card with the electronics to present the RAM as a storage device?

To be fair, llama.cpp had this feature for over a year now. It just applies to GGUF.

I got an m3, I will test it on metal and check how it goes



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: