
How to serve DeepSeek-R1 & v3 on NVIDIA GH200 Grace Hopper Superchip (400 tok/sec throughput, 10 tok/sec/query)
Guest post from Luke Miles, a San-Francisco -based engineer who develops an MLP training accelerator (https://ohmchip.com/). We've invited him to share the ...