nvidia rtx spark local ai agents
qwen3 6 27b dflash speculative decoding
unsloth qwen3 6 27b gguf
tiny vllm cpp cuda inference engine
advanced quantization algorithm for llms