llm architectures kv sharing mhc
takeaways discussion about the deepseek v4 architecture
lego sagrada familia