Excited to announce our HyperP and SqrtGate paper! Checkout my twitter on how we achieve both stable and optimal LR transfer for training FLOPs scaling!