With Quiet-STaR, language models learn to think before speaking