Near-Optimal Scaling of Large Deep Network Training on Public Cloud