A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics - Nature Biomedical Engineering