Computer Vision;Accuracy;Navigation;Urban Areas;Computer Architecture;Transformers;Trajectory;vision Transformers;human Motion Prediction;semantic Scene Understanding;masked Autoencoders;occupancy Priors