Vision-language-action (VLA) models for robotics

Project description

We are seeking motivated Student Research Assistants to join an exciting research project at the intersection of robotics, machine learning, and multimodal foundation models. This project focuses on developing and fine-tuning Vision-Language-Action (VLA) models for robotic manipulation and embodied intelligence. If you're passionate about robotics and cutting-edge AI systems that integrate perception, language understanding, and control — we encourage you to apply!

This research explores how to train and fine-tune Vision-Action-Language models that enable robots to:

  • Understand visual scenes
  • Interpret natural language instructions
  • Execute manipulation and locomotion tasks
  • Generalize across tasks and environments

Research activities will involve:

  • Training and fine-tuning VLA models on robotic platforms (simulated and potentially real)
  • Running large-scale experiments in simulation (e.g., MuJoCo)
  • Integrating learned policies with ROS-based robotic systems
  • Designing evaluation protocols for embodied intelligence
  • Investigating data generation, fine-tuning strategies, and model adaptation

Qualifications

It is desirable for applicants to have the following qualifications: 

  • Programming skills in Python and PyTorch
  • Background in dynamical systems and control theory
  • Experience with ROS and/or MuJoCo simulation

Optional qualifications:  

  • Strong background in probability and statistics
  • Prior experience working with Large Language Models (LLMs), such as prior experience working with Vision-Language-Action (VLA) or multimodal models, experience with imitation learning, reinforcement learning, or policy fine-tuning, and familiarity with robotics manipulation or locomotion systems