Vision-language-action (VLA) models for robotics

Project description

We are seeking motivated Student Research Assistants to join an exciting research project at the intersection of robotics, machine learning, and multimodal foundation models. This project focuses on developing and fine-tuning Vision-Language-Action (VLA) models for robotic manipulation and embodied intelligence. If you're passionate about robotics and cutting-edge AI systems that integrate perception, language understanding, and control — we encourage you to apply!

This research explores how to train and fine-tune Vision-Action-Language models that enable robots to:

Understand visual scenes
Interpret natural language instructions
Execute manipulation and locomotion tasks
Generalize across tasks and environments

Research activities will involve:

Training and fine-tuning VLA models on robotic platforms (simulated and potentially real)
Running large-scale experiments in simulation (e.g., MuJoCo)
Integrating learned policies with ROS-based robotic systems
Designing evaluation protocols for embodied intelligence
Investigating data generation, fine-tuning strategies, and model adaptation

Qualifications

It is desirable for applicants to have the following qualifications:

Programming skills in Python and PyTorch
Background in dynamical systems and control theory
Experience with ROS and/or MuJoCo simulation

Optional qualifications:

Strong background in probability and statistics
Prior experience working with Large Language Models (LLMs), such as prior experience working with Vision-Language-Action (VLA) or multimodal models, experience with imitation learning, reinforcement learning, or policy fine-tuning, and familiarity with robotics manipulation or locomotion systems