Internship : Enriching 3D with Language F/M

I apply

Update on 26/09/2025

Contract type: Internship
Work time: Full time
Location Meylan

About NAVER LABS Europe

NAVER LABS Europe is part of the R&D division of NAVER, Korea’s leading Internet portal and a global tech company with a range of services that include search, commerce, content, fintech, robotics and cloud.

The position

3D-LLMs [1-4] is a groundbreaking advancement that brings Large Language Models (LLMs) into the realm of three-dimensional understanding — essentially teaching them to "see" and reason about the physical world in 3D. 3D-LLMs are designed to handle 3D data inputs like point clouds, meshes, and multi-view images. This allows them to perform tasks that require spatial reasoning and physical context.

At NLE we proposed the *St3R family [5-8], a set of breakthrough approaches in 3D reconstruction which, contrary to most existing methods, don’t require any camera parameters to perform direct 3D reconstruction from image content. They work on any number of images, even when they do not overlap, offering a substantial simplification over traditional methods and significant potential and versatility in the handling of diverse 3D vision challenges.

The goal of this internship is to bring these two worlds together by investigating different strategies to enrich our *St3R models with new capabilities such as 3D captioning, 3D question-answering, 3D grounding as well as planning and language-based navigation.

About the research team

In the Vision group, we conduct research to enable robots to understand and interact with their environment through advanced perception systems. We focus on visual perception, a fundamental capability for any intelligent system designed to engage with the world. Robots must perceive the structure, objects, and people in their surroundings to build a deeper understanding of their environment and effectively perform assigned tasks. Our research combines 3D vision, visual representation learning, self-supervised learning, and human behavior understanding to develop AI components that help robots navigate in 3D environments, detect and interact with surrounding objects and people, and continuously adapt when deployed in new environments.

What we're looking for

already in a PhD program
strong knowledge in Computer Vision with solid deep learning background
good knowledge about 3D geometry and multi-modal LLMs
strong programming skills
experience with visual transformers, LLMs, torchvision and LORA fine-tuning

What we offer

We foster a collaborative environment dedicated to ambitious, multidisciplinary projects that translate advanced research into impactful, real-world solutions, supported by 30+ years of experience in AI and related fields.
Flexible work/life balance.
We are an equal opportunity employer that hires based on skills, experience, and merit. We foster an inclusive and diverse workplace where all qualified candidates are considered fairly, regardless of background.
We’re based in Meylan, close to Grenoble, a city that offers the perfect balance of urban life, cutting-edge research and technology, and spectacular mountain landscapes that provide countless opportunities to relax, recharge, and enjoy the outdoors.

All applications will be carefully considered, even if not all required skills are met. We value diverse backgrounds and the potential of each candidate, and we offer training to support the development of necessary skills.

NAVER LABS, co-located in Korea and France, is the organization dedicated to preparing NAVER’s future. Scientists at NAVER LABS Europe are empowered to pursue long-term research problems that, if successful, can have significant impact and transform NAVER. We take our ideas as far as research can to create the best technology of its kind. Active participation in the academic community and collaborations with world-class public research groups are, among others, important ways to achieve these goals. Teamwork, focus and persistence are important values for us.

When applying for this position online, please don't forget to upload your CV and cover letter. Incomplete applications will not be considered.

NAVER LABS Europe is subject to French jurisdiction requiring organisations to stipulate that a job/internship is open to both women and men. None of our jobs/internships are gender specific.

References

[1] Hong et al, 3D-LLM: Injecting the 3D World into Large Language Models, NeurIPS‘23

[2] Chen et al, LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning, CVPR’24

[3] Jia et al, SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding, ECCV’24

[4] Yu et al, Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning, CVPR’25

[5] Wang et al, DUSt3R: Geometric 3D Vision Made Easy, CVPR’24

[6] Cabon et al, MUSt3R: Multi-view Network for Stereo 3D Reconstruction, CVPR’25

[7] Zust et al, PanSt3R: Multi-view consistent panoptic segmentation, ICCV’25

Réf: 76163a8d-6d5a-494d-b7f8-67a4737c997e

This position has been filled.

Share job