Jobs and Career Opportunities in Europe

25 Mar 2026

Job Information

Organisation/Company: Inria, the French national research institute for the digital sciences
Research Field: Computer science
Researcher Profile: First Stage Researcher (R1)
Application Deadline: 25 Apr 2026 - 00:00 (UTC)
Country: France
Type of Contract: Temporary
Job Status: Full-time
Hours Per Week: 38.5
Offer Starting Date: 1 Apr 2026
Is the job funded through the EU Research Framework Programme?: Not funded by a EU programme
Reference Number: 2026-09717
Is the Job related to staff position within a Research Infrastructure?: No

Offer Description

At Scool, Gautron (2022) have turned a high-fidelity crop simulator into an RL environment. In this problem, an AI advises a farmer throughout a harvesting season, deciding daily how much should the farmer water, fertilize, and so on, with a goal of striking a balance between several criteria such as yield or nitrate pollution under varying weather conditions. By running an off-the-shelf deep RL algorithm such as PPO (Schulman et al., 2017), it was shown in Gautron (2022) that RL can find more efficient solutions than human expert policies. However, the main drawback of the current decision support system is that it provides recommendations under a pre-defined trade-off between the different criteria (such as yield, pollution or work load) and can thus not adapt to the varying needs of individual farmers. An existing solution in the literature is to wrap an RL solver around a preference elicitation mechanism to allow non-RL-expert users to tune the reward function to their needs, while only interacting with the AI at a very abstract level. This is the so called Preference-based RL (PbRL, Wirth et al. (2017)), also known as RL from human feedback (RLHF). These methods have had a recent surge of popularity as they were shown to be useful for training large language models (Ouyang et al., 2022). An illustration of the RL from Human Feedback (RLHF) framework is given in Figure 1 and a survey can be found in Wirth et al. (2017).

Despite continuous efforts to improve PbRL algorithms (Hu et al., 2024; Zhu et al., 2025; Driss et al.), in their current state, they remain inadequate for real-world applications such as the aforementioned task. One of the limitation of current PbRL methods is that they use a costly policy optimization step using deep RL between each query round making the overall interaction last potentially several hours. The idea of this M2 internship is to exploit specificities of the task, namely that preferences can be expressed as proximity to a goal (a vector containing a target average crop yield, amount of used fertilizer, etc.), and use an unsupervised training phase with goal-conditioned RL (Liu et al., 2022) to learn quantities and models that can speed-up PbRL. These models include for instance a prior about possible goals, a set of pre-computed queries and a goal-conditioned policy able to reach target goals and adapt faster to specific user preferences.

For more information, please see Scool's job offer website https://team.inria.fr/scool/job-offers/

Main Acitivies:

Perform a literature review of PbRL and goal-conditioned RL and propose a model of users and user responses in PbRL from a goal-conditioned RL perspective.
Use existing goal-conditioned RL algorithms on the gym-dssat task to learn goal-conditioned policies and a prior over achievable goals.
Propose an approach for using the goal-conditioned policy and the goal prior to speed-up query generation and policy optimization in PbRL. Compare against existing PbRL baselines and evaluate improvements in compute time and reduction in the number of queries.

Optional Activities:

Review the literature of preference elicitation with a focus on Bayesian aproaches such as (Viappiani and Boutilier, 2010).
With the previously learned prior and user response model, develop a Bayesian query generation mechanism and evaluate its performance against existing query selection mechanisms in PbRL.

Where to apply

Website: https://jobs.inria.fr/public/classic/en/offres/2026-09717

Requirements

Languages: FRENCH
Level: Basic

Languages: ENGLISH
Level: Good

Additional Information

Benefits

Subsidized meals
Partial reimbursement of public transport costs
Leave (on a full time annual basis): 7 weeks of annual leave
Possibility of teleworking and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Social security coverage

In accordance with current regulations

Selection process

Please send your CV and cover letter

Website for additional job details: https://jobs.inria.fr/public/classic/en/offres/2026-09717

Work Location(s)

Number of offers available: 1
Company/Institute: Inria
Country: France
City: Villeneuve d'Ascq
Geofield

Contact

City: LE CHESNAY CEDEX
Website: http://www.inria.fr
Street: Domaine de Voluceau - Rocquencourt
Postal Code: 78153

Apply now

Job offer

Research Internship / Priming Preference Elicitation with Goal-Conditioned Reinforcement Learning (F/M)