The evaluation toolkit is available at the following GitHub repository: rgmc2025-handover-track.
Note that the following score is a revisitation of the performance scores provided in the CORSMAL Benchmark.
Teams are ranked based on a 100-points score \(S\) that results from the aggregation (weighted average) across all task configurations. Each task configuration \(i\) has a pre-assigned score \(\omega_i\) and accounts for the delivery location \(d_i\), the total execution time \(t_i\), and the final mass of the delivered container \(m_i\). The task score is computed as follows:
$$ S = \frac{1}{3} \sum_i \left[ \omega_i \Psi_i \frac{\delta(d_i, \rho) + \gamma(t_i, \tau, \eta) + \mu(m_i, \hat{m}_i)}{3}\right],$$
where \( [ \cdot ]\) is the rounding operation to the nearest integer. The score of a configuration is set 0 (not computed) if the container is not delivered within the area or within the maximum allowed time (indicator function \( \Psi_i \in {0,1} \) . The score of all configurations sums up to 300 points and hence we divide by 3 to obtain the 100-point score \(S\).
For the delivery location, we define \(d_i\) as the distance between the position of the centre of the base of the container at the end of the task with respect to the target position (in millimetres or mm). We therefore compute a score based on the following normalisation function: $$ \delta(d_i, \rho) = \begin{cases} \displaystyle 1 - \frac{d_i}{\rho} & \text{if} \quad d_i < \rho \\ \displaystyle 0 & \text{otherwise,} \end{cases} $$ where \( \rho \) is a threshold that defines when an algorithm is unsuccessful for that measure and here represents the radius of the concentric delivery area where the object must be delivered. Specifically, we use as value \( \rho=500 \) mm.
For the total execution time, we define \(t_i\) (in milliseconds or ms) as the time from the moment the subject is instructed to grasp the container to the moment the robot releases the gripper at the delivery location to place the container after the handover (unless the handover failed). We therefore compute a score based on the same normalisation function as before: $$ \gamma(t_i, \tau, \eta) = \begin{cases} \displaystyle 1 - \frac{\max(t_i, \eta) - \eta}{\tau - \eta} & \text{if} \quad t_i < \tau \\ \displaystyle 0 & \text{otherwise,} \end{cases} $$ where \( \tau \) is a threshold that defines when an algorithm is unsuccessful for that measure and here represents the maximum allowed execution time, and \( \eta \) is the minimum expected time to perform a handover. Specifically, we use as values \(\tau=5000\) ms and \(\eta=1000\) ms.
For the final mass of the delivered container, we compute a different normalisation function to account for the measured mass of the filled container - if not empty - before (\(\hat{m}_i\)) and after (\(m_i\)) the execution of the task: $$ \mu(m_i, \hat{m}_i) = \begin{cases} \displaystyle 1 - \frac{|m_i - \hat{m}_i|}{\hat{m}_i} & \text{if} \quad |m_i - \hat{m}_i| < \hat{m}_i \\ \displaystyle 0 & \text{otherwise,} \end{cases} $$ to assess the amount of content that was spilled (in grams, or g) due to an inaccurate robot grasp or unstable delivery phase.