In search of diverse and connected teams

A computational approach to assemble diverse teams based on members' social networks

Authors

Abstract

Previous research shows that teams with diverse background and skills can outperform homogeneous teams. However, people often prefer to work with others who are similar and familiar to them and fail to assemble teams with high diversity levels. We study the problem of team formation considering a pool of individuals who possess different skills and characteristics, and a social network that captures the familiarity among these individuals. The goal is to assign all individuals in diverse teams but based on their social connections, thereby allowing them to preserve a level of familiarity. To address this problem, we implement an algorithm based on the NSGA-II genetic optimization that splits members into well-connected and diverse teams within a social network. It optimizes measures of team communication cost and diversity in O(n^2) time. We tested the algorithm on three empirically collected team formation datasets and against three benchmark algorithms. The experimental results confirm that the proposed algorithm was successful at forming teams that have both diversity in member attributes and previous connections between members. We discuss the benefits of using computational approaches to augment team formation and composition.

Data and Scripts

We have deposited in the GitHub repository https://nusoniclab.github.io/ the following files: (1) the pre-processed and de-identified data used in this study, (2) the Python scripts to pre-process the original datasets, (3) the Python scripts that run the proposed algorithm and benchmark algorithms, including their plots and quantitative metrics.

The first dataset (MyDreamTeam) is administered by the SONIC Research Group, Northwestern University. Because of the sensitive nature of some of the variables collected, Northwestern University Institutional Review Board (IRB)-approved protocol does not permit individual-level data to be made unrestricted and publicly available. Researchers interested in obtaining restricted, anonymized versions of this individual-level data should contact the authors to inquire about obtaining an IRB-approved institutional data sharing agreement. The second dataset used (bibsonomy) is administered by the Knowledge and Data Engineering Group, University of Kassel. This dataset is available under a license agreement, and it can be requested at https://www.kde.cs.uni-kassel.de/wp-content/uploads/bibsonomy/. The third dataset (GHTorrent) is administered by Georgios Gousios. The dataset is freely and publicly available at https://ghtorrent.org/. While we do not maintain these datasets, we provide the scripts to generate the pre-processed datasets used in this study.

Pre-processed Data and Python Scripts

© Copyright 2022. All Rights Reserved.

Made with ‌