AnonymizeBAM: Versatile anonymization of human sequence data for open data sharing

The risks associated with re-identification of human genetic data are severely limiting open data sharing in life sciences. Here, we developed anonymizeBAM, a versatile tool for the anonymization of genetic variant information present in sequence data. Applying anonymizeBAM to single-cell RNA-seq and ATAC-seq datasets confirmed the complete removal of donor-related genetic information. Therefore, the accurate generation of de-identified sequence data will re-enable open sharing in sequencing-based studies for improved transparency, reproducibility, and innovation.

The preprint is available at biorxiv: Ziegenhain et al. 2021

Our implementation is fully accessible in Github, and the tool can easily be installed using: pip install anonymizeBAM

© 2021 Sandberg lab at Karolinska Institutet, Stockholm, Sweden.