AnonymizeBAM: Versatile anonymization of human sequence data for open data sharing
The risks associated with re-identification of human genetic data are severely limiting open data sharing in life sciences. Here, we developed anonymizeBAM, a versatile tool for the anonymization of genetic variant information present in sequence data. Applying anonymizeBAM to single-cell RNA-seq and ATAC-seq datasets confirmed the complete removal of donor-related genetic information. Therefore, the accurate generation of de-identified sequence data will re-enable open sharing in sequencing-based studies for improved transparency, reproducibility, and innovation.
The preprint is available at biorxiv: Ziegenhain et al. 2021
Our implementation is fully accessible in Github, and the tool can easily be installed using: pip install anonymizeBAM