TY - JOUR
T1 - COSGAP
T2 - COntainerized Statistical Genetics Analysis Pipelines
AU - Akdeniz, Bayram Cevdet
AU - Frei, Oleksandr
AU - Hagen, Espen
AU - Filiz, Tahir Tekin
AU - Karthikeyan, Sandeep
AU - Pasman, Joëlle
AU - Jangmo, Andreas
AU - Bergstedt, Jacob
AU - Shorter, John R.
AU - Zetterberg, Richard
AU - Meijsen, Joeri
AU - Sønderby, Ida Elken
AU - Buil, Alfonso
AU - Tesli, Martin
AU - Lu, Yi
AU - Sullivan, Patrick
AU - Andreassen, Ole A.
AU - Hovig, Eivind
N1 - Funding Information:
This work is supported by the European Union\u2019s Horizon 2020 Research and Innovation Programme [Grant No. 847776; CoMorMent] and the US National Institutes of Mental Health [R01MH123724]; South-Eastern Norway Regional Health Authority [#2022073 to B.C.A. and O.F., #2020060 to I.E.S.]; NordForsk to the NeIC Heilsa \u201CTryggvedottir\u201D [#101021 to B.C.A. and E.H.] Research Council of Norway [#324499 to O.F., #. 296030 to A.J.] European Union\u2019s Horizon 2020 Research and Innovation Programme [RealMent, Grant No. 964874 to E.H., T.T.F.] the US National Institutes of Mental Health [#R01MH123724 to J.P. and J.M.]; and the European Research Council [grant agreement ID 101042183 to Y.L.].
Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press.
PY - 2024
Y1 - 2024
N2 - The collection and analysis of sensitive data in large-scale consortia for statistical genetics is hampered by multiple challenges, due to their non-shareable nature. Time-consuming issues in installing software frequently arise due to different operating systems, software dependencies, and limited internet access. For federated analysis across sites, it can be challenging to resolve different problems, including format requirements, data wrangling, setting up analysis on high-performance computing (HPC) facilities, etc. Easier, more standardized, automated protocols and pipelines can be solutions to overcome these issues. We have developed one such solution for statistical genetic data analysis using software container technologies. This solution, named COSGAP: "COntainerized Statistical Genetics Analysis Pipelines,"consists of already established software tools placed into Singularity containers, alongside corresponding code and instructions on how to perform statistical genetic analyses, such as genome-wide association studies, polygenic scoring, LD score regression, Gaussian Mixture Models, and gene-set analysis. Using provided helper scripts written in Python, users can obtain auto-generated scripts to conduct the desired analysis either on HPC facilities or on a personal computer. COSGAP is actively being applied by users from different countries and projects to conduct genetic data analyses without spending much effort on software installation, converting data formats, and other technical requirements.
AB - The collection and analysis of sensitive data in large-scale consortia for statistical genetics is hampered by multiple challenges, due to their non-shareable nature. Time-consuming issues in installing software frequently arise due to different operating systems, software dependencies, and limited internet access. For federated analysis across sites, it can be challenging to resolve different problems, including format requirements, data wrangling, setting up analysis on high-performance computing (HPC) facilities, etc. Easier, more standardized, automated protocols and pipelines can be solutions to overcome these issues. We have developed one such solution for statistical genetic data analysis using software container technologies. This solution, named COSGAP: "COntainerized Statistical Genetics Analysis Pipelines,"consists of already established software tools placed into Singularity containers, alongside corresponding code and instructions on how to perform statistical genetic analyses, such as genome-wide association studies, polygenic scoring, LD score regression, Gaussian Mixture Models, and gene-set analysis. Using provided helper scripts written in Python, users can obtain auto-generated scripts to conduct the desired analysis either on HPC facilities or on a personal computer. COSGAP is actively being applied by users from different countries and projects to conduct genetic data analyses without spending much effort on software installation, converting data formats, and other technical requirements.
UR - http://www.scopus.com/inward/record.url?scp=85194950028&partnerID=8YFLogxK
U2 - 10.1093/bioadv/vbae067
DO - 10.1093/bioadv/vbae067
M3 - Journal article
AN - SCOPUS:85194950028
SN - 2635-0041
VL - 4
JO - Bioinformatics Advances
JF - Bioinformatics Advances
IS - 1
M1 - vbae067
ER -