The GHGA data infrastructure is designed as a federated network: it consists of a central coordination node (GHGA Central) and several GHGA Data Hubs. These are located at seven sites throughout Germany.
The GHGA Data Portal serves as a central point of contact for uploading, downloading and analysing genome data. Behind the scenes, data processing is coordinated by GHGA Central and the GHGA Data Hubs.
The GHGA Data Hubs are operated by universities and research institutions. They are connected to local genome research centres that make genome data available via GHGA. The data hubs act as data processors and provide IT resources required for the operation of the research platform, including storage and computing systems, as well as the staff for maintenance and operation.
Currently, the GHGA Data Hubs are located at seven sites:
Due to the sensitive nature of the archived human genome data, all GHGA Data Hubs are obliged to guarantee the highest level of data security. To this end, the data is encrypted during transmission and storage. The software used follows the latest security concepts such as the Zero Trust Security principle. Additionally, the sites conduct regular security audits and penetration tests and implement the established security standards in an information security management system (ISMS). As consortium leader of GHGA, the DKFZ approves and audits the GHGA Data Hubs with regard to compliance with the defined security standards.
The GHGA Data Hubs also play an important role as part of the federated data platform in the genome sequencing pilot project (MV GenomSeq), which is supported by the Federal Institute for Drugs and Medical Devices (BfArM). As part of this framework, some GHGA Data Hubs operate so-called Genome Data Centres (GRZ).