Click on a Dataset ID in the table below to learn more, and to find
                   out who to contact about access to these data
                
                 
                   
                     Dataset ID 
                     Description 
                     Technology 
                     Samples 
                    
                  
                 
                   
                     
                       
        		               EGAD50000000276 
        	              
                       
                    		 The synthetic genomes have been created trying to mimic real cancer data of 4 patients (Named 185,186,187 and 188). Mutations are based on real CRC patients from the PCAWG dataset. For each patient, two tumor samples at different time points and one healthy sample have been simulated. The cancer intra-tumor heterogeneity and evolution in the patients is depicted by simulating reads from tumor subclones separately and then mixing them according to their clonal proportions in each sample. For rapid use and transfer only selected chromosomes have been generated for each patient.
Chromosomes per patient:
-185: chr4, chr5, chr7, chr17
-186: chr1, chr7, chr12, chr17
-187: chr1, chr2, chr5, chr12, chr17
-188: chr2, chr5, chr12, chr13, chr17
Worflows used to create BAM/BAI, VCF and MAF files from FASTQ (Alignment with GRCh38):
- https://usegalaxy.eu/published/workflow?id=2c3d05023c02113e
- https://usegalaxy.eu/published/workflow?id=1da86d74f8535f4e 
        	              
                       
        		             
                           
                             unspecified 
                           
                         
                        
                       8 
                      
                   
                     
                       
        		               EGAD50000000564 
        	              
                       
                    		 This dataset contains 10 tumor and normal pairs synthetic WGS data of colorectal cancer that were simulated in a standard format of Illumina paired-end reads. The NEAT read simulator (version 3.0, https://github.com/zstephens/neat-genreads) was utilized to synthetize these 10 pairs of tumor and normal WGS data. In the procedure of data generation, simulated parameters (i.e., sequencing error statistics, read fragment length distribution and GC% coverage bias) were learned from data models provided by NEAT. The average sequencing depth for tumor and normal samples aimed to reach around 110X and 60X, respectively.
 
For generation of synthetic normal WGS data per each sample, a germline variant profile from a real patient was down-sampled randomly, representing 50% germline variants of a given patient. These were mixed with the other 50% in silico germline variants that were modelled randomly using an average mutation rate (0.001), finally constituting a full germline profile for normal synthetic WGS data.
 
For generation of synthetic tumor WGS data per each sample, a pre-defined somatic short variant profile (SNVs+Indels) learnt from a real CRC patient was added to the germline variant profile used for creating the normal synthetic WGS data of the same patient, consisting of the variants for tumor sample. Neither copy number profile nor structural variation profile was introduced into the tumor synthetic WGS data. Tumor content and ploidy were assumed to be 100% and 2, respectively.
 
For mapping/variant detection,  the Sarek pipeline v3.1.2 (https://nf-co.re/sarek/3.1.2) was used, specifically:
1. BWA v0.7.17-r1188 for read mapping
2. GATK v4.3.0.0 for pre-processing BAM file (including markduplicates and recalibration).
2. Mutect2 (GATK v4.3.0.0) for somatic variant calling
3. Strelka2 v2.9.10 for germline and somatic variant calling
 
 
Metadata information of 10 CRC patients used for the generation of synthetic normal and tumor WGS data:
 
Patient_id            Tumor_barcode Normal_barcode              Age        Sex         Tissue    Cancer 
SIM007 SIM007_T            SIM007_N           71           F              Rectal    Primary CRC      
SIM008 SIM008_T            SIM008_N           45           F              Colon     Neuroendocrine  Metastasis CRC              
SIM010 SIM010_T            SIM010_N           62           M            Colon     Metastasis CRC 
SIM011 SIM011_T            SIM011_N           55           M            Colon     Neuroendocrine Metastasis CRC               
SIM012 SIM012_T            SIM012_N           57           M            Rectal    Metastasis CRC 
SIM013 SIM013_T            SIM013_N           69           M            Colon     Metastasis CRC 
SIM014 SIM014_T            SIM014_N           68           M            Colon     Neuroendocrine primary CRC     
SIM015 SIM015_T            SIM015_N           58           F              Colon     Primary CRC      
SIM016 SIM016_T            SIM016_N           49           M            Colon/Rectal      Primary CRC      
SIM017 SIM017_T            SIM017_N           78           M            Colon     Neuroendocrine primary CRC      
        	              
                       
        		             
                           
                             unspecified 
                           
                         
                        
                       20