# NCBI dbGaP analysis accession: pha002878 # Name: Prostate Cancer Genome-Wide Association Study (GWAS) - Primary Scan with Incidence density sampling, aggressive and non-aggressive versus control (trichotomous), genotype-specific effect model, adjusted analysis # Description: We present results from two distinct analytic approaches. The first scheme is more frequently used in case control studies. The second scheme reported here takes full advantage of the prospective nature of the PLCO cohort and the power from incidence density sampling. Cumulative density sampling For this scheme, which will be more familiar to non-epidemiologists, does not account for the dynamic nature of the cohort. Genotypes of individuals that have been selected as a case in the relevant phenotype case group are counted once as a case and never as a control. Individuals who have been selected several times as controls but had not developed prostate cancer during follow-up are counted only once in the control group. Incidence density sampling Selection of controls from cases identified in a cohort that accounts for the dynamic nature of the cohort including development of disease during follow-up and timing of entry to and exit from follow-up may have more power to detect an association than the single-selection method. The main feature of incidence-density sampling, as used for control selection here, is that controls are selected independently for each case among those who are at risk at the time of the diagnosis of the case; i.e., among those who would become a case in the study had they developed disease at the same time. Inclusion as a control for a given case set is independent of future diagnosis as a case, of selection as a control for other case sets, and of entry and exit times. Thus, individuals may be included as a case and as a control. Genotypes of individuals who have been selected multiples times are taken into account each time he is selected; the man’s covariates that vary with time, such as age are defined differently each time, depending on the characteristics of the case set for which he was selected as a control. The number of association model we fit increased from 4 in Build 1.0 to 32 in Build 2.0, including all combinations from the following four categories: Sampling Cumulative density Whole genome association analysis of main effects for 554,291 SNPs on 1,151 cases diagnosed with tumors and 1,101 controls that were not diagnosed with prostate cancer at the start of the CGEMS project. Incidence density Whole genome association analysis of main effects for 554,291 SNPs on 1,151 cases diagnosed with tumors and 1,156 controls selected using an incidence density sampling strategy. Dependent variable in model Dichotomous A dichotomous logistic model was constructed to contrast the risk of all prostate cancer cases (both non-aggressive and aggressive) against that of all controls (m=2). Polytomous A polytomous logistic model was constructed to separately contrast the risk of non-aggressive and aggressive prostate cancer cases against that of all controls (m=3). Covariate adjustment Unadjusted A 3-by-m contingency table of genotypes by phenotypes was constructed. Adjusted The m phenotypes were regressed on indicator variables for genotype effects, age group at randomization (4 groups), region of recruitment (9 non-reference regions), and a single eigenvector to account for population stratification. Genotype effects Genotypic The p-value was obtained from a score test of each estimated genotype effect with up to 2(m-1) degrees of freedom. (m is the number of phenotype categories) Trend The p-value was obtained from a score test for the estimated trend of the genotype effects with up to m-1 degrees of freedom. Dominant The p-value was obtained from a score test for the minor homozygote + heterozygote versus major homozygote effect with up to m-1 degrees of freedom. Recessive The p-value was obtained # Method: The GLU assoc.logit1 module (http://code.google.com/p/glu-genetics/) was used to fit all models and to perform score tests of all genetic terms for association with phenotype. For each set of duplicate assays for a sample at each locus, we construct a set of non-missing genotypes observed. If that set is of size one, then that is the consensus genotype for that individual at that locus. Otherwise, the genotype is set to missing. Genotype counts provided for hemizygous genotypes from male non-pseudo-autosomal X chromosomal region are treated as homozygous ones. # Human genome build: 37 # dbSNP build: 132 # SNP ID: Marker accession # Chr ID: chromosome # Chr Position: chromosome position # P-value: testing p-value # pHWE (case): p-value from HWE testing in cases # pHWE (case 2): p-value from HWE testing in cases 2 # pHWE (control): p-value from HWE testing in controls # Call rate (case): Call rate for cases # Call rate (case 2): Call rate for cases 2 # Call rate (control): Call rate for controls # CI low: the lower limit of 95% confidence interval # CI high: the higher limit of 95% confidence interval