########################### # euL1db data version 1.00 # Date : 05-10-14 # Study Table ########################### #Study_id Description PMID Original_assembly Mapped_assembly Source_Insertions Source_validation Additional_curation Citation Methods Awano2010 Contemporary retrotransposition of a novel non-coding gene induces exon-skipping in dystrophin mRNA 20827276 . hg19_NCBI37 . . . Awano, H., Malueka, R.G., Yagi, M., Okizuka, Y., Takeshima, Y. and Matsuo, M. (2010) Contemporary retrotransposition of a novel non-coding gene induces exon-skipping in dystrophin mRNA. J Hum Genet, 55, 785-790. 11 Baillie2011 In this study authors applied a high-throughput method to identify numerous L1, Alu and SVA germline mutations, as well as putative somatic L1 insertions, in the hippocampus and caudate nucleus of three individuals. Surprisingly, authors also found somatic Alu insertions and SVA insertions. There results demonstrate that retrotransposons mobilize to protein-coding genes differentially expressed and active in the brain. Thus, somatic genome mosaicism driven by retrotransposition may reshape the genetic circuitry that underpins normal and abnormal neurobiological processes. 22037309 hg19_NCBI37 hg19_NCBI37 Tables S4, S5 Tables S6, S7 The Baillie et al. publication {Baillie 2011} contains two distinct sets of data obtained by RC-seq: (i) germline polymorphic retrotransposon insertions discovered in a pool of genomic DNA isolated from the blood of several individuals; and (ii) putative somatic and germline retrotransposon insertions discovered in genomic DNA isolated from different brain regions of three individuals. Because many of the putative somatic insertions are only supported by a single sequencing read, we only kept in euL1db somatic insertions that have been validated by PCR and/or Sanger sequencing. In contrast, germline insertions were more robustly identified (several sequencing reads in multiple independent libraries) and were kept, whether further tested by PCR or not. Only L1HS elements were included (L1-Ta and L1-Pre-Ta). Baillie, J.K., Barnett, M.W., Upton, K.R., Gerhardt, D.J., Richmond, T.A., De Sapio, F., Brennan, P.M., Rizzu, P., Smith, S., et al. (2011) Somatic retrotransposition alters the genetic landscape of the human brain. Nature, 479, 534-537. 1 Beck2010 In this study a fosmid-based, paired-end DNA sequencing strategy was used to identify 68 full-length L1s that are differentially present among individuals but are absent from the human genome reference sequence. The majority of these L1s were highly active in a cultured cell retrotransposition assay. Genotyping 26 elements revealed that two L1s are only found in Africa and that two more are absent from the H952 subset of the Human Genome Diversity Panel. Therefore, these results suggest that hot L1s are more abundant in the human population than previously appreciated, and that ongoing L1 retrotransposition continues to be a major source of interindividual genetic variation. 20602998 hg18_NCBI36 hg19_NCBI37 Table S2 Table S2 The original Beck et al. 2010 publication {Beck 2010} only reports a short DNA sequence corresponding to the preintegration site, and its chromosome number and cytogenetic band. We remapped the preintegration site sequence to reference genome hg19 with BWA {Li 2010} to obtain the precise genomic coordinates. For sequences with multiple possible positions (MAPQ=0), we selected the position consistent with the reported cytogenetic band. Beck, C.R., Collier, P., Macfarlane, C., Malig, M., Kidd, J.M., Eichler, E.E., Badge, R.M. and Moran, J.V. (2010) LINE-1 Retrotransposition Activity in Human Genomes. Cell, 141, 1159-1170. 3 Bernard2009 Exon deletions and intragenic insertions are not rare in ataxia with oculomotor apraxia 2. 19744353 . hg19_NCBI37 . . . Bernard, V., Minnerop, M., Bürk, K., Kreuz, F., Gillessen-Kaesbach, G. and Zuhlke, C. (2009) Exon deletions and intragenic insertions are not rare in ataxia with oculomotor apraxia 2. BMC Med Genet, 10, 87. 11 Brouha2002 Evidence consistent with human L1 retrotransposition in maternal meiosis I. 12094329 . hg19_NCBI37 . . . Brouha, B., Meischl, C., Ostertag, E., de Boer, M., Zhang, Y., Neijens, H., Roos, D. and Kazazian, H.H. (2002) Evidence consistent with human L1 retrotransposition in maternal meiosis I. Am J Hum Genet, 71, 327-336. 11 Evrony2012 In this study genome-wide L1 insertion profiling of 300 single neurons from cerebral cortex and caudate nucleus of three normal individuals was done, recovering >80% of germline insertions from single neurons. While authors found the somatic L1 insertions, they estimate <0.6 unique somatic insertions per neuron, and most neurons lack detectable somatic insertions, suggesting that L1 is not a major generator of neuronal diversity in cortex and caudate. Authors then genotyped single cortical cells to characterize the mosaicism of a somatic AKT3 mutation identified in a child with hemimegalencephaly. Single-neuron sequencing allows systematic assessment of genomic diversity in the human brain. 23101622 hg19_NCBI37 hg19_NCBI37 Table S3 Table S3 none Evrony, G.D., Cai, X., Lee, E., Hills, L.B., Elhosary, P.C., Lehmann, H.S., Parker, J.J., Atabay, K.D., Gilmore, E.C., et al. (2012) Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Cell, 151, 483-496. 5 Ewing2010 In this study authors used high-throughput sequencing, they devised a technique to determine the insertion sites of virtually all members of the human-specific L1 retrotransposon family in any human genome. Using diagnostic nucleotides, they were able to locate the L1Hs copies corresponding specifically to the pre-Ta, Ta-0, and Ta-1 L1Hs subfamilies, with over 90% of sequenced reads corresponding to human-specific elements. 20488934 hg18_NCBI37 hg19_NCBI37 Table S1 . . Ewing, A.D. and Kazazian, H.H. (2010) High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res, 20, 1262-1270. 5 Ewing2011 In this study authors present an evidence for L1 insertions across all studies to date that are not represented in the reference human genome assembly, many of which appear to be specific to populations or groups of populations, particularly Africans. Additionally, a cross-comparison of several studies showed that, on average, 27% of surveyed non-reference insertions are present in only one study, indicating the low frequency of many RIPs. 20980553 hg18_NCBI37 hg19_NCBI37 Table S4 . . Ewing, A.D. and Kazazian, H.H. (2011) Whole-genome resequencing allows detection of many rare LINE-1 insertion alleles in humans. Genome Res, 21, 985-990. 8 Helman2014 Whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project 24823667 hg19_NCBI37 hg19_NCBI37 . . . Helman, E., Lawrence, M.S., Stewart, C., Sougnez, C., Getz, G. and Meyerson, M. (2014) Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res, 24, 1053-1063. 6;7 Holmes1994 A new retrotransposable human L1 element from the LRE2 locus on chromosome 1qproduces a chimaeric insertion. 7920631 . hg19_NCBI37 . . . Holmes, S.E., Dombroski, B.A., Krebs, C.M., Boehm, C.D. and Kazazian, H.H. (1994) A new retrotransposable human L1 element from the LRE2 locus on chromosome 1q produces a chimaeric insertion. Nat Genet, 7, 143-148. 11 Iskow2010 In this study authors used a new technology for detecting young retrotransposon insertions and demonstrated that such insertions indeed are abundant in human populations. It was found that new somatic L1 insertions occur at high frequencies in human lung cancer genomes. Genome-wide analysis suggested that altered DNA methylation may be responsible for the high levels of L1 mobilization observed in these tumors. This data indicated that transposon-mediated mutagenesis is extensive in human genomes and is likely to have a major impact on human biology and diseases. 20603005 hg18_NCBI37 hg19_NCBI37 Tables S1, S2 Tables S1, S2 The original Iskow et al. 2010 publication {Iskow 2010a} provides the DNA sequence downstream of each putative L1 insertions obtained by 454 sequencing (in opposite orientation relative to L1) and the genomic coordinates of the BLAT best hit after mapping them on the hg18 reference genome. To obtain the strand information of these putative insertions, we remapped the published DNA sequences (1389 in total) to hg19 using BWA. In a first step, we used bwa mem (bwa mem with options -t4 -M). We recovered 980 uniquely mapped positions consistent with the original BLAT analysis; 285 unmapped sequences; and 124 sequences mapping to multiple positions or corresponding to chimeric sequence. The latter were discarded and not included in euL1db. In a second step, unmapped sequences from the first step were mapped again using an algorithm with increased sensitivity for short sequences (bwa aln with options -l12 -o2 / bwa samse with –n 10 option), allowing us to recover an additional set of 116 uniquely mapping sequences consistent with the original BLAT coordinates. In total, 1095 high-confidence insertions out of 1389 sequences from the 454 Iskow experiments have been included in euL1db. The published table reporting the results of ABI Sanger sequencing experiments includes the coordinates of the flanking region sequenced but not the DNA sequence itself, nor its orientation. We used the middle of this segment as a coordinate for the insertion point and we could not deduce strand information. Iskow, R.C., McCabe, M.T., Mills, R.E., Torene, S., Pittard, W.S., Neuwald, A.F., Van Meir, E.G., Vertino, P.M. and Devine, S.E. (2010) Natural mutagenesis of human genomes by endogenous retrotransposons. Cell, 141, 1253-1261. 4 Kazazian1988 Haemophilia A resulting from de novo insertion of L1 sequences represents anovel mechanism for mutation in man. 2831458 . hg19_NCBI37 . . . Kazazian, H.H., Wong, C., Youssoufian, H., Scott, A.F., Phillips, D.G. and Antonarakis, S.E. (1988) Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature, 332, 164-166. 11 Kondo-lida1999 Novel mutations and genotype-phenotype relationships in 107 families withFukuyama-type congenital muscular dystrophy (FCMD). 10545611 . hg19_NCBI37 . . . Kondo-Iida, E., Kobayashi, K., Watanabe, M., Sasaki, J., Kumagai, T., Koide, H., Saito, K., Osawa, M., Nakamura, Y. and Toda, T. (1999) Novel mutations and genotype-phenotype relationships in 107 families with Fukuyama-type congenital muscular dystrophy (FCMD). Hum Mol Genet, 8, 2303-2309. 11 Lanikova2013 A novel mechanism of b-thalassemia. The insertion of L1 retrotransposable element into β globin IVSII. 23878091 . hg19_NCBI37 Sup Fig 1 Sup Fig 1 ? Lanikova, L., Kucerova, J., Indrak, K., Divoka, M., Issa, J.P., Papayannopoulou, T., Prchal, J.T. and Divoky, V. (2013) β-Thalassemia due to intronic LINE-1 insertion in the β-globin gene (HBB): molecular mechanisms underlying reduced transcript levels of the β-globin(L1) allele. Hum Mutat, 34, 1361-1365. 11 Lee2012 In this study single-nucleotide resolution analysis of TE insertions was performed in 43 high-coverage whole-genome sequencing data sets from five cancer types. Authors identified 194 high-confidence somatic TE insertions, as well as thousands of polymorphic TE insertions in matched normal genomes. Somatic insertions were present in epithelial tumors but not in blood or brain cancers. Somatic L1 insertions tend to occur in genes that are commonly mutated in cancer, disrupt the expression of the target genes, and are biased toward regions of cancer- specific DNA hypomethylation, highlighting their potential impact in tumorigenesis. 22745252 hg18_NCBI36 hg19_NCBI37 Tables S2, S6, S8 Tables S3, S4, S7 Filtered for L1HS Lee, E., Iskow, R., Yang, L., Gokcumen, O., Haseley, P., Luquette, L.J., Lohr, J.G., Harris, C.C., Ding, L., et al. (2012) Landscape of somatic retrotransposition in human cancers. Science, 337, 967-971. 9 Li2001 Frequency of recent retrotransposition events in the human factor IX gene. 11385709 . hg19_NCBI37 . . . Li, X., Scaringe, W.A., Hill, K.A., Roberts, S., Mengos, A., Careri, D., Pinto, M.T., Kasper, C.K. and Sommer, S.S. (2001) Frequency of recent retrotransposition events in the human factor IX gene. Hum Mutat, 17, 511-519. 11 Martinez-Garay2003 Intronic L1 insertion and F268S, novel mutations in RPS6KA3 (RSK2) causingCoffin-Lowry syndrome. 14986828 . hg19_NCBI37 . . . Martinez-Garay, I., Ballesta, M.J., Oltra, S., Orellana, C., Palomeque, A., Molto, M.D., Prieto, F. and Martinez, F. (2003) Intronic L1 insertion and F268S, novel mutations in RPS6KA3 (RSK2) causing Coffin-Lowry syndrome. Clin Genet, 64, 491-496. 11 Meischl2000 A new exon created by intronic insertion of a rearranged LINE-1 element as thecause of chronic granulomatous disease. 10980575 . hg19_NCBI37 . . . Meischl, C., Boer, M., Ahlin, A. and Roos, D. (2000) A new exon created by intronic insertion of a rearranged LINE-1 element as the cause of chronic granulomatous disease. Eur J Hum Genet, 8, 697-703. 11 Miki1992 Disruption of the APC gene by a retrotransposal insertion of L1 sequence in acolon cancer. 1310068 . hg19_NCBI37 . . . Miki, Y., Nishisho, I., Horii, A., Miyoshi, Y., Utsunomiya, J., Kinzler, K.W., Vogelstein, B. and Nakamura, Y. (1992) Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res, 52, 643-645. 11 Mine2007 A large genomic deletion in the PDHX gene caused by the retrotranspositional insertion of a full-length LINE-1 element. 17152059 . hg19_NCBI37 . . . Mine, M., Chen, J.M., Brivet, M., Desguerre, I., Marchant, D., de Lonlay, P., Bernard, A., Ferec, C., Abitbol, M., et al. (2007) A large genomic deletion in the PDHX gene caused by the retrotranspositional insertion of a full-length LINE-1 element. Hum Mutat, 28, 137-142. 11 Morisada2010 Branchio-oto-renal syndrome caused by partial EYA1 deletion due to LINE-1 insertion. 20130917 . hg19_NCBI37 . . . Morisada, N., Rendtorff, N.D., Nozu, K., Morishita, T., Miyakawa, T., Matsumoto, T., Hisano, S., Iijima, K., Tranebjaerg, L., et al. (2010) Branchio-oto-renal syndrome caused by partial EYA1 deletion due to LINE-1 insertion. Pediatr Nephrol, 25, 1343-1348. 11 Mukherjee2004 Molecular pathology of haemophilia B: identification of five novel mutationsincluding a LINE 1 insertion in Indian patients. 15086324 . hg19_NCBI37 . . . Mukherjee, S., Mukhopadhyay, A., Banerjee, D., Chandak, G.R. and Ray, K. (2004) Molecular pathology of haemophilia B: identification of five novel mutations including a LINE 1 insertion in Indian patients. Haemophilia, 10, 259-263. 11 Musova2006 A novel insertion of a rearranged L1 element in exon 44 of the dystrophin gene: Further evidence for possible bias in retroposon integration. 16808900 . hg19_NCBI37 . . . Musova, Z., Hedvicakova, P., Mohrmann, M., Tesarova, M., Krepelova, A., Zeman, J. and Sedlacek, Z. (2006) A novel insertion of a rearranged L1 element in exon 44 of the dystrophin gene: further evidence for possible bias in retroposon integration. Biochem Biophys Res Commun, 347, 145-149. 11 Narita1993 Insertion of a 5' truncated L1 element into the 3' end of exon 44 of thedystrophin gene resulted in skipping of the exon during splicing in a case ofDuchenne muscular dystrophy. 8387534 . hg19_NCBI37 . . . Narita, N., Nishio, H., Kitoh, Y., Ishikawa, Y., Ishikawa, Y., Minami, R., Nakamura, H. and Matsuo, M. (1993) Insertion of a 5' truncated L1 element into the 3' end of exon 44 of the dystrophin gene resulted in skipping of the exon during splicing in a case of Duchenne muscular dystrophy. J Clin Invest, 91, 1862-1867. 11 Samuelov2011 An exceptional mutational event leading to Chanarin-Dorfman syndrome in a large consanguineous family. 21332462 . hg19_NCBI37 . . . Samuelov, L., Fuchs-Telem, D., Sarig, O. and Sprecher, E. (2011) An exceptional mutational event leading to Chanarin-Dorfman syndrome in a large consanguineous family. Br J Dermatol, 164, 1390-1392. 11 Schwahn1998 Positional cloning of the gene for X-linked retinitis pigmentosa 2. 9697692 . hg19_NCBI37 . . . Schwahn, U., Lenzner, S., Dong, J., Feil, S., Hinzmann, B., van Duijnhoven, G., Kirschner, R., Hemberger, M., Bergen, A.A., et al. (1998) Positional cloning of the gene for X-linked retinitis pigmentosa 2. Nat Genet, 19, 327-332. 11 Shukla2013 In this study enhanced retrotransposon capture sequencing (RC-seq) was applied to 19 hepatocellular carcinoma (HCC) genomes and elucidated two archetypal L1-mediated mechanisms enabling tumorigenesis. In the first example, 4/19 (21.1%) donors presented germline retrotransposition events in the tumor suppressor mutated in colorectal cancers (MCC). MCC expression was ablated in each case, enabling oncogenic b-cate- nin/Wnt signaling. In the second example, suppression of tumorigenicity 18 (ST18) was activated by a tumor-specific L1 insertion. Experimental assays confirmed that the L1 interrupted a negative feedback loop by blocking ST18 repression of its enhancer. ST18 was also frequently amplified in HCC nodules from Mdr2-/- mice, supporting its assignment as a candidate liver oncogene. These proof-of-principle results substantiate L1-mediated retrotransposition as an important etiological factor in HCC. 23540693 hg19_NCBI37 hg19_NCBI37 Table S3 Tables S5, S6 Filtered for L1HS Shukla, R., Upton, K.R., Munoz-Lopez, M., Gerhardt, D.J., Fisher, M.E., Nguyen, T., Brennan, P.M., Baillie, J.K., Collino, A., et al. (2013) Endogenous retrotransposition activates oncogenic pathways in hepatocellular carcinoma. Cell, 153, 101-111. 1 Solyom2012 The purpose of this study was to ascertain the locations of somatic LINE-1 retrotransposition events in human colon tumor samples by pooling multiple tumor samples from different patients and performing a targeted resequencing assay (Ewing and Kazazian, Genome Research 2010) to sequence the 3 prime flanking regions of all insertions in the pooled sample. The result was compared to the result of applying the same method to pooled normal samples from the same patients as were used in the pooled tumor sample and selecting sites that show an insertion in the tumor but not in the normal tissue and do not correspond to any known non-reference LINE-1 insertion allele. The selected sites were then validated by site-specific PCR and capillary sequencing to confirm that they represent LINE-1 insertions and to obtain breakpoint sequences. 22968929 hg19_NCBI37 hg19_NCBI37 Tables a, S1A, S1B, S2 Table S3 Solyom et al. article {Solyom 2012} reports insertions found in multiple colon cancer samples or their matched normal tissues. This was achieved by a combination of L1-seq and RC-seq approaches. For the L1-seq approach (their Sup. Tab. 1A and 1B), we kept as germline insertions, those found in both normal and tumor tissues and, as somatic insertions, those found only in one tissue and validated by PCR and/or Sanger sequencing (their Sup. Tab. 3). For the RC-seq data (their Sup. Table 2), we only kept those tagged as "high confidence" and we further added/removed insertions validated/invalidated by PCR and/or Sanger sequencing data (their Sup. Tab. 3). Here as well, only L1HS elements were included. Solyom, S., Ewing, A.D., Rahrmann, E.P., Doucet, T., Nelson, H.H., Burns, M.B., Harris, R.S., Sigmon, D.F., Casella, A., et al. (2012) Extensive somatic L1 retrotransposition in colorectal tumors. Genome Res, 22, 2328-2338. 5;1 Stewart2011 In this study two whole-genome datasets produced by the 1000GP - the low coverage pilot dataset consisting of 179 individuals sequenced to 1-3X coverage and the trio pilot dataset consisting of two family trios sequenced to high- 15-40X coverage were analyzed. These datasets included samples from three continental population groups- 60 samples of European origin (CEU)- 59 African (YRI)- and 60 Asian samples from Japan and China (CHBJPT). The two pilot datasets were produced and analyzed for complementary purposes. The trio dataset was used for assessing detection methods in high coverage samples and for the purpose of finding candidate de novo insertions in the trio children. The high coverage dataset was used to assess population properties of MEI. Both datasets contributed to the overall catalog of events. 21876680 hg18_NCBI36 hg19_NCBI37 Table S1 . Filtered for L1HS Stewart, C., Kural, D., Stromberg, M.P., Walker, J.A., Konkel, M.K., Stutz, A.M., Urban, A.E., Grubert, F., Lam, H.Y., et al. (2011) A comprehensive map of mobile element insertion polymorphisms in humans. PLoS Genet, 7, e1002236. 12 Van_den_Hurk2003 Novel types of mutation in the choroideremia ( CHM) gene: a full-length L1insertion and an intronic mutation activating a cryptic exon. 12827496 . hg19_NCBI37 . . . van den Hurk, J.A., van de Pol, D.J., Wissinger, B., van Driel, M.A., Hoefsloot, L.H., de Wijs, I.J., van den Born, L.I., Heckenlively, J.R., Brunner, H.G., et al. (2003) Novel types of mutation in the choroideremia ( CHM) gene: a full-length L1 insertion and an intronic mutation activating a cryptic exon. Hum Genet, 113, 268-275. 11 Wimmer2011 The NF1 gene contains hotspots for L1 endonuclease-dependent de novo insertion. 22125493 . hg19_NCBI37 . . . Wimmer, K., Callens, T., Wernstedt, A. and Messiaen, L. (2011) The NF1 gene contains hotspots for L1 endonuclease-dependent de novo insertion. PLoS Genet, 7, e1002371. 11 Yoshida1998 Insertional mutation by transposable element, L1, in the DMD gene results inX-linked dilated cardiomyopathy. 9618170 . hg19_NCBI37 . . . Yoshida, K., Nakamura, A., Yazaki, M., Ikeda, S. and Takeda, S. (1998) Insertional mutation by transposable element, L1, in the DMD gene results in X-linked dilated cardiomyopathy. Hum Mol Genet, 7, 1129-1132. 11