Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking

Abstract

Although lung cancer risk among smokers is dependent on smoking dose, it remains unknown if this increased risk reflects an increased rate of somatic mutation accumulation in normal lung cells. Here, we applied single-cell whole-genome sequencing of proximal bronchial basal cells from 33 participants aged between 11 and 86 years with smoking histories varying from never-smoking to 116 pack-years. We found an increase in the frequency of single-nucleotide variants and small insertions and deletions with chronological age in never-smokers, with mutation frequencies significantly elevated among smokers. When plotted against smoking pack-years, mutations followed the linear increase in cancer risk until about 23 pack-years, after which no further increase in mutation frequency was observed, pointing toward individual selection for mutation avoidance. Known lung cancer-defined mutation signatures tracked with both age and smoking. No significant enrichment for somatic mutations in lung cancer driver genes was observed.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Mutation accumulation in PBBCs with age in never-smokers.
Fig. 2: Mutation accumulation in PBBCs of smokers.
Fig. 3: Cancer driver mutations in normal PBBC nuclei.
Fig. 4: Mutational signatures and smoking.

Similar content being viewed by others

Data availability

WGS data are available at dbGap (accession number: phs002758.v1.p1) and can be accessed at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002758.v1.p1. Somatic mutation calls, including single-base substitutions and indels from all 134 samples, have been deposited to SomaMutDB at http://vijglab.einsteinmed.org/static/vcf/lung_Huang.et.al.Naturegenetics.tar.gz.

Code availability

Sequencing reads were filtered to remove adapter and low-quality reads by Trim Galore (version 0.4.1), mapped to the human reference genome (GRCh37, including decoy contigs) using BWA (mem; version 0.7.10), with PCR duplication removed by Samtools (version 0.1.19) Realignment of reads and recalibrations of base quality scores were performed by GATK (version 3.5.0). Somatic mutations were called using SCcaller (version 1.2; https://github.com/biosinodx/SCcaller). MASS (version 7.3-53) and lme4 (1.1-26) was employed for statistical analysis in R (4.0.3 GUI 1,73 Catalina build 7892). Custom codes for statistical analysis, permutation analysis, are available through GitHub (https://github.com/Zhenqiu85/Lung_Smoke_analysis).

References

  1. Flanders, W. D. et al. Lung cancer mortality in relation to age, duration of smoking, and daily cigarette consumption. Cancer Res. 63, 6556–6562 (2003).

    CAS  PubMed  Google Scholar 

  2. Thurston, S. W., Liu, G., Miller, D. P. & Christiani, D. C. Modeling lung cancer risk in case-control studies using a new dose metric of smoking. Cancer Epidemiol. Biomark. Prev. 14, 2296–2302 (2005).

    Article  Google Scholar 

  3. Alberg, A. J., Brock, M. V., Ford, J. G., Samet, J. M. & Spivack, S. D. Epidemiology of lung cancer: diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines. Chest 143, e1S–e29S (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Spivack, S. D., Fasco, M. J., Walker, V. E. & Kaminsky, L. S. The molecular epidemiology of lung cancer. Crit. Rev. Toxicol. 27, 319–365 (1997).

    Article  CAS  PubMed  Google Scholar 

  5. Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Li, H. et al. Frequency of well-identified oncogenic driver mutations in lung adenocarcinoma of smokers varies with histological subtypes and graduated smoking dose. Lung Cancer 79, 8–13 (2013).

    Article  PubMed  Google Scholar 

  7. Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Burns, D. M. Cigarette smoking among the elderly: disease consequences and the benefits of cessation. Am. J. Health Promot. 14, 357–361 (2000).

    Article  CAS  PubMed  Google Scholar 

  9. Crispo, A. et al. The cumulative risk of lung cancer among current, ex- and never-smokers in European men. Br. J. Cancer 91, 1280–1286 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Alexandrov, L. B. et al. Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. George, J. et al. Comprehensive genomic profiles of small cell lung cancer. Nature 524, 47–53 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Shaykhiev, R. et al. Airway basal cells of healthy smokers express an embryonic stem cell signature relevant to lung cancer. Stem Cells 31, 1992–2002 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. McQualter, J. L., Yuen, K., Williams, B. & Bertoncello, I. Evidence of an epithelial stem/progenitor cell hierarchy in the adult mouse lung. Proc. Natl Acad. Sci. USA 107, 1414–1419 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Fukui, T. et al. Lung adenocarcinoma subtypes based on expression of human airway basal cell genes. Eur. Respir. J. 42, 1332–1344 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Rock, J. R. et al. Basal cells as stem cells of the mouse trachea and human airway epithelium. Proc. Natl Acad. Sci. USA 106, 12771–12775 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium.Nature 578, 266–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Brazhnik, K. et al. Single-cell analysis reveals different age-related somatic mutation profiles between stem and differentiated cells in human liver. Sci. Adv. 6, eaax2659 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    Article  CAS  PubMed  Google Scholar 

  20. Zhang, L. et al. Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan. Proc. Natl Acad. Sci. USA 116, 9014–9019 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Remen, T., Pintos, J., Abrahamowicz, M. & Siemiatycki, J. Risk of lung cancer in relation to various metrics of smoking history: a case-control study in Montreal 11 Medical and Health Sciences 1117 Public Health and Health Services. BMC Cancer 18, 1–12 (2018).

    Google Scholar 

  22. Siemiatycki, J. Synthesizing the lifetime history of smoking.Cancer Epidemiol. Biomarkers Prev. 14, 2294–2295 (2005).

    Article  PubMed  Google Scholar 

  23. Thomas, D. C. Is it time to retire the “pack-years” variable? Maybe not! Am. J. Epidemiol. 179, 299–302 (2014).

    Article  PubMed  Google Scholar 

  24. Jilani, A. et al. Molecular cloning of the human gene, PNKP, encoding a polynucleotide kinase 3’-phosphatase and evidence for its role in repair of DNA strand breaks caused by oxidative damage. J. Biol. Chem. 274, 24176–24186 (1999).

    Article  CAS  PubMed  Google Scholar 

  25. Martínez-Jiménez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).

    Article  PubMed  CAS  Google Scholar 

  26. Song, K. et al. A quantitative method for assessing smoke associated molecular damage in lung cancers. Transl. Lung Cancer Res. 7, 439–449 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  CAS  Google Scholar 

  29. Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Martincorena, I. & Campbell, P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015).

    Article  CAS  PubMed  Google Scholar 

  31. Anderson, G. P. & Bozinovski, S. Acquired somatic mutations in the molecular pathogenesis of COPD. Trends Pharmacol. Sci. 24, 71–76 (2003).

    Article  CAS  PubMed  Google Scholar 

  32. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Broderick, P. et al. Deciphering the impact of common genetic variation on lung cancer risk: a genome-wide association study. Cancer Res. 69, 6633–6641 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Hung, R. J. et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633–637 (2008).

    Article  CAS  PubMed  Google Scholar 

  35. Shiraishi, K. et al. A genome-wide association study identifies two new susceptibility loci for lung adenocarcinoma in the Japanese population. Nat. Genet. 44, 900–903 (2012).

    Article  CAS  PubMed  Google Scholar 

  36. Wu, C. et al. Genetic variants on chromosome 15q25 associated with lung cancer risk in Chinese populations. Cancer Res. 69, 5065–5072 (2009).

    Article  CAS  PubMed  Google Scholar 

  37. Wang, Y. et al. Common 5p15.33 and 6p21.33 variants influence lung cancer risk. Nat. Genet. 40, 1407–1409 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Harrison, S. M. et al. Using ClinVar as a resource to support variant interpretation. Curr. Protoc. Hum. Genet. 89, 8 16 1–8 16 23 (2016).

    Google Scholar 

  39. Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes, PCAWG Drivers and Functional Interpretation Working Group 68, PCAWG Structural Variation Working Group. Nature 578, 67–67 (1965).

    Google Scholar 

  40. Burczynski, M. E., Lin, H. K. & Penning, T. M. Isoform-specific induction of a human aldo-keto reductase by polycyclic aromatic hydrocarbons (PAHs), electrophiles, and oxidative stress: implications for the alternative pathway of PAH activation catalyzed by human dihydrodiol dehydrogenase. Cancer Res. 59, 607–614 (1999).

    CAS  PubMed  Google Scholar 

  41. Fluck, C. E. et al. Why boys will be boys: two pathways of fetal testicular androgen biosynthesis are needed for male sexual differentiation. Am. J. Hum. Genet. 89, 201–218 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Nowell, P. C. The clonal evolution of tumor cell populations. Science 194, 23–28 (1976).

    Article  CAS  PubMed  Google Scholar 

  43. Vijg, J. Somatic mutations, genome mosaicism, cancer and aging. Curr. Opin. Genet. Dev. 26, 141–149 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Rozhok, A. I. & DeGregori, J. The evolution of lifespan and age-dependent cancer risk. Trends Cancer 2, 552–560 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Obe, G., Heller, W. D. & Vogt, H. J. in Mutations in Man (ed. Obe, G.) 223–246 (Springer, 1984).

  46. Dong, X. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat. Methods 14, 491–493 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Westhoff, B. et al. Alterations of the Notch pathway in lung cancer. Proc. Natl Acad. Sci. USA 106, 22293–22298 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Nagel, Z. D., Chaim, I. A. & Samson, L. D. Inter-individual variation in DNA repair capacity: a need for multi-pathway functional assays to promote translational DNA repair research. DNA Repair (Amst.) 19, 199–213 (2014).

    Article  CAS  Google Scholar 

  52. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. Wright, C. F. et al. Evaluating variants classified as pathogenic in ClinVar in the DDD study. Genet. Med. 23, 571–575 (2021).

    Article  CAS  PubMed  Google Scholar 

  57. Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Bamford, S. et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer 91, 355–358 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Martincorena, I., Raine, K. M., Davies, H., Stratton, M. R. & Campbell, P. J. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Bates, D., M.M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by National Institutes of Health grants U01 ES029519-01 (J.V. and S.D.S), U01HL145560 (S.D.S. and J.V.) AG017242 (J.V.), AG056278 (J.V.), AG047200 (J.V. and V.G.) and the Glenn Foundation for Medical Research. We thank A. Desai and D. Patel (Pulmonary Medicine) for bronchoscopy sample procurement, S. Khader for cytopathology and X. Hao for assisting with data analysis.

Author information

Authors and Affiliations

Authors

Contributions

J.V., A.Y.M. and S.D.S. conceived this study and designed the experiments. S.D.S., M.S., T.S., Y.P., C.S. and A.S. provided clinical, procedural and specimen-specific study expertise and logistics. Z.H. performed the experiments. Z.H., J.V., A.Y.M., S.S. and K.Y. analyzed the data. Z.H. and J.V. wrote the manuscript.

Corresponding authors

Correspondence to Zhenqiu Huang, Simon D. Spivack or Jan Vijg.

Ethics declarations

Competing interests

A.Y.M., X.D. and J.V. are cofounders of SingulOmics. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Peter Campbell, Benjamin Izar, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Mutation frequency and correction deviation error.

SNV frequency of never-smokers versus age. Each data point indicates the mutation frequency per nucleus from each individual, with color intensity indicating relative standard error value (see Methods). The four cells of highest mutational burden were plotted separately with each data point representing median value with standard deviation errors.

Extended Data Fig. 2 Distribution of shared mutations in subject 1320.

a, Stacked bar plot showing the proportional contribution of shared SNVs between all sequenced 3-8 nuclei per subject. b, Upset plot showing the distribution of shared SNVs in six nuclei from subject 1320 (lower part). The bar chart (upper part) represents the number of SNVs shared by each nucleus combination.

Extended Data Fig. 3 An a prori semi-parametric B-spline model to test the non-linearity between mutation frequency and smoking pack-years.

Each data point indicates the SNV frequency of nuclei of individuals. The spline fit evaluated at the average age and the average of random effects, with the 95% confidence interval are shown by the gray line, with the piece-wise linear model fit as the blue line. P value for the spline model is 0.0043 compared to the linear model, and 0.0034 when compared to the null model (see Methods).

Extended Data Fig. 4 INDEL frequency and smoking dose.

a, INDEL frequency versus smoking pack-years across all individuals (n = 33). Each dot indicates the median value and the minimal and maximal range of INDEL frequency of individuals. b, INDEL frequency of different group of individuals according to the smoking pack-years, with boxes indicating median number and interquartile range of the never (n = 14), light (n = 6), moderate (n = 6), and heavy (n = 7) smoking group, respectively.

Extended Data Fig. 5 Effects of smoking cessation on mutation frequency.

Median number of SNV and INDEL frequency among former smokers (n = 7) and current smokers (n = 12). a, each data point indicates the median value and the minimal and maximal range of SNV frequency of 3-8 nuclei per subject. b, each data point indicates the median value and the minimal and maximal range of INDEL frequency of 3-8 nuclei per subject. P values were obtained by likelihood ratio tests using negative binomial mixed-effect model.

Extended Data Fig. 6 SNV frequency in the lung functional genome using scRNA-seq human lung data instead of GTEX.

Each data point represents the number of mutations per nucleus of in functional genome (x axis) and whole genome (y axis) of all subjects colored by smoking status.

Extended Data Fig. 7 Cancer driver mutations.

a, Distribution of driver gene mutations in single nuclei of subjects, with number of mutations and smoking status indicated by colors. b, Total number of single nuclei with unique mutations found in pan-cancer driver genes and number of unique mutations in pan-cancer driver genes across the sample set (n = 134), 22 of 85 driver genes shown (Supplementary Table 5).

Extended Data Fig. 8 Mutational signatures and smoking .

a, Mutation spectra of four novel signatures identified among never-smokers and smokers. The six substitution types are shown across the top. Within each substitution type, the trinucleotide context is shown as four sets of four bars, grouped by whether an A, C, G or T, respectively, is 5′ or 3′ to the mutated base. b-f, Absolute number of major signatures discovered from never-smokers (n = 14) and smokers (n = 19). Each dot indicates the median number of SNV frequency of each individual. Boxes indicate median values and interquartile ranges among each group. The quoted P values were obtained by likelihood ratio tests using linear mixed-effect models. g, APOBEC signatures relative contribution versus SNV frequency of nuclei of never-smokers. Each data point represents a nucleus.

Extended Data Fig. 9 The INDEL mutation signature analysis.

a, Mutation spectra of INDEL in single nuclei from never-smokers (n = 14) and smokers (n = 19). The contributions of different types of INDELs are shown, grouped by whether variants are deletions or insertions; the size of the event; whether they occur at repeat units; and the sequence content of the INDEL. b, Stacked bar plot showing the proportional contribution of mutational signatures to INDELs across all nuclei (n = 134) measured from never-smokers and smokers, four signatures (N1, ID1, ID3, ID4) were extracted by HDP.

Extended Data Fig. 10 Germline genetic variants associated with solid cancers.

A heatmap showing 6 germline variants associated to solid cancers found in each subject per column, with the presence and absence colored. Variant IDs at the left of each row of the heatmap represent 6 different solid cancer associated single-nucleotide polymorphisms found through Clinvar (Supplementary Table 7).

Supplementary information

Supplementary Information

Supplementary Note, Figures 1–4 and Tables 1–7.

Reporting Summary

Peer Review File

Supplementary Table 1

Supplementary Table 1–7, with a title included in each tab.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Z., Sun, S., Lee, M. et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nat Genet 54, 492–498 (2022). https://doi.org/10.1038/s41588-022-01035-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01035-w

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research