Reference grade characterization of polymorphisms in full-length HLA Class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform

Shingo Suzuki, Swati Ranade, Ken Osaki, Sayaka Ito, Atsuko Shigenari, Yuko Ohnuki, Akira Oka, Anri Masuya, John Harting, Primo Baybayan, Miwako Kitazume, Junichi Sunaga, Satoko Morishima, Yasuo Morishima, Hidetoshi Inoko, Jerzy K. Kulski, Takashi Shiina

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.

Original languageEnglish
Article number2294
Number of pages15
JournalFrontiers in Immunology
Volume9
Issue numberOCT
DOIs
Publication statusPublished - 4 Oct 2018

Fingerprint

MHC Class I Genes
MHC Class II Genes
Alleles
Gene Frequency
HLA-DRB1 Chains
Precision Medicine
Genes
Technology
Transplantation
Ions
Population
Libraries
Exons
Rheumatoid Arthritis
Healthy Volunteers
Software
Databases
Polymerase Chain Reaction
DNA

Cite this

Suzuki, Shingo ; Ranade, Swati ; Osaki, Ken ; Ito, Sayaka ; Shigenari, Atsuko ; Ohnuki, Yuko ; Oka, Akira ; Masuya, Anri ; Harting, John ; Baybayan, Primo ; Kitazume, Miwako ; Sunaga, Junichi ; Morishima, Satoko ; Morishima, Yasuo ; Inoko, Hidetoshi ; Kulski, Jerzy K. ; Shiina, Takashi. / Reference grade characterization of polymorphisms in full-length HLA Class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform. In: Frontiers in Immunology. 2018 ; Vol. 9, No. OCT.
@article{8903b11a1e974010a3e0eaee953aaeb1,
title = "Reference grade characterization of polymorphisms in full-length HLA Class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform",
abstract = "Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2{\%}. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.",
keywords = "Genotyping, HLA, Human leukocyte antigen, Ion PGM, Next-generation sequencing, NGS, PacBio RS II, SMRT sequencing",
author = "Shingo Suzuki and Swati Ranade and Ken Osaki and Sayaka Ito and Atsuko Shigenari and Yuko Ohnuki and Akira Oka and Anri Masuya and John Harting and Primo Baybayan and Miwako Kitazume and Junichi Sunaga and Satoko Morishima and Yasuo Morishima and Hidetoshi Inoko and Kulski, {Jerzy K.} and Takashi Shiina",
year = "2018",
month = "10",
day = "4",
doi = "10.3389/fimmu.2018.02294",
language = "English",
volume = "9",
journal = "Frontiers in Immunology",
issn = "1664-3224",
publisher = "Frontiers Media SA",
number = "OCT",

}

Suzuki, S, Ranade, S, Osaki, K, Ito, S, Shigenari, A, Ohnuki, Y, Oka, A, Masuya, A, Harting, J, Baybayan, P, Kitazume, M, Sunaga, J, Morishima, S, Morishima, Y, Inoko, H, Kulski, JK & Shiina, T 2018, 'Reference grade characterization of polymorphisms in full-length HLA Class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform' Frontiers in Immunology, vol. 9, no. OCT, 2294. https://doi.org/10.3389/fimmu.2018.02294

Reference grade characterization of polymorphisms in full-length HLA Class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform. / Suzuki, Shingo; Ranade, Swati; Osaki, Ken; Ito, Sayaka; Shigenari, Atsuko; Ohnuki, Yuko; Oka, Akira; Masuya, Anri; Harting, John; Baybayan, Primo; Kitazume, Miwako; Sunaga, Junichi; Morishima, Satoko; Morishima, Yasuo; Inoko, Hidetoshi; Kulski, Jerzy K.; Shiina, Takashi.

In: Frontiers in Immunology, Vol. 9, No. OCT, 2294, 04.10.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Reference grade characterization of polymorphisms in full-length HLA Class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform

AU - Suzuki, Shingo

AU - Ranade, Swati

AU - Osaki, Ken

AU - Ito, Sayaka

AU - Shigenari, Atsuko

AU - Ohnuki, Yuko

AU - Oka, Akira

AU - Masuya, Anri

AU - Harting, John

AU - Baybayan, Primo

AU - Kitazume, Miwako

AU - Sunaga, Junichi

AU - Morishima, Satoko

AU - Morishima, Yasuo

AU - Inoko, Hidetoshi

AU - Kulski, Jerzy K.

AU - Shiina, Takashi

PY - 2018/10/4

Y1 - 2018/10/4

N2 - Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.

AB - Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.

KW - Genotyping

KW - HLA

KW - Human leukocyte antigen

KW - Ion PGM

KW - Next-generation sequencing

KW - NGS

KW - PacBio RS II

KW - SMRT sequencing

UR - http://www.scopus.com/inward/record.url?scp=85055073678&partnerID=8YFLogxK

U2 - 10.3389/fimmu.2018.02294

DO - 10.3389/fimmu.2018.02294

M3 - Article

VL - 9

JO - Frontiers in Immunology

JF - Frontiers in Immunology

SN - 1664-3224

IS - OCT

M1 - 2294

ER -