PreprintPDF Available

nf-core/taxprofiler: highly parallelised and flexible pipeline for metagenomic taxonomic classification and profiling

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Metagenomic classification tackles the problem of characterising the taxonomic source of all DNA sequencing reads in a sample. A common approach to address the differences and biases between the many different taxonomic classification tools is to run metagenomic data through multiple classification tools and databases. This, however, is a very time-consuming task when performed manually - particularly when combined with the appropriate preprocessing of sequencing reads before the classification. Here we present nf-core/taxprofiler, a highly parallelised read-processing and taxonomic classification pipeline. It is designed for the automated and simultaneous classification and/or profiling of both short- and long-read metagenomic sequencing libraries against a 11 taxonomic classifiers and profilers as well as databases within a single pipeline run. Implemented in Nextflow and as part of the nf-core initiative, the pipeline benefits from high levels of scalability and portability, accommodating from small to extremely large projects on a wide range of computing infrastructure. It has been developed following best-practise software development practises and community support to ensure longevity and adaptability of the pipeline, to help keep it up to date with the field of metagenomics.
   1
    2
  3
        4
       5
     
6
        7
      8
        9
  10
        11
      12
         13
        14
      15
        16
          17
     18
        19
      20
21
        22
          23
     24
         25
      26
1 Abstract27
        28
            29
          30
          31
        32
          33
34
        35
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
           36
          37
            38
             39
          40
            41
        42
               43
     44
2 Introduction45
         46
           47
         48
                49
           50
           51
        52
            53
             54
            55
      56
          57
              58
           59
          60
          61
              62
             63
             64
             65
            66
             67
           68
          69
70
               71
          72
              73
             74
           75
             76
             77
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
             78
           79
           80
 81
             82
                 83
              84
        85
                86
87
          88
              89
          90
               91
          92
           93
           94
            95
           96
             97
         98
   99
            100
           101
            102
           103
           104
           105
          106
          107
           108
            109
          110
          111
112
             113
           114
         115
        116
         117
          118
            119
                 120
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
            121
          122
          123
               124
            125
      126
          127
             128
            129
          130
           131
            132
       133
        134
3 Description135
         136
         137
            138
 139
         140
             141
            142
         143
         144
        145
        146
         147
         148
         149
           150
            151
         152
             153
        154
             155
             156
              157
          158
            159
            160
           161
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
              162
             163
        164
           165
            166
      167
         168
            169
            170
           171
           172
     173
        
               
          
           
        
            
           

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
          
           
           
             
            
     
       
    
     
    
    
    
    
     
     
    
    
    
         174
           175
          176
         177
        
--help
  178
               179
 180
4 Discussion181
          182
        183
         184
          185
           186
         187
          188
          189
         190
           191
             192
             193
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
           
           
        
   


 

 


 






























  


 













    








 



    


    
 

 





    
 

    
 

    
 

    



    
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint


 

 


 


    
  





    
 


    
 


    
 
 


    



    
 

    
 

    


    



    


    
      
 




 

    


    
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint


 

 





    


    
     
          194
        195
         196
          197
          198
           199
          200
      201
          202
         203
           204
            205
          206
            207
           208
         209
         210
           211
          212
          213
 214
         215
             216
            217
         218
           219
             220
           221
           222
           223
       224
            225
           226
           227
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
           228
          229
             230
          231
             232
           233
           234
         235
           236
            237
           238
        239
          240
         241
            242
           243
        244
          245
         246
             247
        248
       249
5 Conclusion250
         251
           252
             253
            254
            255
           256
          257
            258
         259
         260
261
6 Code Availability262
        263
           264
265
             266

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
   267
7 Acknowledgments268
          269
            270
             271
           272
8 Funding273
          274
          275
         276
             277
         278
        279
           280
         281
282
9 Conict of Interest Statement283
            284
             285
            286
    287

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
10 Supplementary Information288
10.1 Implementation289
10.1.1 Input and Execution290
             291
            292
          293
           294
          295
           296
        297
      298
            299
             300
            301
   302
Listing 1        
             
  
$nextflow run nf-core/taxprofiler \
-r 1.1.0 \
-profile singularity,<institute> \
--input <samplesheet.csv> \
--databases <database.csv> \
--perform_shortread_qc \
--shortread_qc_minlength 20 \
--preprocessing_qc_tool falco \
--run_host_removal --hostremoval_reference 'host_genome.fasta' \
--run_kraken2 --kraken2_save_reads \
--run_metaphlan \
--run_krona \
--run_profile_standardisation
             303
             304
          305
            306
        307
          308
             309
            310

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
            
        
             
            
           
          
                


.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
     
--run_kraken2
  311
            312
           313
            314
  315
         
-r
  316
             317
               318
          319
            320
           321
            322
          323
           324
10.1.2 Preprocessing325
          326
          327
        328
           329
            330
           331
         332
              333
            334
        335
            336
           337
           338
            339
               340
         341
          342
         343
          344
           345
           346
         347
              

  
  
   

 

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
        348
          349
          350
             351
            352
          353
          354
       355
          356
           357
          358
           359
             360
           361
            362
            363
              364
              365
           366
      367
        368
             369
             370
           371
              372
               373
   374
10.1.3 Proling375
          376
          377
           378
          379
       380
                381
     
--run_<tool>
   382
            383
           384
    385
           386
          387
           388
            389

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
           390
             391
        392
             393
            394
          395
           396
          397
         398
          399
         400
            401
          402
           403
             404
           405
          406
           407
          408
           409
         410
        411
         412
            413
       414
10.1.4 Post-proling415
            416
             417
           418
            419
      420
           421
            422
            423
       424
           425
 426
           427
             428
           429
            430
    431

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
10.1.5 Output432
          433
         434
             435
      436
            437
               438
         439
           440
    441
            442
               443
          444
            445
10.2 Comparison with other solutions446
         447
metagenomic          448
            449
           450
             451
           452
         453
454
           455
             456
           457
           458
             459
          460
            461
           462
           463
           464
        465
           466
         467
             468
          469
          470
            471
        472

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
            473
           474
           475
           476
          477
            478
 479
           480
           481
                482
              483
        484
             485
             486
           487
         488
            489
           490
          491
        492
       493
            494
             495
            496
            497
            498
 499
           500
         501
             502
          503
             504
         505
         506
           507
            508
           509
            510
          511
             512
        
    


.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
            513
             514
          515
            516
          517
           518
         519
         520
           521
              522
           523
           524
          525
            526
         527
             528
              529
            530
            531
         532
         533
              534
            535
          536
           537
         538
            539
             540
           541
          542
        543
          544
          545
           546
         547
    548
           549
        550
              551
         552
              553
          554
              555
             556
             557

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
     558
          559
      560
       de novo561
         562
   de novo       563
            564
     de novo     565
        566
       567

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
References568
          569
  570
           571
      Journal of572
Open Source Soware   573
         574
          575
         576
    Molecular Ecology Resources    577
578
       579
         580
       581
     Nature Biotechnology  582
583
          584
     Bioinformatics (Oxford, England)  585
 586
          587
    588
589
        590
      Genome Biology  591
 592
            593
        Briengs in594
Bioinformatics    595
          596
         597
 Genome Research    598
599
         600
      Nature Methods   601
 602
    603
          604
           605
       606
607
             608
   Bioinformatics    609
610
         Nature Reviews.611
Genetics    612

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
           613
         614
      Microbiome 615
 616
          617
           618
 GigaScience   619
        620
       621
        Applied and Envi-622
ronmental Microbiology    623
         624
        625
   Nature Biotechnology   626
627
          628
        629
  Nature Communications    630
631
          632
        633
 Nature Microbiology   634
         635
        636
        637
   Nature Biotechnology   638
639
         640
          641
  Microbial Genomics   642
643
         644
       645
       646
      Nature Methods   647
648
        649
          650
      mSystems  651
652
          653
        Genome654
Research    655
          656
       657
  NAR Genomics and Bioinformatics  658

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
659
         660
  PloS One    661
662
         663
  Nature Methods   664
         665
         666
Bioinformatics    667
         Bioinfor-668
matics    669
            670
          671
        672
Bioinformatics    673
          674
       PloS One 675
  676
         677
       PeerJ.678
Computer Science   679
            680
 Nature Reviews. Microbiology    681
682
         683
         684
       Genome Biology685
   686
            687
     Nature Communications 688
 689
           690
        PloS691
One    692
          693
         694
       Nature Methods695
   696
         697
          698
    Nucleic Acids Research  699
700
          701
          702
   F1000Research    703
704

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
        705
        706
        707
 Frontiers in Genetics    708
709
           710
         711
   Genome Biology    712
713
         714
  Cell    715
716
         717
     BMC Bioinformatics   718
719
           720
        721
    Bioinformatics (Oxford, England)   722
 723
          724
        Microbiome725
  726
          727
            728
     bioRxiv729
730
          731
        732
   BMC Bioinformatics    733
734
         735
        Nature736
Biotechnology    737
          738
         739
    mSystems  740
741
         742
       743
   Bioinformatics (Oxford, England)   744
745
       746
          747
     748
     Microbiome    749
750

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
          751
      752
      Bioinformatics753
(Oxford, England)    754
          755
         756
     757
     Applied and Environmental758
Microbiology    759
         760
   Bioinformatics (Oxford, England)    761
762
         763
       BMC Research Notes764
  765
         766
          767
     Nature Methods  768
 769
          770
       F1000Research  771
772
           773
 Frontiers in Plant Science   774
775
           776
           777
      Bioinformatics  778
 779
           780
      781
BMC Bioinformatics    782
        783
        784
           785
  Frontiers in Microbiology    786
787
            788
         789
  Nature Methods    790
791
           792
  Journal of Open Source Soware   793
794
        795
       Microbiome796

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
  797
         798
         799
         800
 Nature Ecology & Evolution   801
802
          803
          804
       805
  Bioinformatics (Oxford, England)   806
807
           808
        Microbial809
Genomics   810
          811
    Genome Biology    812
813
         814
        Na-815
ture Methods    816
           817
        818
     Microbial Genomics  819
820
           821
      Cell   822
823
           824
          825
          826
  Nucleic Acids Research     827
828

.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
.CC-BY 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted October 23, 2023. ; https://doi.org/10.1101/2023.10.20.563221doi: bioRxiv preprint
... The effectiveness of the workf low was validated in four different cases, including the detection of potentially life-threatening, systemic pathogens (case 1), atypical pathogens causing meningoencephalitis (case 2), and potential co-infections and genotyping of viral RNA (case 3 and case 4). Our approach differs from previous studies [39,41,[65][66][67] by providing a comprehensive solution that can be adapted to different sequencing read types, as evidenced by its application to real-world clinical metagenomics data. In three of the four cases, we successfully detected the presence of DNA/RNA pathogens using the combination of the proposed approaches. ...
Article
Full-text available
Over the past decade, there have been many improvements in the field of metagenomics, including sequencing technologies, advances in bioinformatics and the development of reference databases, but a one-size-fits-all sequencing and bioinformatics pipeline does not yet seem achievable. In this study, we address the bioinformatics part of the analysis by combining three methods into a three-step workflow that increases the sensitivity and specificity of clinical metagenomics and improves pathogen detection. The individual tools are combined into a user-friendly workflow suitable for analysing short paired-end (PE) and long reads from metagenomics datasets—MetaAll. To demonstrate the applicability of the developed workflow, four complicated clinical cases with different disease presentations and multiple samples collected from different biological sites as well as the CAMI Clinical pathogen detection challenge dataset were used. MetaAll was able to identify putative pathogens in all but one case. In this case, however, traditional microbiological diagnostics were also unsuccessful. In addition, co-infection with Haemophilus influenzae and Human rhinovirus C54 was detected in case 1 and co-infection with SARS-Cov-2 and Influenza A virus (FluA) subtype H3N2 was detected in case 3. In case 2, in which conventional diagnostics could not find a pathogen, mNGS pointed to Klebsiella pneumoniae as the suspected pathogen. Finally, this study demonstrated the importance of combining read classification, contig validation and targeted reference mapping for more reliable detection of infectious agents in clinical metagenome samples.
Article
Full-text available
Previously, Klebsiella pneumoniae was found to occur more frequently in healthy turkey flocks than in healthy broiler flocks in Norway. This study aimed to investigate whether this higher occurrence could be attributed to a greater abundance of K. pneumoniae in turkey flocks. We compared culturing, qPCR, and shotgun metagenomic sequencing for the detection and quantification of K. pneumoniae. Using qPCR, we found that 20.7% of broiler flock samples and 63.9% of turkey flock samples were positive for K. pneumoniae. Culturing revealed a significantly higher abundance of K. pneumoniae in turkey flocks compared to broiler flocks. However, metagenomic analysis showed no difference in the relative abundance of Klebsiella spp. between broiler and turkey flocks, and no correlation between the results of culturing and metagenomic quantification. Interestingly, the differential abundance of K. quasipneumoniae was significantly different between the two hosts. Our results indicate that Klebsiella spp. are present in both turkey and broiler flocks at relatively low levels but with a higher abundance in turkey flocks. Our findings also suggest that shotgun metagenomic studies targeting low‐abundance taxa such as Klebsiella have poor sensitivity when comparing groups, indicating that reliance on results from metagenomic analysis without experimental validation should be done with caution.
Article
Full-text available
Abstract Background Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for a variety of metagenomics analyses, including taxonomic classification and profiling. The development of long-read specific tools for taxonomic classification is accelerating, yet there is a lack of information regarding their relative performance. Here, we perform a critical benchmarking study using 11 methods, including five methods designed specifically for long reads. We applied these tools to several mock community datasets generated using Pacific Biosciences (PacBio) HiFi or Oxford Nanopore Technology sequencing, and evaluated their performance based on read utilization, detection metrics, and relative abundance estimates. Results Our results show that long-read classifiers generally performed best. Several short-read classification and profiling methods produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates. By contrast, two long-read methods (BugSeq, MEGAN-LR & DIAMOND) and one generalized method (sourmash) displayed high precision and recall without any filtering required. Furthermore, in the PacBio HiFi datasets these methods detected all species down to the 0.1% abundance level with high precision. Some long-read methods, such as MetaMaps and MMseqs2, required moderate filtering to reduce false positives to resemble the precision and recall of the top-performing methods. We found read quality affected performance for methods relying on protein prediction or exact k-mer matching, and these methods performed better with PacBio HiFi datasets. We also found that long-read datasets with a large proportion of shorter reads (
Article
Full-text available
Background Taxonomic profiling is a fundamental task in microbiome research that aims to detect and quantify the relative abundance of microorganisms in biological samples. Available methods using shotgun metagenomic data generally depend on the deposition of sequenced and taxonomically annotated genomes, usually from cultures of isolated strains, in reference databases (reference genomes). However, the majority of microorganisms have not been cultured yet. Thus, a substantial fraction of microbial community members remains unaccounted for during taxonomic profiling, particularly in samples from underexplored environments. To address this issue, we developed the mOTU profiler, a tool that enables reference genome-independent species-level profiling of metagenomes. As such, it supports the identification and quantification of both “known” and “unknown” species based on a set of select marker genes. Results We present mOTUs3, a command line tool that enables the profiling of metagenomes for >33,000 species-level operational taxonomic units. To achieve this, we leveraged the reconstruction of >600,000 draft genomes, most of which are metagenome-assembled genomes (MAGs), from diverse microbiomes, including soil, freshwater systems, and the gastrointestinal tract of ruminants and other animals, which we found to be underrepresented by reference genomes. Overall, two thirds of all species-level taxa lacked a reference genome. The cumulative relative abundance of these newly included taxa was low in well-studied microbiomes, such as the human body sites (6–11%). By contrast, they accounted for substantial proportions (ocean, freshwater, soil: 43–63%) or even the majority (pig, fish, cattle: 60–80%) of the relative abundance across diverse non-human-associated microbiomes. Using community-developed benchmarks and datasets, we found mOTUs3 to be more accurate than other methods and to be more congruent with 16S rRNA gene-based methods for taxonomic profiling. Furthermore, we demonstrate that mOTUs3 increases the resolution of well-known microbial groups into species-level taxa and helps identify new differentially abundant taxa in comparative metagenomic studies. Conclusions We developed mOTUs3 to enable accurate species-level profiling of metagenomes. Compared to other methods, it provides a more comprehensive view of prokaryotic community diversity, in particular for currently underexplored microbiomes. To facilitate comparative analyses by the research community, it is released with >11,000 precomputed profiles for publicly available metagenomes and is freely available at: https://github.com/motu-tool/mOTUs .
Preprint
Full-text available
Analysis of microbial data from archaeological samples is a rapidly growing field with a great potential for understanding ancient environments, lifestyles and disease spread in the past. However, high error rates have been a long-standing challenge in ancient metagenomics analysis. This is also complicated by a limited choice of ancient microbiome specific computational frameworks that meet the growing computational demands of the field. Here, we propose aMeta, an accurate ancient Metagenomic profiling workflow designed primarily to minimize the amount of false discoveries and computer memory requirements. Using simulated ancient metagenomic samples, we benchmark aMeta against a current state-of-the-art workflow, and demonstrate its superior sensitivity and specificity in both microbial detection and authentication, as well as substantially lower usage of computer memory. aMeta is implemented as a Snakemake workflow to facilitate use and reproducibility.
Article
Full-text available
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Article
Full-text available
Metagenomic studies unravel details about the taxonomic composition and the functions performed by microbial communities. As a complete metagenomic analysis requires different tools for different purposes, the selection and setup of these tools remain challenging. Furthermore, the chosen toolset will affect the accuracy, the formatting, and the functional identifiers reported in the results, impacting the results interpretation and the biological answer obtained. Thus, we surveyed state-of-the-art tools available in the literature, created simulated datasets, and performed benchmarks to design a sensitive and flexible metagenomic analysis pipeline. Here we present MEDUSA, an efficient pipeline to conduct comprehensive metagenomic analyses. It performs preprocessing, assembly, alignment, taxonomic classification, and functional annotation on shotgun data, supporting user-built dictionaries to transfer annotations to any functional identifier. MEDUSA includes several tools, as fastp, Bowtie2, DIAMOND, Kaiju, MEGAHIT, and a novel tool implemented in Python to transfer annotations to BLAST/DIAMOND alignment results. These tools are installed via Conda, and the workflow is managed by Snakemake, easing the setup and execution. Compared with MEGAN 6 Community Edition, MEDUSA correctly identifies more species, especially the less abundant, and is more suited for functional analysis using Gene Ontology identifiers.
Article
Full-text available
Accurate microbial identification and abundance estimation are crucial for metagenomics analysis. Various methods for classification of metagenomic data and estimation of taxonomic profiles, broadly referred to as metagenomic profilers, have been developed. Nevertheless, benchmarking of metagenomic profilers remains challenging because some tools are designed to report relative sequence abundance while others report relative taxonomic abundance. Here we show how misleading conclusions can be drawn by neglecting this distinction between relative abundance types when benchmarking metagenomic profilers. Moreover, we show compelling evidence that interchanging sequence abundance and taxonomic abundance will influence both per-sample summary statistics and cross-sample comparisons. We suggest that the microbiome research community pay attention to potentially misleading biological conclusions arising from this issue when benchmarking metagenomic profilers, by carefully considering the type of abundance data that were analyzed and interpreted and clearly stating the strategy used for metagenomic profiling. Many computational tools for metagenomic profiling have been developed, with different algorithms and features. This analysis shows that, when comparing these tools, the distinction of different types of relative sequence abundance should be taken into consideration.
Article
Full-text available
Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Article
Full-text available
Quality control is an essential first step in sequencing data analysis, and software tools for quality control are deeply entrenched in standard pipelines at most sequencing centers. Although the associated computations are straightforward, in many settings the total computing effort required for quality control is appreciable and warrants optimization. We present Falco, an emulation of the popular FastQC tool that runs on average three times faster while generating equivalent results. Compared to FastQC, Falco also requires less memory to run and provides more flexible visualization of HTML reports.
Article
Full-text available
One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, focusing on environmental samples from contrasting habitats, it was not systematically evaluated (i) which analysis methods provide results that reflect reality most accurately, (ii) how the interpretations of microbial community studies are biased by different analysis methods and (iii) if the most optimal analysis workflow can be implemented in an easy-to-use pipeline. Here, we compared the performance of 16S rRNA (gene) amplicon sequencing analysis tools (i.e., Mothur, QIIME1, QIIME2, and MEGAN) using three mock datasets with known microbial community composition that differed in sequencing quality, species number and abundance distribution (i.e., even or uneven), and phylogenetic diversity (i.e., closely related or well-separated amplicon sequences). Our results showed that QIIME2 outcompeted all other investigated tools in sequence recovery (>10 times fewer false positives), taxonomic assignments (>22% better F-score) and diversity estimates (>5% better assessment), suggesting that this approach is able to reflect the in situ microbial community most accurately. Further analysis of 24 environmental datasets obtained from four contrasting terrestrial and freshwater sites revealed dramatic differences in the resulting microbial community composition for all pipelines at genus level. For instance, at the investigated river water sites Sphaerotilus was only reported when using QIIME1 (8% abundance) and Agitococcus with QIIME1 or QIIME2 (2 or 3% abundance, respectively), but both genera remained undetected when analyzed with Mothur or MEGAN. Since these abundant taxa probably have implications for important biogeochemical cycles (e.g., nitrate and sulfate reduction) at these sites, their detection and semi-quantitative enumeration is crucial for valid interpretations. A high-performance computing conformant workflow was constructed to allow FAIR (Findable, Accessible, Interoperable, and Re-usable) 16S rRNA (gene) amplicon sequence analysis starting from raw sequence files, using the most optimal methods identified in our study. Our presented workflow should be considered for future studies, thereby facilitating the analysis of high-throughput 16S rRNA (gene) sequencing data substantially, while maximizing reliability and confidence in microbial community data analysis.
Article
Full-text available
Background: Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of interactions of microbes in environmental samples. To understand their environments more deeply, the composition of microorganisms in environmental samples has been studied using metagenomes, which are the collections of genomes of the microorganisms. Although many tools have been developed for taxonomy analysis based on different algorithms, variability of analysis outputs of existing tools from the same input metagenome datasets is the main obstacle for many researchers in this field. Results: Here, we present a novel meta-analysis tool for metagenome taxonomy analysis, called TAMA, by intelligently integrating outputs from three different taxonomy analysis tools. Using an integrated reference database, TAMA performs taxonomy assignment for input metagenome reads based on a meta-score by integrating scores of taxonomy assignment from different taxonomy classification tools. TAMA outperformed existing tools when evaluated using various benchmark datasets. It was also successfully applied to obtain relative species abundance profiles and difference in composition of microorganisms in two types of cheese metagenome and human gut metagenome. Conclusion: TAMA can be easily installed and used for metagenome read classification and the prediction of relative species abundance from multiple numbers and types of metagenome read samples. TAMA can be used to more accurately uncover the composition of microorganisms in metagenome samples collected from various environments, especially when the use of a single taxonomy analysis tool is unreliable. TAMA is an open source tool, and can be downloaded at https://github.com/jkimlab/TAMA.