3.1 years ago

novice
★
1.0k

Hello,

I have a few assumptions I would like to test about viral data. Would anyone be familiar with published papers that have experimentally (or somehow) determined the percentage of viral sequences in the data? I have looked into some HMP samples but couldn't that list the percentage of virus explicitly.

An example of what I am looking for: paper sequenced a metagenomic sample from a cheek swab (e.g. using Illumina short-reads) and mentioned in the paper that the percentage of viral reads were 0.001%.

Thanks in advance.

Not exactly what you are asking for but see if it is of interest. Viral sequences in TCGA data.

As there probably exist known viral sequences, you could try to estimate this percentage by blasting the sequences against the genome of interest. Do the union of nucleotide coverage, and divide by whole length.

Thanks, but these were determined using novel methods. I am wondering how such methods are verified if there are no samples with experimentally-known viral abundance.