Database Objects
IsoPops saves data in collections of data frames called Database objects. Each data frame is a table containing information on individual transcripts (the TranscriptDB), unique ORFs (the OrfDB), gene-level information (the GeneDB), or GFF/GTF-formatted transcript data (the GffDB). As filtering steps are applied to a Database, isoforms are filtered from all parts of the Database. This makes it possible to, for example, examine which ORFs are implicated when transcripts are filtered by read count. Each Database object can be used to store data from a single filtering step, a particular experiment, a specific cell type, etc.
There are two kinds of Database objects: Raw Databases and processed Databases. The distinction separates non-targeted experiments from targeted ones, since processed Databases only retain isoforms from the genes which were targeted by the experimental protocol. Raw Databases, on the other hand, are for use in whole-transcriptome analysis.
Raw Databases
A Raw Database consists of a compiled TranscriptDB and a compiled GffDB from the same experiment. Raw Databases, unlike processed Databases, do not include any gene information. Raw Databases are also different from processed Databases in that they do not contain an OrfDB, although the ORFs are included as part of the TranscriptDB if provided. Raw Databases are unfiltered -- if your dataset includes off-target reads, they will still be present in the Raw Database. Raw Databases are created using the function compile_raw_db()
.
At the moment, much of the functionality of IsoPops is specific to targeted sequencing experiments, and thus will only work with processed Database objects. But compiling your data into a Raw Database can still be useful for examining the quality and distribution of reads from an experiment.
Processed Databases
One of the first steps in filtering reads in a targeted sequencing approach is removing off-target reads. These are any reads which come from genes besides the genes you designed probes for. You can filter these reads out, after compiling your data into a Raw Database, by processing that Raw Database. From there additional filtering steps can be applied, such as read count cutoffs and minimum exon requirements. Processed Databases are created by passing a Raw Database along with a gene-ID table into the function process_db()
.