Standard Plots

IsoPops implements the following suite of summary plots which may be useful for getting a sense of the isoform diversity in your data. All standard plot functions require the package ggplot2 and take in a processed Database object. These functions also allow for subsetting of the dataset by a list of genes, and most let you toggle between viewing summary statistics for transcripts and/or ORFs.

Isoform Length Distributions


Example plot

plot_length_dist(database, use_ORFs = F, bins = 200, horiz_spread = 0.3, ...)

Generates a dot plot showing how transcripts/ORFs are distributed in length for each gene in the database.

Arguments
database A compiled Database object.
use_ORFs Logical. Set to TRUE to use abundances from OrfDB instead of abundances from TranscriptDB. Note that OrfDB collapses isoforms with non-unique transcripts, so abundances may differ significantly.
bins The number of bins to use in segmenting the range of lengths over the entire dataset. This parameter determines vertical dot spread.
genes_to_include Vector of gene names to subset from the database. Default is to plot all genes in the database.
horiz_spread Numeric. This parameter determines horizontal dot spread across all the genes visualized.
insert_title String to customize the title of the plot.
Returns
A length distribution dot plot constructed as a ggplot object.
Example Usage
                       # database setup 
                      gene_ID_table <- data.frame(ID = c("PB.1"), Name = c("Gene1"))
                      rawDB <- compile_raw_db(transcript_file, abundance_file, gff_file, ORF_file)
                      DB <- process_db(rawDB, gene_ID_table)
                      
                      plot_length_dist(DB)
                    
Notes
Requires the ggplot2 package.




Treemap Plots


Example plot

plot_treemap(database, use_ORFs = F, ...)

Generates a treemap plot showing how individual transcripts and genes account for abundance within the dataset as a whole

Arguments
database A compiled Database object.
use_ORFs Logical. Set to TRUE to use abundances from OrfDB instead of abundances from TranscriptDB. Note that OrfDB collapses isoforms with non-unique transcripts, so abundances may differ significantly.
genes_to_include Vector of gene names to subset from the database. Default is to plot all genes in the database.
insert_title String to customize the title of the plot.
Returns
A treemap plot constructed as a ggplot object.
Example Usage
                       # database setup 
                      gene_ID_table <- data.frame(ID = c("PB.1"), Name = c("Gene1"))
                      rawDB <- compile_raw_db(transcript_file, abundance_file, gff_file, ORF_file)
                      DB <- process_db(rawDB, gene_ID_table)
                      
                      plot_treemap(DB)
                    
Notes
Requires the ggplot2 package.




Exon-Abundance Distribution Plots


Example plot

Example plot

plot_exon_dist(database, sum_dist = T, bin_width = 0.02, ...)

Generates a bar plot showing the abundances of normalized exon counts for one or more genes. If each transcript is represented as the fraction of exons it contains out of the maximum number of exons found in a gene, this plot is merely a histogram of those representations, weighted by the read count for each transcript. Normalized exon percent is along the x-axis, and abundance is along the y-axis. If only one gene name is given, a second plot is generated where the x-axis is not normalized, instead showing the exon count of each transcript individually, and the y-axis is log-transformed. Jitter along the x-axis is added to improve visibility.

Arguments
database A compiled Database object.
genes_to_include Vector of gene names to subset from the database. Default is to plot all genes in the database.
sum_dist Logical. If TRUE, the result is a histogram-like bar plot, where the x-axis is binned. Otherwise, individual isoforms are plotted as points, and the y-axis is log-transformed (single gene only).
bin_width The histogram bin width, used only when multiple genes are input and the x-axis is the fraction of total exons per gene.
insert_title String to customize the title of the plot.
Returns
A ggplot object.
Example Usage
                       # database setup 
                      gene_ID_table <- data.frame(ID = c("PB.1"), Name = c("Gene1"))
                      rawDB <- compile_raw_db(transcript_file, abundance_file, gff_file, ORF_file)
                      DB <- process_db(rawDB, gene_ID_table)
                      
                      plot_exon_dist(DB, sum_dist = T)  # set to false for scatter plot 
                    
Notes
Requires the ggplot2 package.




Unique Isoforms/ORFs Barplots


Example plot

plot_counts(database, use_log = F, use_counts = c("Isoforms", "ORFs"))

Generates a bar plot showing the number of unique isoform transcripts and unique ORFs for each gene.

Arguments
database A compiled Database object.
use_ORFs Logical. Set to TRUE to use abundances from OrfDB instead of abundances from TranscriptDB. Note that OrfDB collapses isoforms with non-unique transcripts, so abundances may differ significantly.
genes_to_include Vector of gene names to subset from the database. Default is to plot all genes in the database.
insert_title String to customize the title of the plot.
Returns
A ggplot object.
Example Usage
                       # database setup 
                      gene_ID_table <- data.frame(ID = c("PB.1"), Name = c("Gene1"))
                      rawDB <- compile_raw_db(transcript_file, abundance_file, gff_file, ORF_file)
                      DB <- process_db(rawDB, gene_ID_table)
                      
                      plot_N50_N75(DB)
                    
Notes
Requires the ggplot2 package.




N50/N75 Barplots


Example plot

plot_N50_N75(database, use_ORFs = F, ...)

Generates a bar plot showing the number of isoform transcripts and/or unique ORFs for each gene, using the thresholding concepts of N50 and N75. N50 refers to the minimum number of isoforms/ORFs needed to represent at least 50% of the abudance of a gene, while N75 refers to the minimum number of isoforms/ORFs needed to represent at least 75% of the abundance for a gene. This plot can help to identify which genes are dominated by very few isoforms.

Arguments
database A compiled Database object.
use_log Logical. If true, y-axis is plotted on a log scale (base 2).
genes_to_include Vector of gene names to subset from the database. Default is to plot all genes in the database.
use_counts One of both of the strings "Isoforms" and "ORFs", indicating which whould be included in the plot. Default is both and to give a warning if no ORF information is in the database.
insert_title String to customize the title of the plot.
Returns
A ggplot object.
Example Usage
                       # database setup 
                      gene_ID_table <- data.frame(ID = c("PB.1"), Name = c("Gene1"))
                      rawDB <- compile_raw_db(transcript_file, abundance_file, gff_file, ORF_file)
                      DB <- process_db(rawDB, gene_ID_table)
                      
                      plot_isoform_orf_counts(DB)
                    
Notes
Requires the ggplot2 package.




Shannon Diversity Index Plots


Example plot

plot_Shannon_index(database, use_ORFs = F, ...)

Generates a plot showing the Shannon Index for isoform/ORF diversity on the x-axis, and each gene on the y-axis.

Arguments
database A compiled Database object.
use_ORFs Logical. Set to TRUE to use abundances from OrfDB instead of abundances from TranscriptDB. Note that OrfDB collapses isoforms with non-unique transcripts, so abundances may differ significantly.
genes_to_include Vector of gene names to subset from the database. Default is to plot all genes in the database.
insert_title String to customize the title of the plot.
Returns
A ggplot object.
Example Usage
                       # database setup 
                      gene_ID_table <- data.frame(ID = c("PB.1"), Name = c("Gene1"))
                      rawDB <- compile_raw_db(transcript_file, abundance_file, gff_file, ORF_file)
                      DB <- process_db(rawDB, gene_ID_table)
                      
                      plot_Shannon_index(DB)
                    
Notes
Requires the ggplot2 package.




Exon Correlation Plots


Example plot

plot_exon_correlations(database, exon_filename, gene, weighted = T, exons_to_include = NULL, weights = NULL, plot_hist = F, symmetric = F)

Generates a 2D heatmap where each axis is the exons for a gene, and the values in the heatmap correspond to the correlation between the splicing events of pairs of exons. For example, the heatmap cell in row i and column j contains the pearson correlation of all the observed splicing inclusions and exclusions of exons i and j, according to the transcripts in the data. This plot can show which exons tend to either be included or spliced out together, for instance, and any exon pairs which may have mutually exclusive splicing patterns. Exon presence within a transcript is determined by literal string matching, so only full and completely correct matches between exon sequence and transcript sequence are considered.

Arguments
database A compiled Database object.
exon_filename Path to a file in either FASTA or TSV format. If in FASTA format, the sequences are the annotated sequences for all exons in the gene, and the IDs are the exon names (will be displayed in the plot). The ID line must have format ">exonname". If in TSV format, There must be one column for exon names and one column for the exon sequence, tab-separated.
gene The desired gene to plot. Note that the plot will be generated only from exon matches to transcripts for the given gene, so no off-target exon matches are possible.
weighted Logical. If TRUE, transcript abundances will be taken into account when correlations are calculated (recommended).
exons_to_include Vector of exon names to subset from the input file. Default is to include all exons in the inut file. This list is ordered; in other words, if you would like to rearrange the order of exon names on the axes of the heatmap, use this argument to do so.
weights A numeric vector specifying the weights to apply to each transcript for the given gene. Default is the number of full-length reads for the transcript.
plot_hist Logical. If TRUE, a histogram of all exon correlations across the gene is produced.
symetric Logical. If TRUE, both sides of the symmetric heatmap are shown.
Returns
A ggplot object.
Example Usage
                       # database setup 
                      gene_ID_table <- data.frame(ID = c("PB.1"), Name = c("Gene1"))
                      rawDB <- compile_raw_db(transcript_file, abundance_file, gff_file, ORF_file)
                      DB <- process_db(rawDB, gene_ID_table)
                      
                      plot_exon_correlations(DB)
                    
Notes
Requires the ggplot2 package.