Coverage analysis of chromosome contig_2

Report created with Sequana (v0.18.0)

The genome coverage analysis of the chromosome contig_2.

Coverage

The following figures shows the per-base coverage along the reference genome (black line). The blue line indicates the running median. From the normalised coverage, we estimate z-scores on a per-base level. The red lines indicates the z-scores at plus or minus N standard deviations, where N is chosen by the user. (default:4). Only a million point are shown. This may explain some visual discrepancies with.

Subcoverage

Subchromosome

Basic stats

Here are some basic statistics about the genome coverage.

Regions Of Interest (ROI)

The following tables give regions of interest detected by sequana. Here are the definitions of the columns:

Low coverage region

Regions with a z-score lower than -2.00 and at least one base with a z-score lower than -4.00 are detected.There are 12 low regions of interest.

chr,start,end,size,mean_cov,max_cov,mean_rm,mean_zscore,max_zscore,log2_ratio,link
contig_2,1697,1698,1,36,36,41,-5.05,-5.05,-0.188,subplots/contig_2_1_12729.html
contig_2,2103,2825,722,38.1,39,41,-2.98,-7.01,-0.106,subplots/contig_2_1_12729.html
contig_2,4354,4355,1,41,41,46,-4.52,-4.52,-0.166,subplots/contig_2_1_12729.html
contig_2,4525,4526,1,41,41,46,-4.52,-4.52,-0.166,subplots/contig_2_1_12729.html
contig_2,5147,5148,1,40,40,47,-6.13,-6.13,-0.233,subplots/contig_2_1_12729.html
contig_2,5934,5935,1,43,43,48,-4.33,-4.33,-0.159,subplots/contig_2_1_12729.html
contig_2,6415,6416,1,44,44,49,-4.25,-4.25,-0.155,subplots/contig_2_1_12729.html
contig_2,6597,6598,1,43,43,49,-5.07,-5.07,-0.188,subplots/contig_2_1_12729.html
contig_2,7434,7435,1,46,46,51,-4.09,-4.09,-0.149,subplots/contig_2_1_12729.html
contig_2,7474,7475,1,41,41,51,-8.03,-8.03,-0.315,subplots/contig_2_1_12729.html
contig_2,8481,8482,1,49,49,56,-5.17,-5.17,-0.193,subplots/contig_2_1_12729.html
contig_2,9425,9426,1,49,49,58,-6.38,-6.38,-0.243,subplots/contig_2_1_12729.html
chr start end size mean_cov max_cov mean_rm mean_zscore max_zscore log2_ratio link

High coverage region

Regions with a z-score higher than 2.00 and at least one base with a z-score higher than 4.00 are detected.There are 0 high regions of interest.

chr,start,end,size,mean_cov,max_cov,mean_rm,mean_zscore,max_zscore,log2_ratio,link
chr start end size mean_cov max_cov mean_rm mean_zscore max_zscore log2_ratio link

Coverage histogram

The following figures contain the histogram of the genome coverage. The X and Y axis being in log scale in the left panelwhile only the Y axis is in log scale in the right panel.

Coverage vs GC content

The correlation coefficient between the coverage and GC content is 0.209 with a window size of 201bp.

image not created

Note: the correlation coefficient has to be between -1.0 and 1.0. A coefficient of 0 means no correlation, while a coefficient of -1 or 1 means an existing correlation between GC and Coverage

Normalised coverage

Distribution of the normalised coverage with predicted Gaussian. The red line should be followed the trend of the barplot.

Z-Score distribution

Distribution of the z-score (normalised coverage); You should see a Gaussian distribution centered around 0. The estimated parameters are mu=1.00 and sigma=0.02.

Command

Command used:

sequana_coverage --input-file hifi3/minimap2/hifi3.bed -H 4.0 -L -4.0 --clustering-parameter 0.5 --chunk-size 5000000 --window-gc 201 --mixture-models 2 --output-directory hifi3/sequana_coverage --window-median 20001 --reference-file hifi3/sorted_contigs/hifi3.fasta
Sequana version: 0.18.0
.