Iso-Seq学习

SMRT portal安装教程:

http://www.pacb.com/wp-content/uploads/2015/09/SMRT-Analysis-Software-Installation-v2.3.0.pdf

ISO-seq数据地址：

/share/backups/pacbio/20160222_68 的 A01 和 B01。

<1kb的得到1.28G数据，>1kb的得到了2.8G的数据。

SMRT portal 地址：

http://59.79.232.10:8080/smrtportal/#/Design-Job/

软件安装主目录：

/share/workplace/software/PACBIO

reference_droplist: :

/share/workplace/software/PACBIO/userdata/references_dropbox

username: pbuser
password: pacbio-one2three

学习目的：对这两个cell收集一下结果（多少reads，多少全长reads，多少isoform，SMRT-portal的报告都有。

ISOseq数据比对到参考基因组

文本教程参见：

https://github.com/PacificBiosciences/cDNA_primer/wiki

视频教程：

http://www.pacb.com/training/IsoformSequencingIsoSeqOverview/story.html

THE CHALLENGE OF ISOFORM RECONSTRUCTION

简单的说就是二代测序无法有效区分同一个transcript的单倍型！

In eukaryotic organisms, the majority of genes are alternatively spliced to produce multiple transcript isoforms, dramatically increasing the protein-coding potential of a genome.

Alternatively spliced isoforms from the same gene can have significantly different, even antagonistic, effects. To study gene expression, researchers have looked at fragments of an organism’s genes utilizing next-generation sequencing methods, commonly referred to as RNA sequencing (RNA-seq). However, short-read RNA-seq cannot span full-length transcripts, making it difficult to accurately characterize the diverse landscape of isoforms.

Produce full-length transcripts without assembly

简单的说就是三代测序能直接把一个单倍型测穿。这就是ISOseq

The isoform sequencing (Iso-Seq) application generates full-length cDNA sequences — from the 5’ end of transcripts to the poly-A tail — eliminating the need for transcriptome reconstruction using isoform-inference algorithms. The Iso-Seq method generates accurate information about alternatively spliced exons and transcriptional start sites. It also delivers information about poly-adenylation sites for transcripts up to 10 kb in length across the full complement of isoforms within targeted genes or the entire transcriptome.

Iso-Seq的目的就是： understand transcriptome complexity using accurate, unassembled, full-length long reads.

Iso-Seq学习

实验室测序出来的数据目录结构：

Iso-Seq学习

Analysis_Results下的文件：

Iso-Seq学习

正确的数据结构如下：

注意metadata.xml文件和子目录下的bax.h5文件。

Iso-Seq学习

对于数据的处理有三种方式，一种是通过RS_isoseq SMRT portal, 一种是github code，一种是RS_isoseq 明令行。三者的主要区别如下：

The differences between the GitHub code and the RS_IsoSeq code are:

GitHub code requires you to set up a virtual environment and install all libraries on your own
GitHub code is more step-by-step and allows more flexibility
GitHub code is updated faster
GitHub code is all source code - you can modify the code as needed

The difference between the SMRT Portal version and the command-line version (pbtranscript.py) is that the command-line version additionally allows you to:

Use more CPUs than default
Directly start from the isoform-level clustering (ICE) part of RS_IsoSeq. Since v2.3.0, we have added additional entry points to the ICE/Quiver pipeline.

如果用SMRT portal 来分析数据，步骤如下：

1, getting FL reads

首先导入你的raw data，然后选择RS_IsoSeq protocol(SMRT PORTAL的版本要v2.3.0以上)

具体操作参见以前写的博客。（http://www.cnblogs.com/freemao/p/3783475.html）

Iso-seq 建库流程：

Iso-Seq学习