remove adapter

时间:2023-03-09 04:46:01
remove adapter

Although adapter and other technical sequences can potentially occur in any location within reads, by far the most common cause of adapter contamination is sequencing of a DNA fragment which is shorter than the read length. In this scenario, the beginning of the read contains valid data, but when the end of the fragment is reached, the sequencer continues to „read-through‟ into the adapter. This results in a partial or full adapter sequence towards the 3‟ end of the read. While a full adapter sequence can be identified relatively easily, reliably identifying a short partial adapter sequence is inherently difficult.

If you use FASTQC, the „Overrepresented Sequences‟ report can help indicate which adapter file is best suited for your data.

中心用的所有adapter:

TruSeq Universal Adapter
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
TruSeq Adapter, Index 1 
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 2
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 3
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 4
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 5
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 6
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 7
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 8
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 9
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 10
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAGCTTATCTCGTATGCCGTCTTCTGCTTG

TruSeq Adapter, Index 11
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGCTACATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 12
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG

TruSeq Adapter, Index 13
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 14
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 15
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATGTCAGAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 16
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 18 7
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCCGCACATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 19
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAACGATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 20
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGGCCTTATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 21
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTTTCGGAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 22
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGTACGTAATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 23
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTGGATATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 25
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACACTGATATATCTCGTATGCCGTCTTCTGCTTG
TruSeq Adapter, Index 27
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTCCTTTATCTCGTATGCCGTCTTCTGCTTG

fa格式服务器位置:

/share/bioinfo/miaochenyong/center-adapters/adapters.fa

我们的库是单端有index。一共有24个index,也就是一次最多可以同时测24个样。

序列中出现接头序列有两种可能:

1, 插入片段太短,测到另一端的接头了。

2, 文库构建的时候形成了引物而具体,测出来的都是接头的序列

对序列进行去接头处理:

用trimmomatic 的panlindrome mode,需要提供adapter的forward和reverse 信息

这个信息需要用IEM来查看。

软件说明

http://support.illumina.com/downloads/illumina-experiment-manager-user-guide-15031335.html

软件下载

http://support.illumina.com/downloads/illumina-experiment-manager-v1-11.html

看了一下,实在太麻烦了。。。直接放弃

我准备将所有可能的结果都放到adapter文件中。不就得了??

freemao

FAFU