org.Hs.eg.db包简介(转换NCBI、ensemble等数据库中基因ID,symbol等之间的转换)

时间:2023-01-18 16:23:22

1)安装载入

-------------------------------------------

if("org.Hs.eg.db" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("org.Hs.eg.db")}
suppressMessages(library(org.Hs.eg.db))

2)查看该包所有的对象

--------------------------------------------

ls("package:org.Hs.eg.db")

org.Hs.eg.db包简介(转换NCBI、ensemble等数据库中基因ID,symbol等之间的转换)

功能:可以用来进行基因ID的转换

org.Hs.egACCNUM:Map Entrez Gene identifiers to GenBank Accession Numbers(Entrez Gene identifiers 和genbank)
org.Hs.egALIAS2EG:Map between Common Gene Symbol Identifiers and Entrez Gene
org.Hs.eg.db:Bioconductor annotation data package
org.Hs.egCHR:Map Entrez Gene IDs to Chromosomes
org.Hs.egCHRLENGTHS:A named vector for the length of each of the chromosomes
org.Hs.egCHRLOC:Entrez Gene IDs to Chromosomal Location
org.Hs.egENSEMBL:Map Ensembl gene accession numbers with Entrez Gene identifiers
org.Hs.egENSEMBLPROT:Map Ensembl protein acession numbers with Entrez Gene identifiers
org.Hs.egENSEMBLTRANS:Map Ensembl transcript acession numbers with Entrez Gene identifiers
org.Hs.egENZYME:Map between Entrez Gene IDs and Enzyme Commission (EC) Numbers
org.Hs.egGENENAME:Map between Entrez Gene IDs and Genes
org.Hs.egGO:Maps between Entrez Gene IDs and Gene Ontology (GO) IDs
org.Hs.egMAP:Map between Entrez Gene Identifiers and cytogenetic:Maps/bands
org.Hs.egMAPCOUNTS Number of:Mapped keys for the:Maps in package org.Hs.eg.db
org.Hs.egOMIM:Map between Entrez Gene Identifiers and Mendelian Inheritance in Man (MIM) identifiers
org.Hs.egORGANISM:The Organism for org.Hs.eg
org.Hs.egPATH:Mappings between Entrez Gene identifiers and KEGG pathway identifiers
org.Hs.egPFAM:Maps between Manufacturer Identifiers and PFAM Identifiers
org.Hs.egPMID:Map between Entrez Gene Identifiers and PubMed Identifiers
org.Hs.egPROSITE:Maps between Manufacturer Identifiers and PROSITE Identifiers
org.Hs.egREFSEQ:Map between Entrez Gene Identifiers and RefSeq Identifiers
org.Hs.egSYMBOL:Map between Entrez Gene Identifiers and Gene Symbols
org.Hs.egUNIGENE:Map between Entrez Gene Identifiers and UniGene cluster identifiers
org.Hs.egUNIPROT:Map Uniprot accession numbers with Entrez Gene identifiers
org.Hs.eg_dbconn:Collect information about the package annotation DB

示例:

(用mget函数):
myEIDs <- c("1", "10", "100", "1000", "37690")
mySymbols <- mget(myEIDs, org.Hs.egSYMBOL, ifnotfound=NA)     ####myEID是自己的ID,org.Hs.egSYMBOL是其中的一个对象
mySymbols <- unlist(mySymbols)

(用select函数):
myEIDs <- c("ENSG00000130720", "ENSG00000103257", "ENSG00000156414")
cols <- c("SYMBOL", "GENENAME")
select(org.Hs.eg.db, keys=myEIDs, columns=cols, keytype="ENSEMBL")#生成数据框,

原理:例如将 Entrez Gene identifiers( https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) 与 GenBank accession numbers进行简单的mapping。该map依据的数据库是Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA

org.Hs.eg.db包简介(转换NCBI、ensemble等数据库中基因ID,symbol等之间的转换)

org.Hs.eg.db包简介(转换NCBI、ensemble等数据库中基因ID,symbol等之间的转换)

以DATA其中的一个gene2ensembl文件为例来感受其实如何实现的:

wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2ensembl.gz

解压后查看:

org.Hs.eg.db包简介(转换NCBI、ensemble等数据库中基因ID,symbol等之间的转换)

其中第一列是物种id,第二列是GeneID, 第三列是Ensemble_geneID,第四列是RNA_id,第五列是Ensemble_RNAid,第六列是protein_id。因此这些R包的功能极有可能就是利用NCBI或ensem等数据库中的这些文件信息,通过一系列的脚本实现了基因ID之间进行转换,因此如果对NCBI、Ensemble等网络架构熟悉的话,自己又会写脚本,就可以自己处理,而不用这些R包进行。当然别人写好了,为什么自己造*呢?自己造*是为了深刻的理解

3)各个对象的简单使用

-----------------------------------------------------------

3.1)org.Hs.egACCNUM(将Entrez Gene identifiers 与 GenBank Accession Numbers进行map

x <- org.Hs.egACCNUM  ### Bimap interface
mapped_genes <- mappedkeys(x) ## Get the entrez gene identifiers that are mapped to an ACCNUM
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5] # Get the ACCNUM for the first five genes
xx[[1]] # Get the first one
}
#For the reverse map ACCNUM2EG:
xx <- as.list(org.Hs.egACCNUM2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene identifiers for the first five Entrez Gene IDs
xx[[1]] # Get the first one
}

3.2)org.Hs.egALIAS2EG(将 Common Gene Symbol Identifiers 和 Entrez Gene进行转换)

x <- org.Hs.egACCNUM  ## Bimap interface:org.Hs.egALIAS2EG
xx <- as.list(org.Hs.egALIAS2EG)   # Convert the object to a list
xx <- xx[!is.na(xx)] # Remove pathway identifiers that do not map to any entrez gene id
if(length(xx) > 0){
xx[1:2]   # The entrez gene identifiers for the first two elements of XX
xx[[1]]   # Get the first one
}

3.3) org.Hs.egCHR (将Entrez Gene IDs 和Chromosomes进行map)

x <- org.Hs.egCHR        ## Bimap interface
mapped_genes <- mappedkeys(x) #Get entrez gene that are mapped to a chromosome
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]         # Get the CHR for the first five genes
xx[[1]]         # Get the first one
}

3.4)org.Hs.egCHRLENGTHS (每个染色体的长度)

tt <- org.Hs.egCHRLENGTHS  ## Bimap interface:
tt["1"]       # Length of chromosome 1
for (i in c(1:22,'X','Y')){print(tt[i])}   #####打印每一个染色体的长度

3.5) org.Hs.egCHRLOC (Entrez Gene IDs在Chromosomal 上的定位)

x <- org.Hs.egCHRLOC  ### Bimap interface
mapped_genes <- mappedkeys(x) #Get the entrez gene identifiers that are mapped to chromosome locations
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]   # Get the CHRLOC for the first five genes
xx[[1]]   # Get the first one
}

3.6)org.Hs.egENSEMBL (将Ensembl gene accession numbers 与 Entrez Gene identifiers进行map)

x <- org.Hs.egENSEMBL  ## Bimap interface
mapped_genes <- mappedkeys(x)# Get the entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]       # Get the Ensembl gene IDs for the first five genes
xx[[1]]   # Get the first one
}
#For the reverse map ENSEMBL2EG:
xx <- as.list(org.Hs.egENSEMBL2EG) # Convert to a list
if(length(xx) > 0){              
xx[1:5]       # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]]       # Get the first one
}

3.7) org.Hs.egENSEMBLPROT (将Ensembl protein acession numbers 和 Entrez Gene identifiers进行map)

x <- org.Hs.egENSEMBLPROT   ## Bimap interface
mapped_genes <- mappedkeys(x) #Get the entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {  
xx[1:5]   # Get the Ensembl gene IDs for the first five proteins
xx[[1]]     # Get the first one
}
#For the reverse map ENSEMBLPROT2EG:
xx <- as.list(org.Hs.egENSEMBLPROT2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]] # Get the first one
}

3.8) org.Hs.egENSEMBLTRANS (将 Ensembl transcript acession numbers 与 Entrez Gene identifiers进行mapping)

x <- org.Hs.egENSEMBLTRANS   ## Bimap interface:
mapped_genes <- mappedkeys(x) #entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]   # Get the Ensembl gene IDs for the first five proteins
xx[[1]] # Get the first one
}
#For the reverse map ENSEMBLTRANS2EG:
xx <- as.list(org.Hs.egENSEMBLTRANS2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]] # Get the first one
}

3.9)org.Hs.egGENENAME(将 Entrez Gene IDs 与 Genes进行mapping)

x <- org.Hs.egGENENAME    ## Bimap interface
mapped_genes <- mappedkeys(x) #gene names that are mapped to an entrez gene identifier
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5] # Get the GENE NAME for the first five genes
xx[[1]] # Get the first one
}

3.10)org.Hs.egGO (Entrez Gene IDs与 Gene Ontology (GO) IDs进行mapping)

x <- org.Hs.egGO  ## Bimap interface:
mapped_genes <- mappedkeys(x) # entrez gene identifiers that are mapped to a GO ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
got <- xx[[1]] # Try the first one
got[[1]][["GOID"]]
got[[1]][["Ontology"]]
got[[1]][["Evidence"]]
}
# For the reverse map:
xx <- as.list(org.Hs.egGO2EG) # Convert to a list
if(length(xx) > 0){
goids <- xx[2:3] # Gets the entrez gene ids for the top 2nd and 3nd GO identifiers
goids[[1]] # Gets the entrez gene ids for the first element of goids
names(goids[[1]]) # Evidence code for the mappings
}
# For org.Hs.egGO2ALLEGS
xx <- as.list(org.Hs.egGO2ALLEGS)
if(length(xx) > 0){

goids <- xx[2:3] # Entrez Gene identifiers for the top 2nd and 3nd GO identifiers
goids[[1]] # Gets all the Entrez Gene identifiers for the first element of goids
names(goids[[1]]) # Evidence code for the mappings
}

3.11)org.Hs.egPATH (将Entrez Gene identifiers 与KEGG pathway identifiers进行mapping)

x <- org.Hs.egPATH  ## Bimap interface:
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
xx <- as.list(org.Hs.egPATH2EG)
xx <- xx[!is.na(xx)] # Remove pathway identifiers that do not map to any entrez gene id
if(length(xx) > 0){
xx[1:2]
xx[[1]]
}

3.12)org.Hs.egREFSEQ(将Entrez Gene Identifiers 与 RefSeq Identifiers进行mapping)

x <- org.Hs.egREFSEQ
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
x <- org.Hs.egREFSEQ2EG
mapped_seqs <- mappedkeys(x)
xx <- as.list(x[mapped_seqs])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.13)org.Hs.egSYMBOL(将 Entrez Gene Identifiers 与Gene Symbols进行mapping)

x <- org.Hs.egSYMBOL
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
x <- org.Hs.egSYMBOL2EG
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.14)org.Hs.egUNIGENE (Entrez Gene Identifiers 与 UniGene cluster identifiers进行mapping)

x <- org.Hs.egUNIGENE
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
x <- org.Hs.egUNIGENE2EG
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.15)org.Hs.egUNIPROT (Uniprot accession numbers与 Entrez Gene identifiers进行mapping)

x <- org.Hs.egUNIPROT
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
 希望大家通过上述教程的解析,能够理解,基因ID,名称等之间是如何转换,并通过这些对NCBI、ensemble、pfam等数据库有相应的一定认识。

org.Hs.eg.db包简介(转换NCBI、ensemble等数据库中基因ID,symbol等之间的转换)的更多相关文章

  1. Oracle数据库中日期&sol;数字和字符之间的转换和计算

    --查出当前系统时间 select SYSDATE from table; --格式转换 -- TO_CHAR 把日期或数字转换为字符串 -- TO_CHAR(number, '格式') -- TO_ ...

  2. MFC中char&ast;&comma;string和CString之间的转换

    MFC中char*,string和CString之间的转换 一.    将CString类转换成char*(LPSTR)类型 方法一,使用强制转换.例如:  CString theString( &q ...

  3. C&num; 中List&lt&semi;T&gt&semi;与DataSet之间的转换

    p{ text-align:center; } blockquote > p > span{ text-align:center; font-size: 18px; color: #ff0 ...

  4. 【转】Android中dip&lpar;dp&rpar;与px之间单位转换

    Android中dip(dp)与px之间单位转换 dp这个单位可能对web开发的人比较陌生,因为一般都是使用px(像素)但是,现在在开始android应用和游戏后,基本上都转换成用dp作用为单位了,因 ...

  5. shell 脚本文件十六进制转化为ascii码代码&comma; Shell中ASCII值和字符之间的转换

    Shell中ASCII值和字符之间的转换     1.ASCII值转换为字符        方法一: i=97 echo $i | awk '{printf("%c", $1)}' ...

  6. Oracle中的数据类型和数据类型之间的转换

    Oracle中的数据类型 /* ORACLE 中的数据类型: char 长度固定 范围:1-2000 VARCHAR2 长度可变 范围:1-4000 LONG 长度可变 最大的范围2gb 长字符类型 ...

  7. Java中String与Date格式之间的转换

    转自:https://blog.csdn.net/angus_17/article/details/7656631 经常遇到string和date之间的转换,把相关的内容总结在这里吧: 1.strin ...

  8. Java中二进制数与整型之间的转换

    import java.io.*; public class Test{ /** * 二进制与整型之间的转换 * @param args * @throws IOException */ public ...

  9. C&plus;&plus;中GB2312字符串和UTF-8之间的转换

    在编程过程中需要对字符串进行不同的转换,特别是Gb2312和Utf-8直接的转换.在几个开源的魔兽私服中,很多都是老外开发的,而暴雪为了能 够兼容世界上的各个字符集也使用了UTF-8.在中国使用VS( ...

随机推荐

  1. Nginx中文域名配置

    Nginx虚拟主机上绑定一个带中文域名,比如linuxeye.中国,浏览器不能跳转. why? 因为操作系统的核心都是英文组成,DNS服务器的解析也是由英文代码交换,所以DNS服务器上并不支持直接的中 ...

  2. 【jQuery基础学习】05 jQuery与Ajax以及序列化

    好吧,这章不像上章那么水了,总是炒剩饭也不好. 关于AJAX 所谓Ajax,全名Asynchronous JavaScript and XML.(也就异步的JS和XML) 简单点来讲就是不刷新页面来发 ...

  3. Struts2 Convention插件的使用

    转自:http://chenjumin.iteye.com/blog/668389 1.常量说明 struts.convention.result.path="/WEB-INF/conten ...

  4. CUBRID学习笔记 32 对net的datatable的支持 cubrid教程

    在net的驱动中实现理一下的支持 DataTable data populate Built-in commands construct: INSERT , UPDATE, DELETE Column ...

  5. mybatis系列-02-mybatis框架

    2.1     mybatis是什么 MyBatis 本是apache的一个开源项目iBatis, 2010年这个项目由apache software foundation 迁移到了google co ...

  6. IOS 实现QQ好友分组展开关闭功能

    贴出核心代码  主要讲一下思路. - (void)nameBtnClick:(myButton *)sender { //获取当前点击的分组对应的section self.clickIndex = s ...

  7. BZOJ 2253&colon; &lbrack;2010 Beijing wc&rsqb;纸箱堆叠

    题目 2253: [2010 Beijing wc]纸箱堆叠 Time Limit: 30 Sec  Memory Limit: 256 MBSubmit: 239  Solved: 94 Descr ...

  8. 杂题&lowbar;POJ上的过桥问题

    本文出自:http://blog.csdn.net/svitter 过桥问题解释:一条船能够坐两个人,可是有非常多人要过河,所以送过一个人去,还有一个人还要回来接.使全部人过河之后时间最短,怎样求? ...

  9. &lbrack;CTSC2008&rsqb; 网络管理

    题目描述 Description M公司是一个非常庞大的跨国公司,在许多国家都设有它的下属分支机构或部门.为了让分布在世界各地的N个部门之间协同工作,公司搭建了一个连接整个公司的通信网络.该网络的结构 ...

  10. Linux安装mysql5&period;6

    安装mysql5.6https://www.cnblogs.com/wangdaijun/p/6132632.html