使用R进行数据提取
收录于话题
各位小伙伴大家好,小果和大家又见面了,小果发现在使用GEO数据过程中经常需要利用R来处理数据,我们以GSE70493为数据为例进行数据提取。
首先我们用代码打开下载好的GEO数据,并查看该数据的格式情况
eSet <- getGEO("GSE70493",
destdir = '.',
getGPL = F)
eSet = data.frame(eSet)
view(eSet)
数据格式如下:
数据提取
1
提取数据表第一行第一列的数据
> eSet[1,1]
[1] 12.74828
2
提取数据表第八行与第二列的数据
> eSet[8,2]
[1] 11.92521
3
提取数据表第一列的所有数据
> eSet[,1]
[1] 12.74828 12.51457 12.36111 12.34661 12.26977 11.72063 12.44381 12.76442 12.45263 12.45226 12.56951 12.34902
[13] 12.09763 12.02630 11.93051 12.61690 12.32644 12.21138 12.48559 12.50485 12.49083 12.58834 12.27134 12.07632
[25] 12.30095 12.47895 12.42524 11.75825 11.95987 12.13507 12.03302 12.67437 11.94832 12.31930 12.30004 12.45438
[37] 12.57792 12.37737 12.09112 12.54679 11.98876 12.25262 12.81430 12.56049 12.46378 12.31240 12.14513 12.63002
[49] 12.85054 12.29841 12.59145 11.96589 12.18872 12.36106 12.69952 12.94856 12.66841 11.99111 11.97198 12.65475
[61] 12.25758 12.61998 12.09979
代码片段:可切换语言,无法单独设置文字格式
4
提取数据表第一行至第八行与第一至二列的数据
> eSet[1:8,1:2]
GSE70493_series_matrix.txt.gz.X2824546_st GSE70493_series_matrix.txt.gz.X2824549_st
GSM1784987 12.74828 11.99485
GSM1784988 12.51457 11.75353
GSM1784989 12.36111 11.62739
GSM1784990 12.34661 11.56868
GSM1784991 12.26977 11.42177
GSM1784993 11.72063 10.65466
GSM1784994 12.44381 11.50969
GSM1784995 12.76442 11.92521
5
提取数据表第一行至第三行和第一至第三列的数据
> eSet[1:3,1:3]
GSE70493_series_matrix.txt.gz.X2824546_st GSE70493_series_matrix.txt.gz.X2824549_st
GSM1784987 12.74828 11.99485
GSM1784988 12.51457 11.75353
GSM1784989 12.36111 11.62739
GSE70493_series_matrix.txt.gz.X2824551_st
GSM1784987 12.13391
GSM1784988 11.93643
GSM1784989 11.79148
6
提取数据表的列名
> rownames(eSet)
[1] "GSM1784987" "GSM1784988" "GSM1784989" "GSM1784990" "GSM1784991" "GSM1784993" "GSM1784994" "GSM1784995"
[9] "GSM1784996" "GSM1784997" "GSM1784998" "GSM1784999" "GSM1785000" "GSM1785001" "GSM1785003" "GSM1785004"
[17] "GSM1785005" "GSM1785006" "GSM1785007" "GSM1785008" "GSM1785009" "GSM1785011" "GSM1785012" "GSM1785013"
[25] "GSM1785014" "GSM1785015" "GSM1785016" "GSM1785017" "GSM1785018" "GSM1785020" "GSM1785021" "GSM1785022"
[33] "GSM1785023" "GSM1785024" "GSM1785025" "GSM1785027" "GSM1785028" "GSM1785029" "GSM1785030" "GSM1785031"
[41] "GSM1785032" "GSM1785033" "GSM1785035" "GSM1785036" "GSM1785037" "GSM1785038" "GSM1785039" "GSM1785040"
[49] "GSM1785041" "GSM1785042" "GSM1785044" "GSM1785045" "GSM1785046" "GSM1785047" "GSM1785048" "GSM1785049"
[57] "GSM1785050" "GSM1785052" "GSM1785053" "GSM1785054" "GSM1785055" "GSM1785056" "GSM1785057"
7
提取数据表的行名
> colnames(eSet)
[1] "GSE70493_series_matrix.txt.gz.X2824546_st" "GSE70493_series_matrix.txt.gz.X2824549_st"
[3] "GSE70493_series_matrix.txt.gz.X2824551_st" "GSE70493_series_matrix.txt.gz.X2824554_st"
[5] "GSE70493_series_matrix.txt.gz.X2827992_st" "GSE70493_series_matrix.txt.gz.X2827995_st"
[7] "GSE70493_series_matrix.txt.gz.X2827996_st" "GSE70493_series_matrix.txt.gz.X2828010_st"
[9] "GSE70493_series_matrix.txt.gz.X2828012_st" "GSE70493_series_matrix.txt.gz.X2835442_st"
[11] "GSE70493_series_matrix.txt.gz.X2835447_st" "GSE70493_series_matrix.txt.gz.X2835453_st"
[13] "GSE70493_series_matrix.txt.gz.X2835456_st" "GSE70493_series_matrix.txt.gz.X2835459_st"
[15] "GSE70493_series_matrix.txt.gz.X2835461_st" "GSE70493_series_matrix.txt.gz.X2839509_st"
[17] "GSE70493_series_matrix.txt.gz.X2839511_st" "GSE70493_series_matrix.txt.gz.X2839513_st"
[19] "GSE70493_series_matrix.txt.gz.X2839515_st" "GSE70493_series_matrix.txt.gz.X2839517_st"
[21] "GSE70493_series_matrix.txt.gz.X2839524_st" "GSE70493_series_matrix.txt.gz.X2839528_st"
[23] "GSE70493_series_matrix.txt.gz.X2839532_st" "GSE70493_series_matrix.txt.gz.X2839538_st"
[25] "GSE70493_series_matrix.txt.gz.X2839539_st" "GSE70493_series_matrix.txt.gz.X2858288_st"
以上为使用R进行数据的简单提取。
好了,以上就是小果的分享,是不是干货满满呢,小伙伴快去试试吧。
推荐阅读