R: use for loop to build a data frame

I wanted to test correlations between biological replicates of several samples in a ChIP-Seq experiment. Output better to be data frame with different features and samples.

I have read counts of different histone marks for each biological replicate of different samples. I would like to test the correlations and output briefly the Pearson’s r values.

So the approache is to build a vector for each sample, and feed the cor values through a for loop.

# define an empty vector
bOcor<-vector("numeric", 8)
sOcor<-vector("numeric", 8)

for (i in 1:8) {
  bOcor[i]<-cor(Ov.combined[, paste0("bO.", marks[i], "_A")], Ov.combined[, paste0("bO.", marks[i], "_B")])
  sOcor[i]<-cor(Ov.combined[, paste0("sO.", marks[i], "_A")], Ov.combined[, paste0("sO.", marks[i], "_B")])

Then combine the vectors into a data frame:

cordata<-data.frame(bOcor, sOcor, row.names = marks)

Now we can check the data:

# transpose the matrix

        H3K23ac  H3K27me3  H3K36me3   H3K4me3    H3K9ac   H3K9me3   H4K12ac  H4K20me1
bOcor 0.9969142 0.9595674 0.9167808 0.5551019 0.9923818 0.9842282 0.9986214 0.9976846
sOcor 0.9778408 0.9804653 0.9774632 0.9726195 0.9820424 0.9501024 0.9907387 0.9665508

There is no special tip but just to refresh my brain. It’s also worth knowing that when using paste to get the column from a data frame it’s better to use df[, i] instead of df$i as with “$” will not be recognised as numeric by the cor function.

Z. Lu avatar
Z. Lu
Parasite, bioinfo, omics, scripting, data science.
comments powered by Disqus