Featured image of post Printing lines with sliding windows in bash

Printing lines with sliding windows in bash

This is originally from the work that I was trying to find enriched functional clusters on chromosomes. I made 5-gene sliding 50-gene blocks and tested functional enrichment in each block.

Figure from Medium

I would like to print genes in each sliding window into a new file, so that I can make test on each one.

For a simple example,

1
for i in {1..22}; do echo $i; done > testfile

I would like to print sliding 5 elements: window size = 5 and sliding size =2, eg.

1 2 3 4 5
    3 4 5 6 7
        5 6 7 8 9

For the above 22 elements, I will need to print 1-5, 3-7, 5-9, 7-11, 9-13, 11-15, 13-17, 15-19, 17-21 in total 9 sliding windows, which equals (22-5)%2 + 1. We need to go through a for loop to print elements in each sliding window:

The first line is i, then the last line is i+4

Below is a bash script to do this, where I need to define variables for window size, sliding size, etc

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/bin/bash

declare -i WINSIZE=5
declare -i SLIDING=2
NELEM="$(wc -l $1 | awk '{print $1}')"
NSLIDES=$(((NELEM - WINSIZE) / SLIDING))


for ((i=0; i<=NSLIDES;i++)); do
        START=$((1+$i*SLIDING))
        END=$((START+(WINSIZE-1))) # not considering the last 4 or less genes
#       echo $TMPV $START $END
        awk -v start="$START" -v end="$END" 'NR>=start && NR<=end' $1 | tr '\n' ' ' | awk '{print $0 "\t"}'
done

If we do

./test.sh testfile

Then gives:

1
2
3
4
5
6
7
8
9
1 2 3 4 5
3 4 5 6 7
5 6 7 8 9
7 8 9 10 11
9 10 11 12 13
11 12 13 14 15
13 14 15 16 17
15 16 17 18 19
17 18 19 20 21
comments powered by Disqus
CC-BY-NC 4.0
Built with Hugo Theme Stack