How to generate sequential identifiers

Different ways to generate sequential numbers / ids, including using Excel, Vim, and shell command.

After annotating the genome, you probably want to assign new identifiers to some novel genes, and sometimes thousands of theme, like this:

Smp_300010
Smp_300020
Smp_300030
Smp_300040
Smp_300050
Smp_300060
Smp_300070
Smp_300080
Smp_300090
Smp_300100
......
Smp_333600

I have come up to several ways to archive this:

Excel autofill function

This is probably the most common and easy way, even for people without any scripting experience. Using the ‘magic drag’ is easy for hundreds of identifiers, but will take some time create and copy thousands.

Vim Ctrl-A + recording function

  • Step 1: enter “Smp_30001” in the first line
  • Step 2: “Esc” back to the normal view; “qa” to start recording to a key
  • Step 3: “Y” to copy the whole line, and “p” to paste to the next line
  • Step 4: Ctrl-A to increase the number, now becomes “Smp_30002”;
  • Step 5: “q” to finish recording
  • Step 6: using or “3360@a” to repeat the “a” record and generate numbers of identifiers
  • Step 7: “:%s/$/0/g” to add “0” at each line end

Shell seq command

This is the easiest way using one command:

seq 30001 33360 | awk '{print "Smp_" $1 "0"}'
Z. Lu avatar
Z. Lu
Computer biologist, amature photographer, vintage fan and web lover.
comments powered by Disqus