In practice this is used to replace all IDs in a GFF file with your desired IDs, e.g. g10 to Smp_300010 but also g10.t1 to Smp_300010.1 (without 't').
This script is for the same task as in a previous post (Perl based), but in Python.
Briefly, you have a GFF file and would like to change the gene IDs (such as g10, g100) with the paired IDs in another file (such as Smp_300010, Smp_300100), besides the whole string replacement, you also would like to change IDs such as g10.t1 with Smp_300010.1 (removing the ’t' in the middle).
Just worked out a Python solution (given the IDPAIRS, GFF, AND FILEOUT as arguments):
# usage: python Pair_Replace.py <idpair> <gff> <output>importsysimportreIDPAIRS=sys.argv[1]# tab separatedGFF=sys.argv[2]# gff fileFILEOUT=sys.argv[3]# output filed={}# make a dictionary for raw id pairsdt={}# make another dictionary with keys + '.t'withopen(IDPAIRS)asf1:forline1inf1:line1=line1.rstrip()(aug,smp)=line1.split("\t")d[aug]=smpdt[(aug+'.t')]=smpwithopen(GFF)asf2:withopen(FILEOUT,'w')asfo:forline2inf2:line2=line2.rstrip()pat1=re.compile(r'\b('+'|'.join(dt.keys())+r')')# \b(key1.t|key2.t|key...)pat2=re.compile(r'\b('+'|'.join(d.keys())+r')\b')# \b(key1|key2|key...)\bs1=pat1.sub(lambdax:dt[x.group()]+'.',line2)# replace 'g1.t1' with 'Smp_1.1's2=pat2.sub(lambdax:d[x.group()],s1)# replace 'g1' with 'Smp_1'fo.write(s2+"\n")fo.close()