Wednesday, June 4, 2014

filter strings with awk, extended regular expressions

I've been wanting to filter out a DNA sequence within a line of text using awk, and have been unsuccessful. I was trying:

gawk '{gsub(/[ACTG]{10,}/,""); print}'

which I expected to work, but it was not. I've found that gawk (GNU awk) has an extra setting that allows this syntax:

gawk --re-interval '{gsub(/[ACTG]{10,}/,""); print}'

This works.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.