My vtclean project [1] cleans terminal escape sequences. It handles the full ECMA-48 spec, and every behavior described by all of the vt100 documentation I could find, and observed behavior of many command line programs and terminals.
It has good test coverage, fails safe (if it doesn't understand a sequence, it still strips the escape character), and can preserve colors in cleaned up text. It is directly and indirectly used by quite a few projects. It receives extremely few bug reports, because I took the time to do it right.
If you want a good way to do this, probably start here. It's three short regular expressions and a simple state machine (to clean up single-line edits).
My only complaint about the code at this point is it operates on string instead of []byte, but I don't want to break the API and nobody has complained.
It could be interesting to extract this to a context-free grammar that could be implemented in different languages.
There are many reasons users might not want/be able to use Go, or any other language this might be published in. Given the terminal's provenance I can't help but be absolutely certain (despite having no substantive evidence) a lot of people working with C in Interesting™ situations would love something like this, for example.
(I must admit I skimmed it so fast (on my phone) I didn't realize the comment about RGB colors was only referring to the line directly underneath... and then didn't stop to think "RGB can't need all that", woops.)
The nice thing is this is definitely simple enough to be able to just directly port somewhere else, which I guess would make extracting a higher level representation premature-optimization-level superfluous complexity.
Thanks for writing this! I used it a few years ago to remove the ANSI output from HP/Aruba network devices that clear your screen or line which was a nightmare for scripting. It works like a dream.
It has good test coverage, fails safe (if it doesn't understand a sequence, it still strips the escape character), and can preserve colors in cleaned up text. It is directly and indirectly used by quite a few projects. It receives extremely few bug reports, because I took the time to do it right.
If you want a good way to do this, probably start here. It's three short regular expressions and a simple state machine (to clean up single-line edits).
My only complaint about the code at this point is it operates on string instead of []byte, but I don't want to break the API and nobody has complained.
[1] https://github.com/lunixbochs/vtclean