Regex... Magic
Regex... Magic
What is regex?
Regex
Regex (Regular Expressions) is a sequence of characters that creates a search pattern on a line or block of text. [1]
So why would you want to use it?
Regex has a few great uses that could actually optimize your workflow when you are doing menial tasks in an editor like VS Code or any other tool that has text in it. There are a few very useful use cases that don't involve validation on user inputs.
Here are a few use cases that i have experienced over the years.
My use cases
-
Filtering down massive error logs to find out what was not catered for when doing data imports. 500K lines of JSON error Logs per batch each log representing about 100K entries or something and we were importing a few Million customers worth of data. (If you are really interested about the context you can email me.)
TL:DR: We did a bunch of validation of all of the data to make sure we can import it into our system.- this would check a JSON error log for certain error messages after a data import. I used this to remove the ones I did not want to see anymore.
- regex = \{.*\n(^.*\n){5,6}.*\]\n\}(\n|)|^.(append error message here with pipe).\n
- This does three things:
- First it checks for Json blocks that contain no Error Messages and selects the whole block.
- Secondly it excludes full Json blocks that have Error messages in it that you have not filtered out.
- Lastly it selects the error messages you provide inside the brackets seperated by pipes.
- Example: (Regex = \{.*\n(^.*\n){5,6}.*]\n}(\n|)|^.*(asdasdds|asdasssdasda).*\n ) (You can beautify this in your editor to use the regex.)
JSON [ { "something": "", "asdasd": "", "asdasd": "", "asdasd": "", "asdasd": "", "ImportantErrorMessage": [ "asdasssdasda", "asdasdds" ] }, { "something": "", "asdasd": "", "asdasd": "", "asdasd": "", "asdasd": "", "ImportantErrorMessage": [ "asdasdds" ] }, { "something": "", "asdasd": "", "asdasd": "", "asdasd": "", "asdasd": "", "ImportantErrorMessage": [ "asdasssdasdasd", ] }, { "something": "", "asdasd": "", "asdasd": "", "asdasd": "", "asdasd": "", "ImportantErrorMessage": [] } ]
-
Pipe Matching multiple terms in one document/line
- regex for one line = ^.*(one|two|three).*$
- regex for document = one|two|three
-
Removing double spaces and empty lines
- regex = ([ ]{2,})|^\n|\n^$
-
Finding tabs in yaml pipelines... yaml doesn't like tabs...
- Believe me this is a quick diagnosis that can save you a bunch of time.
- regex = \t
Other use Cases
- Validation on user inputs
- Text transformations
- Text Debugging
- Web Scraping
- JSON Validation
- Scrubbing sensitive data from logs/seeding data
- Code Obfuscation (Pretty cool)
- Be a Hacker
Who should use it?
Well the quick answer is everyone... If you are doing something that is menial and boring with a find and replace try and use regex instead, it might speed up your menial task or slow it down depending on what you are doing but in the end it will be either more efficient doing it this way or it might be a bit more entertaining than doing a ctrl+f.
The other batch of people this is relevant to is people who validate logs and look for patterns in large data texts to figure out WTF is going on.
Believe me when I say this is an investment into you career and can help you level up. Taking the extra time to write your searches can help this become second nature and there are endless use cases for it. And in the end it will save you more time than it will waste.
Ok so you wana make a game of it?
Fine I will entertain that short attention span of yours... There are these things called pattern searching algorithms; Regex is a good way of teaching you to recognise these types of things. You can go learn about that over here[2]. By the by, this website is awesome and you can learn a bunch from it. Check out their puzzle section for a challenge.
Cool so how do I learn it?
Here are a few helpful resources that can get you started.
My personal favorite for testing and verifying Regex: Regex101.
The rest are:
And here are some pattern cheat sheets: