Overview
A regular expression finds patterns in text. It is used in ObservePoint to define Include and Exclude Lists and to define what values to look for in custom rules. Use the ObservePoint Regular Expression Tester to try out and modify the following examples or to create your own.
Simple Regular Expressions for Rules
You can use regular expressions when creating rules to look for patterns in values that are captured. Below are some common regular expressions:
Days of the Week
Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday
Timestamp, matching something like 10:12PM or 02:05AM (with preceding zero)
^(0[1-9]|10|11|12):([0-5]\d)(AM|PM)$
Timestamp, matching something like 10:12PM or 2:05AM (no preceding zero)
^([1-9]|10|11|12):([0-5]\d)(AM|PM)$
Timestamp matching 24 Hour Time
^([01]\d|2[0-3]):([0-5][0-9])
Timestamp matching something like 14:12|Saturday
^([01]\d|2[0-3]):?([0-5]\d)\|(Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)
Any 400 level codes, such as a 404 code
(4\d\d)
Default Include List
Audits have two default regular expressions for the Include List (sometimes called the Include Filter), depending on whether the Audit has only a single Starting URL defined or multiple ones defined.
For a single Starting URL, the default regular expression allows any page from any subdomain of the starting page to be eligible to be visited, like this:
Regex: |
|
Note: | Matches pages from any subdomain separated by a period (.) before the primary domain. Do not use this for excluding. |
Type: | Include |
Valid: |
|
Valid: |
|
Valid: |
|
Valid: |
|
Valid: |
|
Invalid: |
|
For multiple Starting URLs, the default Include List allows any valid URL to be scanned, even URLs from different primary domains:
^https?://.*
This matches any URL regardless of primary domain; every URL is valid.
Site Sections
Match on any page in the blog directory:
Regex: |
|
Note: | Any page with /blog/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an Audit. |
Type: | Include or Exclude |
Valid: |
|
Valid: |
|
Valid: |
|
Invalid: |
|
Match on any page in either the lifestyle or entertainment directories:
Regex: |
|
Note: | Any page with /lifestyle/ or /entertainment/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an Audit. |
Type: | Include or Exclude |
Valid: |
|
Valid: |
|
Valid: |
|
Invalid: |
|
Product Detail Pages
Match on any product detail page:
Regex: |
|
Note: | For this example, the product detail page name always follow /p/ in the URL and always has the html extension. The .+ is a wildcard that looks for at least one character following the /p/ directory. The period before html is escaped with a forward slash, making it a literal character. |
Type: | Include or Exclude |
Valid: |
|
Valid: |
|
Invalid: |
|
Invalid: |
|
Match on any page two levels deep under the products directory:
Regex: |
|
Note: | For this example, at least one of any character must be found on three levels below the product directory (blanks are not allowed). Putting this in the Exclude List prevents all pages three levels below the product directory from being visited. |
Type: | Include or Exclude |
Valid: |
|
Invalid: |
|
Invalid: |
|
Page Types
Match any page in the forms directory if it has an application form, indicated by a formtype parameter following the question mark:
Regex: |
|
Note: | Any page with a query string parameter type. The parameter cannot be blank. If the query string has other parameters, the type parameter must be first. The forward slash (\) is an escape character that turns the question mark into a literal character instead of a regex code. |
Type: | Include or Exclude |
Valid: |
|
Invalid: |
|
Invalid: |
|
Match any page in the forms directory where the type parameter is located anywhere in the query string:
Regex: |
|
Note: | The first question mark is literal because it is escaped with a forward slash. The second question mark is part of the regex syntax, and means 0 or 1 of the preceding character. The \w indicates a word boundary and the pipe (|) means or. |
Type: | Include or Exclude |
Valid: |
|
Valid: |
|
Invalid: |
|
Page Date Ranges
Match on any blog page posted during the month of May:
Regex: |
|
Note: | For this example, the blog posts are shown by date. The days of the month must be two digits each. Not typically used for excluding. |
Type: | Include |
Valid: |
|
Valid: |
|
Invalid: |
|
Invalid: |
|
Match on any article posted in January or February:
Regex: |
|
Note: | In this example, the pipe (|) is an or operator, meaning either value is acceptable. Not typically used for excluding. |
Type: | Include |
Valid: |
|
Valid: |
|
Invalid: |
|
Exclude a Logout Link
If the Audit is logged into any page, never access a page that will log you out:
Regex: |
|
Note: | Any link with /logout in the path will not be visited. In these examples, valid URLs are excluded from the Audit. Not typically used for including. |
Type: | Exclude |
Valid: |
|
Valid: |
|
Invalid: |
|
Detecting PII
To find | Use this RegEx | Example of match |
Email addresses | ^[\w\.=-]+@[\w\.-]+\.[\w]{2,3}$ | |
U.S. Social Security numbers | \b(?!000|666|9\d{2})([0-8]\d{2}|7([0-6]\d))([-]?|\s{1})(?!00)\d\d\2(?!0000)\d{4}\b | 513-84-7329 |
IPV4 addresses | ^\d{1,3}[.]\d{1,3}[.]\d{1,3}[.]\d{1,3}$ | 192.168.1.1 |
Dates in MM/DD/YYYY format | ^([1][12]|[0]?[1-9])[\/-]([3][01]|[12]\d|[0]?[1-9])[\/-](\d{4}|\d{2})$ | 05/05/2018 |
MasterCard numbers | ^(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}$ | 5258704108753590 |
Visa card numbers | \b([4]\d{3}[\s]\d{4}[\s]\d{4}[\s]\d{4}|[4]\d{3}[-]\d{4}[-]\d{4}[- ]\d{4}|[4]\d{3}[.]\d{4}[.]\d{4}[.]\d{4}|[4]\d{3}\d{4}\d{4}\d{4})\b | 4563-7568-5698-4587 |
American Express card numbers | ^3[47][0-9]{13}$ | 34583547858682157 |
U.S. ZIP codes | ^((\d{5}-\d{4})|(\d{5})|([A-Z]\d[A-Z]\s\d[A-Z]\d))$ | 97589 |
File paths | \\[^\\]+$ | \\fs1\shared |
URLs | (?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+ |(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’])) |
Source - Netwrix