Common Regular Expressions for ObservePoint

Overview

A regular expression finds patterns in text. It is used in ObservePoint to define Include and Exclude Lists and to define what values to look for in custom rules. Use the ObservePoint Regular Expression Tester to try out and modify the following examples or to create your own.

Simple Regular Expressions for Rules

You can use regular expressions when creating rules to look for patterns in values that are captured. Below are some common regular expressions:

Days of the Week

Timestamp, matching something like 10:12PM or 02:05AM (with preceding zero)

^(0[1-9]|10|11|12):([0-5]\d)(AM|PM)$

Timestamp, matching something like 10:12PM or 2:05AM (no preceding zero)

^([1-9]|10|11|12):([0-5]\d)(AM|PM)$

Timestamp matching 24 Hour Time

^([01]\d|2[0-3]):([0-5][0-9])

Timestamp matching something like 14:12|Saturday

Any 400 level codes, such as a 404 code

(4\d\d)

Default Include List

Audits have two default regular expressions for the Include List (sometimes called the Include Filter), depending on whether the Audit has only a single Starting URL defined or multiple ones defined.

For a single Starting URL, the default regular expression allows any page from any subdomain of the starting page to be eligible to be visited, like this:

Regex:	`^https?://([^/:\?]*\.)?mysite.com`
Note:	Matches pages from any subdomain separated by a period (.) before the primary domain. Do not use this for excluding.
Type:	Include
Valid:	`http://mysite.com`
Valid:	`https://mysite.com`
Valid:	`http://www.mysite.com/home`
Valid:	`https://dev.mysite.com/home`
Valid:	`http://my.mysite.com/products/products_and_services.html`
Invalid:	`http://anothersite.com/`

For multiple Starting URLs, the default Include List allows any valid URL to be scanned, even URLs from different primary domains:

^https?://.*

This matches any URL regardless of primary domain; every URL is valid.

Site Sections

Match on any page in the blog directory:

Regex:	`/blog/`
Note:	Any page with /blog/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an Audit.
Type:	Include or Exclude
Valid:	`http://mysite.com/blog/`
Valid:	`http://mysite.com/blog/posts?id=100023`
Valid:	`http://mysite.com/blog/reviews/iPhoneX-review`
Invalid:	`http://mysite.com/myblog/iPhoneX-comparison`

Match on any page in either the lifestyle or entertainment directories:

Regex:	`/(lifestyle\|entertainment)/`
Note:	Any page with /lifestyle/ or /entertainment/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an Audit.
Type:	Include or Exclude
Valid:	`http://mysite.com/entertainment/`
Valid:	`http://mysite.com/lifestyle/looking-for-spring.html`
Valid:	`http://mysite.com/articles/entertainment/movie-reviews/`
Invalid:	`http://mysite.com/features/out_of_entertainment_ideas.html`

Product Detail Pages

Match on any product detail page:

Regex:	`/p/.+\.html`
Note:	For this example, the product detail page name always follow /p/ in the URL and always has the html extension. The .+ is a wildcard that looks for at least one character following the /p/ directory. The period before html is escaped with a forward slash, making it a literal character.
Type:	Include or Exclude
Valid:	`http://mysite.com/p/mens-black-suit.html?id=100023`
Valid:	`http://mysite.com/p/100023.html?desc=mens_black_suit`
Invalid:	`http://mysite.com/p/mens_black_suit_1000123456`
Invalid:	`http://mysite.com/products/mens/1000123456.html`

Match on any page two levels deep under the products directory:

Regex:	`/products/.+/.+/.+`
Note:	For this example, at least one of any character must be found on three levels below the product directory (blanks are not allowed). Putting this in the Exclude List prevents all pages three levels below the product directory from being visited.
Type:	Include or Exclude
Valid:	`http://mysite.com/products/mens/suits/100034256.htm`
Invalid:	`http://mysite.com/products/mens/100034256.htm`
Invalid:	`http://mysite.com/products/mens/shoes/`

Page Types

Match any page in the forms directory if it has an application form, indicated by a formtype parameter following the question mark:

Regex:	`/forms\?type=.+&?.*`
Note:	Any page with a query string parameter type. The parameter cannot be blank. If the query string has other parameters, the type parameter must be first. The forward slash (\) is an escape character that turns the question mark into a literal character instead of a regex code.
Type:	Include or Exclude
Valid:	`http://mysite.com/forms?type=car_loan`
Invalid:	`http://mysite.com/forms?customer=1&type=car_loan&zip=94003`
Invalid:	`http://mysite.com/forms/personal/loans/application.html`

Match any page in the forms directory where the type parameter is located anywhere in the query string:

Regex:	`/forms\?(&?([\w]=\|type)=[\w])*`
Note:	The first question mark is literal because it is escaped with a forward slash. The second question mark is part of the regex syntax, and means 0 or 1 of the preceding character. The \w indicates a word boundary and the pipe (\|) means or.
Type:	Include or Exclude
Valid:	`http://mysite.com/forms?type=car_loan`
Valid:	`http://mysite.com/forms?customer=1&type=car_loan&zip=94003`
Invalid:	`http://mysite.com/forms/personal/loans/application.html`

Page Date Ranges

Match on any blog page posted during the month of May:

Regex:	`/blog/2018/05/\d\d/.+`
Note:	For this example, the blog posts are shown by date. The days of the month must be two digits each. Not typically used for excluding.
Type:	Include
Valid:	`http://mysite.com/blog/2018/05/01/tesla_model_3_review`
Valid:	`http://mysite.com/blog/2018/05/29/test_drive_tesla_model_3`
Invalid:	`http://mysite.com/blog/2018/05/1/tesla_model_3_review`
Invalid:	`http://mysite.com/blog/2018/12/01/tesla_model_3_review`

Match on any article posted in January or February:

Regex:	`/articles/(jan\|feb)/.+`
Note:	In this example, the pipe (\|) is an or operator, meaning either value is acceptable. Not typically used for excluding.
Type:	Include
Valid:	`http://mysite.com/articles/jan/tesla_model_3_review`
Valid:	`http://mysite.com/articles/feb/tesla_model_3_review`
Invalid:	`http://mysite.com/articles/mar/tesla_model_3_review`

Exclude a Logout Link

If the Audit is logged into any page, never access a page that will log you out:

Regex:	`/logout`
Note:	Any link with /logout in the path will not be visited. In these examples, valid URLs are excluded from the Audit. Not typically used for including.
Type:	Exclude
Valid:	`http://mysite.com/logout/`
Valid:	`http://mysite.com/account/logout-instructions.html`
Invalid:	`http://mysite.com/help?articleid=logout-instructions`

Detecting PII

To find	Use this RegEx	Example of match
Email addresses	^[\w\.=-]+@[\w\.-]+\.[\w]{2,3}$	[email protected]
U.S. Social Security numbers	\b(?!000\|666\|9\d{2})([0-8]\d{2}\|7([0-6]\d))([-]?\|\s{1})(?!00)\d\d\2(?!0000)\d{4}\b	513-84-7329
IPV4 addresses	^\d{1,3}[.]\d{1,3}[.]\d{1,3}[.]\d{1,3}$	192.168.1.1
Dates in MM/DD/YYYY format	^([1][12]\|[0]?[1-9])[\/-]([3][01]\|[12]\d\|[0]?[1-9])[\/-](\d{4}\|\d{2})$	05/05/2018
MasterCard numbers	^(?:5[1-5][0-9]{2}\|222[1-9]\|22[3-9][0-9]\|2[3-6][0-9]{2}\|27[01][0-9]\|2720)[0-9]{12}$	5258704108753590
Visa card numbers	\b([4]\d{3}[\s]\d{4}[\s]\d{4}[\s]\d{4}\|[4]\d{3}[-]\d{4}[-]\d{4}[- ]\d{4}\|[4]\d{3}[.]\d{4}[.]\d{4}[.]\d{4}\|[4]\d{3}\d{4}\d{4}\d{4})\b	4563-7568-5698-4587
American Express card numbers	^3[47][0-9]{13}$	34583547858682157
U.S. ZIP codes	^((\d{5}-\d{4})\|(\d{5})\|([A-Z]\d[A-Z]\s\d[A-Z]\d))$	97589
File paths	\\[^\\]+$	\\fs1\shared
URLs	(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}\|[a-z0-9%])\|www\d{0,3}[.]\|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+\|$([^\s()<>]+\|(\([^\s()<>]+$))\))+(?:$([^\s()<>]+ \|(\([^\s()<>]+$))\)\|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))	www.netwrix.com

Source - Netwrix

Create Tag & Variable Rules

OP Labs SDR Help Documents

OP Custom Tag - Unsecured Content

OP Custom Tag - Unsecured Content v2

Audit Filters for Initial/Final URLs & Status Codes