Skip to main content
All CollectionsTag and Variable Rules
Common Regular Expressions for ObservePoint
Common Regular Expressions for ObservePoint
Luiza Gircoveanu avatar
Written by Luiza Gircoveanu
Updated over 8 months ago

Overview

A regular expression finds patterns in text. It is used in ObservePoint to define Include and Exclude Lists and to define what values to look for in custom rules. Use the ObservePoint Regular Expression Tester to try out and modify the following examples or to create your own.

Simple Regular Expressions for Rules

You can use regular expressions when creating rules to look for patterns in values that are captured. Below are some common regular expressions:

Days of the Week

Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday

Timestamp, matching something like 10:12PM or 02:05AM (with preceding zero)

^(0[1-9]|10|11|12):([0-5]\d)(AM|PM)$

Timestamp, matching something like 10:12PM or 2:05AM (no preceding zero)

^([1-9]|10|11|12):([0-5]\d)(AM|PM)$

Timestamp matching 24 Hour Time

^([01]\d|2[0-3]):([0-5][0-9])

Timestamp matching something like 14:12|Saturday

^([01]\d|2[0-3]):?([0-5]\d)\|(Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)

Any 400 level codes, such as a 404 code

(4\d\d)

Default Include List

Audits have two default regular expressions for the Include List (sometimes called the Include Filter), depending on whether the Audit has only a single Starting URL defined or multiple ones defined.

For a single Starting URL, the default regular expression allows any page from any subdomain of the starting page to be eligible to be visited, like this:

Regex:

^https?://([^/:\?]*\.)?mysite.com

Note:

Matches pages from any subdomain separated by a period (.) before the primary domain. Do not use this for excluding.

Type:

Include

Valid:

http://mysite.com

Valid:

https://mysite.com

Valid:

http://www.mysite.com/home

Valid:

https://dev.mysite.com/home

Valid:

http://my.mysite.com/products/products_and_services.html

Invalid:

http://anothersite.com/

For multiple Starting URLs, the default Include List allows any valid URL to be scanned, even URLs from different primary domains:

^https?://.*

This matches any URL regardless of primary domain; every URL is valid.

Site Sections

Match on any page in the blog directory:

Regex:

/blog/

Note:

Any page with /blog/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an Audit.

Type:

Include or Exclude

Valid:

http://mysite.com/blog/

Valid:

http://mysite.com/blog/posts?id=100023

Valid:

http://mysite.com/blog/reviews/iPhoneX-review

Invalid:

http://mysite.com/myblog/iPhoneX-comparison

Match on any page in either the lifestyle or entertainment directories:

Regex:

/(lifestyle|entertainment)/

Note:

Any page with /lifestyle/ or /entertainment/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an Audit.

Type:

Include or Exclude

Valid:

http://mysite.com/entertainment/

Valid:

http://mysite.com/lifestyle/looking-for-spring.html

Valid:

http://mysite.com/articles/entertainment/movie-reviews/

Invalid:

http://mysite.com/features/out_of_entertainment_ideas.html

Product Detail Pages

Match on any product detail page:

Regex:

/p/.+\.html

Note:

For this example, the product detail page name always follow /p/ in the URL and always has the html extension. The .+ is a wildcard that looks for at least one character following the /p/ directory. The period before html is escaped with a forward slash, making it a literal character.

Type:

Include or Exclude

Valid:

http://mysite.com/p/mens-black-suit.html?id=100023

Valid:

http://mysite.com/p/100023.html?desc=mens_black_suit

Invalid:

http://mysite.com/p/mens_black_suit_1000123456

Invalid:

http://mysite.com/products/mens/1000123456.html

Match on any page two levels deep under the products directory:

Regex:

/products/.+/.+/.+

Note:

For this example, at least one of any character must be found on three levels below the product directory (blanks are not allowed). Putting this in the Exclude List prevents all pages three levels below the product directory from being visited.

Type:

Include or Exclude

Valid:

http://mysite.com/products/mens/suits/100034256.htm

Invalid:

http://mysite.com/products/mens/100034256.htm

Invalid:

http://mysite.com/products/mens/shoes/

Page Types

Match any page in the forms directory if it has an application form, indicated by a formtype parameter following the question mark:

Regex:

/forms\?type=.+&?.*

Note:

Any page with a query string parameter type. The parameter cannot be blank. If the query string has other parameters, the type parameter must be first. The forward slash (\) is an escape character that turns the question mark into a literal character instead of a regex code.

Type:

Include or Exclude

Valid:

http://mysite.com/forms?type=car_loan

Invalid:

http://mysite.com/forms?customer=1&type=car_loan&zip=94003

Invalid:

http://mysite.com/forms/personal/loans/application.html

Match any page in the forms directory where the type parameter is located anywhere in the query string:

Regex:

/forms\?(&?([\w]*=|type)=[\w]*)*

Note:

The first question mark is literal because it is escaped with a forward slash. The second question mark is part of the regex syntax, and means 0 or 1 of the preceding character. The \w indicates a word boundary and the pipe (|) means or.

Type:

Include or Exclude

Valid:

http://mysite.com/forms?type=car_loan

Valid:

http://mysite.com/forms?customer=1&type=car_loan&zip=94003

Invalid:

http://mysite.com/forms/personal/loans/application.html

Page Date Ranges

Match on any blog page posted during the month of May:

Regex:

/blog/2018/05/\d\d/.+

Note:

For this example, the blog posts are shown by date. The days of the month must be two digits each. Not typically used for excluding.

Type:

Include

Valid:

http://mysite.com/blog/2018/05/01/tesla_model_3_review

Valid:

http://mysite.com/blog/2018/05/29/test_drive_tesla_model_3

Invalid:

http://mysite.com/blog/2018/05/1/tesla_model_3_review

Invalid:

http://mysite.com/blog/2018/12/01/tesla_model_3_review

Match on any article posted in January or February:

Regex:

/articles/(jan|feb)/.+

Note:

In this example, the pipe (|) is an or operator, meaning either value is acceptable. Not typically used for excluding.

Type:

Include

Valid:

http://mysite.com/articles/jan/tesla_model_3_review

Valid:

http://mysite.com/articles/feb/tesla_model_3_review

Invalid:

http://mysite.com/articles/mar/tesla_model_3_review

Exclude a Logout Link

If the Audit is logged into any page, never access a page that will log you out:

Regex:

/logout

Note:

Any link with /logout in the path will not be visited. In these examples, valid URLs are excluded from the Audit. Not typically used for including.

Type:

Exclude

Valid:

http://mysite.com/logout/

Valid:

http://mysite.com/account/logout-instructions.html

Invalid:

http://mysite.com/help?articleid=logout-instructions

Detecting PII

To find

Use this RegEx

Example of match

Email addresses

^[\w\.=-]+@[\w\.-]+\.[\w]{2,3}$

U.S. Social Security numbers

\b(?!000|666|9\d{2})([0-8]\d{2}|7([0-6]\d))([-]?|\s{1})(?!00)\d\d\2(?!0000)\d{4}\b

513-84-7329

IPV4 addresses

^\d{1,3}[.]\d{1,3}[.]\d{1,3}[.]\d{1,3}$

192.168.1.1

Dates in MM/DD/YYYY format

^([1][12]|[0]?[1-9])[\/-]([3][01]|[12]\d|[0]?[1-9])[\/-](\d{4}|\d{2})$

05/05/2018

MasterCard numbers

^(?:5[1-5][0-9]{2}|222[1-9]|22[3-9][0-9]|2[3-6][0-9]{2}|27[01][0-9]|2720)[0-9]{12}$

5258704108753590

Visa card numbers

\b([4]\d{3}[\s]\d{4}[\s]\d{4}[\s]\d{4}|[4]\d{3}[-]\d{4}[-]\d{4}[-

]\d{4}|[4]\d{3}[.]\d{4}[.]\d{4}[.]\d{4}|[4]\d{3}\d{4}\d{4}\d{4})\b

4563-7568-5698-4587

American Express card numbers

^3[47][0-9]{13}$

34583547858682157

U.S. ZIP codes

^((\d{5}-\d{4})|(\d{5})|([A-Z]\d[A-Z]\s\d[A-Z]\d))$

97589

File paths

\\[^\\]+$

\\fs1\shared

URLs

(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+

|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

Source - Netwrix

Did this answer your question?