Common Regular Expressions for ObservePoint

A regular expression finds patterns in text. It is used in ObservePoint to define Include and Exclude Lists and to define what values to look for in custom rules. Use the ObservePoint Regular Expression Tester to try out and modify the following examples or to create your own.

Simple Regular Expressions for Rules

You can use regular expressions when creating rules to look for patterns in values that are captured. Below are some common regular expressions:

Days of the Week
Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday

Timestamp, matching something like 10:12PM or 02:05AM (with preceding zero)
^(0[1-9]|10|11|12):([0-5]\d)(AM|PM)$

Timestamp, matching something like 10:12PM or 2:05AM (no preceding zero)
^([1-9]|10|11|12):([0-5]\d)(AM|PM)$

Timestamp matching 24 Hour Time
^([01]\d|2[0-3]):([0-5][0-9])

Timestamp matching something like 14:12|Saturday
^([01]\d|2[0-3]):?([0-5]\d)\|(Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)

Any 400 level codes, such as a 404 code
(4\d\d)

Audit Include and Exclude Examples

Here are some examples for including or excluding pages in an audit using both paths and query string parameters.

Note: Maroon text in the examples below are placeholders for your own text. For example, replace the maroon text in http://<span style="color: #993366;">mysite.com</span> with your own domain. Bold characters are regex syntax.

Default Include List

Audits have two default regular expressions for the Include List (sometimes called the Include Filter), depending on whether the audit has only a single Starting URL defined or multiple ones defined.

For a single Starting URL, the default regular expression allows any page from any subdomain of the starting page to be eligible to be visited, like this:

Regex: ^https?://([^/:\?]*\.)?mysite.com
Note: Matches pages from any subdomain separated by a period (.) before the primary domain. Do not use this for excluding. Bold characters below show optional text.
Type: Include
Valid: http://mysite.com
Valid: http<b>s</b>://mysite.com
Valid: http://www.mysite.com/home
Valid: https://dev.mysite.com/home
Valid: http://my.mysite.com/products/products_and_services.html
Invalid: http://anothersite.com/

For multiple Starting URLs, the default Include List allows any valid URL to be scanned, even URLs from different primary domains:

^https?://.*

This matches any URL regardless of primary domain; every URL is valid.

Site Sections

Match on any page in the blog directory:

Regex: /blog/
Note: Any page with  /blog/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an audit.
Type: Include or Exclude
Valid: http://mysite.com/blog/
Valid: http://mysite.com/blog/posts?id=100023
Valid: http://mysite.com/blog/reviews/iPhoneX-review
Invalid: http://mysite.com/myblog/iPhoneX-comparison

Match on any page in either the lifestyle or entertainment directories:

Regex: /(lifestyle|entertainment)/
Note: Any page with  /lifestyle/ or /entertainment/ in the path will be matched. If placed in the Exclude List, the valid URLs in the following examples will not be visited during an audit.
Type: Include or Exclude
Valid: http://mysite.com/entertainment/
Valid: http://mysite.com/lifestyle/looking-for-spring.html
Valid: http://mysite.com/articles/entertainment/movie-reviews/
Invalid: http://mysite.com/features/out_of_entertainment_ideas.html

Product Detail Pages

Match on any product detail page:

Regex: /p/.+\.html
Note: For this example, the product detail page name always follow /p/ in the URL and always has the html extension. The .+ is a wildcard that looks for at least one character following the /p/ directory. The period before html is escaped with a forward slash, making it a literal character.
Type: Include or Exclude
Valid: http://mysite.com/p/mens-black-suit.html?id=100023
Valid: http://mysite.com/p/100023.html?desc=mens_black_suit
Invalid: http://mysite.com/p/mens_black_suit_1000123456
Invalid: http://mysite.com/products/mens/1000123456.html

Match on any page two levels deep under the products directory:

Regex: /products/.+/.+/.+
Note: For this example, at least one of any character must be found on three levels below the product directory (blanks are not allowed). Putting this in the Exclude List prevents all pages three levels below the product directory from being visited.
Type: Include or Exclude
Valid: http://mysite.com/products/mens/suits/100034256.htm
Invalid: http://mysite.com/products/mens/100034256.htm
Invalid: http://mysite.com/products/mens/shoes/

Page Types

Match any page in the forms directory if it has an application form, indicated by a formtype parameter following the question mark:

Regex: /forms\?type=.+&?.*
Note: Any page with a query string parameter  type. The parameter cannot be blank. If the query string has other parameters, the type parameter must be first. The forward slash (\) is an escape character that turns the question mark into a literal character instead of a regex code.
Type: Include or Exclude
Valid: http://mysite.com/forms?type=car_loan
Invalid: http://mysite.com/forms?customer=1&type=car_loan&zip=94003
Invalid: http://mysite.com/forms/personal/loans/application.html

Match any page in the forms directory where the type parameter is located anywhere in the query string:

Regex: /forms\?(&?([\w]*=|type)=[\w]*)*
Note: The first question mark is literal because it is escaped with a forward slash. The second question mark is part of the regex syntax, and means 0 or 1 of the preceding character. The  \w indicates a word boundary and the pipe (|) means or. 
Type: Include or Exclude
Valid: http://mysite.com/forms?type=car_loan
Valid: http://mysite.com/forms?customer=1&type=car_loan&zip=94003
Invalid: http://mysite.com/forms/personal/loans/application.html

Page Date Ranges

Match on any blog page posted during the month of May:

Regex: /blog/2018/05/\d\d/.+
Note: For this example, the blog posts are shown by date. The days of the month must be two digits each. Not typically used for excluding.
Type: Include
Valid: http://mysite.com/blog/2018/05/01/tesla_model_3_review
Valid: http://mysite.com/blog/2018/05/29/test_drive_tesla_model_3
Invalid: http://mysite.com/blog/2018/05/1/tesla_model_3_review
Invalid: http://mysite.com/blog/2018/12/01/tesla_model_3_review

Match on any article posted in January or February:

Regex: /articles/(jan|feb)/.+
Note: In this example, the pipe (|) is an or operator, meaning either value is acceptable. Not typically used for excluding.
Type: Include
Valid: http://mysite.com/articles/jan/tesla_model_3_review
Valid: http://mysite.com/articles/feb/tesla_model_3_review
Invalid: http://mysite.com/articles/mar/tesla_model_3_review

Exclude a Logout Link

If the audit is logged into any page, never access a page that will log you out:

Regex: /logout
Note: Any link with  /logout in the path will not be visited. In these examples, valid URLs are excluded from the audit. Not typically used for including.
Type: Exclude
Valid: http://mysite.com/logout/
Valid: http://mysite.com/account/logout-instructions.html
Invalid: http://mysite.com/help?articleid=logout-instructions
Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.