Skip to main content

Audit Include & Exclude Lists

Written by Product Enablement

Overview

Audit Include and Exclude lists define what pages to scan or not scan after the Starting URLs. Any item in the Include List restricts the scan to only the pages that match that item. Any item in an Exclude List prevents any pages that match that item from being scanned.

Note: Sometimes these lists are also referred to as filters. The Include and Exclude lists can be full URLs, partial URLs, or regular expressions that match a valid page.

Order of Precedence

Starting URLs take precedence over everything else and will always be visited during an Audit, even if a URL matches an item in the Exclude List. Starting URLs are always visited before any other URLs.

Exclude URLs override any URL in the Include field and eliminates them from eligibility.

Include URLs must be found on a starting page, otherwise, they cannot be discovered and won't be visited.

Starting URLs

The Starting URL list can be one or more URLs that are always visited before any other URLs. Any links discovered from the starting pages are eligible to be visited, subject to the Include and Exclude filters. If an Exclude item matches a Starting URL, it will be ignored.

Include List

The Include List limits what pages are eligible to be scanned during an Audit. It can be a fully qualified or partial URL, or regular expression matching a full or partial URL.

Adding any URL or partial URL automatically limits what pages are eligible to be scanned in the Audit. However, there is no guarantee that all the eligible pages or directories listed will actually be visited.

Default Include Filter

The default Include List allows any page from the primary domain of the Starting URL to be scanned, including subdomains. By default it is a modified version of the Starting URL:

^https?://([^/:\?]*\.)?examplesite.com([^.]|$)

The Include List can contain exact URLs, partial URLs, or regular expressions.

Typically you won't change anything in this box unless you want to direct your Audits to specific areas of the site. In that case, replace the default value with the directories that you want the Audit to scan.

You can also use this to perform cross-domain auditing where you need to start the Audit on one domain and end on another. To do this, type in the domains you want to traverse. For any Include List URLs to be found, they must be discovered on a page that is audited.

For complex URL patterns, use ObservePoint's regular expression tester.

Also, refer to the Regular Expressions document for common pattern matching use cases.

Exclude List

The Exclude List prevents URLs from being audited. You may use exact URLs, partial URLs, or regular expressions, just as you would in the Include List. Any URL that matches an item in the Exclude List will not be visited unless it is expressly defined in the Starting URL field.

Note: The expressions specified in the Exclude List are applicable exclusively to the Initial URL and do not affect the Final URL in case of redirection. Please ensure that any exclusions are configured with consideration to the Initial URL, as redirects may alter the path or structure after the initial request.

Overlapping Rules & Page Limits

It is possible to set up multiple rules that a given page URL matches. For example, if you scan example.com, and you have the following rules:

  • Rule 1: example.com: limit of 10 pages

  • Rule 2: example.com/store: limit of 2 pages

Which rule does example.com/store/123 match?

This page will apply to Rule 2 example.com/store because it is longer.

In general, when a page triggers multiple overlapping rules, it will count toward the rule with the longest character length (the most specific rule).

More examples below:

Example:

  • Rule 1: /blog/ (Limit: 100 pages) - Length: 6 characters

  • Rule 2: /blog/news/ (Limit: 50 pages) - Length: 11 characters

  • The Scenario: The Audit visits the page https://example.com/blog/news/latest-update.

  • Result: Because both rules match but Rule 2 is longer (more specific), this page will count toward the 50-page limit of Rule B. It will not consume a slot in Rule A's 100-page limit.

How do I use overlapping filters and still apply a limit?

You must have a non-zero limit on all of the overlapping rules. See the following examples:

Bad Example (overlapping rules with no limit):

  • Rule 1: /products/ (Limit: 0 / Unlimited)

  • Rule 2: /products/appliances/ (Limit: 200 pages)

  • Result: Even though Rule 2 matches, since Rule 1 has a limit of zero, the URL does not contribute to the 200-page limit of Rule 2.

Good Example (no unlimited rules):

  • Rule 1: /products/ (Limit: 800)

  • Rule 2: /products/appliances/ (Limit: 200 pages)

  • Result: The audit will count the page toward Rule 2 (200-page limit), because it is more specific than Rule 1, and neither Rule is unlimited.

  • Result: The audit will visit a maximum of 1,000 pages, with a maximum of 200 pages from /products/appliances.

Did this answer your question?