Regular Expression constructs

Last Updated : Feb 26, 2015 |

For more information, see the information regarding patterns at http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html.

Construct

Matches

Characters

x

The character x

\\

The backslash character

Character classes

[abc]

a, b, or c (simple class)

[^abc]

Any character except a, b, or c (negation)

[a-zA-Z]

a through z or A through Z, inclusive (range)

[a-d[m-p]]

a through d, or m through p: [a-dm-p] (union)

[a-z&&[def]]

d, e, or f (intersection)

[a-z&&[^bc]]

a through z, except for b and c: [ad-z] (subtraction)

[a-z&&[^m-p]]

a through z, and not m through p: [a-lq-z](subtraction)

Predefined character classes

.

Any character (may or may not match line terminators)

\d

A digit: [0-9]

\D

A non-digit: [^0-9]

\s

A whitespace character: [ \t\n\x0B\f\r]

\S

A non-whitespace character: [^\s]

\w

A word character: [a-zA-Z_0-9]

\W

A non-word character: [^\w]

java.lang.Character classes (simple java character type)

\p{javaLowerCase}

Equivalent to java.lang.Character.isLowerCase()

\p{javaUpperCase}

Equivalent to java.lang.Character.isUpperCase()

\p{javaWhitespace}

Equivalent to java.lang.Character.isWhitespace()

\p{javaMirrored}

Equivalent to java.lang.Character.isMirrored()

Classes for Unicode blocks and categories

\p{InGreek}

A character in the Greek block (simple block)

\p{Lu}

An uppercase letter (simple category)

\p{Sc}

A currency symbol

\P{InGreek}

Any character except one in the Greek block (negation)

[\p{L}&&[^\p{Lu}]]

Any letter except an uppercase letter (subtraction)

Boundary matchers

^

The beginning of a line

$

The end of a line

Greedy quantifiers

X?

X, once or not at all

X*

X, zero or more times

X+

X, one or more times

X{n}

X, exactly n times

X{n,}

X, at least n times

X{n,m}

X, at least n but not more than m times

Logical operators

XY

X followed by Y

X|Y

Either X or Y