- Java Regex Cheat Sheet Cheat
- Java Regex Cheat Sheet
- Java Regex Cheat Sheet Pdf
- Regex Cheat Sheet Pdf
- Reg Expression Cheat Sheet
- Java Regex Cheat Sheet For Beginners
- Java Design Pattern Cheat Sheet
- Java Tutorial
Regex cheat sheet of all shortcuts and commands. Match a single white space character (space, tab, form feed, or line feed). Regular Expression to Regular expression for valid Java variable names, Does not exclude Java reserved words. Regular expression for valid Java variable names, Does not exclude Java reserved words Comments. Character classes. Any character except newline w. Regular Expressions Cheat Sheet for Python, PHP, Perl, JavaScript and Ruby developers. The list of the most important metacharacters you'll ever need.
This cheat sheet lists a series of XSS attacks that can be used to bypass certain XSS defensive filters. Please note that input filtering is an incomplete defense for XSS which these tests can be used to illustrate.
- Java Object Oriented
- Java Advanced
- Java Useful Resources
- Selected Reading
Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
The java.util.regex package primarily consists of the following three classes −
- Pattern Class − A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile() methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
- Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.
- PatternSyntaxException − A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.
Capturing Groups
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters 'd', 'o', and 'g'.
Capturing groups are numbered by counting their opening parentheses from the left to the right. In the expression ((A)(B(C))), for example, there are four such groups −
- ((A)(B(C)))
- (A)
- (B(C))
- (C)
To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher's pattern.
There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.
Example
Following example illustrates how to find a digit string from the given alphanumeric string −
This will produce the following result −
Output
Regular Expression Syntax
Here is the table listing down all the regular expression metacharacter syntax available in Java −
Subexpression | Matches |
---|---|
^ | Matches the beginning of the line. |
$ | Matches the end of the line. |
. | Matches any single character except newline. Using m option allows it to match the newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets. |
A | Beginning of the entire string. |
z | End of the entire string. |
Z | End of the entire string except allowable final line terminator. |
re* | Matches 0 or more occurrences of the preceding expression. |
re+ | Matches 1 or more of the previous thing. |
re? | Matches 0 or 1 occurrence of the preceding expression. |
re{ n} | Matches exactly n number of occurrences of the preceding expression. |
re{ n,} | Matches n or more occurrences of the preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of the preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers the matched text. |
(?: re) | Groups regular expressions without remembering the matched text. |
(?> re) | Matches the independent pattern without backtracking. |
w | Matches the word characters. |
W | Matches the nonword characters. |
s | Matches the whitespace. Equivalent to [tnrf]. |
S | Matches the nonwhitespace. |
d | Matches the digits. Equivalent to [0-9]. |
D | Matches the nondigits. |
A | Matches the beginning of the string. |
Z | Matches the end of the string. If a newline exists, it matches just before newline. |
z | Matches the end of the string. |
G | Matches the point where the last match finished. |
n | Back-reference to capture group number 'n'. |
b | Matches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets. |
B | Matches the nonword boundaries. |
n, t, etc. | Matches newlines, carriage returns, tabs, etc. |
Q | Escape (quote) all characters up to E. |
E | Ends quoting begun with Q. |
Methods of the Matcher Class
Here is a list of useful instance methods −
Index Methods
Index methods provide useful index values that show precisely where the match was found in the input string −
Sr.No. | Method & Description |
---|---|
1 | public int start() Returns the start index of the previous match. |
2 | public int start(int group) Returns the start index of the subsequence captured by the given group during the previous match operation. |
3 | public int end() Returns the offset after the last character matched. |
4 | public int end(int group) Returns the offset after the last character of the subsequence captured by the given group during the previous match operation. |
Study Methods
Study methods review the input string and return a Boolean indicating whether or not the pattern is found −
Sr.No. | Method & Description |
---|---|
1 | public boolean lookingAt() Attempts to match the input sequence, starting at the beginning of the region, against the pattern. |
2 | public boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern. |
3 | public boolean find(int start) Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index. |
4 | public boolean matches() Attempts to match the entire region against the pattern. |
Replacement Methods
Replacement methods are useful methods for replacing text in an input string −
Sr.No. | Method & Description |
---|---|
1 | public Matcher appendReplacement(StringBuffer sb, String replacement) Implements a non-terminal append-and-replace step. |
2 | public StringBuffer appendTail(StringBuffer sb) Implements a terminal append-and-replace step. |
3 | public String replaceAll(String replacement) Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. |
4 | public String replaceFirst(String replacement) Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string. |
5 | public static String quoteReplacement(String s) Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. |
The start and end Methods
Following is the example that counts the number of times the word 'cat' appears in the input string −
Example
This will produce the following result −
Output
You can see that this example uses word boundaries to ensure that the letters 'c' 'a' 't' are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred.
The start method returns the start index of the subsequence captured by the given group during the previous match operation, and the end returns the index of the last character matched, plus one.
The matches and lookingAt Methods
The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingAt does not.
Both methods always start at the beginning of the input string. Here is the example explaining the functionality −
Example
This will produce the following result −
Output
The replaceFirst and replaceAll Methods
The replaceFirst and replaceAll methods replace the text that matches a given regular expression. As their names indicate, replaceFirst replaces the first occurrence, and replaceAll replaces all occurrences.
Here is the example explaining the functionality −
Example
This will produce the following result −
Output
The appendReplacement and appendTail Methods
The Matcher class also provides appendReplacement and appendTail methods for text replacement.
Here is the example explaining the functionality −
Example
This will produce the following result −
Output
PatternSyntaxException Class Methods
A PatternSyntaxException is an unchecked exception that indicates a syntax error in a regular expression pattern. The PatternSyntaxException class provides the following methods to help you determine what went wrong −
Sr.No. | Method & Description |
---|---|
1 | public String getDescription() Retrieves the description of the error. |
2 | public int getIndex() Retrieves the error index. |
3 | public String getPattern() Retrieves the erroneous regular expression pattern. |
4 | public String getMessage() Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern. |
- Java Regular Expression Example
- Boundaries
- Java String Regex Methods
Java regex is the official Java regular expression API. The term Java regex is an abbreviation of Java regular expression. The Java regex API is located in the
java.util.regex
package which has been part of standard Java (JSE) since Java 1.4. This Java regex tutorial will explain how to use this API to match regular expressions against text. Although Java regex has been part of standard Java since Java 1.4, this Java regex tutorial covers the Java regex API released with Java 8.
Regular Expressions
A regular expression is a textual pattern used to search in text. You do so by 'matching' the regular expression against the text. The result of matching a regular expression against a text is either:
- A
true
/false
specifying if the regular expression matched the text. - A set of matches - one match for every occurrence of the regular expression found in the text.
For instance, you could use a regular expression to search an Java String for email addresses, URLs, telephone numbers, dates etc. This would be done by matching different regular expressions against the String. The result of matching each regular expression against the String would be a set of matches - one set of matches for each regular expression (each regular expression may match more than one time).
I will show you some examples of how to match regular expressions against text with the Java regex API further down this page. But first I will introduce the core classes of the Java regex API in the following section.
Java Regex Core Classes
The Java regex API consists of two core classes. These are:
The
Pattern
class is used to create patterns (regular expressions). A pattern is precompiled regular expression in object form (as a Pattern
instance), capable of matching itself against a text. The
Matcher
class is used to match a given regular expression (Pattern
instance) against a text multiple times. In other words, to look for multiple occurrences of the regular expression in the text. The Matcher
will tell you where in the text (character index) it found the occurrences. You can obtain a Matcher
instance from a Pattern
instance. Both the
Pattern
and Matcher
classes are covered in detail in their own texts. See links above, or in the top left of every page in this Java regex tutorial trail.Java Regular Expression Example
As mentioned above the Java regex API can either tell you if a regular expression matches a certain String, or return all the matches of that regular expression in the String. The following sections will show you examples of both of these ways to use the Java regex API.
Pattern Example
Here is a simple java regex example that uses a regular expression to check if a text contains the substring
http://
: The
text
variable contains the text to be checked with the regular expression. The
pattern
variable contains the regular expression as a String
. The regular expression matches all texts which contains one or more characters (.*
) followed by the text http://
followed by one or more characters (.*
). The third line uses the
Pattern.matches()
static method to check if the regular expression (pattern) matches the text. If the regular expression matches the text, then Pattern.matches()
returns true. If the regular expression does not match the text Pattern.matches()
returns false. The example does not actually check if the found
http://
string is part of a valid URL, with domain name and suffix (.com, .net etc.). The regular expression just checks for an occurrence of the string http://
.Matcher Example
Here is another Java regex example which uses the
Matcher
class to locate multiple occurrences of the substring 'is' inside a text: From the
Pattern
instance a Matcher
instance is obtained. Via this Matcher
instance the example finds all occurrences of the regular expression in the text.Java Regular Expression Syntax
A key aspect of regular expressions is the regular expression syntax. Java is not the only programming language that has support for regular expressions. Most modern programming languages supports regular expressions. The syntax used in each language define regular expressions is not exactly the same, though. Therefore you will need to learn the syntax used by your programming language.
In the following sections of this Java regex tutorial I will give you examples of the Java regular expression syntax, to get you started with the Java regex API and regular expressions in general. The regular expression syntax used by the Java regex API is covered in detail in the text about the Java regular expression syntax
Matching Characters
The first thing to look at is how to write a regular expression that matches characters against a given text. For instance, the regular expression defined here:
will match all strings that are exactly the same as the regular expression. There can be no characters before or after the
http://
- or the regular expression will not match the text. For instance, the above regex will match this text: But not this text:
The second string contains characters both before and after the
http://
that is matched against.Metacharacters
Metacharacters are characters in a regular expression that are interpreted to have special meanings. These metacharacters are:
Character | Description |
---|---|
< | |
> | |
( | |
) | |
[ | |
] | |
{ | |
} | |
^ | |
- | |
= | |
$ | |
! | |
| | |
? | |
* | |
+ | |
. |
What exactly these metacharacters mean will be explained further down this Java Regex tutorial. Just keep in mind that if you include e.g. a '.' (fullstop) in a regular expression it will not match a fullstop character, but match something else which is defined by that metacharacter (also explained later).
Escaping Characters
As mentioned above, metacharacters in Java regular expressions have a special meaning. If you really want to match these characters in their literal form, and not their metacharacter meaning, you must 'escape' the metacharacer you want to match. To escape a metacharacter you use the Java regular expression escape character - the backslash character. Escaping a character means preceding it with the backslash character. For instance, like this:
In this example the
.
character is preceded (escaped) by the
character. When escaped the fullstop character will actually match a fullstop character in the input text. The special metacharacter meaning of an escaped metacharacter is ignored - only its actual literal value (e.g. a fullstop) is used. Java regular expression syntax uses the backslash character as escape character, just like Java Strings do. This gives a little challenge when writing a regular expression in a Java string. Look at this regular expression example:
Notice that the regular expression String contains two backslashes after each other, and then a
.
. The reason is, that first the Java compiler interprets the two
characters as an escaped Java String character. After the Java compiler is done, only one
is left, as
means the character
. The string thus looks like this: Now the Java regular expression interpreter kicks in, and interprets the remaining backslash as an escape character. The following character
.
is now interpreted to mean an actual full stop, not to have the special regular expression meaning it otherwise has. The remaining regular expression thus matches for the full stop character and nothing more. Several characters have a special meaning in the Java regular expression syntax. If you want to match for that explicit character and not use it with its special meaning, you need to escape it with the backslash character first. For instance, to match for the full stop character, you need to write:
To match for the backslash character itself, you need to write:
Getting the escaping of characters right in regular expressions can be tricky. For advanced regular expressions you might have to play around with it a while before you get it right.
Matching Any Character
So far we have only seen how to match specific characters like 'h', 't', 'p' etc. However, you can also just match any character without regard to what character it is. The Java regular expression syntax lets you do that using the
.
character (period / full stop). Here is an example regular expression that matches any character: This regular expression matches a single character, no matter what character it is.
The
.
character can be combined with other characters to create more advanced regular expressions. Here is an example: This regular expression will match any Java string that contains the characters 'H' followed by any character, followed by the characters 'llo'. Thus, this regular expression will match all of the strings 'Hello', 'Hallo', 'Hullo', 'Hxllo' etc.
Matching Any of a Set of Characters
Java regular expressions support matching any of a specified set of characters using what is referred to as character classes. Here is a character class example:
The character class (set of characters to match) is enclosed in the square brackets - the
[ae]
part of the regular expression, in other words. The square brackets are not matched - only the characters inside them. The character class will match one of the enclosed characters regardless of which, but no mor than one. Thus, the regular expression above will match any of the two strings 'Hallo' or 'Hello', but no other strings. Only an 'a' or an 'e' is allowed between the 'H' and the 'llo'.
You can match a range of characters by specifying the first and the last character in the range with a dash in between. For instance, the character class
[a-z]
will match all characters between a lowercase a
and a lowercase z
, both a
and z
included. You can have more than one character range within a character class. For instance, the character class
[a-zA-Z]
will match all letters between a
and z
or between A
and Z
. You can also use ranges for digits. For instance, the character class
[0-9]
will match the characters between 0 and 9, both included. If you want to actually match one of the square brackets in a text, you will need to escape them. Here is how escaping the square brackets look:
The
[
is the escaped square left bracket. This regular expression will match the string 'H[llo'. If you want to match the square brackets inside a character class, here is how that looks:
The character class is this part:
[[]]
. The character class contains the two square brackets escaped ([
and ]
). This regular expression will match the strings 'H[llo' and 'H]llo'.
Matching a Range of Characters
The Java regex API allows you to specify a range of characters to match. Specifying a range of characters is easier than explicitly specifying each character to match. For instance, you can match the characters a to z like this:
This regular expression will match any single character from a to z in the alphabet.
The character classes are case sensitive. To match all characters from a to z regardless of case, you must include both uppercase and lowercase character ranges. Here is how that looks:
Matching Digits
You can match digits of a number with the predefined character class with the code
d
. The digit character class corresponds to the character class [0-9]
. Since the
character is also an escape character in Java, you need two backslashes in the Java string to get a d
in the regular expression. Here is how such a regular expression string looks: This regular expression will match strings starting with 'Hi' followed by a digit (
0
to 9
). Thus, it will match the string 'Hi5' but not the string 'Hip'.Matching Non-digits
Matching non-digits can be done with the predefined character class
[D]
(uppercase D). Here is an regular expression containing the non-digit character class: This regular expression will match any string which starts with 'Hi' followed by one character which is not a digit.
Matching Word Characters
Java Regex Cheat Sheet Cheat
You can match word characters with the predefined character class with the code
w
. The word character class corresponds to the character class [a-zA-Z_0-9]
. This regular expression will match any string that starts with 'Hi' followed by a single word character.
Matching Non-word Characters
You can match non-word characters with the predefined character class
[W]
(uppercase W). Since the
character is also an escape character in Java, you need two backslashes in the Java string to get a w
in the regular expression. Here is how such a regular expression string looks:Java Regex Cheat Sheet
Here is a regular expression example using the non-word character class:
Boundaries
The Java Regex API can also match boundaries in a string. A boundary could be the beginning of a string, the end of a string, the beginning of a word etc. The Java Regex API supports the following boundaries:
Symbol | Description |
---|---|
^ | The beginning of a line. |
$ | The end of a line. |
b | A word boundary (where a word starts or ends, e.g. space, tab etc.). |
B | A non-word boundary. |
A | The beginning of the input. |
G | The end of the previous match. |
Z | The end of the input but for the final terminator (if any). |
z |
Some of these boundary matchers are explained below.
Beginning of Line (or String)
The
^
boundary matcher matches the beginning of a line according to the Java API specification. However, in practice it seems to only be matching the beginning of a String. For instance, the following example only gets a single match at index 0: Even if the input string contains several line breaks, the
^
character only matches the beginning of the input string, not the beginning of each line (after each line break). The beginning of line / string matcher is often used in combination with other characters, to check if a string begins with a certain substring. For instance, this example checks if the input string starts with the substring
http://
: This example finds a single match of the substring
http://
from index 0 to index 7 in the input stream. Even if the input string had contained more instances of the substring http://
they would not have been matched by this regular expression, since the regular expression started with the ^
character.End of Line (or String)
The
$
boundary matcher matches the end of the line according to the Java specification. In practice, however, it looks like it only matches the end of the input string. The beginning of line (or string) matcher is often used in combination with other characters, most commonly to check if a string ends with a certain substring. Here is an example of the end of line / string matcher:
This example will find a single match at the end of the input string.
Word Boundaries
The
b
boundary matcher matches a word boundary, meaning a location in an input string where a word either starts or ends. Here is a Java regex word boundary example:
This example matches all word boundaries found in the input string. Notice how the word boundary matcher is written as
b
- with two
(backslash) characters. The reason for this is explained in the section about escaping characters. The Java compiler uses
as an escape character, and thus requires two backslash characters after each other in order to insert a single backslash character into the string. The output of running this example would be:
Java Regex Cheat Sheet Pdf
The output lists all the locations where a word either starts or ends in the input string. As you can see, the indices of word beginnings point to the first character of the word, whereas endings of a word points to the first character after the word.
You can combine the word boundary matcher with other characters to search for words beginning with specific characters. Here is an example:
![Java Regex Cheat Sheet Java Regex Cheat Sheet](/uploads/1/1/9/8/119845671/546046623.png)
This example will find all the locations where a word starts with the letter
l
(lowercase). In fact it will also find the ends of these matches, meaning the last character of the pattern, which is the lowercase l
letter.Non-word Boundaries
The
B
boundary matcher matches non-word boundaries. A non-word boundary is a boundary between two characters which are both part of the same word. In other words, the character combination is not word-to-non-word character sequence (which is a word boundary). Here is a simple Java regex non-word boundary matcher example: This example will give the following output:
Notice how these match indexes corresponds to boundaries between characters within the same word.
Quantifiers
Quantifiers can be used to match characters more than once. There are several types of quantifiers which are listed in the Java Regex Syntax. I will introduce some of the most commonly used quantifiers here.
The first two quantifiers are the
*
and +
characters. You put one of these characters after the character you want to match multiple times. Here is a regular expression with a quantifier: This regular expression matches strings with the text 'Hell' followed by zero or more
o
characters. Thus, the regular expression will match 'Hell', 'Hello', 'Helloo' etc. If the quantifier had been the
+
character instead of the *
character, the string would have had to end with 1 or more o
characters. If you want to match any of the two quantifier characters you will need to escape them. Here is an example of escaping the
+
quantifier: This regular expression will match the string 'Hell+';
You can also match an exact number of a specific character using the
{n}
quantifier, where n
is the number of characters you want to match. Here is an example: This regular expression will match the string 'Helloo' (with two
o
characters in the end). You can set an upper and a lower bound on the number of characters you want to match, like this:
This regular expression will match the strings 'Helloo', 'Hellooo' and 'Helloooo'. In other words, the string 'Hell' with 2, 3 or 4
o
characters in the end.Logical Operators
The Java Regex API supports a set of logical operators which can be used to combine multiple subpatterns within a single regular expression. The Java Regex API supports two logical operators: The and operator and the or operator.
Regex Cheat Sheet Pdf
The and operator is implicit. If two characters (or other subpatterns) follow each other in a regular expression, that means that both the first and the second subpattern much match the target string. Here is an example of a regular expression that uses an implicit and operator:
Notice the 3 subpatterns
[Cc]
, [Ii]
and .*
Since there are no characters between these subpatterns in the regular expression, there is implicitly an and operator in between them. This means, that the target string must match all 3 subpatterns in the given order to match the regular expression as a whole. As you can see from the string, the expression matches the string. The string should start with either an uppercase or lowercase
C
, followed by an uppercase or lowercase I
and then zero or more characters. The string meets these criteria. The or operator is explicit and is represented by the pipe character
|
. Here is an example of a regular expression that contains two subexpression with the logical or operator in between: As you can see, the pattern will match either the subpattern
Ariel
or the subpattern Sleeping Beauty
somewhere in the target string. Since the target string contains the text Sleeping Beauty
, the regular expression matches the target string.Java String Regex Methods
The Java String class has a few regular expression methods too. I will cover some of those here:
matches()
The Java String
matches()
method takes a regular expression as parameter, and returns true
if the regular expression matches the string, and false
if not. Here is a
matches()
example:split()
The Java String
split()
method splits the string into N substrings and returns a String array with these substrings. The split()
method takes a regular expression as parameter and splits the string at all positions in the string where the regular expression matches a part of the string. The regular expression is not returned as part of the returned substrings. Here is a
split()
example: This example will return the three strings 'one', ' three' and ' one'.
replaceFirst()
The Java String
replaceFirst()
method returns a new String with the first match of the regular expression passed as first parameter with the string value of the second parameter. Here is a
replaceFirst()
example: This example will return the string 'one five three two one'.
Reg Expression Cheat Sheet
replaceAll()
The Java String
replaceAll()
method returns a new String with all matches of the regular expression passed as first parameter with the string value of the second parameter.Java Regex Cheat Sheet For Beginners
Here is a
replaceAll()
example:Java Design Pattern Cheat Sheet
This example will return the string 'one five three five one'.