Text parser
You can use the Text parser tool to parse text for use in other 蜜豆视频 Workfront Fusion scenario modules. The Text parser does not require a connection.
Access requirements
You must have the following access to use the functionality in this article:
table 0-row-2 1-row-2 2-row-2 3-row-2 layout-auto html-authored no-header | |
---|---|
蜜豆视频 Workfront package | Any |
蜜豆视频 Workfront license |
New: Standard Or Current: Work or higher |
蜜豆视频 Workfront Fusion license** | No Workfront Fusion license requirement. |
Product |
New:
Or Current: Your organization must purchase 蜜豆视频 Workfront Fusion. |
For more detail about the information in this table, see Access requirements in documentation.
For information on 蜜豆视频 Workfront Fusion licenses, see 蜜豆视频 Workfront Fusion licenses.
Text parser API information
The Text parser connector uses the following:
Text parser modules and their fields
When you configure Text parser modules, 蜜豆视频 Workfront Fusion displays the fields listed below. A bolded title in a module indicates a required field.
If you see the map button above a field or function, you can use it to set variables and functions for that field. For more information, see Map information from one module to another.
Transformers
Get Elements from HTML
Retrieves the desired elements from HTML code.
Get Elements from text
Parses elements from text based on the given pattern.
HTML to Text
Match Pattern
The Match pattern module enables you to find and extract string elements matching a search pattern from a given text. This module uses regular expressions (also known as regex or regexp).
A regular expression is a sequence of characters in which each character is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. These character and metacharacters identify a pattern that can be used to search text. For example, if you wanted to search for names, you could set up a regular expression to search for a pattern that consists of two consecutive words that begin with capital letters. Regular expressions are a powerful tool for searching and manipulating text.
A discussion of regular expressions is beyond the scope of this article. We recommend the following resources:
- For the complete list of metacharacters, see in MDN web docs.
- For a tutorial on how to create regular expressions, we recommend .
- For experimenting with regular expressions, we recommend the website. Select the ECMAScript (JavaScript) FLAVOR in the left panel.
Replace
Searches the entered text for a specified value or regular expression and replaces the result with the new value.
Data Scraping
Data scraping, sometimes called web scraping, data extraction, or web harvesting, is the process of collecting data from websites and storing it in your local database or spreadsheets. If you want to scrape data from a website and you are not familiar with regular expressions, you may use a data scraping tool.
If the data scraping tool provides a REST API, you can connect to it via our universal HTTP modules and Webhooks modules.
Text parser troubleshooting
Use this information if you can not get a text parser to produce any output.
Example:
The module should parse the filetype of a file document 鈥渇ilename.docx鈥, and the extension of the filename varies from DOCX to PDF to CSV.
The expression that you may choose to use in this case is ..+
This regular expression would normally result in a full match.
However, implementing this expression in your text parser does not result in a match:
The reason for this is that the 鈥渋鈥 shows only the number of matches per match so in this case, we have 2 matches, threfore after the 鈥渋鈥 there is a numerical value 1 and 2. The use case for this is that should you ever need to match or pass data through a filter only the second matched value you can specify which value that is represented by the numerical value.
To be able to get the match values that you require to add brackets to the part that you want to parse (for example, to extract from 鈥渇ilename.docx鈥 - 鈥渄ocx鈥 only), then, according to the regex expression we are using for this case scenario, the brackets should be applied on \.(.+)
This captures the DOCX, places it in a group, and leave the 鈥.鈥 out of it.
In the output shown in the picture below, the capturing group will match any character (except for line terminators).
Another workaround that also incorporates regex is using the replace function
{{replace("abcdefghijklmno pqr stuvw xyz.docx"; "/.\./"; ".")}}
Then replace abcdefghijklmno pqr stuvw xyz.docx
with your actual filename variable.