Overview of the Expression Engine Language

About the Expression Engine Language

DataFlux Data Management Platform is a powerful suite of data cleansing and data integration software applications. You can use the Data Job Expression node to run a scripting language to process your data sets in ways that are not built into the DataFlux Data Management Studio. The Expression Engine Language (EEL) provides many statements, functions, and variables for manipulating data in DataFlux Data Management Studio and SAS Event Stream Processing Engine.
This reference guide guides you through solutions to address some common EEL tasks. Most examples use the Expression node in the Data Job Editor. All of the examples in this guide also apply to other nodes where EEL is used in DataFlux Data Management Studio.

Introduction to the EEL Operations

Operations in the EEL are processed in symbols. Symbols are similar to variables; they are either fields passed from the node above or are variables declared in the code. EEL code consists of declarations, statements, and labels.
Declarations
Declarations establish the existence of variables in memory. Declared variables are available only after their declaration, so it is better to make all declarations at the beginning of a code segment. Place declarations in the code outside of programmatic constructs, so declaring a variable in a FOR loop is not valid.
Statements
Statements are either assignments (for example: x=y) or keywords (for example: goto) followed by parameters. Statements can be located anywhere in a code segment.
Labels
Labels are named locations in a code segment and can be located anywhere in the code segment. Reserved keywords cannot be used for label names. See Reserved Words.
Pieces of Expression code do not need to be separated by anything, but it is best to use white space and newline characters for readability. Code might include comments. A comment is text within a code segment that is not executed. Comments can be either C style (starts with /* and ends with */) or C++ style (starts with // and continues to the end of a line).
Assume that there are two symbols (output fields from the previous step) named "x" and "y." Here is an example of Expression code:
// Declaration of integer z
integer z
// Assignment statement
z=x+y
This example creates another symbol (field), "z" and sets the value of z to x + y, making z ready for the next step.
A segment of Expression code can also be a straight expression. In the context of the Expression main code area, if a straight expression value is false, then the row is not sent to output. For example, assume the same fields from the previous example, "x" and "y." Consider the following straight expression in the Expression code area:
x<=y
EEL in the Expression code area executes on each record of your data set. Only records where the value of x is less than or equal to y are output to the next node. If you have more than one function in the main code area, the last function to execute determines the overall expression value. For example, if the last function returns a true value, then the entire expression returns true.
The following example includes several of the concepts discussed above:
// declarations
integer x; /*semicolon is safely ignored, and can use C-style comments*/
real y
 
// statements 
x=10 y=12.4 /* more than one statement can be on a line */

Declaration of Symbols

Declarations have the following syntax:
["static"]["private"|"public"]["hidden"|"visible"] type[(*size)] ["array"] identifier
where type is:
"integer"|"string"|"real"|"boolean"|"date"
and identifier is a non-keyword starting with an alphabetic character followed by characters, digits, underscores, or any string delimited by back quotation marks (`). Refer to Reserved Words for a list of reserved keywords.
Note: Size is applicable to the string type only.
Note: The global symbol type is deprecated but is equivalent to static public.
Additional information about declaring symbols:
  • The default symbol type is public.
  • Private symbols are visible only within the code block in which they are declared.
  • Static symbols are public by default. You can also declare a static symbol as private.
  • String symbols can be declared with a size. If you assign a value to a string symbol, it is truncated to this size. If you do not specify a size, 255 is used by default.
    Note: The maximum size is 5 MB. However, this applies only to fields within the Expression node. If the symbol is available in the output, it is truncated to 32 KB when the Expression node passes the value on to the next node. For example, if you define a string of length 45 KB, you can work with it inside the expression node. However, it is truncated to 32 KB on output. To override the maximum size, set the EXPRESS_MAX_STRING_LENGTH setting.
  • The keyword, bytes, qualifies a string size in bytes. See the previous note for additional details.
  • Symbols can be declared anywhere in code except within programmatic constructs, such as loops. It is good practice to declare symbols at the beginning of the code block.
  • In the Data Job Editor of Data Management Studio, all symbols declared in code are available in the output unless they are declared private or hidden.
  • Before code is executed, symbols are reset to null. If the symbols are declared static or have been declared in the pre-processing step, they retain their value from the previous execution.
  • The static keyword can be used when declaring symbols. It specifies that the value of the symbol value is not reset between calls to the expression (between rows read in Data Job Editor). This replaces the global keyword. The pre-processing expression defaults all symbols to static public whether they are declared static or not.
  • Hidden and visible keywords can be used when declaring symbols. The default is visible if none is specified. Hidden symbols are not output from the expression step in data jobs. Note that this differs from public and private. Private variables are not output either, but they are not visible outside the expression block. Hidden variables are visible outside the expression block but are not output.
    Public or global symbols declared in one area are available to other areas as follows:
    • Symbols declared in Pre-Processing are available to any block.
    • Symbols declared in Expression are available to Expression and Post-Processing.
    • Symbols declared in Post-Processing are available only to Post-Processing.
    • Automatic symbols, which are symbols from the previous step, are available to any of the three blocks.
  • To declare a variable with spaces or other special characters in the name, write your variable name between back quotation marks (`). For example:
    string `my var`
    `my var`="Hello"
    Note: It is the grave accent character (`), also known as the back quote, that is used, and not the apostrophe (') or quotation marks ("). The grave accent is found above the tab key on standard keyboards.
Here are some sample declarations:
// a 30-character string available
// only to the code block
private string(30) name

// a 30-byte string
string(30 bytes) name
 
// a 255-character public string
string address
 
// a global real number
global real number
 
// a public date field. Use back
// quotes if symbols include spaces
date `birth date`

Statements

Statements have the following syntax:
statement:
| "goto" label
| identifier "=" expression
| "return" expression
| "if" expression ["then"] statement ["else" statement]
| "for" identifier ["="] expression ["to"] expression ["step" expression] statement
| "begin" statement [statement...] "end"
| ["call"] function
| "while" expression statement
 
label: identifier ":"
 
expression:
described later
 
function: identifier "(" parameter [,parameter...] ")"
Statements can be separated by a semicolon, a space, or newline character. To group more than one statement together (for example, in a FOR loop), use begin/end.

GOTO and LABEL

GOTO LABEL
LABEL: identifier ":"
A GOTO statement jumps code control to a LABEL statement. A label can occur anywhere in the same code block. For example:
integer x
x=0
// label statement called start
start:
     x=x+1 
     if x < 10 goto start

Assignment

Assigns the value of an expression to a symbol as follows:
  • Only read-write symbols can be assigned a value.
  • In data jobs, all symbols are read-write.
  • A symbol assigned an expression of a different type receives the converted (or coerced) value of that expression. For example, if you assign a number to a string-type symbol, the symbol contains a string representation of that number.
  • If the expression cannot be converted into the type of symbol, the symbol's value is null. For example, if you assign a non-date string to a date symbol, it will be set to null.
integer num 
string str
date dt 
boolean b 
real r 
 
// assign 1 to num
num=1 
// assign Jan 28 '03 to the date symbol
dt=#01/28/03#
// sets boolean to true
b=true 
// also sets boolean to true
b='yes' 
// sets real to 30.12 (converting from string)
r="30.12" 
// sets string to the string representation of the date
str=dt 
// sets num to the rounded value of r
num=r

Arrays

In the EEL, you can create arrays of primitive types such as integers, strings, reals, dates, and Booleans. It is not possible to create arrays of objects such as dbcursor, dbconnection, regex, and file.
The syntax to create arrays of primitive types is as follows:
  • string array string_list
  • integer array integer_list
  • date array date_list
  • Boolean array boolean_list
  • real array real_list
There are three supported functions on arrays: DIM, SET, and GET. For more information about arrays, see Arrays.

Return

"return" expression
The return statement exits the code block immediately, returning a value.
  • In the data jobs, the return type is converted to Boolean.
  • If a false value is returned from the expression, the record is not included in the output.
The following is an example of a return statement:
// only include rows where ID >= 200
if id < 200
     return false

IF/ELSE

"IF" expression ["then"] statement ["ELSE" statement]
The IF/ELSE statement branches to one or more statements, depending on the expression.
  • Use this to execute code conditionally.
  • If you need to execute more than one statement in a branch, use begin/end.
  • The THEN keyword is optional.
  • If you nest IF/ELSE statements, the ELSE statement corresponds to the closest if statement. (See the previous example.) It is better to use begin/end statements if you do this, as it makes the code more readable.
In the following example, you can change the value of age to see different outcomes:
string(20) person
integer x
integer y
integer age
Age=10
 
if Age < 20 then
     person="child"
else
     person="adult"
 
if Age==10
     begin
               x=50
          y=20
     end
 
// nested if/else
if Age <= 60
               if Age < 40
          call print("Under 40")
     // this else corresponds to the inner if statement
     else
 
          call print("Age 40 to 60")
// this else corresponds to the outer if statement
else
     call print("Over 60")

FOR

The FOR loop executes one or more statements multiple times.
  • FOR loops are based on a symbol, which is set to some value at the start of the loop, and changes with each iteration of the loop.
  • A FOR loop has a start value, an end value, and an optional step value.
  • The start, end, and step value can be any expression.
  • If the step value is not specified, it defaults to 1.
  • If you are starting at a high number and ending at a lower number, you must use a negative step.
  • If you need to execute more than one statement in the loop, use begin/end.
For example:
integer i
for i = 1 to 10 step 2
call print('Value of i is ' & i)
 
integer x
integer y
x=10 y=20
 
for i = x to y
     call print('Value of i is ' & i)
 
for i = y to x step -1
     begin
          call print('Value of i is ' & i)
          x=i /*does not affect the loop since start/end/step
          expressions are only evaluated before loop*/
     end

WHILE

"WHILE" expression statement
The WHILE loop enables you to execute the same code multiple times; a condition remains true. For example:
integer i
i=1000
// keep looping while the value of i is > 10
while i > 10 
     i=i/2
 
// you can use begin/end to enclose more than one statement 
while i < 1000
     begin
          i=i*2
          call print('Value if i is ' & i)
     end

BEGIN/END

"BEGIN" statement [statement...] "END"
The BEGIN/END statement groups multiple statements together. If you need to execute multiple statements in a FOR or WHILE loop or in an IF/THEN/ELSE statement, you must use begin/end. These can be nested as well.

CALL

"CALL" statement [statement...] "END"
This statement calls a function and discards the return value. By default, the return value of the last function called in an expression is returned, unless there is an explicit expression or "return" statement at the end. This enables you to prevent the function return value from being used to determine the expression if it is the last function in the expression.

Expressions

  • An expression can include operators in combination with numbers, strings, functions, functions, which use other functions.
  • An expression always has a resulting value.
  • The resulting value can be one of the following: string, integer, real, date, and Boolean.
  • The resulting value can also be null (a special type of value).
This section covers different types of expressions.

Operators

The following table lists operators in order of precedence:
Operators
Description
! or not
Boolean
(,)
parentheses (can be nested to any depth)
*
multiply
/
divide
%
modulo
+
add
-
subtract
&
string concatenation
!=
not equal ("!=" and "<>" are the same)
<>
not equal
==
comparison operator (= is an assignment and should not be used for comparisons)
>
greater than
<
less than
>=
greater than or equal to
<=
less than or equal to
and
Boolean and
or
Boolean or

Modulo Operator

The modulo operator is represented by the % symbol. The result of the expression a%d ("a modulo d") returns a value r, for example:
a = qd + r and 0 ≤ r < | d |, where | d | denotes the absolute value of d
If either a or d are not integers, they are rounded down to the nearest integer before the modulo calculation is performed.
For positive values of a and d, it can be the remainder on division of a by d. For example:
a
d
a%d (r)
11
3
2
11
-3
2
-11
3
-2
9.4
3
0
9.6
3
0
10
3
1
9.4
3.2
0
9.6
3.2
0
10
3.2
1
–10.2
3.2
–2

Comparison Operator

Do not confuse the comparison operator (==) with the assignment operator (=). For example:
// correct statements to compare the value of x and y
if x==y then statement1
else statement2
 
// Assigning a value
x=y

String Expressions

A string expression is a string of undeclared length. Strings can be concatenated using an ampersand (&) or operated upon with built-in functions. For information about defining the length of a string, see Declaration of Symbols. For example:
string str 
// simple string
str="Hello"
// concatenate two strings
str="Hello" & " There"
Note: Setting a string variable to a string expression results in a truncated string if the variable was declared with a shorter length than the expression.
When a string value is used in a Boolean expression, the value is evaluated and the following values are considered true, (upper/lower/mixed): true, t; yes, y; or 1. The following values are considered false (also upper/lower/mixed): false, no, n, or 0.
For more information about string expressions, see Strings.

Integer and Real Expressions

Integer and real expressions result in an integer or real value, respectively. For example:
integer x 
real r
 
// order of precedence starts with parentheses,
// then multiplication, then addition
x=1+(2+3)*4
 
// string is converted to value 10
x=5 + "10"
r=3.14 
 
// x will now be 3
x=r

Date Expressions

  • A date value is stored as a real value with the whole portion representing number of days since January 1, 1900, and the fraction representing the fraction of a day.
  • If years are specified as two digits, then the years between 51 and 99 are assumed to be in 1900. Other years are assumed to be in 2000.
  • A date constant is denoted with a number sign (#).
  • If a whole number is added to a date, the resulting date is advanced by the specified number of days.
  • To make changes to the locale setting in Microsoft Windows, refer to the LOCALE() function topic in this reference guide.
For example:
date date_value1
date data_value2
date_value1 = todate("01/02/03")
date_value2 = #01-02-03#
Note: The actual results depend on your Windows system settings.
For more on date expressions, see Dates and Times.

Boolean Expressions

  • A Boolean expression can either be true or false.
  • Results of comparisons are always Boolean.
  • Using AND or OR in an expression also results in a Boolean value.
For example:
boolean a
boolean b
boolean c
 
a=true
b=false
// c is true
c=a or b 
// c is false
c=a and b 
// c is true
c=10<20 
// c is false
c=10==20 
// c is true
c=10!=20 
// c is true
c='yes' 
// c is false
c='no'

Null Propagations

If any part of a mathematical expression has a null value, the entire expression is usually null.
The following table shows how nulls are propagated:
Expression
Result
null == value
null (applies to all comparison operators)
null & string
string
null & null
null
number + null
null (applies to all arithmetic operations)
null + null
null (applies to all arithmetic operations)
null AND null
null
null AND true
null
null AND false
false
null OR null
null
null OR true
true
null OR false
false
not null
null
if null
statement following if is not executed
FOR loop
run-time error if any of the terms are null
while null
statement following while is not executed
For example:
integer x
integer y
integer z 
boolean b 
string s 
x=10 
y=null 
// z has a value of null
z=x + y 
// b is true
b=true or null
// b is null
b=false or null
// use isnull function to determine if null
if isnull(b) 
     call print("B is null")
// s is "str"
s="str" & null

Coercion

If a part of an expression is not the type expected in that context, it is converted into the correct type.
  • A type can be coerced into other types.
  • If a value cannot be coerced, it results in null.
To explicitly coerce one type to another type, use one of the following functions: TOBOOLEAN, TODATE, TOINTEGER, TOREAL, or TOSTRING. These functions are helpful when there is a need to force a comparison of different types. For example, to compare a string variable called xyz with a number 123.456, the number is converted to a string before the comparison is completed using the following example:
toreal(xyz) > 123.456
The following table shows the rules for coercion:
Coercion Type
TOSTRING
TOINTEGER
TOREAL
TODATE
TOBOOLEAN
from String
yes
yes
yes
yes
from Integer
yes
yes
yes
yes
from Real
yes
yes
yes
yes
from Date
yes
yes
yes
no
from Boolean
yes
yes
yes
no
The following table shows special considerations for coercion:
Coercion Type
Resulting Action
date to string
A default date format is used: YYYY/MM/DD hh:mm:ss. Use the FORMATDATE function for a more flexible conversion.
date to number
The number represents days since 12/30/1899. Hours, minutes, seconds, and milliseconds are converted to a fraction, where 1 hour = 1/24 units, 1 minute = 1/(24*60) units, and so on.
string to date
Most date formats are recognized and intelligently converted. See the Date Expressions section for more information.
string to Boolean
The values yes, no, true, false, y, n, t, and f are recognized.
integer to real to Boolean
Any nonzero value is true. Zero is false.

Functions

  • A function can be part of an expression.
  • If you need to call a function but do not want the return value, use CALL.
  • Each function has a specific return type and parameter type.
  • If the parameters provided to the function are not the correct type, they are sometimes coerced.
  • A function sometimes requires a parameter to be a specific type. If you pass a parameter of the wrong type, it is not coerced and you get an error.
  • Functions normally propagate null (there might be exceptions).
  • Some functions might modify the value of their parameters if they are documented to do so.
  • Some functions might accept a variable number of parameters.
For example:
string str
integer x 
str="Hello there"
// calls the upper function
if upper(str)=='HELLO THERE'
     // calls the print function
     call print("yes")
 
// x is set to 7 (position of word 'there')
x=instr(str,"there",1)

Global Functions

You can register global functions (or user-defined functions, known as UDFs) that can be reused from any expression in the system. This includes data and process flow nodes, business rules, and more. To do this, create one text file (or more) in the installation under etc/udf. Each file might contain one or more function definitions. Each definition is enclosed in a function/end function block. The return type for each function can be integer, real, date, string, or Boolean. To get to function parameters, use the PARAMETER() and PARAMETERCOUNT() functions as well as individual functions for the types (PARAMETERBOOLEAN, PARAMETERDATE, PARAMETERINTEGER, PARAMETERREAL, or PARAMETERSTRING).
The following functions are applicable when registering the global functions mentioned earlier. Each global function accepts an integer as a parameter to indicate which parameter is desired. Each function returns the type specified and converts the parameter to that type, if it is not already the type specified.
  • PARAMETERBOOLEAN
  • PARAMETERDATE
  • PARAMETERINTEGER
  • PARAMETERREAL
  • PARAMETERSTRING
For example:
function example_udf return string 255
   // this function is declared to return a string of
   // up to 255 characters retrieve the number of
   // parameters passed
   print("You passed " & parametercount() & " parameters")
   integer x
   for x = 1 to parametercount()
   begin
      print("Parameter " & x & " type is " &
      parametertype(x) & " value is " & parameter(x))
      if(parametertype(x)=="integer")
         print("Integer: " & parameterinteger(x))
      else if parametertype(x)=="real"
         print("Real: " & parameterreal(x))
   end
   return "string val"
end function

Objects

The EEL supports a number of objects. Generally, an object is a type of code in which not only the data type of a data structure is defined, but also the types of operations that can be applied to the data structure. In particular, the EEL supports objects for the following:
  • Data Quality: Expressions and Functions
  • Databases: Database connectivity (DBCONNECT object)
  • Files: Text file reading and writing (FILE object)
  • Regular Expressions: Regular expression searches (REGEX object)