Functions¶
Functions operate at the row level.
if_else¶
if_else(condition, expr_if_true, expr_if_false)
// A ternary operator is also supported.
(condition ? expr_if_true : expr_if_false)
condition- An expression that evaluates toTRUEorFALSE.expr_if_true- Value to return ifcondition=-TRUEexpr_if_false- Value to return ifcondition==FALSE
The if_else function evaluates an expression and returns expr_if_true or expr_if_false depending on the evaluation of condition.
flavor: if_else(flavor == "Chocolate", sugar > 50, sugar < 50)
max¶
Returns the max of arguments passed
orders: max(1,2,3) == 3 # Returns 3 == 3; true
items: max(col1, col2, col3) # Returns the max value for the given row of col1-3.
inventory: max(col1) # This does not return the max for an entire column; col1 is a scaler value.
min¶
Returns the minimum of arguments passed
to_lower¶
Converts a string to lowercase.
to_upper¶
Converts a string to uppercase.
replace¶
replace(value, find, replace)
count¶
count(column)
Returns the number of times the passed value(s) have been observed.
color: count(color) <= 20 # Fails if a value is observed more than 20x times.
configuration: count(color, size) <= 10 # Fails if the combination of values is observed more than 10 times.
Note
count does not work well on large datasets. It stores a hash digest of the arguments to test for uniqueness.
null coalescence¶
?? can be used to set a default value.
(colname ?? 1) == 1 # returns TRUE if colname==NA/nil
Two-Pass Functions¶
Two-pass functions allow for more advanced expressions to be evaluated, but they require a first-pass through the file to collect information.
group_count¶
group_count(group_column, count_column, eq_value)
group_count will group data by the group_column, and count the number of occurences of eq_value in the count_column.
example
family_id: is_int()
person: group_count(person, is_head_of_household, true) == 1
is_head_of_household: is_bool()
In the example above, we are checking to see that only one person in a family is set to true for the column is_head_of_household. If you have missing data in your grouping column you may need use an if_else statement to conditionally validate a row as true:
family_id: is_int()
person: if_else(is_missing(), true, group_count(person, is_head_of_household, true) == 1)
is_head_of_household: is_bool()