Column Expressions
Create new columns or modify existing ones by writing column_name = expression indented under the with statement.
Simple expressions
Any arithmetic expression is valid: +, -, *, /, ** (power).
Python runtime functions
Python functions in the runtime namespace use : in column expressions:
python
def clean_price(s):
return s.str.replace("$", "").astype(float)
end
with sales
price = :clean_price(price)
Bare function calls are for Pivotal built-ins such as upper(name) or backend-native functions. Use :my_func(col) when calling Python.
Column loops
Column loops are covered with checks and pipeline functions in Pipeline Control.
String functions
| Function | Description |
|---|---|
upper(col) |
Uppercase |
lower(col) |
Lowercase |
trim(col) |
Strip leading/trailing whitespace |
ltrim(col) |
Strip leading whitespace |
rtrim(col) |
Strip trailing whitespace |
left(col, n) |
First n characters |
right(col, n) |
Last n characters |
substr(col, start, length) |
Substring |
len(col) |
String length |
replace(col, old, new) |
Replace substring |
regex_extract(col, pattern) |
Extract first regex match |
regex_extract(col, pattern, group) |
Extract regex capture group |
regex_replace(col, pattern, replacement) |
Replace regex matches |
with products
name = upper(name)
code = left(sku, 4)
slug = lower(trim(name))
abbr = substr(code, 1, 3)
n = len(description)
clean = replace(notes, "N/A", "")
postcode = regex_extract(address, "\\b\\d{4}\\b")
clean_phone = regex_replace(phone, "[^0-9]", "")
Regex patterns use normal Pivotal strings. Escape backslashes inside quoted patterns, for example "\\b\\d{4}\\b" for a four-digit postcode on a word boundary.
String concatenation with +:
Nesting is supported:
Date functions
Extract components from date/datetime columns or perform date arithmetic:
| Function | Description |
|---|---|
year(col) |
Year as integer |
month(col) |
Month (1–12) |
day(col) |
Day of month |
quarter(col) |
Quarter (1–4) |
dayofweek(col) |
Day of week (0=Monday in pandas, 1=Monday in Polars) |
hour(col) |
Hour (0–23) |
minute(col) |
Minute (0–59) |
date_format(col, fmt) |
Format as string, e.g. "%b %Y" → "Mar 2024" |
to_date(col) |
Parse a string column to date/datetime |
date_diff(end_col, start_col) |
Difference in days (integer) |
date_add(col, n) |
Add n days to a date column |
with sales
yr = year(order_date)
mo = month(order_date)
quarter = quarter(order_date)
label = date_format(order_date, "%b %Y")
days_open = date_diff(close_date, open_date)
due_date = date_add(order_date, 30)
Tip: to filter or group by a date part, create the column first, then use it:
with sales as monthly
yr = year(order_date)
mo = month(order_date)
filter yr == 2024
group by yr, mo
agg sum amount as total
Type casting
Convert column data types using cast ... as <type>:
with sales
cast price as float # coerce — bad values become NaN/null
cast qty as int
cast created_at as datetime
cast price, cost as float # multiple columns in one statement
cast price as float strict # strict — raises error on bad values
Types: int / integer float string / str bool / boolean datetime
Inline cast inside an expression:
Rounding
Round numeric columns with round:
Use as with one source column to create a new rounded column. Omit as to
round one or more columns in place.
Aggregate functions in expressions
Use aggregate functions to compute values relative to the whole table:
| Function | Description |
|---|---|
sum(col) |
Total |
mean(col) |
Average |
std(col) |
Standard deviation |
min(col) |
Minimum |
max(col) |
Maximum |
count(col) |
Count |
Windowed aggregates — compute the aggregate within groups using by:
with sales
pct_of_region = amount / sum(amount)
by region
regional_z = (amount - mean(amount)) / std(amount)
by region, category
Conditional assignment — where
Create a column with different values depending on a condition:
else is optional. Without it, rows where the condition is false receive null / NaN. Use else to provide a default:
Multi-case assignment
Test multiple conditions in order; use explicit else for the default:
The conditions are evaluated in order; the first match wins. The else line is the default value.