Column Expressions
Create new columns or modify existing ones by writing column_name = expression indented under the with statement.
Simple expressions
Any arithmetic expression is valid: +, -, *, /, ** (power).
Python runtime functions
Python functions in the runtime namespace use : in column expressions:
python
def clean_price(s):
return s.str.replace("$", "").astype(float)
end
with sales
price = :clean_price(price)
Bare function calls are for Pivotal built-ins such as upper(name) or backend-native functions. Use :my_func(col) when calling Python.
Column loops
Column loops are covered with checks and pipeline functions in Pipeline Control.
String functions
| Function | Description |
|---|---|
upper(col) |
Uppercase |
lower(col) |
Lowercase |
trim(col) |
Strip leading/trailing whitespace |
ltrim(col) |
Strip leading whitespace |
rtrim(col) |
Strip trailing whitespace |
left(col, n) |
First n characters |
right(col, n) |
Last n characters |
substr(col, start, length) |
Substring |
len(col) |
String length |
replace(col, old, new) |
Replace substring |
regex_extract(col, pattern) |
Extract first regex match |
regex_extract(col, pattern, group) |
Extract regex capture group |
regex_replace(col, pattern, replacement) |
Replace regex matches |
with products
name = upper(name)
code = left(sku, 4)
slug = lower(trim(name))
abbr = substr(code, 1, 3)
n = len(description)
clean = replace(notes, "N/A", "")
postcode = regex_extract(address, "\\b\\d{4}\\b")
clean_phone = regex_replace(phone, "[^0-9]", "")
Regex patterns use normal Pivotal strings. Escape backslashes inside quoted patterns, for example "\\b\\d{4}\\b" for a four-digit postcode on a word boundary.
String concatenation with +:
Nesting is supported:
Date functions
Extract components from date/datetime columns or perform date arithmetic:
| Function | Description |
|---|---|
year(col) |
Year as integer |
month(col) |
Month (1–12) |
day(col) |
Day of month |
quarter(col) |
Quarter (1–4) |
dayofweek(col) |
Day of week (0=Monday in pandas, 1=Monday in Polars) |
hour(col) |
Hour (0–23) |
minute(col) |
Minute (0–59) |
date_format(col, fmt) |
Format as string, e.g. "%b %Y" → "Mar 2024" |
to_date(col) |
Parse a string column to date/datetime |
date_diff(end_col, start_col) |
Difference in days (integer) |
date_add(col, n) |
Add n days to a date column |
with sales
yr = year(order_date)
mo = month(order_date)
quarter = quarter(order_date)
label = date_format(order_date, "%b %Y")
days_open = date_diff(close_date, open_date)
due_date = date_add(order_date, 30)
Tip: to filter or group by a date part, create the column first, then use it:
with sales as monthly
yr = year(order_date)
mo = month(order_date)
filter yr == 2024
group by yr, mo
agg sum amount as total
Type casting
Convert column data types using cast ... as <type>:
with sales
cast price as float # coerce — bad values become NaN/null
cast qty as int
cast created_at as datetime
cast price, cost as float # multiple columns in one statement
cast price as float strict # strict — raises error on bad values
Types: int / integer float string / str bool / boolean datetime
Inline cast inside an expression:
Rounding
Round numeric columns with round:
Use as with one source column to create a new rounded column. Omit as to
round one or more columns in place.
Aggregate functions in expressions
Use aggregate functions to compute values relative to the whole table:
| Function | Description |
|---|---|
sum(col) |
Total |
mean(col) |
Average |
std(col) |
Standard deviation |
min(col) |
Minimum |
max(col) |
Maximum |
count(col) |
Count |
Windowed aggregates — compute the aggregate within groups using by:
with sales
pct_of_region = amount / sum(amount)
by region
regional_z = (amount - mean(amount)) / std(amount)
by region, category
Conditional assignment — where
Create a column with different values depending on a condition:
else is optional. Without it, rows where the condition is false receive null / NaN. Use else to provide a default:
Multi-case assignment
Test multiple conditions in order; use explicit else for the default:
The conditions are evaluated in order; the first match wins. The else line is the default value.
map is also accepted as an alias for where in this form. It behaves the same way and is mainly intended as a friendlier keyword for exact remapping examples and AI-generated code.