Documentation
DataFrameMacros.DataFrameMacros — ModuleDataFrameMacros offers macros which transform expressions for DataFrames functions that use the source => function => sink mini-language. The supported functions are @transform/@transform!, @select/@select!, @groupby, @combine, @subset/@subset!, @sort/@sort! and @unique.
All macros have signatures of the form:
@macro(df, args...; kwargs...)Each positional argument in args is converted to a source .=> function .=> sink expression for the transformation mini-language of DataFrames. By default, all macros execute the given function by-row, only @combine executes by-column. There is automatic broadcasting across all column specifiers, so it is possible to directly use multi-column specifiers such as All(), Not(:x), r"columnname" and startswith("prefix").
For example, the following pairs of expressions are equivalent:
transform(df, :x .=> ByRow(x -> x + 1) .=> :y)
@transform(df, :y = :x + 1)
select(df, names(df, All()) .=> ByRow(x -> x ^ 2))
@select(df, $(All()) ^ 2)
combine(df, :x .=> (x -> sum(x) / 5) .=> :result)
@combine(df, :result = sum(:x) / 5)Column references
Each positional argument must be of the form [sink =] some_expression. Columns can be referenced within sink or some_expression using a Symbol, a String, or an Int. Any column identifier that is not a Symbol must be prefaced with the interpolation symbol $. The $ interpolation symbol also allows to use variables or expressions that evaluate to column identifiers.
The five expressions in the following code block are equivalent.
using DataFrames
using DataFrameMacros
df = DataFrame(x = 1:3)
@transform(df, :y = :x + 1)
@transform(df, :y = $"x" + 1)
@transform(df, :y = $1 + 1)
col = :x
@transform(df, :y = $col + 1)
cols = [:x, :y, :z]
@transform(df, :y = $(cols[1]) + 1)Passing multiple expressions
Multiple expressions can be passed as multiple positional arguments, or alternatively as separate lines in a begin end block. You can use parentheses, or omit them. The following expressions are equivalent:
@transform(df, :y = :x + 1, :z = :x * 2)
@transform df :y = :x + 1 :z = :x * 2
@transform df begin
:y = :x + 1
:z = :x * 2
end
@transform(df, begin
:y = :x + 1
:z = :x * 2
end)Flag macros
You can modify the behavior of all macros using flag macros, which are not real macros but only signal changed behavior for a positional argument to the outer macro.
Each flag is specified with a single character, and you can combine these characters as well. The supported flags are:
| character | meaning |
|---|---|
| r | Switch to by-row processing. |
| c | Switch to by-column processing. |
| m | Wrap the function expression in passmissing. |
| t | Collect all :symbol = expression expressions into a NamedTuple where (; symbol = expression, ...) and set the sink to AsTable. |
Example @c
To compute a centered column with @transform, you need access to the whole column at once and signal this with the @c flag.
using Statistics
using DataFrames
using DataFrameMacros
julia> df = DataFrame(x = 1:3)
3×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
julia> @transform(df, :x_centered = @c :x .- mean(:x))
3×2 DataFrame
Row │ x x_centered
│ Int64 Float64
─────┼───────────────────
1 │ 1 -1.0
2 │ 2 0.0
3 │ 3 1.0Example @m
Many functions need to be wrapped in passmissing to correctly return missing if any input is missing. This can be achieved with the @m flag macro.
julia> df = DataFrame(name = ["alice", "bob", missing])
3×1 DataFrame
Row │ name
│ String?
─────┼─────────
1 │ alice
2 │ bob
3 │ missing
julia> @transform(df, :name_upper = @m uppercasefirst(:name))
3×2 DataFrame
Row │ name name_upper
│ String? String?
─────┼─────────────────────
1 │ alice Alice
2 │ bob Bob
3 │ missing missing Example @t
In DataFrames, you can return a NamedTuple from a function and then automatically expand it into separate columns by using AsTable as the sink value. To simplify this process, you can use the @t flag macro, which collects all statements of the form :symbol = expression in the function body, collects them into a NamedTuple, and sets the sink argument to AsTable.
julia> df = DataFrame(name = ["Alice Smith", "Bob Miller"])
2×1 DataFrame
Row │ name
│ String
─────┼─────────────
1 │ Alice Smith
2 │ Bob Miller
julia> @transform(df, @t begin
s = split(:name)
:first_name = s[1]
:last_name = s[2]
end)
2×3 DataFrame
Row │ name first_name last_name
│ String SubString… SubString…
─────┼─────────────────────────────────────
1 │ Alice Smith Alice Smith
2 │ Bob Miller Bob MillerThe @t flag also works with tuple destructuring syntax, so the previous example can be shortened to:
@transform(df, @t :first_name, :last_name = split(:name))DataFrameMacros.@combine — Macro@combine(df, args...; kwargs...)The @combine macro builds a DataFrames.combine call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to combine but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.
DataFrameMacros.@select! — Macro@select!(df, args...; kwargs...)The @select! macro builds a DataFrames.select! call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to select! but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.
DataFrameMacros.@select — Macro@select(df, args...; kwargs...)The @select macro builds a DataFrames.select call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to select but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.
DataFrameMacros.@subset! — Macro@subset!(df, args...; kwargs...)The @subset! macro builds a DataFrames.subset! call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to subset! but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.
DataFrameMacros.@subset — Macro@subset(df, args...; kwargs...)The @subset macro builds a DataFrames.subset call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to subset but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.
DataFrameMacros.@transform! — Macro@transform!(df, args...; kwargs...)The @transform! macro builds a DataFrames.transform! call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to transform! but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.
DataFrameMacros.@transform — Macro@transform(df, args...; kwargs...)The @transform macro builds a DataFrames.transform call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to transform but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.
DataFrameMacros.@unique — Macro@unique(df, args...; kwargs...)The @unique macro builds a DataFrames.unique call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs are passed down to unique but have to be separated from the positional arguments by a semicolon ;.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.