Documentation
DataFrameMacros.DataFrameMacros
— ModuleDataFrameMacros offers macros which transform expressions for DataFrames functions that use the source => function => sink
mini-language. The supported functions are @transform
/@transform!
, @select/@select!
, @groupby
, @combine
, @subset
/@subset!
, @sort
/@sort!
and @unique
.
All macros have signatures of the form:
@macro(df, args...; kwargs...)
Each positional argument in args
is converted to a source .=> function .=> sink
expression for the transformation mini-language of DataFrames. By default, all macros execute the given function by-row, only @combine
executes by-column. There is automatic broadcasting across all column specifiers, so it is possible to directly use multi-column specifiers such as All()
, Not(:x)
, r"columnname"
and startswith("prefix")
.
For example, the following pairs of expressions are equivalent:
transform(df, :x .=> ByRow(x -> x + 1) .=> :y)
@transform(df, :y = :x + 1)
select(df, names(df, All()) .=> ByRow(x -> x ^ 2))
@select(df, $(All()) ^ 2)
combine(df, :x .=> (x -> sum(x) / 5) .=> :result)
@combine(df, :result = sum(:x) / 5)
Column references
Each positional argument must be of the form [sink =] some_expression
. Columns can be referenced within sink
or some_expression
using a Symbol
, a String
, or an Int
. Any column identifier that is not a Symbol
must be prefaced with the interpolation symbol $
. The $
interpolation symbol also allows to use variables or expressions that evaluate to column identifiers.
The five expressions in the following code block are equivalent.
using DataFrames
using DataFrameMacros
df = DataFrame(x = 1:3)
@transform(df, :y = :x + 1)
@transform(df, :y = $"x" + 1)
@transform(df, :y = $1 + 1)
col = :x
@transform(df, :y = $col + 1)
cols = [:x, :y, :z]
@transform(df, :y = $(cols[1]) + 1)
Passing multiple expressions
Multiple expressions can be passed as multiple positional arguments, or alternatively as separate lines in a begin end
block. You can use parentheses, or omit them. The following expressions are equivalent:
@transform(df, :y = :x + 1, :z = :x * 2)
@transform df :y = :x + 1 :z = :x * 2
@transform df begin
:y = :x + 1
:z = :x * 2
end
@transform(df, begin
:y = :x + 1
:z = :x * 2
end)
Flag macros
You can modify the behavior of all macros using flag macros, which are not real macros but only signal changed behavior for a positional argument to the outer macro.
Each flag is specified with a single character, and you can combine these characters as well. The supported flags are:
character | meaning |
---|---|
r | Switch to by-row processing. |
c | Switch to by-column processing. |
m | Wrap the function expression in passmissing . |
t | Collect all :symbol = expression expressions into a NamedTuple where (; symbol = expression, ...) and set the sink to AsTable . |
Example @c
To compute a centered column with @transform
, you need access to the whole column at once and signal this with the @c
flag.
using Statistics
using DataFrames
using DataFrameMacros
julia> df = DataFrame(x = 1:3)
3×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
julia> @transform(df, :x_centered = @c :x .- mean(:x))
3×2 DataFrame
Row │ x x_centered
│ Int64 Float64
─────┼───────────────────
1 │ 1 -1.0
2 │ 2 0.0
3 │ 3 1.0
Example @m
Many functions need to be wrapped in passmissing
to correctly return missing
if any input is missing
. This can be achieved with the @m
flag macro.
julia> df = DataFrame(name = ["alice", "bob", missing])
3×1 DataFrame
Row │ name
│ String?
─────┼─────────
1 │ alice
2 │ bob
3 │ missing
julia> @transform(df, :name_upper = @m uppercasefirst(:name))
3×2 DataFrame
Row │ name name_upper
│ String? String?
─────┼─────────────────────
1 │ alice Alice
2 │ bob Bob
3 │ missing missing
Example @t
In DataFrames, you can return a NamedTuple
from a function and then automatically expand it into separate columns by using AsTable
as the sink value. To simplify this process, you can use the @t
flag macro, which collects all statements of the form :symbol = expression
in the function body, collects them into a NamedTuple
, and sets the sink argument to AsTable
.
julia> df = DataFrame(name = ["Alice Smith", "Bob Miller"])
2×1 DataFrame
Row │ name
│ String
─────┼─────────────
1 │ Alice Smith
2 │ Bob Miller
julia> @transform(df, @t begin
s = split(:name)
:first_name = s[1]
:last_name = s[2]
end)
2×3 DataFrame
Row │ name first_name last_name
│ String SubString… SubString…
─────┼─────────────────────────────────────
1 │ Alice Smith Alice Smith
2 │ Bob Miller Bob Miller
The @t
flag also works with tuple destructuring syntax, so the previous example can be shortened to:
@transform(df, @t :first_name, :last_name = split(:name))
DataFrameMacros.@combine
— Macro@combine(df, args...; kwargs...)
The @combine
macro builds a DataFrames.combine
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to combine
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.
DataFrameMacros.@select!
— Macro@select!(df, args...; kwargs...)
The @select!
macro builds a DataFrames.select!
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to select!
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.
DataFrameMacros.@select
— Macro@select(df, args...; kwargs...)
The @select
macro builds a DataFrames.select
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to select
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.
DataFrameMacros.@subset!
— Macro@subset!(df, args...; kwargs...)
The @subset!
macro builds a DataFrames.subset!
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to subset!
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.
DataFrameMacros.@subset
— Macro@subset(df, args...; kwargs...)
The @subset
macro builds a DataFrames.subset
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to subset
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.
DataFrameMacros.@transform!
— Macro@transform!(df, args...; kwargs...)
The @transform!
macro builds a DataFrames.transform!
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to transform!
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.
DataFrameMacros.@transform
— Macro@transform(df, args...; kwargs...)
The @transform
macro builds a DataFrames.transform
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to transform
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.
DataFrameMacros.@unique
— Macro@unique(df, args...; kwargs...)
The @unique
macro builds a DataFrames.unique
call. Each expression in args
is converted to a src => function => sink
construct that conforms to the transformation mini-language of DataFrames.
Keyword arguments kwargs
are passed down to unique
but have to be separated from the positional arguments by a semicolon ;
.
The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros
module docstring, accessible via ?DataFrameMacros
.