Documentation

DataFrameMacros.DataFrameMacrosModule

DataFrameMacros offers macros which transform expressions for DataFrames functions that use the source => function => sink mini-language. The supported functions are @transform/@transform!, @select/@select!, @groupby, @combine, @subset/@subset!, @sort/@sort! and @unique.

All macros have signatures of the form:

@macro(df, args...; kwargs...)

Each positional argument in args is converted to a source .=> function .=> sink expression for the transformation mini-language of DataFrames. By default, all macros execute the given function by-row, only @combine executes by-column. There is automatic broadcasting across all column specifiers, so it is possible to directly use multi-column specifiers such as All(), Not(:x), r"columnname" and startswith("prefix").

For example, the following pairs of expressions are equivalent:

transform(df, :x .=> ByRow(x -> x + 1) .=> :y)
@transform(df, :y = :x + 1)

select(df, names(df, All()) .=> ByRow(x -> x ^ 2))
@select(df, $(All()) ^ 2)

combine(df, :x .=> (x -> sum(x) / 5) .=> :result)
@combine(df, :result = sum(:x) / 5)

Column references

Each positional argument must be of the form [sink =] some_expression. Columns can be referenced within sink or some_expression using a Symbol, a String, or an Int. Any column identifier that is not a Symbol must be prefaced with the interpolation symbol $. The $ interpolation symbol also allows to use variables or expressions that evaluate to column identifiers.

The five expressions in the following code block are equivalent.

using DataFrames
using DataFrameMacros

df = DataFrame(x = 1:3)

@transform(df, :y = :x + 1)
@transform(df, :y = $"x" + 1)
@transform(df, :y = $1 + 1)
col = :x
@transform(df, :y = $col + 1)
cols = [:x, :y, :z]
@transform(df, :y = $(cols[1]) + 1)

Passing multiple expressions

Multiple expressions can be passed as multiple positional arguments, or alternatively as separate lines in a begin end block. You can use parentheses, or omit them. The following expressions are equivalent:

@transform(df, :y = :x + 1, :z = :x * 2)
@transform df :y = :x + 1 :z = :x * 2
@transform df begin
    :y = :x + 1
    :z = :x * 2
end
@transform(df, begin
    :y = :x + 1
    :z = :x * 2
end)

Flag macros

You can modify the behavior of all macros using flag macros, which are not real macros but only signal changed behavior for a positional argument to the outer macro.

Each flag is specified with a single character, and you can combine these characters as well. The supported flags are:

charactermeaning
rSwitch to by-row processing.
cSwitch to by-column processing.
mWrap the function expression in passmissing.
tCollect all :symbol = expression expressions into a NamedTuple where (; symbol = expression, ...) and set the sink to AsTable.

Example @c

To compute a centered column with @transform, you need access to the whole column at once and signal this with the @c flag.

using Statistics
using DataFrames
using DataFrameMacros

julia> df = DataFrame(x = 1:3)
3×1 DataFrame
 Row │ x     
     │ Int64 
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> @transform(df, :x_centered = @c :x .- mean(:x))
3×2 DataFrame
 Row │ x      x_centered 
     │ Int64  Float64    
─────┼───────────────────
   1 │     1        -1.0
   2 │     2         0.0
   3 │     3         1.0

Example @m

Many functions need to be wrapped in passmissing to correctly return missing if any input is missing. This can be achieved with the @m flag macro.

julia> df = DataFrame(name = ["alice", "bob", missing])
3×1 DataFrame
 Row │ name    
     │ String? 
─────┼─────────
   1 │ alice
   2 │ bob
   3 │ missing 

julia> @transform(df, :name_upper = @m uppercasefirst(:name))
3×2 DataFrame
 Row │ name     name_upper 
     │ String?  String?    
─────┼─────────────────────
   1 │ alice    Alice
   2 │ bob      Bob
   3 │ missing  missing    

Example @t

In DataFrames, you can return a NamedTuple from a function and then automatically expand it into separate columns by using AsTable as the sink value. To simplify this process, you can use the @t flag macro, which collects all statements of the form :symbol = expression in the function body, collects them into a NamedTuple, and sets the sink argument to AsTable.

julia> df = DataFrame(name = ["Alice Smith", "Bob Miller"])
2×1 DataFrame
 Row │ name        
     │ String      
─────┼─────────────
   1 │ Alice Smith
   2 │ Bob Miller

julia> @transform(df, @t begin
           s = split(:name)
           :first_name = s[1]
           :last_name = s[2]
       end)
2×3 DataFrame
 Row │ name         first_name  last_name  
     │ String       SubString…  SubString… 
─────┼─────────────────────────────────────
   1 │ Alice Smith  Alice       Smith
   2 │ Bob Miller   Bob         Miller

The @t flag also works with tuple destructuring syntax, so the previous example can be shortened to:

@transform(df, @t :first_name, :last_name = split(:name))
source
DataFrameMacros.@combineMacro
@combine(df, args...; kwargs...)

The @combine macro builds a DataFrames.combine call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to combine but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source
DataFrameMacros.@select!Macro
@select!(df, args...; kwargs...)

The @select! macro builds a DataFrames.select! call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to select! but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source
DataFrameMacros.@selectMacro
@select(df, args...; kwargs...)

The @select macro builds a DataFrames.select call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to select but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source
DataFrameMacros.@subset!Macro
@subset!(df, args...; kwargs...)

The @subset! macro builds a DataFrames.subset! call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to subset! but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source
DataFrameMacros.@subsetMacro
@subset(df, args...; kwargs...)

The @subset macro builds a DataFrames.subset call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to subset but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source
DataFrameMacros.@transform!Macro
@transform!(df, args...; kwargs...)

The @transform! macro builds a DataFrames.transform! call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to transform! but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source
DataFrameMacros.@transformMacro
@transform(df, args...; kwargs...)

The @transform macro builds a DataFrames.transform call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to transform but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source
DataFrameMacros.@uniqueMacro
@unique(df, args...; kwargs...)

The @unique macro builds a DataFrames.unique call. Each expression in args is converted to a src => function => sink construct that conforms to the transformation mini-language of DataFrames.

Keyword arguments kwargs are passed down to unique but have to be separated from the positional arguments by a semicolon ;.

The transformation logic for all DataFrameMacros macros is explained in the DataFrameMacros module docstring, accessible via ?DataFrameMacros.

source