DataFrameMacros.jl
DataFrameMacros.jl offers macros for DataFrame manipulation with a syntax geared towards clarity, brevity and convenience. Each macro translates expressions into the more verbose source => function => sink mini-language from DataFrames.jl.
Here is a simple example:
julia> using DataFrameMacros, DataFramesjulia> df = DataFrame(name = ["Mary Louise Parker", "Thomas John Fisher"])2×1 DataFrame Row │ name │ String ─────┼──────────────────── 1 │ Mary Louise Parker 2 │ Thomas John Fisherjulia> result = @transform(df, :middle_initial = split(:name)[2][1] * ".")2×2 DataFrame Row │ name middle_initial │ String String ─────┼──────────────────────────────────── 1 │ Mary Louise Parker L. 2 │ Thomas John Fisher J.
Unlike DataFrames.jl, most operations are row-wise by default. This often results in cleaner code that's easier to understand and reason about, especially when string or object manipulation is involved. Such operations often don't have a clean broadcasting syntax, for example, somestring[2] is easier to read than getindex.(somestrings, 2). The same is true for someobject.property and getproperty.(someobjects, :property).
The following macros are currently available:
@transform/@transform!@select/@select!@groupby@combine@subset/@subset!@sort/@sort!@unique
Together with Chain.jl, you get a convient syntax for chains of transformations:
using DataFrameMacros
using DataFrames
using Chain
using Random
using Statistics
Random.seed!(123)
df = DataFrame(
id = shuffle(1:5),
group = rand('a':'b', 5),
weight_kg = randn(5) .* 5 .+ 60,
height_cm = randn(5) .* 10 .+ 170)
result = @chain df begin
@subset(:weight_kg > 50)
@transform(:BMI = :weight_kg / (:height_cm / 100) ^ 2)
@groupby(iseven(:id), :group)
@combine(:mean_BMI = mean(:BMI))
@sort(sqrt(:mean_BMI))
end
show(result)4×3 DataFrame
Row │ id_iseven group mean_BMI
│ Bool Char Float64
─────┼────────────────────────────
1 │ false a 19.0728
2 │ true a 20.4405
3 │ false b 22.097
4 │ true b 22.9701Design choices
These are the most important aspects that differ from other packages (DataFramesMeta.jl in particular):
- All macros except
@combinework row-wise by default. This reduces syntax complexity in most cases because no broadcasting is necessary. A flag macro (@cor@r) can be used to switch between row/column-based mode when needed. @groupbyand@sortallow using arbitrary expressions including multiple columns, without having to@transformfirst and repeat the new column names.- Column expressions are interpolated into the macro with
$. All column expressions are broadcasted implicitly to create a collection of src-func-sink pairs. This allows to use multi-column specifiers likeAll()orNot(:x)where the specified transformation is executed for each column. - Keyword arguments to the macro-underlying functions work by separating them from column expressions with the
;character. - Target column names are written with
:symbols to avoid visual ambiguity (:newcol = ...). This also allows to useAsTableas a target like in DataFrames.jl. - The flag macro can also include the character
mto switch on automaticpassmissingin row-wise mode. - There is also a
@tflag macro, which extracts every:sym = expressionexpression and collects the new symbols in a named tuple, while setting the target toAsTable.