DataFrameMacros.jl
DataFrameMacros.jl offers macros for manipulating DataFrames with a syntax geared towards clarity, brevity and convenience. Each macro translates expressions into the source .=> function .=> sink
mini-language from DataFrames.jl.
The following macros are currently available:
@transform
/@transform!
@select
/@select!
@groupby
@combine
@subset
/@subset!
@sort
/@sort!
@unique
Differences to DataFramesMeta.jl
- Except
@combine
, all macros work row-wise by default in DataFrameMacros.jl - DataFrameMacros.jl uses
{}
to signal column expressions instead of$()
. - In DataFrameMacros.jl, you can apply the same expression to several columns in
{}
braces at once and even broadcast across multiple sets of columns. - In DataFrameMacros.jl, you can use special
{{ }}
multi-column expressions where you can operate on a tuple of all values at once which makes it easier to do aggregates across columns. - DataFrameMacros.jl has a special syntax to make use of
transform!
on a view returned fromsubset
, so you can easily transform only some rows of your dataset with@transform!(df, @subset(...), ...)
.
If any of these points have changed, please open an issue.
Examples
@select
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(a = 1:5, b = 6:10, c = 11:15)
5×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 6 11 2 │ 2 7 12 3 │ 3 8 13 4 │ 4 9 14 5 │ 5 10 15
julia> @select(df, :a)
5×1 DataFrame Row │ a │ Int64 ─────┼─────── 1 │ 1 2 │ 2 3 │ 3 4 │ 4 5 │ 5
julia> @select(df, :a, :b)
5×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 6 2 │ 2 7 3 │ 3 8 4 │ 4 9 5 │ 5 10
julia> @select(df, :A = :a, :B = :b)
5×2 DataFrame Row │ A B │ Int64 Int64 ─────┼────────────── 1 │ 1 6 2 │ 2 7 3 │ 3 8 4 │ 4 9 5 │ 5 10
julia> @select(df, :a + 1)
5×1 DataFrame Row │ a_function │ Int64 ─────┼──────────── 1 │ 2 2 │ 3 3 │ 4 4 │ 5 5 │ 6
julia> @select(df, :a_plus_one = :a + 1)
5×1 DataFrame Row │ a_plus_one │ Int64 ─────┼──────────── 1 │ 2 2 │ 3 3 │ 4 4 │ 5 5 │ 6
julia> @select(df, {[:a, :b]} / 2)
5×2 DataFrame Row │ a_function b_function │ Float64 Float64 ─────┼──────────────────────── 1 │ 0.5 3.0 2 │ 1.0 3.5 3 │ 1.5 4.0 4 │ 2.0 4.5 5 │ 2.5 5.0
julia> @select(df, sqrt({Not(:b)}))
5×2 DataFrame Row │ a_sqrt c_sqrt │ Float64 Float64 ─────┼────────────────── 1 │ 1.0 3.31662 2 │ 1.41421 3.4641 3 │ 1.73205 3.60555 4 │ 2.0 3.74166 5 │ 2.23607 3.87298
julia> @select(df, 5 * {All()})
5×3 DataFrame Row │ a_function b_function c_function │ Int64 Int64 Int64 ─────┼──────────────────────────────────── 1 │ 5 30 55 2 │ 10 35 60 3 │ 15 40 65 4 │ 20 45 70 5 │ 25 50 75
julia> @select(df, {Between(1, 2)} - {Between(2, 3)})
5×2 DataFrame Row │ a_b_- b_c_- │ Int64 Int64 ─────┼────────────── 1 │ -5 -5 2 │ -5 -5 3 │ -5 -5 4 │ -5 -5 5 │ -5 -5
julia> @select(df, "{1}_plus_{2}" = {Between(1, 2)} + {Between(2, 3)})
5×2 DataFrame Row │ a_plus_b b_plus_c │ Int64 Int64 ─────┼──────────────────── 1 │ 7 17 2 │ 9 19 3 │ 11 21 4 │ 13 23 5 │ 15 25
julia> @select(df, @bycol :a .- :b)
ERROR: UndefVarError: .- not defined
julia> @select(df, :d = @bycol :a .+ 1)
5×1 DataFrame Row │ d │ Int64 ─────┼─────── 1 │ 2 2 │ 3 3 │ 4 4 │ 5 5 │ 6
julia> @select(df, "a_minus_{2}" = :a - {[:b, :c]})
5×2 DataFrame Row │ a_minus_b a_minus_c │ Int64 Int64 ─────┼────────────────────── 1 │ -5 -10 2 │ -5 -10 3 │ -5 -10 4 │ -5 -10 5 │ -5 -10
julia> @select(df, "{1}_minus_{2}" = {[:a, :b, :c]} - {[:a, :b, :c]'})
5×9 DataFrame Row │ a_minus_a b_minus_a c_minus_a a_minus_b b_minus_b c_minus_b a_min ⋯ │ Int64 Int64 Int64 Int64 Int64 Int64 Int64 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 0 5 10 -5 0 5 ⋯ 2 │ 0 5 10 -5 0 5 3 │ 0 5 10 -5 0 5 4 │ 0 5 10 -5 0 5 5 │ 0 5 10 -5 0 5 ⋯ 3 columns omitted
julia> @select(df, :a + mean({{[:b, :c]}}))
5×1 DataFrame Row │ a_b_c_function │ Float64 ─────┼──────────────── 1 │ 9.5 2 │ 11.5 3 │ 13.5 4 │ 15.5 5 │ 17.5
@transform
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(a = 1:5, b = 6:10, c = 11:15)
5×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 6 11 2 │ 2 7 12 3 │ 3 8 13 4 │ 4 9 14 5 │ 5 10 15
julia> @transform(df, :a + 1)
5×4 DataFrame Row │ a b c a_function │ Int64 Int64 Int64 Int64 ─────┼───────────────────────────────── 1 │ 1 6 11 2 2 │ 2 7 12 3 3 │ 3 8 13 4 4 │ 4 9 14 5 5 │ 5 10 15 6
julia> @transform(df, :a_plus_one = :a + 1)
5×4 DataFrame Row │ a b c a_plus_one │ Int64 Int64 Int64 Int64 ─────┼───────────────────────────────── 1 │ 1 6 11 2 2 │ 2 7 12 3 3 │ 3 8 13 4 4 │ 4 9 14 5 5 │ 5 10 15 6
julia> @transform(df, @bycol :a .- mean(:b))
5×4 DataFrame Row │ a b c a_b_function │ Int64 Int64 Int64 Float64 ─────┼─────────────────────────────────── 1 │ 1 6 11 -7.0 2 │ 2 7 12 -6.0 3 │ 3 8 13 -5.0 4 │ 4 9 14 -4.0 5 │ 5 10 15 -3.0
julia> @transform(df, :d = @bycol :a .+ 1)
5×4 DataFrame Row │ a b c d │ Int64 Int64 Int64 Int64 ─────┼──────────────────────────── 1 │ 1 6 11 2 2 │ 2 7 12 3 3 │ 3 8 13 4 4 │ 4 9 14 5 5 │ 5 10 15 6
julia> @transform(df, "a_minus_{2}" = :a - {[:b, :c]})
5×5 DataFrame Row │ a b c a_minus_b a_minus_c │ Int64 Int64 Int64 Int64 Int64 ─────┼─────────────────────────────────────────── 1 │ 1 6 11 -5 -10 2 │ 2 7 12 -5 -10 3 │ 3 8 13 -5 -10 4 │ 4 9 14 -5 -10 5 │ 5 10 15 -5 -10
julia> @transform(df, "{1}_minus_{2}" = {[:a, :b, :c]} - {[:a, :b, :c]'})
5×12 DataFrame Row │ a b c a_minus_a b_minus_a c_minus_a a_minus_b b_minu ⋯ │ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1 6 11 0 5 10 -5 ⋯ 2 │ 2 7 12 0 5 10 -5 3 │ 3 8 13 0 5 10 -5 4 │ 4 9 14 0 5 10 -5 5 │ 5 10 15 0 5 10 -5 ⋯ 5 columns omitted
@combine
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(a = 1:5, b = 6:10, c = 11:15)
5×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 6 11 2 │ 2 7 12 3 │ 3 8 13 4 │ 4 9 14 5 │ 5 10 15
julia> @combine(df, :mean_a = mean(:a))
1×1 DataFrame Row │ mean_a │ Float64 ─────┼───────── 1 │ 3.0
julia> @combine(df, "mean_{}" = mean({All()}))
1×3 DataFrame Row │ mean_a mean_b mean_c │ Float64 Float64 Float64 ─────┼─────────────────────────── 1 │ 3.0 8.0 13.0
julia> @combine(df, "first_3_{}" = first({Not(:b)}, 3))
3×2 DataFrame Row │ first_3_a first_3_c │ Int64 Int64 ─────┼────────────────────── 1 │ 1 11 2 │ 2 12 3 │ 3 13
julia> @combine(df, begin :mean_a = mean(:a) :median_b = median(:b) :sum_c = sum(:c) end)
1×3 DataFrame Row │ mean_a median_b sum_c │ Float64 Float64 Int64 ─────┼────────────────────────── 1 │ 3.0 8.0 65
@sort
julia> using DataFrames
julia> using DataFrameMacros
julia> using Random
julia> Random.seed!(123)
MersenneTwister(123)
julia> df = DataFrame(randn(5, 5), :auto)
5×5 DataFrame Row │ x1 x2 x3 x4 x5 │ Float64 Float64 Float64 Float64 Float64 ─────┼──────────────────────────────────────────────────────── 1 │ 1.19027 -0.664713 -0.339366 0.368002 -0.979539 2 │ 2.04818 0.980968 -0.843878 -0.281133 0.260402 3 │ 1.14265 -0.0754831 -0.888936 -0.734886 -0.468489 4 │ 0.459416 0.273815 0.327215 -0.71741 -0.880897 5 │ -0.396679 -0.194229 0.592403 -0.77507 0.277726
julia> @sort(df, :x1)
5×5 DataFrame Row │ x1 x2 x3 x4 x5 │ Float64 Float64 Float64 Float64 Float64 ─────┼──────────────────────────────────────────────────────── 1 │ -0.396679 -0.194229 0.592403 -0.77507 0.277726 2 │ 0.459416 0.273815 0.327215 -0.71741 -0.880897 3 │ 1.14265 -0.0754831 -0.888936 -0.734886 -0.468489 4 │ 1.19027 -0.664713 -0.339366 0.368002 -0.979539 5 │ 2.04818 0.980968 -0.843878 -0.281133 0.260402
julia> @sort(df, -:x1)
5×5 DataFrame Row │ x1 x2 x3 x4 x5 │ Float64 Float64 Float64 Float64 Float64 ─────┼──────────────────────────────────────────────────────── 1 │ 2.04818 0.980968 -0.843878 -0.281133 0.260402 2 │ 1.19027 -0.664713 -0.339366 0.368002 -0.979539 3 │ 1.14265 -0.0754831 -0.888936 -0.734886 -0.468489 4 │ 0.459416 0.273815 0.327215 -0.71741 -0.880897 5 │ -0.396679 -0.194229 0.592403 -0.77507 0.277726
julia> @sort(df, :x2 * :x3)
5×5 DataFrame Row │ x1 x2 x3 x4 x5 │ Float64 Float64 Float64 Float64 Float64 ─────┼──────────────────────────────────────────────────────── 1 │ 2.04818 0.980968 -0.843878 -0.281133 0.260402 2 │ -0.396679 -0.194229 0.592403 -0.77507 0.277726 3 │ 1.14265 -0.0754831 -0.888936 -0.734886 -0.468489 4 │ 0.459416 0.273815 0.327215 -0.71741 -0.880897 5 │ 1.19027 -0.664713 -0.339366 0.368002 -0.979539
julia> df2 = DataFrame(a = [1, 2, 2, 1, 2], b = [4, 4, 4, 3, 3], c = [5, 7, 5, 7, 5])
5×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 4 5 2 │ 2 4 7 3 │ 2 4 5 4 │ 1 3 7 5 │ 2 3 5
julia> @sort(df2, :a, :b)
5×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 3 7 2 │ 1 4 5 3 │ 2 3 5 4 │ 2 4 7 5 │ 2 4 5
julia> @sort(df2, :c - :a - :b)
5×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 2 4 5 2 │ 1 4 5 3 │ 2 3 5 4 │ 2 4 7 5 │ 1 3 7
@groupby
julia> using DataFrames
julia> using DataFrameMacros
julia> using Random
julia> Random.seed!(123)
MersenneTwister(123)
julia> df = DataFrame( color = ["red", "red", "red", "blue", "blue"], size = ["big", "small", "big", "small", "big"], height = [1, 2, 3, 4, 5], )
5×3 DataFrame Row │ color size height │ String String Int64 ─────┼──────────────────────── 1 │ red big 1 2 │ red small 2 3 │ red big 3 4 │ blue small 4 5 │ blue big 5
julia> @groupby(df, :color)
GroupedDataFrame with 2 groups based on key: color First Group (3 rows): color = "red" Row │ color size height │ String String Int64 ─────┼──────────────────────── 1 │ red big 1 2 │ red small 2 3 │ red big 3 ⋮ Last Group (2 rows): color = "blue" Row │ color size height │ String String Int64 ─────┼──────────────────────── 1 │ blue small 4 2 │ blue big 5
julia> @groupby(df, :color, :size)
GroupedDataFrame with 4 groups based on keys: color, size First Group (2 rows): color = "red", size = "big" Row │ color size height │ String String Int64 ─────┼──────────────────────── 1 │ red big 1 2 │ red big 3 ⋮ Last Group (1 row): color = "blue", size = "big" Row │ color size height │ String String Int64 ─────┼──────────────────────── 1 │ blue big 5
julia> @groupby(df, :evenheight = iseven(:height))
GroupedDataFrame with 2 groups based on key: evenheight First Group (3 rows): evenheight = false Row │ color size height evenheight │ String String Int64 Bool ─────┼──────────────────────────────────── 1 │ red big 1 false 2 │ red big 3 false 3 │ blue big 5 false ⋮ Last Group (2 rows): evenheight = true Row │ color size height evenheight │ String String Int64 Bool ─────┼──────────────────────────────────── 1 │ red small 2 true 2 │ blue small 4 true
@astable
julia> using DataFrames
julia> using DataFrameMacros
julia> df = DataFrame(name = ["Jeff Bezanson", "Stefan Karpinski", "Alan Edelman", "Viral Shah"])
4×1 DataFrame Row │ name │ String ─────┼────────────────── 1 │ Jeff Bezanson 2 │ Stefan Karpinski 3 │ Alan Edelman 4 │ Viral Shah
julia> @select(df, @astable :first, :last = split(:name))
4×2 DataFrame Row │ first last │ SubStrin… SubStrin… ─────┼────────────────────── 1 │ Jeff Bezanson 2 │ Stefan Karpinski 3 │ Alan Edelman 4 │ Viral Shah
julia> @select(df, @astable begin f, l = split(:name) :first, :last = f, l :initials = first(f) * "." * first(l) * "." end)
4×3 DataFrame Row │ first last initials │ SubStrin… SubStrin… String ─────┼──────────────────────────────── 1 │ Jeff Bezanson J.B. 2 │ Stefan Karpinski S.K. 3 │ Alan Edelman A.E. 4 │ Viral Shah V.S.
@passmissing
julia> using DataFrames
julia> using DataFrameMacros
julia> df = DataFrame(short = ["cat", "dog", "mouse", "duck"], long = ["catch", "dogged", missing, "docks"])
4×2 DataFrame Row │ short long │ String String? ─────┼───────────────── 1 │ cat catch 2 │ dog dogged 3 │ mouse missing 4 │ duck docks
julia> @transform(df, :startswith = @passmissing startswith(:long, :short))
4×3 DataFrame Row │ short long startswith │ String String? Bool? ─────┼───────────────────────────── 1 │ cat catch true 2 │ dog dogged true 3 │ mouse missing missing 4 │ duck docks false
Multiple columns in {}
If {}
contains a multi-column expression, then the function is run for each combination of arguments determined by broadcasting all sets together.
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(a = 1:5, b = 6:10, c = 11:15)
5×3 DataFrame Row │ a b c │ Int64 Int64 Int64 ─────┼───────────────────── 1 │ 1 6 11 2 │ 2 7 12 3 │ 3 8 13 4 │ 4 9 14 5 │ 5 10 15
julia> @select(df, :a + {[:b, :c]})
5×2 DataFrame Row │ a_b_+ a_c_+ │ Int64 Int64 ─────┼────────────── 1 │ 7 12 2 │ 9 14 3 │ 11 16 4 │ 13 18 5 │ 15 20
julia> @select(df, :a + {Not(:a)})
5×2 DataFrame Row │ a_b_+ a_c_+ │ Int64 Int64 ─────┼────────────── 1 │ 7 12 2 │ 9 14 3 │ 11 16 4 │ 13 18 5 │ 15 20
julia> @select(df, {[:a, :b]} + {[:b, :c]})
5×2 DataFrame Row │ a_b_+ b_c_+ │ Int64 Int64 ─────┼────────────── 1 │ 7 17 2 │ 9 19 3 │ 11 21 4 │ 13 23 5 │ 15 25
julia> @select(df, {[:a, :b]} + {[:b, :c]'})
5×4 DataFrame Row │ a_b_+ b_b_+ a_c_+ b_c_+ │ Int64 Int64 Int64 Int64 ─────┼──────────────────────────── 1 │ 7 12 12 17 2 │ 9 14 14 19 3 │ 11 16 16 21 4 │ 13 18 18 23 5 │ 15 20 20 25
{{}}
syntax
The double brace syntax refers to multiple columns as a tuple, which means that you can aggregate over a larger number of columns than it would be practical to write out explicitly.
julia> using DataFrames
julia> using DataFrameMacros
julia> using Random
julia> using Statistics
julia> Random.seed!(123)
MersenneTwister(123)
julia> df = DataFrame( jan = randn(5), feb = randn(5), mar = randn(5), apr = randn(5), may = randn(5), jun = randn(5), jul = randn(5), )
5×7 DataFrame Row │ jan feb mar apr may jun jul ⋯ │ Float64 Float64 Float64 Float64 Float64 Float64 Floa ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1.19027 -0.664713 -0.339366 0.368002 -0.979539 1.52392 -0.8 ⋯ 2 │ 2.04818 0.980968 -0.843878 -0.281133 0.260402 -1.77773 0.3 3 │ 1.14265 -0.0754831 -0.888936 -0.734886 -0.468489 -2.93306 -0.1 4 │ 0.459416 0.273815 0.327215 -0.71741 -0.880897 0.782258 2.3 5 │ -0.396679 -0.194229 0.592403 -0.77507 0.277726 2.31358 -0.9 ⋯ 1 column omitted
julia> @select(df, :july_larger = :jul > median({{Between(:jan, :jun)}}))
5×1 DataFrame Row │ july_larger │ Bool ─────┼───────────── 1 │ false 2 │ true 3 │ true 4 │ true 5 │ false
julia> @select(df, :mean_smaller = mean({{All()}}) < median({{All()}}))
5×1 DataFrame Row │ mean_smaller │ Bool ─────┼────────────── 1 │ false 2 │ true 3 │ true 4 │ false 5 │ false
@transform!
on @subset
DataFrames.jl allows transform!
ing a view returned by subset(df, ..., view = true)
. If you pass a @subset
macro call without a dataframe argument to @transform!
, a view is created automatically, then the transform is executed and the original argument returned.
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame( name = ["Chicken", "Pork", "Apple", "Pear", "Beef"], type = ["Meat", "Meat", "Fruit", "Fruit", "Meat"], price = [4.99, 5.99, 0.99, 1.29, 6.99], )
5×3 DataFrame Row │ name type price │ String String Float64 ─────┼────────────────────────── 1 │ Chicken Meat 4.99 2 │ Pork Meat 5.99 3 │ Apple Fruit 0.99 4 │ Pear Fruit 1.29 5 │ Beef Meat 6.99
julia> @transform!(df, @subset(:type == "Meat"), :price = :price + 2)
5×3 DataFrame Row │ name type price │ String String Float64 ─────┼────────────────────────── 1 │ Chicken Meat 6.99 2 │ Pork Meat 7.99 3 │ Apple Fruit 0.99 4 │ Pear Fruit 1.29 5 │ Beef Meat 8.99
julia> @transform!(df, @subset(:price < 7, :name != "Pear"), :n_sold = round(Int, :price * 5))
5×4 DataFrame Row │ name type price n_sold │ String String Float64 Int64? ─────┼─────────────────────────────────── 1 │ Chicken Meat 6.99 35 2 │ Pork Meat 7.99 missing 3 │ Apple Fruit 0.99 5 4 │ Pear Fruit 1.29 missing 5 │ Beef Meat 8.99 missing
julia> @transform!( @groupby(df, :type), @subset(@bycol :price .< mean(:price)), :price = 100 * :price)
5×4 DataFrame Row │ name type price n_sold │ String String Float64 Int64? ─────┼─────────────────────────────────── 1 │ Chicken Meat 699.0 35 2 │ Pork Meat 7.99 missing 3 │ Apple Fruit 99.0 5 4 │ Pear Fruit 1.29 missing 5 │ Beef Meat 8.99 missing
Special case @nrow
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(x = [1, 1, 1, 2, 2])
5×1 DataFrame Row │ x │ Int64 ─────┼─────── 1 │ 1 2 │ 1 3 │ 1 4 │ 2 5 │ 2
julia> @transform(df, @nrow)
5×2 DataFrame Row │ x nrow │ Int64 Int64 ─────┼────────────── 1 │ 1 5 2 │ 1 5 3 │ 1 5 4 │ 2 5 5 │ 2 5
julia> @combine(groupby(df, :x), :count = @nrow)
2×2 DataFrame Row │ x count │ Int64 Int64 ─────┼────────────── 1 │ 1 3 2 │ 2 2
Special case @eachindex
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(x = [1, 1, 1, 2, 2])
5×1 DataFrame Row │ x │ Int64 ─────┼─────── 1 │ 1 2 │ 1 3 │ 1 4 │ 2 5 │ 2
julia> @transform(df, @eachindex)
5×2 DataFrame Row │ x eachindex │ Int64 Int64 ─────┼────────────────── 1 │ 1 1 2 │ 1 2 3 │ 1 3 4 │ 2 4 5 │ 2 5
julia> @combine(groupby(df, :x), :i = @eachindex)
5×2 DataFrame Row │ x i │ Int64 Int64 ─────┼────────────── 1 │ 1 1 2 │ 1 2 3 │ 1 3 4 │ 2 1 5 │ 2 2
Special case @proprow
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(x = [1, 1, 1, 2, 2])
5×1 DataFrame Row │ x │ Int64 ─────┼─────── 1 │ 1 2 │ 1 3 │ 1 4 │ 2 5 │ 2
julia> @combine(groupby(df, :x), :p = @proprow)
2×2 DataFrame Row │ x p │ Int64 Float64 ─────┼──────────────── 1 │ 1 0.6 2 │ 2 0.4
Special case @groupindices
julia> using DataFrames
julia> using DataFrameMacros
julia> using Statistics
julia> df = DataFrame(x = [1, 1, 1, 2, 2])
5×1 DataFrame Row │ x │ Int64 ─────┼─────── 1 │ 1 2 │ 1 3 │ 1 4 │ 2 5 │ 2
julia> @combine(groupby(df, :x), :gi = @groupindices)
2×2 DataFrame Row │ x gi │ Int64 Int64 ─────┼────────────── 1 │ 1 1 2 │ 2 2