Tutorial

In this tutorial, we'll get to know the macros of DataFrameMacros while working with the well-known Titanic dataset from Kaggle.

Loading the data

The titanic function returns the DataFrame with data about passengers of the Titanic.

julia> using DataFrameMacros, DataFrames, Statistics
julia> df = DataFrameMacros.titanic()891×12 DataFrame
 Row │ PassengerId  Survived  Pclass  Name                               Sex     Age        SibSp  Parch  Ticket            Fare     Cabin    Embarked
     │ Int64        Int64     Int64   String                             String  Float64?   Int64  Int64  String            Float64  String?  String?
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │           1         0       3  Braund, Mr. Owen Harris            male         22.0      1      0  A/5 21171          7.25    missing  S
   2 │           2         1       1  Cumings, Mrs. John Bradley (Flor…  female       38.0      1      0  PC 17599          71.2833  C85      C
   3 │           3         1       3  Heikkinen, Miss. Laina             female       26.0      0      0  STON/O2. 3101282   7.925   missing  S
   4 │           4         1       1  Futrelle, Mrs. Jacques Heath (Li…  female       35.0      1      0  113803            53.1     C123     S
  ⋮  │      ⋮          ⋮        ⋮                     ⋮                    ⋮         ⋮        ⋮      ⋮           ⋮             ⋮        ⋮        ⋮
 889 │         889         0       3  Johnston, Miss. Catherine Helen …  female  missing        1      2  W./C. 6607        23.45    missing  S
 890 │         890         1       1  Behr, Mr. Karl Howell              male         26.0      0      0  111369            30.0     C148     C
 891 │         891         0       3  Dooley, Mr. Patrick                male         32.0      0      0  370376             7.75    missing  Q
                                                                                                                                       884 rows omitted

The simplest operation one can do is to select columns from a DataFrame. DataFrames.jl has the select function for that purpose and DataFramesMacro has the corresponding @select macro. We can pass symbols or strings with column names that we're interested in.

julia> @select(df, :Name, :Age, :Survived)891×3 DataFrame
 Row │ Name                               Age        Survived
     │ String                             Float64?   Int64
─────┼────────────────────────────────────────────────────────
   1 │ Braund, Mr. Owen Harris                 22.0         0
   2 │ Cumings, Mrs. John Bradley (Flor…       38.0         1
   3 │ Heikkinen, Miss. Laina                  26.0         1
   4 │ Futrelle, Mrs. Jacques Heath (Li…       35.0         1
  ⋮  │                 ⋮                      ⋮         ⋮
 889 │ Johnston, Miss. Catherine Helen …  missing           0
 890 │ Behr, Mr. Karl Howell                   26.0         1
 891 │ Dooley, Mr. Patrick                     32.0         0
                                              884 rows omitted

We can also compute new columns with @select. We can either specify a new column ourselves, or DataFrames selects an automatic name.

For example, we can extract the last name from each name string by splitting at the comma.

julia> @select(df, :last_name = split(:Name, ",")[1])891×1 DataFrame
 Row │ last_name
     │ SubStrin…
─────┼───────────
   1 │ Braund
   2 │ Cumings
   3 │ Heikkinen
   4 │ Futrelle
  ⋮  │     ⋮
 889 │ Johnston
 890 │ Behr
 891 │ Dooley
 884 rows omitted

The split function operates on a single string, so for this expression to work on the whole column :Name, there must be an implicit broadcast expansion happening. In DataFrameMacros, every macro but @combine works by-row by default. The expression that the @select macro creates is equivalent to the following ByRow construct:

select(df, :Name => ByRow(x -> split(x, ",")[1]) => :last_name)

@transform

Another thing we can try is to categorize every passenger into child or adult at the boundary of 18 years.

Let's use the @transform macro this time, which appends new columns to an existing DataFrame.

julia> @transform(df, :type = :Age >= 18 ? "adult" : "child")ERROR: TypeError: non-boolean (Missing) used in boolean context

This command fails because some passengers have no age recorded, and the ternary operator ... ? ... : ... (a shortcut for if ... then ... else ...) cannot operate on missing values.

The @m `passmissing` flag macro

One option is to remove the missing values beforehand, but then we would have to delete rows from the dataset. A simple option to make the expression pass through missing values, is by using the special flag macro @m.

julia> @transform(df, :type = @m :Age >= 18 ? "adult" : "child")891×13 DataFrame
 Row │ PassengerId  Survived  Pclass  Name                               Sex     Age        SibSp  Parch  Ticket            Fare     Cabin    Embarked  type
     │ Int64        Int64     Int64   String                             String  Float64?   Int64  Int64  String            Float64  String?  String?   String?
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │           1         0       3  Braund, Mr. Owen Harris            male         22.0      1      0  A/5 21171          7.25    missing  S         adult
   2 │           2         1       1  Cumings, Mrs. John Bradley (Flor…  female       38.0      1      0  PC 17599          71.2833  C85      C         adult
   3 │           3         1       3  Heikkinen, Miss. Laina             female       26.0      0      0  STON/O2. 3101282   7.925   missing  S         adult
   4 │           4         1       1  Futrelle, Mrs. Jacques Heath (Li…  female       35.0      1      0  113803            53.1     C123     S         adult
  ⋮  │      ⋮          ⋮        ⋮                     ⋮                    ⋮         ⋮        ⋮      ⋮           ⋮             ⋮        ⋮        ⋮         ⋮
 889 │         889         0       3  Johnston, Miss. Catherine Helen …  female  missing        1      2  W./C. 6607        23.45    missing  S         missing
 890 │         890         1       1  Behr, Mr. Karl Howell              male         26.0      0      0  111369            30.0     C148     C         adult
 891 │         891         0       3  Dooley, Mr. Patrick                male         32.0      0      0  370376             7.75    missing  Q         adult
                                                                                                                                                884 rows omitted

This is equivalent to a DataFrames construct, in which the function is wrapped in passmissing:

transform(df, :Age => ByRow(passmissing(x -> x >= 18 ? "adult" : "child")) => :type)

This way, if any input argument is missing, the function returns missing, too.

@subset

To retain only rows that fulfill certain conditions, you can use the @subset macro. For this macro it does not make sense to specify sink column names, because derived columns do not appear in the result. If there are missing values, you can use the @m flag to pass them through the boolean condition, and add the keyword argument skipmissing = true which the underlying subset function requires to remove such rows.

julia> @subset(df, @m startswith(:Name, "M") && :Age > 50; skipmissing = true)7×12 DataFrame
 Row │ PassengerId  Survived  Pclass  Name                         Sex     Age       SibSp  Parch  Ticket       Fare     Cabin    Embarked
     │ Int64        Int64     Int64   String                       String  Float64?  Int64  Int64  String       Float64  String?  String?
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │           7         0       1  McCarthy, Mr. Timothy J      male        54.0      0      0  17463        51.8625  E46      S
   2 │         153         0       3  Meo, Mr. Alfonzo             male        55.5      0      0  A.5. 11206    8.05    missing  S
   3 │         318         0       2  Moraweck, Dr. Ernest         male        54.0      0      0  29011        14.0     missing  S
   4 │         457         0       1  Millet, Mr. Francis Davis    male        65.0      0      0  13509        26.55    E38      S
   5 │         493         0       1  Molson, Mr. Harry Markland   male        55.0      0      0  113787       30.5     C30      S
   6 │         673         0       2  Mitchell, Mr. Henry Michael  male        70.0      0      0  C.A. 24580   10.5     missing  S
   7 │         773         0       2  Mack, Mrs. (Mary)            female      57.0      0      0  S.O./P.P. 3  10.5     E77      S

@groupby

The groupby function in DataFrames does not use the src => function => sink mini-language, it requires you to create any columns you want to group by beforehand. In DataFrameMacros, the @groupby macro works like a transform and groupby combination, so that you can create columns and group by them in one stroke.

For example, we could group the passengers based on if their last name begins with a letter from the first or the second half of the alphabet.

julia> @groupby(df, :alphabet_half = :Name[1] <= 'M' ? "first" : "second")GroupedDataFrame with 2 groups based on key: alphabet_half
First Group (570 rows): alphabet_half = "first"
 Row │ PassengerId  Survived  Pclass  Name                               Sex     Age        SibSp  Parch  Ticket            Fare     Cabin    Embarked  alphabet_half
     │ Int64        Int64     Int64   String                             String  Float64?   Int64  Int64  String            Float64  String?  String?   String
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │           1         0       3  Braund, Mr. Owen Harris            male         22.0      1      0  A/5 21171          7.25    missing  S         first
   2 │           2         1       1  Cumings, Mrs. John Bradley (Flor…  female       38.0      1      0  PC 17599          71.2833  C85      C         first
   3 │           3         1       3  Heikkinen, Miss. Laina             female       26.0      0      0  STON/O2. 3101282   7.925   missing  S         first
   4 │           4         1       1  Futrelle, Mrs. Jacques Heath (Li…  female       35.0      1      0  113803            53.1     C123     S         first
  ⋮  │      ⋮          ⋮        ⋮                     ⋮                    ⋮         ⋮        ⋮      ⋮           ⋮             ⋮        ⋮        ⋮            ⋮
 567 │         888         1       1  Graham, Miss. Margaret Edith       female       19.0      0      0  112053            30.0     B42      S         first
 568 │         889         0       3  Johnston, Miss. Catherine Helen …  female  missing        1      2  W./C. 6607        23.45    missing  S         first
 569 │         890         1       1  Behr, Mr. Karl Howell              male         26.0      0      0  111369            30.0     C148     C         first
 570 │         891         0       3  Dooley, Mr. Patrick                male         32.0      0      0  370376             7.75    missing  Q         first
                                                                                                                                                      562 rows omitted
⋮
Last Group (321 rows): alphabet_half = "second"
 Row │ PassengerId  Survived  Pclass  Name                               Sex     Age        SibSp  Parch  Ticket           Fare     Cabin    Embarked  alphabet_half
     │ Int64        Int64     Int64   String                             String  Float64?   Int64  Int64  String           Float64  String?  String?   String
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │           8         0       3  Palsson, Master. Gosta Leonard     male          2.0      3      1  349909           21.075   missing  S         second
   2 │          10         1       2  Nasser, Mrs. Nicholas (Adele Ach…  female       14.0      1      0  237736           30.0708  missing  C         second
   3 │          11         1       3  Sandstrom, Miss. Marguerite Rut    female        4.0      1      1  PP 9549          16.7     G6       S         second
   4 │          13         0       3  Saundercock, Mr. William Henry     male         20.0      0      0  A/5. 2151         8.05    missing  S         second
  ⋮  │      ⋮          ⋮        ⋮                     ⋮                    ⋮         ⋮        ⋮      ⋮           ⋮            ⋮        ⋮        ⋮            ⋮
 318 │         880         1       1  Potter, Mrs. Thomas Jr (Lily Ale…  female       56.0      0      1  11767            83.1583  C50      C         second
 319 │         881         1       2  Shelley, Mrs. William (Imanita P…  female       25.0      0      1  230433           26.0     missing  S         second
 320 │         885         0       3  Sutehall, Mr. Henry Jr             male         25.0      0      0  SOTON/OQ 392076   7.05    missing  S         second
 321 │         886         0       3  Rice, Mrs. William (Margaret Nor…  female       39.0      0      5  382652           29.125   missing  Q         second
                                                                                                                                                     313 rows omitted

`begin ... end` syntax

You can of course group by multiple columns, in that case just add more positional arguments. In order to write more readable code, we can arrange our multiple arguments as lines in a begin ... end block instead of two comma-separated positional arguments.

julia> group = @groupby df begin
           :alphabet_half = :Name[1] <= 'M' ? "first" : "second"
           :Sex
       endGroupedDataFrame with 4 groups based on keys: alphabet_half, Sex
First Group (368 rows): alphabet_half = "first", Sex = "male"
 Row │ PassengerId  Survived  Pclass  Name                           Sex     Age        SibSp  Parch  Ticket            Fare     Cabin    Embarked  alphabet_half
     │ Int64        Int64     Int64   String                         String  Float64?   Int64  Int64  String            Float64  String?  String?   String
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │           1         0       3  Braund, Mr. Owen Harris        male         22.0      1      0  A/5 21171          7.25    missing  S         first
   2 │           5         0       3  Allen, Mr. William Henry       male         35.0      0      0  373450             8.05    missing  S         first
   3 │           6         0       3  Moran, Mr. James               male    missing        0      0  330877             8.4583  missing  Q         first
   4 │           7         0       1  McCarthy, Mr. Timothy J        male         54.0      0      0  17463             51.8625  E46      S         first
  ⋮  │      ⋮          ⋮        ⋮                   ⋮                  ⋮         ⋮        ⋮      ⋮           ⋮             ⋮        ⋮        ⋮            ⋮
 365 │         884         0       2  Banfield, Mr. Frederick James  male         28.0      0      0  C.A./SOTON 34068  10.5     missing  S         first
 366 │         887         0       2  Montvila, Rev. Juozas          male         27.0      0      0  211536            13.0     missing  S         first
 367 │         890         1       1  Behr, Mr. Karl Howell          male         26.0      0      0  111369            30.0     C148     C         first
 368 │         891         0       3  Dooley, Mr. Patrick            male         32.0      0      0  370376             7.75    missing  Q         first
                                                                                                                                                  360 rows omitted
⋮
Last Group (112 rows): alphabet_half = "second", Sex = "female"
 Row │ PassengerId  Survived  Pclass  Name                               Sex     Age        SibSp  Parch  Ticket    Fare      Cabin    Embarked  alphabet_half
     │ Int64        Int64     Int64   String                             String  Float64?   Int64  Int64  String    Float64   String?  String?   String
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │          10         1       2  Nasser, Mrs. Nicholas (Adele Ach…  female       14.0      1      0  237736     30.0708  missing  C         second
   2 │          11         1       3  Sandstrom, Miss. Marguerite Rut    female        4.0      1      1  PP 9549    16.7     G6       S         second
   3 │          15         0       3  Vestrom, Miss. Hulda Amanda Adol…  female       14.0      0      0  350406      7.8542  missing  S         second
   4 │          19         0       3  Vander Planke, Mrs. Julius (Emel…  female       31.0      1      0  345763     18.0     missing  S         second
  ⋮  │      ⋮          ⋮        ⋮                     ⋮                    ⋮         ⋮        ⋮      ⋮       ⋮         ⋮         ⋮        ⋮            ⋮
 109 │         876         1       3  Najib, Miss. Adele Kiamie "Jane"   female       15.0      0      0  2667        7.225   missing  C         second
 110 │         880         1       1  Potter, Mrs. Thomas Jr (Lily Ale…  female       56.0      0      1  11767      83.1583  C50      C         second
 111 │         881         1       2  Shelley, Mrs. William (Imanita P…  female       25.0      0      1  230433     26.0     missing  S         second
 112 │         886         0       3  Rice, Mrs. William (Margaret Nor…  female       39.0      0      5  382652     29.125   missing  Q         second
                                                                                                                                               104 rows omitted

@combine

We can compute summary statistics on groups using the @combine macro. This is the only macro that works by-column by default because aggregations are most commonly computed on full columns, not on each row.

For example, we can compute survival rates for the groups we created above.

julia> @combine(group, :survival_rate = mean(:Survived))4×3 DataFrame
 Row │ alphabet_half  Sex     survival_rate
     │ String         String  Float64
─────┼──────────────────────────────────────
   1 │ first          male         0.214674
   2 │ first          female       0.752475
   3 │ second         male         0.143541
   4 │ second         female       0.723214

@chain

The @chain macro from Chain.jl is useful to build sequences of operations. It is not included in DataFrameMacros but works well with it.

In a chain, the first argument of each function or macro call is by default the result from the previous line.

julia> using Chain
julia> @chain df begin
           @select(:Sex, :Age, :Survived)
           dropmissing(:Age)
           @groupby(:Sex, :age_range =
               floor(Int, :Age/10) * 10 : ceil(Int, :Age/10) * 10 - 1)
           @combine(:survival_rate = mean(:Survived))
           @sort(first(:age_range), :Sex)
       end17×3 DataFrame
 Row │ Sex     age_range  survival_rate
     │ String  UnitRang…  Float64
─────┼──────────────────────────────────
   1 │ female  0:9            0.633333
   2 │ male    0:9            0.59375
   3 │ female  10:19          0.772727
   4 │ male    10:19          0.125
  ⋮  │   ⋮         ⋮            ⋮
  15 │ female  60:69          1.0
  16 │ male    60:69          0.0833333
  17 │ male    70:79          0.0
                         10 rows omitted

Here you could also see the @sort macro, which is useful when you want to sort by values that are derived from different columns, but which you don't want to include in the DataFrame.

The @c flag macro

Some @transform or @select calls require access to whole columns at once. One scenario is computing a z-score. Because @transform and @select work by-row by default, you need to add the @c flag macro to signal that you want to work by-column. This is exactly the opposite from DataFrames, where you work by-column by default and signal by-row behavior with the ByRow wrapper.

julia> @select(
           dropmissing(df, :Age),
           :age_z = @c (:Age .- mean(:Age)) ./ std(:Age))714×1 DataFrame
 Row │ age_z
     │ Float64
─────┼───────────
   1 │ -0.530005
   2 │  0.57143
   3 │ -0.254646
   4 │  0.364911
  ⋮  │     ⋮
 712 │ -0.736524
 713 │ -0.254646
 714 │  0.158392
 707 rows omitted

The @t flag macro

If a computation should return multiple different columns, DataFrames allows you to do this by returning a NamedTuple and setting the sink argument to AsTable. To streamline this process you can use the @t flag macro. It signals that all :symbol = expression expressions that are found are rewritten so that a NamedTuple like (symbol = expression, symbol2...) is returned and the sink argument is set to AsTable.

julia> @select(df, @t begin
           nameparts = split(:Name, r"[\s,]+")
           :title = nameparts[2]
           :first_name = nameparts[3]
           :last_name = nameparts[1]
       end)891×3 DataFrame
 Row │ title      first_name  last_name
     │ SubStrin…  SubStrin…   SubStrin…
─────┼──────────────────────────────────
   1 │ Mr.        Owen        Braund
   2 │ Mrs.       John        Cumings
   3 │ Miss.      Laina       Heikkinen
   4 │ Mrs.       Jacques     Futrelle
  ⋮  │     ⋮          ⋮           ⋮
 889 │ Miss.      Catherine   Johnston
 890 │ Mr.        Karl        Behr
 891 │ Mr.        Patrick     Dooley
                        884 rows omitted

You can also use tuple destructuring syntax with the @t macro. This can often make assignments of multiple columns even more terse:

julia> @select(df, @t begin
           :last_name, :title, :first_name, rest... = split(:Name, r"[\s,]+")
       end)891×3 DataFrame
 Row │ last_name  title      first_name
     │ SubStrin…  SubStrin…  SubStrin…
─────┼──────────────────────────────────
   1 │ Braund     Mr.        Owen
   2 │ Cumings    Mrs.       John
   3 │ Heikkinen  Miss.      Laina
   4 │ Futrelle   Mrs.       Jacques
  ⋮  │     ⋮          ⋮          ⋮
 889 │ Johnston   Miss.      Catherine
 890 │ Behr       Mr.        Karl
 891 │ Dooley     Mr.        Patrick
                        884 rows omitted

Multi-column specifications

So far we have only accessed a single column with each column specifier, like :Survived. But often, transformations are supposed to be applied over a set of columns.

In DataFrameMacros, the source-function-sink pair construct being created is automatically broadcasted over all column specifiers. This means one can not only use any expression marked by $ which results in a single column identifier, but also in multi column identifiers. The broadcasting is "invisible" to the user when they only limit their use to single-column identifiers, as broadcasting over singular objects results in a singular source-function-sink expression.

Possible identifiers are n-dimensional arrays of strings, symbols or integers and all valid inputs to the DataFrames.names(df, specifier) function. Examples of these are All(), Not(:x), Between(:x, :z), any Type, or any Function that returns true or false given a column name String.

Let's look at a few basic examples. Here's a simple selection of columns without transformation:

julia> @select(df, $(Between(:Name, :Age)))891×3 DataFrame
 Row │ Name                               Sex     Age
     │ String                             String  Float64?
─────┼──────────────────────────────────────────────────────
   1 │ Braund, Mr. Owen Harris            male         22.0
   2 │ Cumings, Mrs. John Bradley (Flor…  female       38.0
   3 │ Heikkinen, Miss. Laina             female       26.0
   4 │ Futrelle, Mrs. Jacques Heath (Li…  female       35.0
  ⋮  │                 ⋮                    ⋮         ⋮
 889 │ Johnston, Miss. Catherine Helen …  female  missing
 890 │ Behr, Mr. Karl Howell              male         26.0
 891 │ Dooley, Mr. Patrick                male         32.0
                                            884 rows omitted

Or another example with a Function that selects all columns ending with "e":

julia> @select(df, $(endswith("e")))891×3 DataFrame
 Row │ Name                               Age        Fare
     │ String                             Float64?   Float64
─────┼───────────────────────────────────────────────────────
   1 │ Braund, Mr. Owen Harris                 22.0   7.25
   2 │ Cumings, Mrs. John Bradley (Flor…       38.0  71.2833
   3 │ Heikkinen, Miss. Laina                  26.0   7.925
   4 │ Futrelle, Mrs. Jacques Heath (Li…       35.0  53.1
  ⋮  │                 ⋮                      ⋮         ⋮
 889 │ Johnston, Miss. Catherine Helen …  missing    23.45
 890 │ Behr, Mr. Karl Howell                   26.0  30.0
 891 │ Dooley, Mr. Patrick                     32.0   7.75
                                             884 rows omitted

The next step is to actually compute with the selected columns. The resulting DataFrames mini-language construct is sources .=> function[s] .=> sinks where in the default case, there is just a single function, even when using multiple columns.

For example, we can select all columns that are subtypes of Real and convert them to Float32:

julia> @select(df, Float32($Real))891×6 DataFrame
 Row │ PassengerId_Float32  Survived_Float32  Pclass_Float32  SibSp_Float32  Parch_Float32  Fare_Float32
     │ Float32              Float32           Float32         Float32        Float32        Float32
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────
   1 │                 1.0               0.0             3.0            1.0            0.0        7.25
   2 │                 2.0               1.0             1.0            1.0            0.0       71.2833
   3 │                 3.0               1.0             3.0            0.0            0.0        7.925
   4 │                 4.0               1.0             1.0            1.0            0.0       53.1
  ⋮  │          ⋮                  ⋮                ⋮               ⋮              ⋮             ⋮
 889 │               889.0               0.0             3.0            1.0            2.0       23.45
 890 │               890.0               1.0             1.0            0.0            0.0       30.0
 891 │               891.0               0.0             3.0            0.0            0.0        7.75
                                                                                         884 rows omitted

On the left-hand side of left_expression = right_expression, we can also create a multi-column-specifier object in order to choose a collection of column names for the result of right_expression. We can splice collections of existing names in with $ which makes it easy to create new names based on old ones. For example, to continue with the Float32 example, we could lowercase each column name and append a _32 suffix instead of relying on the automatic renaming.

julia> @select(df, lowercase.($Real) .* "_32" = Float32($Real))891×6 DataFrame
 Row │ passengerid_32  survived_32  pclass_32  sibsp_32  parch_32  fare_32
     │ Float32         Float32      Float32    Float32   Float32   Float32
─────┼─────────────────────────────────────────────────────────────────────
   1 │            1.0          0.0        3.0       1.0       0.0   7.25
   2 │            2.0          1.0        1.0       1.0       0.0  71.2833
   3 │            3.0          1.0        3.0       0.0       0.0   7.925
   4 │            4.0          1.0        1.0       1.0       0.0  53.1
  ⋮  │       ⋮              ⋮           ⋮         ⋮         ⋮         ⋮
 889 │          889.0          0.0        3.0       1.0       2.0  23.45
 890 │          890.0          1.0        1.0       0.0       0.0  30.0
 891 │          891.0          0.0        3.0       0.0       0.0   7.75
                                                           884 rows omitted

Just to reiterate, this expression amounts to something close to:

select(df, DataFrameMacros.stringargs(df, Real) .=> ByRow(Float32) .=> lowercase.(DataFrameMacros.stringargs(df, Real) .* "_32"))

The stringargs function handles the conversion from input object to column names and is almost equivalent to using DataFrames.names, except that Symbols, Strings, and collections thereof are passed through as-is.

We can see the broadcasting aspect better by combining column specifiers of different length in one expression. Let's pretend for example, that we wanted to have columns that compute interactions of multiple numeric variables, such as age with survival status or passenger class:

julia> @select(df, :Age * $[:Survived, :Pclass])891×2 DataFrame
 Row │ Age_Survived_*  Age_Pclass_*
     │ Float64?        Float64?
─────┼──────────────────────────────
   1 │            0.0          66.0
   2 │           38.0          38.0
   3 │           26.0          78.0
   4 │           35.0          35.0
  ⋮  │       ⋮              ⋮
 889 │      missing       missing
 890 │           26.0          26.0
 891 │            0.0          96.0
                    884 rows omitted

As you can see, the :Age column was multiplied element-wise with each of the other two columns.

This process works also with n-dimensional arrays, for example to multiply multiple columns in all possible combinations, we can use one row and one column vector:

julia> @select(df, $[:Survived, :Pclass] * $(permutedims([:Survived, :Pclass])))891×4 DataFrame
 Row │ Survived_Survived_*  Pclass_Survived_*  Survived_Pclass_*  Pclass_Pclass_*
     │ Int64                Int64              Int64              Int64
─────┼────────────────────────────────────────────────────────────────────────────
   1 │                   0                  0                  0                9
   2 │                   1                  1                  1                1
   3 │                   1                  3                  3                9
   4 │                   1                  1                  1                1
  ⋮  │          ⋮                   ⋮                  ⋮                 ⋮
 889 │                   0                  0                  0                9
 890 │                   1                  1                  1                1
 891 │                   0                  0                  0                9
                                                                  884 rows omitted

The sink specifier can be an n-dimensional array as well, which is finally flattened into a sequence of columns going column-first.

julia> @select(df, ["a" "c"; "b" "d"] = $[:Survived, :Pclass] * $(permutedims([:Survived, :Pclass])))891×4 DataFrame
 Row │ a      b      c      d
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     0      0      0      9
   2 │     1      1      1      1
   3 │     1      3      3      9
   4 │     1      1      1      1
  ⋮  │   ⋮      ⋮      ⋮      ⋮
 889 │     0      0      0      9
 890 │     1      1      1      1
 891 │     0      0      0      9
                  884 rows omitted

The left-hand side doesn't necessarily have to match the size of the right-hand side expression (remember we're broadcasting) but of course you just copy columns multiple times if you have more names than source columns.

julia> @select(df, ["a", "b", "c"] = :Survived)891×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     0      0      0
   2 │     1      1      1
   3 │     1      1      1
   4 │     1      1      1
  ⋮  │   ⋮      ⋮      ⋮
 889 │     0      0      0
 890 │     1      1      1
 891 │     0      0      0
           884 rows omitted