class: center, middle, inverse, title-slide .title[ # Procesando datos con el paquete tidyverse ] .subtitle[ ## R + Ciencias Sociales ] .author[ ### Pablo Tiscornia ] --- <style type="text/css"> .remark-slide-content { font-size: 25px; padding: 1em 1em 1em 1em; } </style> --- class: inverse, middle, center # ¿Qué es [Tidyverse](https://www.tidyverse.org/)? *** --- # Tidyverse .pull-left[ #### `Tidyverse` es una colección de paquetes de R, pensados para denominada "ciencia de datos". #### Comparten la misma filosofía de uso, por lo que trabajan en armonía entre unos y otros. ] .pull-right[ <img src="data:image/png;base64,#../img/tidyverse.png" width="781" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # ¿Por qué tidyverse? <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- # __¿Por qué tidyverse?__ - ### Orientado a ser leído y escrito por y para seres humanos -- - ### Funciones no pensadas para una tarea específica sino para un proceso de trabajo <img src="data:image/png;base64,#../img/circuito del dato.png" width="50%" style="display: block; margin: auto;" /> -- - ### Su comunidad, basada en los principios del código abierto y trabajo colaborativo --- # __Instalación y uso__ * Sólo una vez (por computadora): ```r install.packages("tidyverse") ``` -- * En cada inicio de sesión de R o Rstudio: ```r library(tidyverse) ``` -- _No es necesario esto:_ ```r install.packages("dplyr") install.packages("tidyr") install.packages("ggplot2") ``` --- # Hoja de ruta ### Presentación de los paquetes `dplyr` y `tidyr` .pull-left[ ## ✔️ dplyr ☑️️ `select()` ☑️️ `filter()` ☑️️ `mutate()` ☑️️ `rename()` ☑️️ `arragne()` ☑️️ `summarise()` ☑️️ `group_by()` ] .pull-right[ ## ✔️ tidyr ☑️ `pivot_longer()` ☑️ `pivot_wider()` <br> ## ✔️ magrittr ☑️ `%>%` (_el pipe_) ] *** ```r library(eph) b_eph_ind <- get_microdata(year = 2019, trimester = 3, type = "individual") ``` --- class: middle, center, inverse EL PIPE <img src="data:image/png;base64,#../img/pipe.png" width="200" /> *** _<p style="color:grey;" align:"center">Una forma de escribir</p>_ --- # EL PIPE <br><br> .pull-left[ ```r base_de_datos `%>%` funcion1 `%>%` funcion2 `%>%` funcion3 ``` ] .pull-right[ ![](data:image/png;base64,#../img/pipe_paso_a_paso.gif)<!-- --> ] --- # EL PIPE .pull-left[ ### **Sin EL PIPE:** ```r # Paso2(Paso1(base_de_datos$variable)) prop.table(table(`b_eph_ind$CH04`)) ``` ``` 1 2 0.4818711 0.5181289 ``` ] -- .pull-right[ ### **Con EL PIPE** ```r `b_eph_ind$CH04` `%>%` # base_de_datos$variable table() `%>%` # Paso 1 prop.table() # Paso 2 ``` ``` . 1 2 0.4818711 0.5181289 ``` ] --- # magrittr - una forma de escribir <br><br> ### **Caso:** Deseo obtener la distribución relativa de casos por sexo: #### Funciones: `table()` - `prop.table()` - `round()` -- --- class: middle, center, inverse <img src="data:image/png;base64,#../img/logo dplyr.png" width="30%" style="display: block; margin: auto;" /> --- # dplyr ## Funciones del paquete dplyr: <br> | __Función__ | __Acción__ | | :--- | ---: | | `select()` | *selecciona o descarta variables*| | `filter()` | *selecciona filas*| | `mutate()` | *crea / edita variables*| | `rename()` | *renombra variables*| | `group_by()` | *segmenta en funcion de una variable*| | `summarize()` | *genera una tabla de resúmen*| --- class: inverse, middle, center # __select()__ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> _<p style="color:grey;" align:"center">Elije o descarta columnas de una base de datos</p>_ --- # select() ### La función tiene el siguiente esquema: ```r base_de_datos %>% * select(id, nombre) ``` <img src="data:image/png;base64,#../img/select_presentacion.png" width="65%" style="display: block; margin: auto;" /> --- # **Caso** ### - **Indicador 1:** *Principales tasas del mercado de trabajo para el aglomerado de CABA y Partidos del GBA* ### - **Indicador 2:** *Indicador 1 según el __sexo__ y __edad__ de las personas.* -- Según el [Diseño de registro](https://www.indec.gob.ar/ftp/cuadros/menusuperior/eph/EPH_registro_t318.pdf), las variables de trabajo son: - **Aglomerado de residencia** = `AGLOMERADO` - **Condición de actividad** = `ESTADO` - **Sexo** = `CH04` - **Edad** = `CH06` - **Factor de ponderación** = `PONDERA` --- # **Caso** ### Librerías de trabajo e importación de la base: ```r library(tidyverse) library(eph) b_eph_ind <- read.table("entradas/usu_individual_t119.txt", header = TRUE, sep = ";") ``` --- # select() - nombre de las variables ### selecciono las columnas que deseo de la base de datos: ```r b_eph_ind_seleccion <- `b_eph_ind` %>% `select`(ESTADO, CH04, CH06, PONDERA) ``` -- ### Chequeo la operación: ```r colnames(b_eph_ind_seleccion) ``` ``` [1] "ESTADO" "CH04" "CH06" "PONDERA" ``` --- # select() - por posición de la columna ```r b_eph_ind_seleccion <- b_eph_ind %>% select(`10, 12, 14, 28`) ``` -- ### chequeo seleccion: ```r colnames(b_eph_ind_seleccion) ``` ``` [1] "PONDERA" "CH04" "CH06" "ESTADO" ``` --- count: false # Otra forma de selecionar .panel1-select_1-auto[ ```r *b_eph_ind ``` ] .panel2-select_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_1-auto[ ```r b_eph_ind %>% * select(12:16) ``` ] .panel2-select_1-auto[ ``` # A tibble: 57,229 × 5 CH04 CH05 CH06 CH07 CH08 <int> <fct> <int> <int> <int> 1 1 12/04/1963 56 2 1 2 2 24/09/1972 46 2 1 3 2 14/09/1998 20 1 1 4 1 11/04/2007 12 5 1 5 2 03/03/1981 38 2 4 6 2 17/12/2011 7 5 4 7 1 10/12/2013 5 5 4 8 1 27/02/2016 3 5 4 9 1 15/07/1965 54 3 4 10 1 19/08/2000 19 5 4 # … with 57,219 more rows ``` ] <style> .panel1-select_1-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_1-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- <img src="data:image/png;base64,#https://media.tenor.com/images/4474c747b4bba7b72172078cbf2e797b/tenor.gif" width="65%" style="display: block; margin: auto;" /> --- class: inverse, middle, center ## Una más. --- count: false # Otra forma de selecionar .panel1-select_2-auto[ ```r *b_eph_ind ``` ] .panel2-select_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_2-auto[ ```r b_eph_ind %>% * select(CH03:CH10) ``` ] .panel2-select_2-auto[ ``` # A tibble: 57,229 × 8 CH03 CH04 CH05 CH06 CH07 CH08 CH09 CH10 <int> <int> <fct> <int> <int> <int> <int> <int> 1 1 1 12/04/1963 56 2 1 1 2 2 2 2 24/09/1972 46 2 1 1 2 3 3 2 14/09/1998 20 1 1 1 2 4 3 1 11/04/2007 12 5 1 1 1 5 2 2 03/03/1981 38 2 4 1 2 6 3 2 17/12/2011 7 5 4 1 1 7 3 1 10/12/2013 5 5 4 2 1 8 3 1 27/02/2016 3 5 4 2 1 9 1 1 15/07/1965 54 3 4 1 2 10 3 1 19/08/2000 19 5 4 1 1 # … with 57,219 more rows ``` ] <style> .panel1-select_2-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_2-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center ## Una más. --- count: false # Otra forma de selecionar .panel1-select_3-auto[ ```r *b_eph_ind ``` ] .panel2-select_3-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_3-auto[ ```r b_eph_ind %>% * select(starts_with("CH")) ``` ] .panel2-select_3-auto[ ``` # A tibble: 57,229 × 16 CH03 CH04 CH05 CH06 CH07 CH08 CH09 CH10 CH11 CH12 CH13 CH14 CH15 <int> <int> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <chr> <int> 1 1 1 12/0… 56 2 1 1 2 0 4 1 <NA> 1 2 2 2 24/0… 46 2 1 1 2 0 4 2 3 1 3 3 2 14/0… 20 1 1 1 2 0 7 2 1 1 4 3 1 11/0… 12 5 1 1 1 2 4 2 0 1 5 2 2 03/0… 38 2 4 1 2 0 4 2 2 4 6 3 2 17/1… 7 5 4 1 1 1 2 2 1 1 7 3 1 10/1… 5 5 4 2 1 1 1 2 4 1 8 3 1 27/0… 3 5 4 2 1 1 1 2 0 1 9 1 1 15/0… 54 3 4 1 2 0 2 1 <NA> 3 10 3 1 19/0… 19 5 4 1 1 1 4 2 5 1 # … with 57,219 more rows, and 3 more variables: CH15_COD <int>, CH16 <int>, # CH16_COD <int> ``` ] <style> .panel1-select_3-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_3-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center ## Una más! --- count: false # Otra forma de selecionar .panel1-select_4-auto[ ```r *b_eph_ind ``` ] .panel2-select_4-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_4-auto[ ```r b_eph_ind %>% * select(ends_with("_COD")) ``` ] .panel2-select_4-auto[ ``` # A tibble: 57,229 × 6 CH15_COD CH16_COD PP04B_COD PP04D_COD PP11B_COD PP11D_COD <int> <int> <chr> <chr> <chr> <chr> 1 NA NA 8401 34323 <NA> <NA> 2 NA NA 9700 55314 <NA> <NA> 3 NA NA 1009 20333 <NA> <NA> 4 NA NA <NA> <NA> <NA> <NA> 5 202 NA 4803 30113 <NA> <NA> 6 NA NA <NA> <NA> <NA> <NA> 7 NA NA <NA> <NA> <NA> <NA> 8 NA NA <NA> <NA> <NA> <NA> 9 22 NA 1009 30113 <NA> <NA> 10 NA NA 1009 30314 <NA> <NA> # … with 57,219 more rows ``` ] <style> .panel1-select_4-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_4-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_4-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- <img src="data:image/png;base64,#https://media.tenor.com/images/31210518c407ef4392726bd7ab3a1625/tenor.gif" width="65%" style="display: block; margin: auto;" /> --- class: inverse, middle, center ## Una más. --- count: false # Otra forma de selecionar .panel1-select_5-auto[ ```r *b_eph_ind ``` ] .panel2-select_5-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # Otra forma de selecionar .panel1-select_5-auto[ ```r b_eph_ind %>% * select(contains("03")) ``` ] .panel2-select_5-auto[ ``` # A tibble: 57,229 × 7 CH03 PP03C PP03D PP03G PP03H PP03I PP03J <int> <int> <int> <int> <int> <int> <int> 1 1 0 0 2 0 2 2 2 2 2 2 2 0 2 1 3 3 1 0 1 1 1 1 4 3 NA NA NA NA NA NA 5 2 1 0 2 0 2 2 6 3 NA NA NA NA NA NA 7 3 NA NA NA NA NA NA 8 3 NA NA NA NA NA NA 9 1 1 0 2 0 2 2 10 3 1 0 2 0 2 2 # … with 57,219 more rows ``` ] <style> .panel1-select_5-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-select_5-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-select_5-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- <img src="data:image/png;base64,#https://media.tenor.com/images/b8718c934090ad1a36acd7ef9d0b846c/tenor.gif" width="65%" style="display: block; margin: auto;" /> --- class: inverse, middle, center # _PRÁCTICA_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- class: inverse, middle ## Práctica 1) Crear un objeto en donde importamos la base de datos de la EPH (recordar tener en cuenta la extensión del archivo) 2) Crear otro objeto en donde selecciono 3 columnas de interés según sus nombres 3) Crear otro objeto en donde selecciono 3 columnas de interés según su posición 4) Escribir el siguiente código en el esquema "paso a paso (con pipes)" ```r base_ejercicio <- select(b_eph_ind, ESTADO, CH04, CAT_OCUP) ``` --- class: inverse, middle, center # filter() *** _<p style="color:grey;" align:"center">Define los casos (filas) en base a una condición</p>_ --- # filter() ### La función tiene el siguiente esquema: ```r base_de_datos %>% filter(condicion) ``` <img src="data:image/png;base64,#../img/filter_presentacion.png" width="65%" style="display: block; margin: auto;" /> --- # filter() - ### Por ejemplo: ```r base %>% `filter(Edad > 70)` ``` <img src="data:image/png;base64,#../img/filter_presentacion.png" width="65%" style="display: block; margin: auto;" /> --- # filter() ### Para resolver el **indicador** planteado, vamos a delimitar el universo a las **personas de 14 o más años** --- count: false # filter() .panel1-filter-auto[ ```r *b_eph_ind ``` ] .panel2-filter-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # filter() .panel1-filter-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) ``` ] .panel2-filter-auto[ ``` # A tibble: 57,229 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 1 12 3 547 5 2 2 38 1 584 6 2 2 7 4 584 7 2 1 5 4 584 8 2 1 3 4 584 9 2 1 54 1 584 10 2 1 19 1 584 # … with 57,219 more rows ``` ] --- count: false # filter() .panel1-filter-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) %>% * filter(CH06 >= 14) ``` ] .panel2-filter-auto[ ``` # A tibble: 45,344 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 2 38 1 584 5 2 1 54 1 584 6 2 1 19 1 584 7 2 2 44 1 815 8 2 1 16 3 815 9 2 1 31 1 815 10 2 1 58 1 563 # … with 45,334 more rows ``` ] <style> .panel1-filter-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # filter() #### Operadores para filtrar: <br> .pull-left[ |Condición |Acción | | :--- | :--- | | | | | `==` | *igual* | | `%in%` | *incluye* | | `!=` | *distinto* | | `>` | *mayor que* | | `<` | *menor que* | | `>=` | *mayor o igual que*| | `<=` | *menor o igual que*| ] .pull-right[ | Operador | Descripción | | :--- | :--- | | | | | `&` | *y* - Cuando se cumplen ambas condiciones | | | | *o* - Cuando se cumple una u otra condición | ] --- # filter() ### **Caso:** Necesito delimitarl el universo a la población que reside en la _Ciudad Autónoma de buenos Aires_ __o__ en los _Partidos del Buenos aires_. -- - Chequeo categorías de la variable: ```r unique(b_eph_ind$AGLOMERADO) ``` ``` [1] 2 3 4 5 6 7 9 10 12 13 14 15 17 18 19 20 22 23 25 26 27 29 30 31 32 [26] 33 34 36 38 91 93 ``` -- - Reviso en el diseño de registro los códigos correspondientes. --- count: false #filter .panel1-filter_1-auto[ ```r *b_eph_ind ``` ] .panel2-filter_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false #filter .panel1-filter_1-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) ``` ] .panel2-filter_1-auto[ ``` # A tibble: 57,229 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 1 12 3 547 5 2 2 38 1 584 6 2 2 7 4 584 7 2 1 5 4 584 8 2 1 3 4 584 9 2 1 54 1 584 10 2 1 19 1 584 # … with 57,219 more rows ``` ] --- count: false #filter .panel1-filter_1-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) %>% * filter(AGLOMERADO == 32 | AGLOMERADO == 33) ``` ] .panel2-filter_1-auto[ ``` # A tibble: 10,097 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 32 2 49 1 1031 2 32 1 9 4 1031 3 32 2 81 3 1031 4 32 1 72 1 1234 5 32 2 73 1 1234 6 32 1 28 3 1234 7 32 2 69 3 640 8 32 2 87 3 1923 9 32 1 40 1 2424 10 32 2 41 1 2424 # … with 10,087 more rows ``` ] <style> .panel1-filter_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false #filter .panel1-filter_2-auto[ ```r *b_eph_ind ``` ] .panel2-filter_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false #filter .panel1-filter_2-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) ``` ] .panel2-filter_2-auto[ ``` # A tibble: 57,229 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 2 1 56 1 547 2 2 2 46 1 547 3 2 2 20 1 547 4 2 1 12 3 547 5 2 2 38 1 584 6 2 2 7 4 584 7 2 1 5 4 584 8 2 1 3 4 584 9 2 1 54 1 584 10 2 1 19 1 584 # … with 57,219 more rows ``` ] --- count: false #filter .panel1-filter_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, PONDERA) %>% * filter(AGLOMERADO %in% c(32,33)) ``` ] .panel2-filter_2-auto[ ``` # A tibble: 10,097 × 5 AGLOMERADO CH04 CH06 ESTADO PONDERA <int> <int> <int> <int> <int> 1 32 2 49 1 1031 2 32 1 9 4 1031 3 32 2 81 3 1031 4 32 1 72 1 1234 5 32 2 73 1 1234 6 32 1 28 3 1234 7 32 2 69 3 640 8 32 2 87 3 1923 9 32 1 40 1 2424 10 32 2 41 1 2424 # … with 10,087 more rows ``` ] <style> .panel1-filter_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-filter_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-filter_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _PRÁCTICA_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- class: inverse, middle # Práctica - A partir de la base de la EPH, crear un objeto nuevo que **contenga** las variables __AGLOMERADO__ y __CH06__ y **filtar** por aquella población que tenga _18 o más años de edad_ y que resida en los aglomerados de _Neuquén_ o _Río Negro_ - Chequear que las operaciones hayan sido un éxito (_pista: funciones como **unique()**, **table()** o **colnames()** pueden ser de ayuda)_ --- class: inverse, middle, center # _mutate()_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> _<p style="color:grey;" align:"center">Creoa / edita variables (columnas)</p>_ --- # mutate() - ### En R base: ```r base_de_dato$var_nueva <- base_de_datos$var_1 + base_de_datos$var_2 ``` <br> - ### En `tidyverse`: ```r base_de_datos %>% mutate(var_nueva = var_1 + var_2) ``` --- # mutate() <br><br> ### **Indicador:** Sumatoria de ingresos por la ocupación principal y secundaria(s) <br><br> --- count: false # mutate() .panel1-mutate_1-auto[ ```r *b_eph_ind ``` ] .panel2-mutate_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # mutate() .panel1-mutate_1-auto[ ```r b_eph_ind %>% * select(P21, TOT_P12) ``` ] .panel2-mutate_1-auto[ ``` # A tibble: 57,229 × 2 P21 TOT_P12 <int> <int> 1 28000 700 2 9500 3600 3 -9 0 4 0 0 5 -9 0 6 0 0 7 0 0 8 0 0 9 -9 0 10 0 0 # … with 57,219 more rows ``` ] --- count: false # mutate() .panel1-mutate_1-auto[ ```r b_eph_ind %>% select(P21, TOT_P12) %>% * mutate(ingreso_ocup_tot = P21 + TOT_P12) ``` ] .panel2-mutate_1-auto[ ``` # A tibble: 57,229 × 3 P21 TOT_P12 ingreso_ocup_tot <int> <int> <int> 1 28000 700 28700 2 9500 3600 13100 3 -9 0 -9 4 0 0 0 5 -9 0 -9 6 0 0 0 7 0 0 0 8 0 0 0 9 -9 0 -9 10 0 0 0 # … with 57,219 more rows ``` ] <style> .panel1-mutate_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mutate_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mutate_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # mutate() - case_when() ### Función complementaria: `case_when()`, mayormente utilizada para recodificación de variables <img src="data:image/png;base64,#../img/mutate_case.png" width="100%" style="display: block; margin: auto;" /> --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_2-auto[ ```r *b_eph_ind ``` ] .panel2-mutate_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_2-auto[ ```r b_eph_ind %>% * select(CH04, CH06) ``` ] .panel2-mutate_2-auto[ ``` # A tibble: 57,229 × 2 CH04 CH06 <int> <int> 1 1 56 2 2 46 3 2 20 4 1 12 5 2 38 6 2 7 7 1 5 8 1 3 9 1 54 10 1 19 # … with 57,219 more rows ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_2-auto[ ```r b_eph_ind %>% select(CH04, CH06) %>% * mutate(sexo = case_when(CH04 == 1 ~ "Varón", * CH04 == 2 ~ "Mujer")) ``` ] .panel2-mutate_2-auto[ ``` # A tibble: 57,229 × 3 CH04 CH06 sexo <int> <int> <chr> 1 1 56 Varón 2 2 46 Mujer 3 2 20 Mujer 4 1 12 Varón 5 2 38 Mujer 6 2 7 Mujer 7 1 5 Varón 8 1 3 Varón 9 1 54 Varón 10 1 19 Varón # … with 57,219 more rows ``` ] <style> .panel1-mutate_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mutate_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mutate_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_3-auto[ ```r *b_eph_ind ``` ] .panel2-mutate_3-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_3-auto[ ```r b_eph_ind %>% * select(CH06) ``` ] .panel2-mutate_3-auto[ ``` # A tibble: 57,229 × 1 CH06 <int> 1 56 2 46 3 20 4 12 5 38 6 7 7 5 8 3 9 54 10 19 # … with 57,219 more rows ``` ] --- count: false # Recodificando con mutate() y case_when() .panel1-mutate_3-auto[ ```r b_eph_ind %>% select(CH06) %>% * mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", * CH06 %in% c(19:29) ~ "19 a 29", * CH06 %in% c(30:39) ~ "30 a 39", * CH06 %in% c(40:49) ~ "40 a 49", * CH06 %in% c(50:59) ~ "50 a 59", * CH06 >= 60 ~ "60 o más")) ``` ] .panel2-mutate_3-auto[ ``` # A tibble: 57,229 × 2 CH06 edad_rango <int> <chr> 1 56 50 a 59 2 46 40 a 49 3 20 19 a 29 4 12 0 a 18 5 38 30 a 39 6 7 0 a 18 7 5 0 a 18 8 3 0 a 18 9 54 50 a 59 10 19 19 a 29 # … with 57,219 more rows ``` ] <style> .panel1-mutate_3-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-mutate_3-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-mutate_3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _PRÁCTICA_ *** --- class: inverse # Práctica 1) Crear una variable nueva con las etiquetas correspondientes a los valores de **CAT_OCUP**: ```r 1 --> Patrón 2 --> Cuenta propia 3 --> Obrero o empleado 4 --> Trabajador familiar sin remuneración 9 --> Ns./Nr. ``` 1) Recodificar la variable de ingresos P21 en 5 rangos. --- class: inverse, middle, center # _summarise()_ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> _<p style="color:grey;" align:"center">Resume la información en una nueva tabla</p>_ --- # summarise() <br><br> <br><br> #### **Caso:** - **Indicador1:** Quiero conocer cuántas personas ocupadas hay - **Indicador2:** Quiero conocer el ingreso medio de la ocupación principal --- count: false # _summarise()_ .panel1-summarise_1-auto[ ```r *b_eph_ind ``` ] .panel2-summarise_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # _summarise()_ .panel1-summarise_1-auto[ ```r b_eph_ind %>% * select(ESTADO, P21, PONDERA) ``` ] .panel2-summarise_1-auto[ ``` # A tibble: 57,229 × 3 ESTADO P21 PONDERA <int> <int> <int> 1 1 28000 547 2 1 9500 547 3 1 -9 547 4 3 0 547 5 1 -9 584 6 4 0 584 7 4 0 584 8 4 0 584 9 1 -9 584 10 1 0 584 # … with 57,219 more rows ``` ] --- count: false # _summarise()_ .panel1-summarise_1-auto[ ```r b_eph_ind %>% select(ESTADO, P21, PONDERA) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = questionr::wtd.mean(x = P21, * weights = PONDERA)) ``` ] .panel2-summarise_1-auto[ ``` # A tibble: 1 × 5 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ ingr_oc_princ…¹ <int> <int> <int> <int> <dbl> 1 27989128 11933503 -9 540000 8269. # … with abbreviated variable name ¹ingr_oc_princ_media ``` ] <style> .panel1-summarise_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-summarise_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-summarise_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r *library(questionr) ``` ] .panel2-summarise_2-auto[ ] --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r library(questionr) *b_eph_ind ``` ] .panel2-summarise_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r library(questionr) b_eph_ind %>% * select(ESTADO, P21, PONDERA) ``` ] .panel2-summarise_2-auto[ ``` # A tibble: 57,229 × 3 ESTADO P21 PONDERA <int> <int> <int> 1 1 28000 547 2 1 9500 547 3 1 -9 547 4 3 0 547 5 1 -9 584 6 4 0 584 7 4 0 584 8 4 0 584 9 1 -9 584 10 1 0 584 # … with 57,219 more rows ``` ] --- count: false # _summarise()_ .panel1-summarise_2-auto[ ```r library(questionr) b_eph_ind %>% select(ESTADO, P21, PONDERA) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-summarise_2-auto[ ``` # A tibble: 1 × 5 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ ingr_oc_princ…¹ <int> <int> <int> <int> <dbl> 1 27989128 11933503 -9 540000 8269. # … with abbreviated variable name ¹ingr_oc_princ_media ``` ] <style> .panel1-summarise_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-summarise_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-summarise_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _group_by()_ *** _<p style="color:grey;" align:"center">Aplica una operación sobre la población de forma segmentada</p>_ --- # group_by() <br><br> <br><br> ```r base_de_datos %>% group_by(variable_de_corte) #<< ``` --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r *library(questionr) ``` ] .panel2-group_by_1-auto[ ] --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r library(questionr) *b_eph_ind ``` ] .panel2-group_by_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r library(questionr) b_eph_ind %>% * group_by(CH04) ``` ] .panel2-group_by_1-auto[ ``` # A tibble: 57,229 × 177 # Groups: CH04 [2] CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # _group_by()_ .panel1-group_by_1-auto[ ```r library(questionr) b_eph_ind %>% group_by(CH04) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-group_by_1-auto[ ``` # A tibble: 2 × 6 CH04 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ ingr_oc…¹ <int> <int> <int> <int> <int> <dbl> 1 1 13528065 6793308 -9 540000 10805. 2 2 14461063 5140195 -9 300000 5896. # … with abbreviated variable name ¹ingr_oc_princ_media ``` ] <style> .panel1-group_by_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-group_by_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-group_by_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Paso a Paso <img src="data:image/png;base64,#https://media.tenor.com/images/6c8cf7404cd3fdc8f518221899116825/tenor.gif" width="60%" style="display: block; margin: auto;" /> --- # **Caso** ### - **Indicador 1:** *Principales tasas del mercado de trabajo para el aglomerado de CABA y Partidos del GBA* ### - **Indicador 2:** *Indicador 1 según el __sexo__ y __edad__ de las personas.* -- Según el [Diseño de registro](https://www.indec.gob.ar/ftp/cuadros/menusuperior/eph/EPH_registro_t318.pdf), las variables de trabajo son: - **Aglomerado de residencia** = `AGLOMERADO` - **Condición de actividad** = `ESTADO` - **Sexo** = `CH04` - **Edad** = `CH06` - **Factor de ponderación** = `PONDERA` --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r *b_eph_ind ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% * select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 57,229 × 6 AGLOMERADO CH04 CH06 ESTADO P21 PONDERA <int> <int> <int> <int> <int> <int> 1 2 1 56 1 28000 547 2 2 2 46 1 9500 547 3 2 2 20 1 -9 547 4 2 1 12 3 0 547 5 2 2 38 1 -9 584 6 2 2 7 4 0 584 7 2 1 5 4 0 584 8 2 1 3 4 0 584 9 2 1 54 1 -9 584 10 2 1 19 1 0 584 # … with 57,219 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% * mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", * CH06 %in% c(19:29) ~ "19 a 29", * CH06 %in% c(30:39) ~ "30 a 39", * CH06 %in% c(40:49) ~ "40 a 49", * CH06 %in% c(50:59) ~ "50 a 59", * CH06 >= 60 ~ "60 o más"), * sexo = case_when(CH04 == 1 ~ "Varón", * CH04 == 2 ~ "Mujer")) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 57,229 × 8 AGLOMERADO CH04 CH06 ESTADO P21 PONDERA edad_rango sexo <int> <int> <int> <int> <int> <int> <chr> <chr> 1 2 1 56 1 28000 547 50 a 59 Varón 2 2 2 46 1 9500 547 40 a 49 Mujer 3 2 2 20 1 -9 547 19 a 29 Mujer 4 2 1 12 3 0 547 0 a 18 Varón 5 2 2 38 1 -9 584 30 a 39 Mujer 6 2 2 7 4 0 584 0 a 18 Mujer 7 2 1 5 4 0 584 0 a 18 Varón 8 2 1 3 4 0 584 0 a 18 Varón 9 2 1 54 1 -9 584 50 a 59 Varón 10 2 1 19 1 0 584 19 a 29 Varón # … with 57,219 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", CH06 %in% c(19:29) ~ "19 a 29", CH06 %in% c(30:39) ~ "30 a 39", CH06 %in% c(40:49) ~ "40 a 49", CH06 %in% c(50:59) ~ "50 a 59", CH06 >= 60 ~ "60 o más"), sexo = case_when(CH04 == 1 ~ "Varón", CH04 == 2 ~ "Mujer")) %>% * filter(AGLOMERADO %in% c(32, 33)) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 10,097 × 8 AGLOMERADO CH04 CH06 ESTADO P21 PONDERA edad_rango sexo <int> <int> <int> <int> <int> <int> <chr> <chr> 1 32 2 49 1 30000 1031 40 a 49 Mujer 2 32 1 9 4 0 1031 0 a 18 Varón 3 32 2 81 3 0 1031 60 o más Mujer 4 32 1 72 1 0 1234 60 o más Varón 5 32 2 73 1 20000 1234 60 o más Mujer 6 32 1 28 3 0 1234 19 a 29 Varón 7 32 2 69 3 0 640 60 o más Mujer 8 32 2 87 3 0 1923 60 o más Mujer 9 32 1 40 1 -9 2424 40 a 49 Varón 10 32 2 41 1 -9 2424 40 a 49 Mujer # … with 10,087 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", CH06 %in% c(19:29) ~ "19 a 29", CH06 %in% c(30:39) ~ "30 a 39", CH06 %in% c(40:49) ~ "40 a 49", CH06 %in% c(50:59) ~ "50 a 59", CH06 >= 60 ~ "60 o más"), sexo = case_when(CH04 == 1 ~ "Varón", CH04 == 2 ~ "Mujer")) %>% filter(AGLOMERADO %in% c(32, 33)) %>% * group_by(sexo, edad_rango) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 10,097 × 8 # Groups: sexo, edad_rango [14] AGLOMERADO CH04 CH06 ESTADO P21 PONDERA edad_rango sexo <int> <int> <int> <int> <int> <int> <chr> <chr> 1 32 2 49 1 30000 1031 40 a 49 Mujer 2 32 1 9 4 0 1031 0 a 18 Varón 3 32 2 81 3 0 1031 60 o más Mujer 4 32 1 72 1 0 1234 60 o más Varón 5 32 2 73 1 20000 1234 60 o más Mujer 6 32 1 28 3 0 1234 19 a 29 Varón 7 32 2 69 3 0 640 60 o más Mujer 8 32 2 87 3 0 1923 60 o más Mujer 9 32 1 40 1 -9 2424 40 a 49 Varón 10 32 2 41 1 -9 2424 40 a 49 Mujer # … with 10,087 more rows ``` ] --- count: false # _group_by()_ .panel1-group_by_2-auto[ ```r b_eph_ind %>% select(AGLOMERADO, CH04, CH06, ESTADO, P21, PONDERA) %>% mutate(edad_rango = case_when(CH06 %in% c(0:18) ~ "0 a 18", CH06 %in% c(19:29) ~ "19 a 29", CH06 %in% c(30:39) ~ "30 a 39", CH06 %in% c(40:49) ~ "40 a 49", CH06 %in% c(50:59) ~ "50 a 59", CH06 >= 60 ~ "60 o más"), sexo = case_when(CH04 == 1 ~ "Varón", CH04 == 2 ~ "Mujer")) %>% filter(AGLOMERADO %in% c(32, 33)) %>% group_by(sexo, edad_rango) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-group_by_2-auto[ ``` # A tibble: 14 × 7 # Groups: sexo [2] sexo edad_rango cant_pob_tot cant_ocupados min_ingr_oc_princ max_i…¹ ingr_…² <chr> <chr> <int> <int> <int> <int> <dbl> 1 Mujer 0 a 18 1946718 17926 -9 15000 45.4 2 Mujer 19 a 29 1192959 517320 -9 60000 5603. 3 Mujer 30 a 39 1039620 637976 -9 103000 11186. 4 Mujer 40 a 49 1076082 766799 -9 300000 13659. 5 Mujer 50 a 59 817229 511513 -9 200000 11132. 6 Mujer 60 o más 1692597 320630 -9 75000 2264. 7 Mujer <NA> 67672 0 0 0 0 8 Varón 0 a 18 2113559 47708 -9 64008 226. 9 Varón 19 a 29 1252010 808136 -9 85000 10114. 10 Varón 30 a 39 975293 858522 -9 122000 18043. 11 Varón 40 a 49 1017797 895313 -9 260000 23085. 12 Varón 50 a 59 772758 671746 -9 500000 22032. 13 Varón 60 o más 1229724 491218 -9 300000 8332. 14 Varón <NA> 88090 0 0 0 0 # … with abbreviated variable names ¹max_ingr_oc_princ, ²ingr_oc_princ_media ``` ] <style> .panel1-group_by_2-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-group_by_2-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-group_by_2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: middle, center, inverse <img src="data:image/png;base64,#../img/logo tidyr.png" width="30%" style="display: block; margin: auto;" /> --- # Funciones del paquete tidyr: <br><br> <br><br> | __Función__ | __Acción__ | | :--- | ---: | | `pivot_longer()` | *Transforma en filas varias columnas*| | `pivot_wider()` | *transforma en columnas varias filas*| --- # estructura de datos <br> .pull-left[ <img src="data:image/png;base64,#../img/dato_ancho.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#../img/dato_largo.png" width="80%" style="display: block; margin: auto;" /> ] --- class: inverse, middle, center # _pivot_longer()_ *** _<p style="color:grey;" align:"center">Reestructura la base, apilando varias columnas en una. De ancho a largo</p>_ --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r *b_eph_ind ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 57,229 × 177 CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% * group_by(CH04) ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 57,229 × 177 # Groups: CH04 [2] CODUSU ANO4 TRIME…¹ NRO_H…² COMPO…³ H15 REGION MAS_500 AGLOM…⁴ PONDERA <fct> <int> <int> <int> <int> <int> <int> <fct> <int> <int> 1 TQRMNOQXY… 2019 3 1 1 1 43 S 2 547 2 TQRMNOQXY… 2019 3 1 2 1 43 S 2 547 3 TQRMNOQXY… 2019 3 1 3 1 43 S 2 547 4 TQRMNOQXY… 2019 3 1 4 1 43 S 2 547 5 TQRMNOQST… 2019 3 1 2 1 43 S 2 584 6 TQRMNOQST… 2019 3 1 3 0 43 S 2 584 7 TQRMNOQST… 2019 3 1 4 0 43 S 2 584 8 TQRMNOQST… 2019 3 1 5 0 43 S 2 584 9 TQRMNOSRQ… 2019 3 1 1 1 43 S 2 584 10 TQRMNOSRQ… 2019 3 1 2 1 43 S 2 584 # … with 57,219 more rows, 167 more variables: CH03 <int>, CH04 <int>, # CH05 <fct>, CH06 <int>, CH07 <int>, CH08 <int>, CH09 <int>, CH10 <int>, # CH11 <int>, CH12 <int>, CH13 <int>, CH14 <chr>, CH15 <int>, CH15_COD <int>, # CH16 <int>, CH16_COD <int>, NIVEL_ED <int>, ESTADO <int>, CAT_OCUP <int>, # CAT_INAC <int>, IMPUTA <int>, PP02C1 <int>, PP02C2 <int>, PP02C3 <int>, # PP02C4 <int>, PP02C5 <int>, PP02C6 <int>, PP02C7 <int>, PP02C8 <int>, # PP02E <int>, PP02H <int>, PP02I <int>, PP03C <int>, PP03D <int>, … ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% group_by(CH04) %>% * summarise(cant_pob_tot = sum(PONDERA), * cant_ocupados = sum(PONDERA[ESTADO == 1]), * min_ingr_oc_princ = min(P21), * max_ingr_oc_princ = max(P21), * ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr * weights = PONDERA)) ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 2 × 6 CH04 cant_pob_tot cant_ocupados min_ingr_oc_princ max_ingr_oc_princ ingr_oc…¹ <int> <int> <int> <int> <int> <dbl> 1 1 13528065 6793308 -9 540000 10805. 2 2 14461063 5140195 -9 300000 5896. # … with abbreviated variable name ¹ingr_oc_princ_media ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% group_by(CH04) %>% summarise(cant_pob_tot = sum(PONDERA), cant_ocupados = sum(PONDERA[ESTADO == 1]), min_ingr_oc_princ = min(P21), max_ingr_oc_princ = max(P21), ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr weights = PONDERA)) %>% * select(CH04, cant_ocupados, ingr_oc_princ_media) ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 2 × 3 CH04 cant_ocupados ingr_oc_princ_media <int> <int> <dbl> 1 1 6793308 10805. 2 2 5140195 5896. ``` ] --- count: false # _pivot_longer()_ .panel1-pivot_longer_1-auto[ ```r b_eph_ind %>% group_by(CH04) %>% summarise(cant_pob_tot = sum(PONDERA), cant_ocupados = sum(PONDERA[ESTADO == 1]), min_ingr_oc_princ = min(P21), max_ingr_oc_princ = max(P21), ingr_oc_princ_media = wtd.mean(x = P21, # Paquete questionr weights = PONDERA)) %>% select(CH04, cant_ocupados, ingr_oc_princ_media) %>% * pivot_longer(cols = c(cant_ocupados, ingr_oc_princ_media), #<< * names_to = "variable", * values_to = "valor") ``` ] .panel2-pivot_longer_1-auto[ ``` # A tibble: 4 × 3 CH04 variable valor <int> <chr> <dbl> 1 1 cant_ocupados 6793308 2 1 ingr_oc_princ_media 10805. 3 2 cant_ocupados 5140195 4 2 ingr_oc_princ_media 5896. ``` ] <style> .panel1-pivot_longer_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-pivot_longer_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-pivot_longer_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # _pivot_wider()_ *** _<p style="color:grey;" align:"center">Reestructura la base, encolumnando varias filas de una variable. De largo a ancho</p>_ --- count: false # _pivot_wider()_ .panel1-pivot_wider_1-auto[ ```r *base_largo ``` ] .panel2-pivot_wider_1-auto[ ``` # A tibble: 4 × 3 CH04 variable valor <int> <chr> <dbl> 1 1 cant_ocupados 6793308 2 1 ingr_oc_princ_media 10805. 3 2 cant_ocupados 5140195 4 2 ingr_oc_princ_media 5896. ``` ] --- count: false # _pivot_wider()_ .panel1-pivot_wider_1-auto[ ```r base_largo %>% * pivot_wider(names_from = "variable", #<< * values_from = "valor") ``` ] .panel2-pivot_wider_1-auto[ ``` # A tibble: 2 × 3 CH04 cant_ocupados ingr_oc_princ_media <int> <dbl> <dbl> 1 1 6793308 10805. 2 2 5140195 5896. ``` ] <style> .panel1-pivot_wider_1-auto { color: black; width: 42.4666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-pivot_wider_1-auto { color: black; width: 55.5333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-pivot_wider_1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style>