Standardize one file • standardizeSnapshot

Introduction

This vignette shows the typical workflow to standardize a camera trap file from Snapshot Safari.

library(standardizeSnapshot)

Setup a logger

This is an optional but recommended step. If you want not only to print messages to the console, but also to save them in a file, you can use a logger.

The function create_logger allows to create a file in the specified location and setup the logging:

logfile <- file.path(tempdir(), "log", "logger.log")
logfile
#> [1] "/tmp/Rtmp2TFAKG/log/logger.log"

logger <- create_logger(my_logfile = logfile, 
                        console = FALSE)
#> Create logger /tmp/Rtmp2TFAKG/log/logger.log

Read the file

First, we need to read the data. Here, we read a file that was previously written in /tmp/Rtmp2TFAKG/data_in (not shown). This dataframe represents Digikam-like data:

in_folder <- file.path(tempdir(), "data_in")
in_folder
#> [1] "/tmp/Rtmp2TFAKG/data_in"

df <- read.csv(file.path(in_folder,  "digikam.csv"))
head(df, 3)
#>   X Station   Species DateTimeOriginal       Date     Time delta.time.secs
#> 1 1     G03 porcupine  2018-06-28 8:56 2018-06-28 17:38:42               0
#> 2 2     D06      kudu 2018-06-25 16:13 2018-06-25  7:18:05               0
#> 3 3     E06 springbok 2018-06-29 18:33 2018-06-29  0:53:56          353978
#>   delta.time.mins delta.time.hours delta.time.days            Directory
#> 1             0.0              0.0             0.0 E:/MOK/MOK_Roll1/G03
#> 2             0.0              0.0             0.0 E:/MOK/MOK_Roll1/D06
#> 3          5899.6             98.3             4.1 E:/MOK/MOK_Roll1/E06
#>       FileName EXIF.Model EXIF.Make metadata_Species metadata_Number
#> 1 I_00006a.JPG         E3 CUDDEBACK        porcupine               1
#> 2 I_00003a.JPG         E3 CUDDEBACK             kudu               1
#> 3 I__00013.JPG         E3 CUDDEBACK        springbok               1
#>   metadata_Behaviour metadata_Sex n_images metadata_young_present
#> 1               <NA>         <NA>        1                   <NA>
#> 2             Moving       Female        1                   <NA>
#> 3             Moving         <NA>        1                   <NA>
#>   metadata_Numberofindividuals
#> 1                           NA
#> 2                           NA
#> 3                           NA
#>                                                                     HierarchicalSubject
#> 1                                          Species, Species|porcupine, Number|1, Number
#> 2 Species|kudu, Behaviour, Sex|Female, Number|1, Behaviour|Moving, Species, Number, Sex
#> 3             Number, Behaviour|Moving, Species, Number|1, Species|springbok, Behaviour

NB: this file is the same as the digikam dataset included in the package.

You can reproduce the following results by using:

data(digikam)
df <- digikam

Standardize the file

Then, we standardize the file. The function standardize_snapshot_df allows to standardize a single dataframe.

This function has a number of options, but the only ones that are mandatory are:

df: the dataframe to standardize
standard_df: the reference dataframe telling the function how to rename the columns. Here, we use the built-in dataset standard.

std_df <- standardize_snapshot_df(df = df, 
                                  standard_df = standard,
                                  locationID_digikam = "MOK",
                                  logger = logger)
#> Initial file: 22 columns, 100 rows.
#> Standardizing columns
#> Match found in column names: renaming column metadata_Numberofindividuals into metadata_NumberOfIndividuals
#> Standardizing dates/times
#> Getting location code for Digikam data
#> Fill capture info
#> Cleaning location/camera, species and columns values
#> Final file: 27 columns, 100 rows. Here is a sneak peek:
#> locationID   cameraID    season  roll    eventID snapshotName    eventDate   eventTime
#> MOK  MOK_A09 NA  1   MOK_A09#1#1 giraffe 2018-07-08  12:15:34
#> MOK  MOK_A09 NA  1   MOK_A09#1#2 springbok   2018-08-26  10:45:55
#> MOK  MOK_A09 NA  1   MOK_A09#1#3 unresolvable    2018-09-02  18:11:28
#> MOK  MOK_B07 NA  1   MOK_B07#1#1 zebraburchells  2018-06-28  07:49:42
#> MOK  MOK_B07 NA  1   MOK_B07#1#2 gemsbok 2018-08-19  09:11:55

Here, we also use 2 optional arguments:

locationID_digikam: a location code, only useful if the data was processed with Digikam: indeed, the location (reserve) cannot be determined from the dataframe alone.
logger: a logger created with create_logger. If you did not setup a logger, you can ignore this argument.

By default, the function displays the head of the first 8 columns of the file along with numerous messages.

Write the file

The last step is to write the standardized file to a destination. For this, we use the function write_standardized_df.

This function has only 2 mandatory arguments:

df: the file to write
to: the folder in which the file should be copied.

out_folder <- file.path(tempdir(), "data_out") # the folder in which to copy the file
out_folder
#> [1] "/tmp/Rtmp2TFAKG/data_out"

write_standardized_df(df = std_df,
                      to = out_folder,
                      logger = logger)
#> Creating folder /tmp/Rtmp2TFAKG/data_out
#> Writing file /tmp/Rtmp2TFAKG/data_out/MOK_SNA_R1.csv ---

Here, we also use the logger argument (as in the data standardization step).

The file is now written in the destination.

list.files(out_folder)
#> [1] "MOK_SNA_R1.csv"

We can check that the logger was filled:

list.files(file.path(tempdir(), "log"))
#> [1] "logger.log"
readLines(logfile)
#>  [1] "INFO  [2024-04-13 18:02:56] Create logger /tmp/Rtmp2TFAKG/log/logger.log"                                                               
#>  [2] "INFO  [2024-04-13 18:02:56] Initial file: 22 columns, 100 rows."                                                                        
#>  [3] "INFO  [2024-04-13 18:02:56] Standardizing columns"                                                                                      
#>  [4] "INFO  [2024-04-13 18:02:57] Match found in column names: renaming column metadata_Numberofindividuals into metadata_NumberOfIndividuals"
#>  [5] "INFO  [2024-04-13 18:02:57] Standardizing dates/times"                                                                                  
#>  [6] "INFO  [2024-04-13 18:02:57] Getting location code for Digikam data"                                                                     
#>  [7] "INFO  [2024-04-13 18:02:57] Fill capture info"                                                                                          
#>  [8] "INFO  [2024-04-13 18:02:57] Cleaning location/camera, species and columns values"                                                       
#>  [9] "INFO  [2024-04-13 18:02:57] Final file: 27 columns, 100 rows. Here is a sneak peek:"                                                    
#> [10] "locationID\tcameraID\tseason\troll\teventID\tsnapshotName\teventDate\teventTime"                                                        
#> [11] "MOK\tMOK_A09\tNA\t1\tMOK_A09#1#1\tgiraffe\t2018-07-08\t12:15:34"                                                                        
#> [12] "MOK\tMOK_A09\tNA\t1\tMOK_A09#1#2\tspringbok\t2018-08-26\t10:45:55"                                                                      
#> [13] "MOK\tMOK_A09\tNA\t1\tMOK_A09#1#3\tunresolvable\t2018-09-02\t18:11:28"                                                                   
#> [14] "MOK\tMOK_B07\tNA\t1\tMOK_B07#1#1\tzebraburchells\t2018-06-28\t07:49:42"                                                                 
#> [15] "MOK\tMOK_B07\tNA\t1\tMOK_B07#1#2\tgemsbok\t2018-08-19\t09:11:55"                                                                        
#> [16] ""                                                                                                                                       
#> [17] "INFO  [2024-04-13 18:02:57] Creating folder /tmp/Rtmp2TFAKG/data_out"                                                                   
#> [18] "INFO  [2024-04-13 18:02:57] Writing file /tmp/Rtmp2TFAKG/data_out/MOK_SNA_R1.csv ---"