basedata.inventory package¶
Module contents¶
This submodule, basedata.inventory, contains functions for generating datafile inventory data for a target directory’s sub-directories.
-
basedata.inventory.
list_datafiles
(directory, add_extensions=None)¶ Generates a list of data-type files in a directory that have desired extension types as specified.
Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]
This list can be appended with the add_extensions parameter
- Parameters
directory – str pathname of target parent directory
add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]
- Returns
list of filenames for files with matching extension type
-
basedata.inventory.
list_files_with_extensions
(directory, ext_list)¶ Generates a list of files in a directory that have desired extension types as specified.
- Parameters
directory – str pathname of target parent directory
ext_list – list of strings specifying target extension types e.g. [‘.csv’, ‘.xls’]
- Returns
list of filenames for files with matching extension type
-
basedata.inventory.
list_subdir_paths
(directory)¶ Generates a list of subdirectory paths
- Parameters
directory – str pathname of target parent directory
- Returns
list of paths for each subdirectory in the target parent directory
-
basedata.inventory.
list_subdirs
(directory)¶ Generates a list of subdirectory directory basenames.
- Parameters
directory – str pathname of target parent directory
- Returns
list of subdirectory basenames for each subdirectory in the target parent directory
-
basedata.inventory.
make_datafile_array
(directory, add_extensions=None)¶ Generates an array of datafile names in a specified directory, along with the basename of the directory repeated in a separate column.
Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]
This list of extensions can be appended with the add_extensions parameter.
- Parameters
directory – str pathname of target parent directory
add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]
- Returns
numpy.ndarray of filenames for files with matching extension type along with a column repeating the basename of the directory
-
basedata.inventory.
make_datafile_dataframe
(directory, columns=('directory', 'filename'), add_extensions=None, return_df=True, to_file=None, **kwargs)¶ Generates a dataframe of subdirectory names and associated datafiles contained in each of those subdirectories.
Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]
This list of extensions can be appended with the add_extensions parameter
- Parameters
directory – str pathname of target parent directory
columns – tuple specifying the name of each column, default=(‘directory’, ‘filename’)
add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]
return_df – bool indicates whether dataframe object is returned, default=True
to_file – str or None indicates target filepath to which dataframe is saved as a .csv file. None does not save .csv. Default=None
kwargs – additional named parameters for pandas.DataFrame.to_csv()
- Returns
pandas.DataFrame of subdirectory names and associate datafiles stored in each subdirectory, returned only if return_df=True