basedata.inventory package

Module contents

This submodule, basedata.inventory, contains functions for generating datafile inventory data for a target directory’s sub-directories.

basedata.inventory.list_datafiles(directory, add_extensions=None)

Generates a list of data-type files in a directory that have desired extension types as specified.

Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]

This list can be appended with the add_extensions parameter

Parameters
  • directory – str pathname of target parent directory

  • add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]

Returns

list of filenames for files with matching extension type

basedata.inventory.list_files_with_extensions(directory, ext_list)

Generates a list of files in a directory that have desired extension types as specified.

Parameters
  • directory – str pathname of target parent directory

  • ext_list – list of strings specifying target extension types e.g. [‘.csv’, ‘.xls’]

Returns

list of filenames for files with matching extension type

basedata.inventory.list_subdir_paths(directory)

Generates a list of subdirectory paths

Parameters

directory – str pathname of target parent directory

Returns

list of paths for each subdirectory in the target parent directory

basedata.inventory.list_subdirs(directory)

Generates a list of subdirectory directory basenames.

Parameters

directory – str pathname of target parent directory

Returns

list of subdirectory basenames for each subdirectory in the target parent directory

basedata.inventory.make_datafile_array(directory, add_extensions=None)

Generates an array of datafile names in a specified directory, along with the basename of the directory repeated in a separate column.

Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]

This list of extensions can be appended with the add_extensions parameter.

Parameters
  • directory – str pathname of target parent directory

  • add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]

Returns

numpy.ndarray of filenames for files with matching extension type along with a column repeating the basename of the directory

basedata.inventory.make_datafile_dataframe(directory, columns=('directory', 'filename'), add_extensions=None, return_df=True, to_file=None, **kwargs)

Generates a dataframe of subdirectory names and associated datafiles contained in each of those subdirectories.

Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]

This list of extensions can be appended with the add_extensions parameter

Parameters
  • directory – str pathname of target parent directory

  • columns – tuple specifying the name of each column, default=(‘directory’, ‘filename’)

  • add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]

  • return_df – bool indicates whether dataframe object is returned, default=True

  • to_file – str or None indicates target filepath to which dataframe is saved as a .csv file. None does not save .csv. Default=None

  • kwargs – additional named parameters for pandas.DataFrame.to_csv()

Returns

pandas.DataFrame of subdirectory names and associate datafiles stored in each subdirectory, returned only if return_df=True