basedata.inventory package

Submodules

basedata.inventory.test_inventory module

unittests for basedata.inventory submodule functions

class basedata.inventory.test_inventory.DirBuildTests(methodName='runTest')

Bases: unittest.case.TestCase

unittests for subdirectory and test files build functions

test_make_dirfiles()

ensure subdirectories are populated with test files

test_make_files()

ensure test files are created

test_make_subdirs()

ensure subdirectories are created

class basedata.inventory.test_inventory.InventoryTests(methodName='runTest')

Bases: unittest.case.TestCase

unittests for data.inventory submodule

test_list_datafiles()

ensure list_datafiles returns an accurate file list

test_list_files_with_extensions()

ensure list_files_with_extensions returns an accurate file list

test_list_subdir_paths()

ensure list_subdir_paths generates an accurate path list

test_list_subdirs()

ensure list_subdirs returns an accurate subdir name list

test_make_datafile_array()

ensure make_datafile_array returns an accurate array

test_make_datafile_dataframe()

ensure make_datafile_dataframe returns an accurate dataframe

test_make_datafile_dataframe_save()

ensure make_datafile_dataframe to_file arg saves a .csv to file

basedata.inventory.test_inventory.make_dirfiles(root_dir, subdir_list, file_list)

makes subdirectories and populates them with test files for unittests

basedata.inventory.test_inventory.make_files(dir_path, file_list)

makes files in directory path for use in unittests

basedata.inventory.test_inventory.make_subdirs(root_dir, subdir_list)

makes subdirectories in root directory for use in unittests

Module contents

This submodule, basedata.inventory, contains functions for generating datafile inventory data for a target directory’s sub-directories.

basedata.inventory.list_datafiles(directory, add_extensions=None)

Generates a list of data-type files in a directory that have desired extension types as specified.

Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]

This list can be appended with the add_extensions parameter

Parameters
  • directory – str pathname of target parent directory

  • add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]

Returns

list of filenames for files with matching extension type

basedata.inventory.list_files_with_extensions(directory, ext_list)

Generates a list of files in a directory that have desired extension types as specified.

Parameters
  • directory – str pathname of target parent directory

  • ext_list – list of strings specifying target extension types e.g. [‘.csv’, ‘.xls’]

Returns

list of filenames for files with matching extension type

basedata.inventory.list_subdir_paths(directory)

Generates a list of subdirectory paths

Parameters

directory – str pathname of target parent directory

Returns

list of paths for each subdirectory in the target parent directory

basedata.inventory.list_subdirs(directory)

Generates a list of subdirectory directory basenames.

Parameters

directory – str pathname of target parent directory

Returns

list of subdirectory basenames for each subdirectory in the target parent directory

basedata.inventory.make_datafile_array(directory, add_extensions=None)

Generates an array of datafile names in a specified directory, along with the basename of the directory repeated in a separate column.

Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]

This list of extensions can be appended with the add_extensions parameter.

Parameters
  • directory – str pathname of target parent directory

  • add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]

Returns

numpy.ndarray of filenames for files with matching extension type along with a column repeating the basename of the directory

basedata.inventory.make_datafile_dataframe(directory, columns=('directory', 'filename'), add_extensions=None, return_df=True, to_file=None, **kwargs)

Generates a dataframe of subdirectory names and associated datafiles contained in each of those subdirectories.

Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]

This list of extensions can be appended with the add_extensions parameter

Parameters
  • directory – str pathname of target parent directory

  • columns – tuple specifying the name of each column, default=(‘directory’, ‘filename’)

  • add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]

  • return_df – bool indicates whether dataframe object is returned, default=True

  • to_file – str or None indicates target filepath to which dataframe is saved as a .csv file. None does not save .csv. Default=None

  • kwargs – additional named parameters for pandas.DataFrame.to_csv()

Returns

pandas.DataFrame of subdirectory names and associate datafiles stored in each subdirectory, returned only if return_df=True