basedata.inventory package¶
Submodules¶
basedata.inventory.test_inventory module¶
unittests for basedata.inventory submodule functions
-
class
basedata.inventory.test_inventory.
DirBuildTests
(methodName='runTest')¶ Bases:
unittest.case.TestCase
unittests for subdirectory and test files build functions
-
test_make_dirfiles
()¶ ensure subdirectories are populated with test files
-
test_make_files
()¶ ensure test files are created
-
test_make_subdirs
()¶ ensure subdirectories are created
-
-
class
basedata.inventory.test_inventory.
InventoryTests
(methodName='runTest')¶ Bases:
unittest.case.TestCase
unittests for data.inventory submodule
-
test_list_datafiles
()¶ ensure list_datafiles returns an accurate file list
-
test_list_files_with_extensions
()¶ ensure list_files_with_extensions returns an accurate file list
-
test_list_subdir_paths
()¶ ensure list_subdir_paths generates an accurate path list
-
test_list_subdirs
()¶ ensure list_subdirs returns an accurate subdir name list
-
test_make_datafile_array
()¶ ensure make_datafile_array returns an accurate array
-
test_make_datafile_dataframe
()¶ ensure make_datafile_dataframe returns an accurate dataframe
-
test_make_datafile_dataframe_save
()¶ ensure make_datafile_dataframe to_file arg saves a .csv to file
-
-
basedata.inventory.test_inventory.
make_dirfiles
(root_dir, subdir_list, file_list)¶ makes subdirectories and populates them with test files for unittests
-
basedata.inventory.test_inventory.
make_files
(dir_path, file_list)¶ makes files in directory path for use in unittests
-
basedata.inventory.test_inventory.
make_subdirs
(root_dir, subdir_list)¶ makes subdirectories in root directory for use in unittests
Module contents¶
This submodule, basedata.inventory, contains functions for generating datafile inventory data for a target directory’s sub-directories.
-
basedata.inventory.
list_datafiles
(directory, add_extensions=None)¶ Generates a list of data-type files in a directory that have desired extension types as specified.
Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]
This list can be appended with the add_extensions parameter
- Parameters
directory – str pathname of target parent directory
add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]
- Returns
list of filenames for files with matching extension type
-
basedata.inventory.
list_files_with_extensions
(directory, ext_list)¶ Generates a list of files in a directory that have desired extension types as specified.
- Parameters
directory – str pathname of target parent directory
ext_list – list of strings specifying target extension types e.g. [‘.csv’, ‘.xls’]
- Returns
list of filenames for files with matching extension type
-
basedata.inventory.
list_subdir_paths
(directory)¶ Generates a list of subdirectory paths
- Parameters
directory – str pathname of target parent directory
- Returns
list of paths for each subdirectory in the target parent directory
-
basedata.inventory.
list_subdirs
(directory)¶ Generates a list of subdirectory directory basenames.
- Parameters
directory – str pathname of target parent directory
- Returns
list of subdirectory basenames for each subdirectory in the target parent directory
-
basedata.inventory.
make_datafile_array
(directory, add_extensions=None)¶ Generates an array of datafile names in a specified directory, along with the basename of the directory repeated in a separate column.
Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]
This list of extensions can be appended with the add_extensions parameter.
- Parameters
directory – str pathname of target parent directory
add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]
- Returns
numpy.ndarray of filenames for files with matching extension type along with a column repeating the basename of the directory
-
basedata.inventory.
make_datafile_dataframe
(directory, columns=('directory', 'filename'), add_extensions=None, return_df=True, to_file=None, **kwargs)¶ Generates a dataframe of subdirectory names and associated datafiles contained in each of those subdirectories.
Default extensions captured in this function are [‘.csv’, ‘.xls’, ‘.xlsx’, ‘.sqlite3’]
This list of extensions can be appended with the add_extensions parameter
- Parameters
directory – str pathname of target parent directory
columns – tuple specifying the name of each column, default=(‘directory’, ‘filename’)
add_extensions – list of strings specifying additional target extension types, e.g. [‘.txt’, ‘.parquet’]
return_df – bool indicates whether dataframe object is returned, default=True
to_file – str or None indicates target filepath to which dataframe is saved as a .csv file. None does not save .csv. Default=None
kwargs – additional named parameters for pandas.DataFrame.to_csv()
- Returns
pandas.DataFrame of subdirectory names and associate datafiles stored in each subdirectory, returned only if return_df=True