Damavand Documention - Datasets Module API Reference
Data is the essential ingridient for any data-driven analysis, including rotating machinery intelligent condition monitoring. On this page, we go through both submodules of the datasets module: 1)Downloaders and 1)Digestors. While the former is focused around downloading the raw dataset from internet, the latter is developed around structurizing the raw data.
Downloaders Submodule
read_addresses
Loading the addresses of datasets to download them
Arguments:
- This function recieves no argument
Return Value:
- A
dictobject whose keys are dataset names and values are download addresses.
Description:
Using this function, one is able to load the download addresses of the available datasets, as a python dictionary. For single-file datasets, the value corresponding to the dataset name key, is the download link. For datasets consisting of various files, value corresponding to the dataset name will be another python dictionary whose keys are file names and corresponding values are download links.
Usage example:
# Importing
from damavand.damavand.datasets.downloaders import read_addresses
# Loading the addresses
addresses = read_addresses()
ZipDatasetDownloader
Downloading datasets stored in single Zip files
Description
Using this class, one is able to download and extract datasets that are available as single zip files (e.g., SEU).
Instantiation: ZipDatasetDownloader(url)
urlis the download link.
Downloading: ZipDatasetDownloader.download(download_file)
download_fileis the directory in which the zip file is downloaded.- This value is stored in
ZipDatasetDownloader.download_fileonceZipDatasetDownloader.download()is called.
Extraction: ZipDatasetDownloader.extract(extraction_path)
extraction_pathis the directory where the zip file is extracted.- This value is stored in
ZipDatasetDownloader.extraction_pathonceZipDatasetDownloader.extract()is called.
Merging downloading and extraction steps: ZipDatasetDownloader.download_extract(download_path, extraction_path)
download_fileis the directory in which the zip file is downloaded.
This value is stored inZipDatasetDownloader.download_fileonceZipDatasetDownloader.download()is called.extraction_pathis the directory where the zip file is extracted.
This value is stored inZipDatasetDownloader.extraction_pathonceZipDatasetDownloader.download_extract()is called.
Usage example:
# Importing
from damavand.damavand.datasets.downloaders import read_addresses, ZipDatasetDownloader
# Loading the addresses
addresses = read_addresses()
# Downloading the dataset
downloader = zipDatasetDownloader(addresses['SEU'])
downloader.download_extract('SEU.zip', 'SEU/)
CwruDownloader
A custom downloader for CWRU dataset.
Description:
Instantiation: CwruDownloader(files)
filesis a pythondictionarywhose keys are file names and corresponding values are the download links.
Downloading: CwruDownloader.download(download_path, chunk_size = 512, delay = 1)
download_pathis the directory where desired files are downloaded to. This value is stored inCwruDownloader.download_path, onceCwruDownloader.download()is called.chunk_sizeto avoid corrupted downloading, responses are read in chunks; this variable controls the size of the chunks. Default value is 512 Bytes.delayto avoid server overload, it is better to place a small delay between requesting consecutive files. This argument controls the dealy time interval, in the unit of seconds. The default value is 1 second.
Redownloading errored downlaods: CwruDownloader.redownload(chunk_size = 512, delay = 1)
chunk_sizeto avoid corrupted downloading, responses are read in chunks; this variable controls the size of the chunks. Default value is 512 Bytes.delayto avoid server overload, it is better to place a small delay between requesting consecutive files. This argument controls the dealy time interval, in the unit of seconds. The default value is 1 second.
Undownloaded files: CwruDownloader.undownloaded
If a file is not downloaded properly - either during the CwruDownloader.download() or CwruDownloader.redownload() - it is added to CwruDownloader.undownloaded as a pair of key (file name) and value (corresponding error). This can be later used to complete the downloading process.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, CwruDownloader
# Loading the addresses
addresses = read_addresses()
# Downloading the addresses
downloader = CwruDownloader(addresses['CWRU'])
downloader.download('CWRU/')
while len(list(downloader.undownloaded.keys())) > 0:
downloader.redownload()
PuDownloader
A custom downloader for PU dataset.
Description:
Instantiation: PuDownloader(files)
filesis a pythondictionarywhose keys are file names and corresponding values are the download links.
Downloading: PuDownloader.download(download_path, timeout = 10)
download_pathis the directory where rar files are donwloaded to. This value is stored inPuDownloader.download_path, oncePuDownloader.download()is called.timeoutis the number of seconds that downloader waits to download a file.
Extracting: PuDownloader.extract(extraction_path)
extraction_pathis the directory that rar files are extracted to. This value is stored inPuDownloader.extraction_path, oncePuDownloader.extract()is called.
Merging downloading and extraction steps: PuDownloader.download_extract(download_path, extraction_path)
download_pathis the directory in which the rar files are downloaded to. This value is stored inPuDownloader.download_path, oncePuDownloader.download_extract()is called.extraction_pathis the directory that rar files are extracted to. This value is stored inPuDownloader.extraction_path, oncePuDownloader.download_extract()is called.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, PuDownloader
# Loading the addresses
addresses = read_addresses()
# Downloading the dataset
downloader = PuDownloader(addresses['PU'])
PuDownloader(addresses['PU']).download_extract('PU_rarfiles', 'PU/')
MaFaulDaDownloader
A custom downloader for MaFaulDa dataset.
Description:
Instantiation: MaFaulDaDownloader(files)
filesis a pythondictionarywhose keys are file names and corresponding values are the download links.
Downloading: MaFaulDaDownloader.download(download_path)
download_pathis the directory where desired files are donwloaded to. This value is stored inMaFaulDaDownloader.download_path, onceMaFaulDaDownloader.download()is called.
Extracting: MaFaulDaDownloader.extract(extraction_path)
extraction_pathis the directory that zip files are extracted to. This value is stored inMaFaulDaDownloader.extraction_path, onceMaFaulDaDownloader.extract()is called.
Merging downloading and extraction steps: MaFaulDaDownloader.download_extract(download_path, extraction_path)
download_pathis the directory in which the zip file is downloaded to. This value is stored inMaFaulDaDownloader.download_path, onceMaFaulDaDownloader.download_extract()is called.extraction_pathis the directory that zip file is extracted to. This value is stored inMaFaulDaDownloader.extraction_path, onceMaFaulDaDownloader.download_extract()is called.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, MaFaulDaDownloader
# Loading the addresses
addresses = read_addresses()
# Downloading the dataset
downloader = MaFaulDaDownloader(addresses['MaFaulDa'])
PuDownloader(addresses['MaFaulDa']).download_extract('MaFaulDa_zipfiles', 'MaFaulDa/')
Digestors Submodule
KAIST
A digestor to mine the dataset by Korea Advanced Institute of Science and Technology (KAIST)
Original title: Vibration, Acoustic, Temperature, and Motor Current Dataset of Rotating Machine Under Varying Operating Conditions for Fault Diagnosis
External resources:
- https://data.mendeley.com/datasets/ztmf3m7h5x/6
- https://www.sciencedirect.com/science/article/pii/S2352340923001671
Description:
Instantiation: KAIST(base_directory, files, channels)
base_directoryis the home directory of the extracted files.filesis the list of files of interest; to include all files, useos.listdir(base_directory)channelsis the ist of channels to include; 0, 1, 2 and 3 correspond to x direction - housing A, y direction - housing A, x direction - housing B and y direction - housing B, respectively. Default value is [0, 1, 2, 3].
Mining: KAIST.mine(mining_params)
mining_paramsis a python dictonary whose keys arewin_lenandhop_lenwith their correponding values.
Accessing data: KAIST.data
Mined data is presented as a python dictonary whose keys correspond to the channels and values are list of pd.DataFrame objects.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, ZipDatasetDownloader
from damavand.damavand.datasets.digestors import KAIST
# Downloading the dataset
addresses = read_addresses()
downloader = ZipDatasetDownloader(addresses['KAIST'])
downloader.download_extract('KAIST.zip', 'KAIST/')
# Mining the dataset (using only two channels out of four)
kaist = KAIST('KAIST/', os.listdir('KAIST/'), list(range(2)))
mining_params = {
'win_len': 20000,
'hop_len': 20000,
}
kaist.mine(mining_params)
MFPT
A digestor to mine the dataset by Society for Machinery Failure Prevention Technology (MFPT)
Original title: Condition Based Maintenance Fault Database for Testing of Diagnostic and Prognostics Algorithms - Bearing Fault Dataset
External resources:
- https://www.mfpt.org/fault-data-sets/
- https://mfpt.org/wp-content/uploads/2018/03/MFPT-Bearing-Envelope-Analysis.pdf
Description:
Instantiation: MFPT(base_directory, folders)
base_directoryis the home directory of the extracted file.filesis the list of files to include; valid elements are: - baseline_1.mat - InnerRaceFault_vload_1.mat - InnerRaceFault_vload_2.mat - InnerRaceFault_vload_4.mat - InnerRaceFault_vload_7.mat - OuterRaceFault_1.mat - OuterRaceFault_vload_1.mat - OuterRaceFault_vload_2.mat - OuterRaceFault_vload_4.mat - OuterRaceFault_vload_7.mat
Mining: MFPT.mine(mining_params)
mining_paramsis a nested python dictonary whose keys are 97656 and 48828 (sampling frequencies the dataset is collected by) and the corresponding values are objects of python dictonary. Secondary dictonaries each have two keys:win_lenandhop_lenwith correponding values.
Accessing data: MFPT.data
- Mined data is presented as a python dictonary whose keys are 97656 and 48828. Corresponding values are lists of
pd.DataFrameobjects, belonging to the data files recorded to the corresponding sampling frequency.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, ZipDatasetDownloader
from damavand.damavand.datasets.digestors import MFPT
# Downloading the dataset
addresses = read_addresses()
downloader = ZipDatasetDownloader(addresses['MFPT'])
downloader.download_extract('MFPT.zip', 'MFPT/')
# Mining the dataset
mfpt = MFPT('MFPT/', [
'baseline_1.mat',
'InnerRaceFault_vload_1.mat',
'InnerRaceFault_vload_2.mat',
'InnerRaceFault_vload_4.mat',
'InnerRaceFault_vload_7.mat',
'OuterRaceFault_1.mat',
'OuterRaceFault_vload_1.mat',
'OuterRaceFault_vload_2.mat',
'OuterRaceFault_vload_4.mat',
'OuterRaceFault_vload_7.mat',
])
mining_params = {
97656: {'win_len': 16671, 'hop_len': 2000},
48828: {'win_len': 8337, 'hop_len': 1000},
}
mfpt.mine(mining_params)
CWRU
A digestor to mine the bearing dataset by Case Western Reserve University (CWRU)
External resource:
- https://engineering.case.edu/bearingdatacenter
Description:
Instantiation: CWRU(base_directory, channels)
base_directoryis the home directory of the downloaded files.channelsis a list of strings to include the desired measurement channels; 'FE' and 'DE' corresponding to fan-end acceleration, drive-end acceleration and base acceleration respectively, are available choices. Default value is ['FE', 'DE'].
Mining: CWRU.mine(mining_params, synchronous_only)
mining_paramsis a nested python dictonary whose keys are '12K' and '48K' (sampling frequencies used to collect the dataset) and values are again python dictionaries whose keys arewin_lenandhop_len.synchronous_onlyis a boolean variable to be used as a flag; once this flag is setTrue, only files which contain all the desired channels are mined and ones missing one of the channels are skipped. Default value isFalse.
Accessing data: CWRU.data
- Mined data is organized as a nested python dictonary whose keys are elements of the
channels; corresponding values again python dictionaries whose keys are the sampling frequencies; '12K' and '48K'.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, CwruDownloader
from damavand.damavand.datasets.digestors import CWRU
# Downloading the dataset
addresses = read_addresses()
downloader = CwruDownloader(addresses['CWRU'])
downloader.download('CWRU/')
while len(list(downloader.undownloaded.keys())) > 0:
downloader.redownload()
# Mining the dataset
mining_params = {
'12K': {'win_len': 12000, 'hop_len': 3000},
'48K': {'win_len': 48000, 'hop_len': 16000},
}
cwru = CWRU('CWRU/')
cwru.mine(mining_params, synchronous_only = True)
SEU
A digestor to mine the gearbox dataset from Southeast University (SEU)
Original title: Gearbox dataset from Highly Accurate Machine Fault Diagnosis Using Deep Transfer Learning
External resources:
- https://ieeexplore.ieee.org/abstract/document/8432110
- https://github.com/cathysiyu/Mechanical-datasets/tree/master/gearbox
Description
Instantiation: SEU(base_directory, channels)
base_directoryis the home directory of the downloaded files.channelsis a list of integers (from 0 to 7), corresponding to 8 accelerometers. Default value is [0, 1, 2, 3, 4, 5, 6, 7].
Mining: SEU.mine(mining_params)
mining_paramsis a python dictonary whose keys arewin_lenandhop_len.
Accessing data: SEU.data
Mined data is organized as a python dictonary whose keys are elements of the channels; corresponding values are lists of pd.DataFrame objects.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, ZipDatasetDownloader
from damavand.damavand.datasets.digestors import SEU
# Downloading the dataset
addresses = read_addresses()
downloader = ZipDatasetDownloader(addresses['SEU'])
downloader.download_extract('SEU.zip', 'SEU/')
# Mining the dataset
mining_params = {'win_len': 10000, 'hop_len': 10000}
seu = SEU('SEU/')
seu.mine(mining_params)
MaFaulda
A digestor to mine the Machinery Fault Database (MaFaulDa)
-
Original title: Machinery Fault Database
-
External resources:
- https://www02.smt.ufrj.br/~offshore/mfs/page_01.html
Description:
-
Instantiation:
MaFauldDa(base_directory, folders, channels)-
base_directoryis the home directory of the extracted folders. -foldersis a list to include folders of interest, during the mining process. -channelsis a list of integers (from 0 to 7), corresponding to the tachometer, 3 accelerometers on the underhang bearing (axial, radial and tangential), 3 accelerometers on the overhang bearing (axial, radial and tangential) and a microphone. -
Mining:
MaFaulda.mine(mining_params)-
mining_paramsis a python dictonary whose keys arewin_lenandhop_len. -
Accessing data:
MaFaulda.dataMined data is organized as a python dictonary whose keys are elements of the
channels; corresponding values are lists ofpd.DataFrameobjects.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, MaFaulDaDownloader
from damavand.damavand.datasets.digestors import MaFauldDa
# Downloading the dataset
addresses = read_addresses()
downloader = MaFaulDaDownloader({key: addresses['MaFaulDa'][key] for key in ['normal.zip', 'imbalance.zip']})
downloader.download_extract('mafaulda_zip_files/', 'mafaulda/')
# Mining the dataset (using only the third channel)
mafaulda = MaFauldDa('mafaulda/', os.listdir('mafaulda/'), channels = [2])
mining_params = {
'win_len': 50000,
'hop_len': 50000
}
mafaulda.mine(mining_params)
MEUT
A digestor to mine the dataset by Mehran University of Engineering & Technology (MEUT)
Original title: Triaxial bearing vibration dataset of induction motor under varying load conditions
External resources:
- https://data.mendeley.com/datasets/fm6xzxnf36/2
- https://www.sciencedirect.com/science/article/pii/S2352340922005170
Description:
Instantiation: MUET(base_directory, folders, channels)
base_directoryis the home directory of the extracted folders.foldersis the list of folders to include during the mining process.channelsis the list of integers, corresponding to the triaxial acceleration signals; 1, 2 and 3 correspond to X-axis, Y-axis, and Z-axis. The default value is [1, 2, 3].
Mining: MUET.mine(mining_params)
mining_paramsis a python dictonary whose keys arewin_lenandhop_len.
Accessing data: MUET.data
Mined data is organized as a python dictonary whose keys are elements of the channels; corresponding values are lists of pd.DataFrame objects.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, ZipDatasetDownloader
from damavand.damavand.datasets.digestors import MUET
# Downloading the dataset
addresses = read_addresses()
downloader = ZipDatasetDownloader(addresses['MEUT'])
downloader.download_extract('MEUT.zip', 'MEUT/')
# Mining the dataset
dataset = MUET('MEUT/fm6xzxnf36-2/', os.listdir('MEUT/fm6xzxnf36-2/'), [3])
mining_params = {
'win_len': 10000,
'hop_len': 5000
}
dataset.mine(mining_params)
UoO
A digestor to mine the bearing dataset from University of Ottawa (UoO)
Original title: Bearing vibration data collected under time-varying rotational speed conditions
External resources:
- https://www.sciencedirect.com/science/article/pii/S2352340918314124
- https://data.mendeley.com/datasets/v43hmbwxpm/1
Description:
Instantiation: UoO(base_directory, channels, reps)
base_directoryis the home directory of the extracted folders.channelsis a list of strings, specifying the desired channels; available choices are 'channel_1' and 'channel_2', corresponding to the acceleration and the rotational speed, respectively. The default value is ['channel_1', 'channel_2'].repsis a list of integers that specifies the number of measurement repetitions (1, 2 and 3), to include. Default value is [1, 2, 3].
Mining: UoO.mine(mining_params)
mining_paramsis a python dictonary whose keys arewin_lenandhop_len.
Accessing data: UoO.data
Mined data is organized as a python dictonary whose keys are elements of the channels; corresponding values are lists of pd.DataFrame objects.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, ZipDatasetDownloader
from damavand.damavand.datasets.digestors import UoO
# Downloading the dataset
addresses = read_addresses()
downloader = ZipDatasetDownloader(addresses['UoO'])
downloader.download_extract('UoO.zip', 'UoO/')
# Mining the dataset
dataset = UoO('UoO/', ['Channel_1', 'Channel_2'], [1])
mining_params = {'win_len': 10000, 'hop_len': 10000}
dataset.mine(mining_params)
PU
A digestor to mine the bearing dataset by Paderborn University (PU)
Original title: Condition Monitoring of Bearing Damage in Electromechanical Drive Systems by Using Motor Current Signals of Electric Motors: A Benchmark Data Set for Data-Driven Classification
External resources:
- https://www.papers.phmsociety.org/index.php/phme/article/view/1577
- https://mb.uni-paderborn.de/kat/forschung/kat-datacenter/bearing-datacenter/data-sets-and-download
Description:
Instantiation: PU(base_directory, folders, channels, reps)
base_directoryis the home directory of the extracted folders.foldersis the list of the extracted folders, to include.channelsis the ist of strings, specifying the desired channels; available choices are 'CP1', 'CP2' and 'Vib' corresponding to the current phases (1 and 2) and acceleration. The default value is ['CP1', 'CP2', 'Vib'].repsis a list of integers that specifies the number of repetitions (from 1 to 20), to include. The default value is [1, 2, 3, ... , 19, 20].
Mining: PU.mine(mining_params)
mining_paramsis a python dictonary whose keys arewin_lenandhop_len.
Accessing data: PU.data
Mined data is organized as a python dictonary whose keys are elements of the channels; corresponding values are lists of pd.DataFrame objects.
Usage example:
# Importings
from damavand.damavand.datasets.downloaders import read_addresses, PuDownloader
from damavand.damavand.datasets.digestors import PU
# Downloading the dataset
addresses = read_addresses()
addresses['PU'].pop('real_damage')
downloader = PuDownloader(addresses['PU'])
downloader.download_extract(download_path = 'PU_compressed/', extraction_path = 'PU/', timeout = 10)
#Mining the dataset
mining_params = {'win_len': 16000, 'hop_len': 16000}
pu = PU('PU/', os.listdir('PU/'), ['Vib'],reps = [1])
pu.mine(mining_params)