astartes package
Subpackages
- astartes.samplers package
- Subpackages
- astartes.samplers.extrapolation package
- Submodules
- astartes.samplers.extrapolation.dbscan module
- astartes.samplers.extrapolation.kmeans module
- astartes.samplers.extrapolation.molecular_weight module
- astartes.samplers.extrapolation.optisim module
- astartes.samplers.extrapolation.scaffold module
- astartes.samplers.extrapolation.sphere_exclusion module
- astartes.samplers.extrapolation.target_property module
- astartes.samplers.extrapolation.time_based module
- Module contents
- astartes.samplers.interpolation package
- astartes.samplers.extrapolation package
- Submodules
- astartes.samplers.abstract_sampler module
- Module contents
- Subpackages
- astartes.utils package
- Submodules
- astartes.utils.aimsim_featurizer module
Molecule
Molecule.mol_graph
Molecule.mol_text
Molecule.mol_property_val
Molecule.descriptor
Molecule.get_descriptor_val()
Molecule.match_fingerprint_from()
Molecule.get_similarity_to()
Molecule.get_name()
Molecule.get_mol_property_val()
Molecule.draw()
Molecule.is_same()
Molecule.__init__()
Molecule.draw()
Molecule.get_descriptor_val()
Molecule.get_mol_property_val()
Molecule.get_name()
Molecule.get_similarity_to()
Molecule.is_same()
Molecule.match_fingerprint_from()
Molecule.set_descriptor()
featurize_molecules()
- astartes.utils.array_type_helpers module
- astartes.utils.exceptions module
- astartes.utils.fast_kennard_stone module
- astartes.utils.sampler_factory module
- astartes.utils.user_utils module
- astartes.utils.warnings module
- Module contents
Submodules
astartes.main module
- astartes.main.train_test_split(X: array, y: array | None = None, labels: array | None = None, train_size: float = 0.75, test_size: float | None = None, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, return_indices: bool = False)
Deterministic train_test_splitting of arbitrary arrays.
- Parameters:
X (np.array) – Numpy array of feature vectors.
y (np.array, optional) – Targets corresponding to X, must be of same size. Defaults to None.
labels (np.array, optional) – Labels corresponding to X, must be of same size. Defaults to None.
train_size (float, optional) – Fraction of dataset to use in training set. Defaults to 0.75.
test_size (float, optional) – Fraction of dataset to use in test set. Defaults to None.
sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.
random_state (int, optional) – The random seed used throughout astartes.
hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.
return_indices (bool, optional) – True to return indices of train/test instead of values. Defaults to False.
- Returns:
X, y, and labels train/val/test data, or indices.
- Return type:
np.array
- astartes.main.train_val_test_split(X: array | DataFrame, y: array | Series | None = None, labels: array | Series | None = None, train_size: float = 0.8, val_size: float = 0.1, test_size: float = 0.1, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, return_indices: bool = False)
Deterministic train_test_splitting of arbitrary arrays.
- Parameters:
X (np.array, pd.DataFrame) – Numpy array or pandas DataFrame of feature vectors.
y (np.array, pd.Series, optional) – Targets corresponding to X, must be of same size. Defaults to None.
labels (np.array, pd.Series, optional) – Labels corresponding to X, must be of same size. Defaults to None.
train_size (float, optional) – Fraction of dataset to use in training set. Defaults to 0.8.
val_size (float, optional) – Fraction of dataset to use in validation set. Defaults to 0.1.
test_size (float, optional) – Fraction of dataset to use in test set. Defaults to 0.1.
sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.
random_state (int, optional) – The random seed used throughout astartes.
hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.
return_indices (bool, optional) – True to return indices of train/test after values. Defaults to False.
- Returns:
X, y, and labels train/val/test data, or indices.
- Return type:
np.array(s)
astartes.molecules module
- astartes.molecules.train_test_split_molecules(molecules: array, y: array | None = None, labels: array | None = None, train_size: float = 0.75, test_size: float | None = None, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, fingerprint: str = 'morgan_fingerprint', fprints_hopts: dict = {}, return_indices: bool = False)
Deterministic train/test splitting of molecules (SMILES strings or RDKit objects).
- Parameters:
molecules (np.array) – List of SMILES strings or RDKit molecule objects representing molecules or reactions.
y (np.array, optional) – Targets corresponding to SMILES, must be of same size. Defaults to None.
labels (np.array, optional) – Labels corresponding to SMILES, must be of same size. Defaults to None.
train_size (float, optional) – Fraction of dataset to use in training (test+train~1). Defaults to 0.75.
test_size (float, optional) – Fraction of dataset to use in test set. Defaults to None.
sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.
random_state (int, optional) – The random seed used throughout astartes. Defaults to None.
hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.
fingerprint (str, optional) – Molecular fingerprint to be used from AIMSim. Defaults to “morgan_fingerprint”.
fprints_hopts (dict, optional) – Hyperparameters for AIMSim featurization. Defaults to {}.
return_indices (bool, optional) – True to return indices of train/test after the values. Defaults to False.
- Returns:
X, y, and labels train/test data, or indices.
- Return type:
np.array
- astartes.molecules.train_val_test_split_molecules(molecules: array, y: array | None = None, labels: array | None = None, train_size: float = 0.8, val_size: float = 0.1, test_size: float = 0.1, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, fingerprint: str = 'morgan_fingerprint', fprints_hopts: dict = {}, return_indices: bool = False)
Deterministic train_test_splitting of molecules (SMILES strings or RDKit objects).
- Parameters:
molecules (np.array) – List of SMILES strings or RDKit molecule objects representing molecules or reactions.
y (np.array, optional) – Targets corresponding to SMILES, must be of same size. Defaults to None.
labels (np.array, optional) – Labels corresponding to SMILES, must be of same size. Defaults to None.
train_size (float, optional) – Fraction of dataset to use in training set. Defaults to 0.8.
val_size (float, optional) – Fraction of dataset to use in validation set. Defaults to 0.1.
test_size (float, optional) – Fraction of dataset to use in test set. Defaults to 0.1.
sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.
random_state (int, optional) – The random seed used throughout astartes. Defaults to 42.
hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.
fingerprint (str, optional) – Molecular fingerprint to be used from AIMSim. Defaults to “morgan_fingerprint”.
fprints_hopts (dict, optional) – Hyperparameters for AIMSim featurization. Defaults to {}.
return_indices (bool, optional) – True to return indices of train/test after the values. Defaults to False.
- Returns:
X, y, and labels train/val/test data, or indices.
- Return type:
np.array