astartes package

Subpackages

Submodules

astartes.main module

astartes.main.train_test_split(X: array, y: array | None = None, labels: array | None = None, train_size: float = 0.75, test_size: float | None = None, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, return_indices: bool = False)

Deterministic train_test_splitting of arbitrary arrays.

Parameters:
  • X (np.array) – Numpy array of feature vectors.

  • y (np.array, optional) – Targets corresponding to X, must be of same size. Defaults to None.

  • labels (np.array, optional) – Labels corresponding to X, must be of same size. Defaults to None.

  • train_size (float, optional) – Fraction of dataset to use in training set. Defaults to 0.75.

  • test_size (float, optional) – Fraction of dataset to use in test set. Defaults to None.

  • sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.

  • random_state (int, optional) – The random seed used throughout astartes.

  • hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.

  • return_indices (bool, optional) – True to return indices of train/test instead of values. Defaults to False.

Returns:

X, y, and labels train/val/test data, or indices.

Return type:

np.array

astartes.main.train_val_test_split(X: array | DataFrame, y: array | Series | None = None, labels: array | Series | None = None, train_size: float = 0.8, val_size: float = 0.1, test_size: float = 0.1, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, return_indices: bool = False)

Deterministic train_test_splitting of arbitrary arrays.

Parameters:
  • X (np.array, pd.DataFrame) – Numpy array or pandas DataFrame of feature vectors.

  • y (np.array, pd.Series, optional) – Targets corresponding to X, must be of same size. Defaults to None.

  • labels (np.array, pd.Series, optional) – Labels corresponding to X, must be of same size. Defaults to None.

  • train_size (float, optional) – Fraction of dataset to use in training set. Defaults to 0.8.

  • val_size (float, optional) – Fraction of dataset to use in validation set. Defaults to 0.1.

  • test_size (float, optional) – Fraction of dataset to use in test set. Defaults to 0.1.

  • sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.

  • random_state (int, optional) – The random seed used throughout astartes.

  • hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.

  • return_indices (bool, optional) – True to return indices of train/test after values. Defaults to False.

Returns:

X, y, and labels train/val/test data, or indices.

Return type:

np.array(s)

astartes.molecules module

astartes.molecules.train_test_split_molecules(molecules: array, y: array | None = None, labels: array | None = None, train_size: float = 0.75, test_size: float | None = None, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, fingerprint: str = 'morgan_fingerprint', fprints_hopts: dict = {}, return_indices: bool = False)

Deterministic train/test splitting of molecules (SMILES strings or RDKit objects).

Parameters:
  • molecules (np.array) – List of SMILES strings or RDKit molecule objects representing molecules or reactions.

  • y (np.array, optional) – Targets corresponding to SMILES, must be of same size. Defaults to None.

  • labels (np.array, optional) – Labels corresponding to SMILES, must be of same size. Defaults to None.

  • train_size (float, optional) – Fraction of dataset to use in training (test+train~1). Defaults to 0.75.

  • test_size (float, optional) – Fraction of dataset to use in test set. Defaults to None.

  • sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.

  • random_state (int, optional) – The random seed used throughout astartes. Defaults to None.

  • hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.

  • fingerprint (str, optional) – Molecular fingerprint to be used from AIMSim. Defaults to “morgan_fingerprint”.

  • fprints_hopts (dict, optional) – Hyperparameters for AIMSim featurization. Defaults to {}.

  • return_indices (bool, optional) – True to return indices of train/test after the values. Defaults to False.

Returns:

X, y, and labels train/test data, or indices.

Return type:

np.array

astartes.molecules.train_val_test_split_molecules(molecules: array, y: array | None = None, labels: array | None = None, train_size: float = 0.8, val_size: float = 0.1, test_size: float = 0.1, sampler: str = 'random', random_state: int | None = None, hopts: dict = {}, fingerprint: str = 'morgan_fingerprint', fprints_hopts: dict = {}, return_indices: bool = False)

Deterministic train_test_splitting of molecules (SMILES strings or RDKit objects).

Parameters:
  • molecules (np.array) – List of SMILES strings or RDKit molecule objects representing molecules or reactions.

  • y (np.array, optional) – Targets corresponding to SMILES, must be of same size. Defaults to None.

  • labels (np.array, optional) – Labels corresponding to SMILES, must be of same size. Defaults to None.

  • train_size (float, optional) – Fraction of dataset to use in training set. Defaults to 0.8.

  • val_size (float, optional) – Fraction of dataset to use in validation set. Defaults to 0.1.

  • test_size (float, optional) – Fraction of dataset to use in test set. Defaults to 0.1.

  • sampler (str, optional) – Sampler to use, see IMPLEMENTED_INTER/EXTRAPOLATION_SAMPLERS. Defaults to “random”.

  • random_state (int, optional) – The random seed used throughout astartes. Defaults to 42.

  • hopts (dict, optional) – Hyperparameters for the sampler used above. Defaults to {}.

  • fingerprint (str, optional) – Molecular fingerprint to be used from AIMSim. Defaults to “morgan_fingerprint”.

  • fprints_hopts (dict, optional) – Hyperparameters for AIMSim featurization. Defaults to {}.

  • return_indices (bool, optional) – True to return indices of train/test after the values. Defaults to False.

Returns:

X, y, and labels train/val/test data, or indices.

Return type:

np.array

Module contents