Hit Dexter: How likely is my compound a frequent hitter?

About

Hit Dexter employs machine learning models to predict the likelihood of small molecules triggering positive signals in biochemical assays. Compounds prone to causing (false-) positive signals are often referred to as frequent hitters. These compounds include aggregators, pan-assay interference compounds (PAINS) and compounds with reactive groups. Hit Dexter consists of two extra trees classifiers trained on a dataset of 250k compounds that have been measured for activity on at least 50 different protein groups. Morgan2 (i.e. extended-connectivity) fingerprints served as molecular descriptors. Hit Dexter includes a model to discriminate highly promiscuous from non-promiscuous compounds, and a further model to discriminate promiscuous (i.e. moderately and highly promiscuous) compounds from non-promiscuous compounds. Highly, moderately, and non-promiscuous compounds are defined as compounds active on >9.6%, >4.2% and <1.4% of all measured target groups, respectively. The results of Hit Dexter can serve as a valuable source of information that can direct further experimental tests but should not be used as any kind of a hard filter to discard compounds. For further detail see Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters

Usage

Enter SMILES or upload a file with a list of SMILES. Click submit to start the calculation. You will then be forwarded to the result page. The result page contains a table with the input SMILES, the name of the molecule (if no name was provided, the molecules are numbered consecutively), and the probabilities calculated by the models of each compound being highly promiscuous or moderately promiscuous. Errors and warnings are listed in the ".log" file, which can be downloaded. The table of results can be downloaded in .csv format.

Example input file

Example 1.1: CCOC(=O)N1CCN(CC1)C2=C(C(=O)C2=O)N3CCN(CC3)C4=CC=C(C=C4)OC

C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O)O)O

C1=CC(=C(C=C1O)O)C(=O)C=CC2=CC(=C(C(=C2)O)O)O

Example 2.1: CCOC(=O)N1CCN(CC1)C2=C(C(=O)C2=O)N3CCN(CC3)C4=CC=C(C=C4)OC PhantomPAINSexample

C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O)O)O exampleAggegator

C1=CC(=C(C=C1O)O)C(=O)C=CC2=CC(=C(C(=C2)O)O)O exampleReactive

Generalized example 1.2: SMILES

SMILES

SMILES

... Generalized example 2.2: SMILES Some_Name_Of_The_Molecule

SMILES Some_Name_Of_The_Molecule

SMILES Some_Name_Of_The_Molecule

...

Example output heat map

Example 1.1:
SMILES Molecule name Comment Hit Dexter: Probability and prediction confidence of a compound being moderately or highly promiscuous Similarity of a compound to known aggregators and dark chemical matter (DCM) Number of undesired functional groups present in a compound
Moderate or high promiscuity Distance to closest training instance High promiscuity Distance to closest training instance Distance to closest aggregator Distance to closest DCM PAINS SMARTS (480 patterns) BMS (180 patterns) Dundee (105 patterns) Glaxo (55 patterns) Pfizer (57 patterns) MLSMR (116 patterns) SureChEMBL (166 patterns) Error/Warning (see log file for details)

Example output tabular view

Example 1.1:
SMILES Molecule name Comment Hit Dexter: Probability and prediction confidence of a compound being moderately or highly promiscuous Similarity of a compound to known aggregators and dark chemical matter (DCM) Number of undesired functional groups present in a compound
Moderate or high promiscuity Distance to closest training instance High promiscuity Distance to closest training instance Distance to closest aggregator Distance to closest DCM PAINS SMARTS (480 patterns) BMS (180 patterns) Dundee (105 patterns) Glaxo (55 patterns) Pfizer (57 patterns) MLSMR (116 patterns) SureChEMBL (166 patterns) Error/Warning (see log file for details)

Interpretation of example output

mol3
Molecule "CCOC(=O)N1CCN(CC1)C2=C(C(=O)C2=O)N3CCN(CC3)C4=CC=C(C=C4)OC", named "1", has two known PAINS substructures but shows no activity in any screeing assay. Both of our models state that this molecule is one hundred percent likely to be non-promiscuous.
mol1
The molecule "C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O)O)O", named "2", was put into the two machine learning models. The model for discriminating between non-promiscuous and highly promiscuous molecules predicts that this molecule is highly promiscuous with one hundred percent likelihood. The second model discriminating between non-promiscuous and at least moderately promiscuous molecules gives the same result. Be careful with such compounds. In this example the molecule is a known aggregator that gives false positive readouts in screening assays.
mol2
The second molecule "C1=CC(=C(C=C1O)O)C(=O)C=CC2=CC(=C(C(=C2)O)O)O", named "3", is with a probability of 0.860 a highly promiscuous molecule and with 0.960 at least a moderately promiscuous molecule. In this example as well, a second look at the compound should be taken. This molecule has reactive groups and might interact with the assay or protein in a covalent way.
In principle it is difficult to state at which probability cutoff a compound should be investigated further, and you should adapt the cutoff to your individual case.