Concepts
Keep four concepts separate:
| Concept | Meaning | Where to look |
|---|---|---|
| Support | The complete set of SMILES strings Grimace can emit for one molecule and writer options. | MolToSmilesEnum(...) |
| Root | The atom where a rooted traversal starts, or all roots when rootedAtAtom=-1. |
rootedAtAtom |
| Decoder token | One string emitted by one decoder transition; not necessarily one character. | MolToSmilesDecoder(...), MolToSmilesDeterminizedDecoder(...) |
| Writer parity | String-level agreement with RDKit’s supported writer behavior, not just chemical equivalence. | Correctness contracts, RDKit serializer coverage |
Support
For one molecule and one set of writer flags, the support is the set of SMILES strings that Grimace can emit.
support = tuple(
grimace.MolToSmilesEnum(
mol,
rootedAtAtom=-1,
isomericSmiles=False,
canonical=False,
doRandom=True,
)
)
This is different from RDKit’s MolToSmiles(..., doRandom=True). RDKit returns
one sampled string per call. MolToSmilesEnum(...) enumerates the full support
of Grimace’s supported writer language for the same writer options.
Root
A root is the atom where a rooted SMILES traversal starts.
rootedAtAtom=-1means all valid roots.rootedAtAtom=0means start at atom0.- For disconnected molecules, the original RDKit fragment order is preserved.
Most users should start with rootedAtAtom=-1. Use one explicit root when you
need to compare or constrain a particular traversal start.
Decoder token
A Grimace token is one string emitted by one decoder transition. It is not an integer token id, and it is not necessarily one character.
Examples include:
- atom or bond fragments such as
C,c,Cl,[C@H],=,/, and\\ - syntax fragments such as
(,),1, and%10
MolToSmilesDecoder(...) is branch-preserving: two choices can have the same
token text if they represent different underlying writer branches.
MolToSmilesDeterminizedDecoder(...) merges same-text choices.
Writer parity
Writer parity is a string-level claim, not just a chemical-equivalence claim. It asks whether a string belongs to the relevant writer language.
A SMILES string can be chemically valid and parse to the same molecule while still being outside that writer language.
Use Correctness contracts for the detailed boundary between chemical semantics and writer parity. Use Limitations for the current supported scope.
API choices
| Need | Use |
|---|---|
| Every supported finished string | MolToSmilesEnum(...) |
| Legal next tokens while building a string | MolToSmilesDecoder(...) or MolToSmilesDeterminizedDecoder(...) |
| The first unsupported token or character in a candidate | Deviation diagnostics |
| Dataset vocabulary coverage | Token inventories |
| Repeated calls or storage without RDKit on read | Prepared molecules |