The ens Command

The ens command is the generic command used to manipulate molecular ensembles. Ensembles are the most commonly used chemistry major object. Ensembles contain atom, bonds, molecules and other minor objects.

The syntax of this command follows the standard schema of command/subcommand/majorhandle. Since molecular ensembles are major objects, they are not addressed via labels.

Similar to the functionality of molfile and dataset objects, ensembles can be persistent, or transient. Persistent ensembles are those created by the ens create command or similar functions. They possess a handle and exist until explicitly deleted. Transient ensembles only exist for the duration of a single command. They are deleted as soon as the command finishes, regardless whether the command was successful or not.

Examples:

ens get $ehandle E_SMILES
ens merge [ens create CCC] [ens create CCC]
ens get lycorine E_CID

This is the list of officially supported subcommands:

ens add

ens add ehandle ?ehandle_list?...

This command performs the same operation as the ens merge command, but preserves the ensembles in the merge lists (argument four and onwards). The base ensemble (third argument) is modified.

Please refer to the ens merge command for a more detailed documentation.

ens align3d

ens align3d ehandle box/center/masscenter/pmi ?usehydrogens?

Perform a 3D alignment by modifying standard atom coordinates property A_XYZ .

The possible alignment modes are

By default all atoms are used to compute the alignment rotation and movement vectors, including hydrogens. If these should be omitted from computing the movement vectors (but not the subsequent atom movement), the optional parameter can be set to false .

ens append

ens append ehandle property value ?property value?..

Standard data manipulation command for appending property data. It is explained in more detail in the section about setting property data.

Example:

ens append $ehandle E_NAME “_linker”

ens assign

ens assign ehandle srcproperty dstproperty

Assign property data to another property on the same ensemble. This process is more efficient than going through a pair or ens get/ens set commands, because in most cases no string or Tcl script object representations of the property data need to be created.

Both source and destination properties may be addressed with subfields. A data conversion path must exist between the data types of the involved properties. If any data conversion fails, the command fails. For example, it is possible to assign a string property to a numeric property - but only if all property values can be successfully converted to that numeric type. The reverse example case always succeeds, out-of-memory errors and similar global events excluded.

The original property data remains valid. The command variant ens rename directly exchanges the property name without any data duplication or conversion, if that is possible. In any case, the original property data is no longer present after the execution of this command variant.

Examples:

ens assign $ehandle A_XY A_XY%
ens assign $ehandle E_NMRSPECTRUM(spectrometer) E_METHOD
ens rename $ehandle E_IDENT E_NAME

ens atoms

ens atoms ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of the atoms the ensemble contains as minor objects. This is explained in more detail in the section about object cross-references.

Examples:

ens atoms $ehandle
ens atoms $ehandle hydrogen
ens atoms $ehandle !hydrogen count

The first example simply returns a list of the labels of the atoms the ensemble contains as minor objects. The second example returns the atom label(s) of all hydrogen atoms in the ensemble. If there are no such atoms, an empty list is returned. The final example counts the number of non-hydrogen atoms in the ensemble.

ens bonds

ens bonds ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of the bonds the ensemble contains as minor objects. This is explained in more detail in the section about object cross-references.

Examples:

ens bonds $ehandle
ens bonds $ehandle doublebond
ens bonds $ehandle carbon count

The first example simply returns a list of the labels of the bonds the ensemble contains as minor objects. The second example returns the bonds label(s) of all double bonds in the ensemble. If there are no such bonds, an empty list is returned. The final example counts the number of bonds which involve one or more carbon atoms in the ensemble.

ens cast

ens cast ehandle dataset/ens/reaction/table ?propertylist?

Transform the ensemble into a different object. Depending on the target object class, the result is as follows:

If the optional property list is specified, an attempt is made to compute the listed properties before the cast operation, so that they may become a part of the new object. No error is raised if a computation fails.

The command returns the handle of the new object, or the input object handle in case of mode ens .

ens clear

ens clear ehandle

This command resets an ensemble to a virgin state. All minor objects and all property data of the ensemble are deleted. However, the ensemble handle remains valid, representing an ensemble without any atoms, bonds or other minor objects.

ens copy

ens copy src_ehandle dst_ehandle

Create a copy of the input ensemble in the framework of an existing ensemble. The old data of the destination ensemble is destroyed, but its handle is reused for the copy. The destination handle can be an empty string. In that case, the ensemble is duplicated and a new handle assigned.

This command is useful when references to an ensemble handle are potentially stored in unknown locations and the ensemble data needs to be updated.

The return value of the command is the handle of destination ensemble. It is allowed to copy an ensemble onto itself.

Example:

set eh1 [ens create CC]
set eh2 [ens create CCC]
ens copy $eh1 $eh2

After the example code sequence, both ensembles represent ethane, the first compound. However, these are independent ensembles. Any further modifications of the ensemble data on any of the ensembles will not be seen by the other.

ens create

ens create ?codestring? ?mode? ?datasethandle?

This command creates a new molecular ensemble and returns its handle. If none of the optional arguments are specified, or the argument string is an empty string, an empty ensemble without any atoms or bonds is created. These may later be populated with commands like atom create.

If data string may either begin with an automatically recognized prefix, or an automatic format detection process is initiated. Recognized prefixes are:

The colon in the prefix may be omitted (except for the name: item), but this is not recommended, since it may lead to misinterpretation of the data if the prefix is also part of a valid structure encoding.

In addition, URLs as structure data argument are automatically detected and handled specially. If the URL is a data URI, it is unpacked and its payload processed in a second cycle. If it is an HTTP or FTP URL, the file is downloaded and its contents read a a structure file with automatic format detection. This is not identical to data URI processing: Data URIs are again interpreted as command arguments with all prefix and line notation interpretation, while file contents are only interpreted as a record in a structure data file.

If none of the above special cases are recognized, automatic interpretation is performed next. Currently, the encoding then may either be

In the absence of a prefix, the encoding is automatically detected. With the exception of PubChem CIDs, the long form of a database ID must be used, not its simple integer value (i.e. a simple 70 is interpreted as PubChem CID, while CHEMBL70 or chembl:70 are read as ChEMBL database IDs).

For the base64 -encoded compressed records, the compression algorithm may be raw zlib , gzip or zip and its type is automatically detected.

In case one of the SMILES -class encoding schemes is used, the mode argument of the ens create command provides finer control of the decoding. By default, or when this argument is an empty string, the string is interpreted as standard SMILES , except when there are elements in the string which cannot occur in SMILES but in SMARTS . In SMILES mode, query expressions are only recognized to a very limited degree, and implicit hydrogens are automatically added. This decoding scheme may also be explicitly selected by specifying hadd as mode.

Mode nohadd is essentially the same, but implicit hydrogen addition does not happen. In any case, explicitly encoded hydrogen is decoded and preserved.

Mode smarts (or query ) also skips hydrogen addition, but in addition the decoder now fully parses SMARTS , including Recursive SMARTS, but it also becomes less lenient in the area of superatom encodings and similar gray areas, in order to avoid ambiguity. The recognized SMILES dialect may be switched via the control variable ::cactvs(smiles_version). The default is Daylight release 4.9 with Cactvs and EliLilly extensions.

Mode sln forces the interpretation of the input string as Sybyl Line Notation . If the SLN I/O module has already been loaded, interpretation as SLN is automatically attempted in any case, but only after SMILES decoding has failed. Since there are strings which are both valid SMILES and SLN , but mean something different, this automatism can lead to misinterpretation, so if you know you are dealing with SLN , it is a good idea to specify it. The sln mode attempts to auto-load the SLN I/O module if it is not yet loaded. In case it cannot be loaded, this mode raises an error.

The 3D decoder mode prefers resolution of identifiers as 3D model instead of 2D connectivity. This has an effect only with a few select combination of identifiers and resolvers and should be considered experimental.

In nohadd decoder mode, the structure code is finally, if everything else fails, interpreted as a plain molecular formula. If the string is parsed successfully as a formula, a collection of atoms of the specified elements is created, without any bonds.

By default, or if the final optional parameter is an empty string, the new ensemble is not a member of any dataset. It may be directly made a dataset member if a dataset handle is specified.

Examples:

set eh [ens create]
set eh [ens create CCC]
set sshandle [ens create {[CH3][Cl,Br,I]} smarts]
set eh [ens create [decode -url C%23C] nohadd]

In case a structure is encoded as a string in a format which cannot be directly decoded by the ens create command (such as a plain string representation of an MDL molfile), the standard method is to load the appropriate file format decoder (if not built in, this is needed so that automatic format detection of the memory image record works), open the structure string as a memory-based structure file, and read from this file. This technique allows the input of multiple records from the in-memory file and thus is also useful in cases like a multi-record SMILES file encoded as a string.

Example:

filex load cdx
set fh [molfile open [decode -base 64 $cdxstring] s]
set eh [molfile read $fh]
molfile close $fh

ens dataset

ens dataset ehandle ?filterlist?

Return the dataset handle of the dataset the ensemble is part of. It the ensemble is not member of a dataset, or does not pass all of the optional filters, an empty string is returned.

Example:

ens dataset $ehandle

ens defined

ens defined ehandle property

This command checks whether a property is defined for the ensemble. This is explained in more detail in the section about property validity checking. Note that this is not a check for the presence of property data! The ens valid command is used for this purpose.

ens dget

ens dget ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The difference between ens get and ens dget is that the latter does not attempt computation of property data, but rather initializes the property values to the default and return that default if the data is not yet available. For data already present, ens get and ens dget are equivalent.

ens delete

ens delete all
ens delete ehandlelist ?ehandlelist?...

Delete ensembles and the minor objects which are part of the deleted ensembles. The special parameter all may be used to delete all ensembles currently registered in the application, including those which are part of reactions or other major objects. Alternatively, any number of lists of ensemble handles may be specified for specific deletions.

The command returns the number of deleted ensembles.

For historic reasons, the same command may also be invoked as ens destroy .

Example:

ens delete $ehandle
ens delete $ehandlelist1 $ehandlelist2

ens dup

ens dup ehandle ?datasethandle? ?position? ?filterset? ?ctonlyflag?

Duplicate an ensemble. The return value is the handle of the new ensemble.

The duplicate ensemble is placed into the same dataset as the source, if it is a member of a dataset. Specifying an explicitly empty dataset argument places the duplicate outside any dataset, regardless of the dataset membership of the source ensemble.

If the duplicate is moved to a dataset, it is appended to the dataset end by default. This happens also if the position parameter is explicitly specified as end or an empty string. Otherwise, the ensemble is inserted at the given position, starting with 0. If the requested position is larger than the current size of the dataset, the ensemble is appended.

The next optional parameter allows the selection of only a subset of atoms to be copied. All atoms which do not pass the filter set are discarded, as are all bonds which connect to discarded atoms. If no atoms pass the filters, the result is an empty ensemble. By default, no atom filtering takes place, and all atoms and bonds of the original ensemble are part of the duplicate.

The final optional parameter can be used to make the duplicate lightweight. If this boolean parameter is set, the duplicate is limited to the basic connectivity information with all atom and bond properties, but it has no copies of properties in other object classes, and no copies of rings, molecules, groups or other minor object classes.

The ens hdup command is a variant of this command. It automatically adds a hydrogen set to the duplicate.

Examples:

ens dup $ehandle
ens dup $ehandle [dataset create] end ringatom

The first sample line is a standard use. The second example moves the duplicate into a newly created dataset, and isolates the ring systems. All other atoms are stripped.

ens exists

ens exists ehandle ?filterlist?

Check whether an ensemble handle exists. The command returns 0 or 1. Optionally, the ensemble may be filtered by a standard filter list. If filters in the filter list operate on atom, bonds, or other minor objects, it is sufficient if a single minor object of the ensemble passes the filter.

Example:

ens exists $ehandle chlorine

Check whether the ensemble with the handle in variable $ehandle exists and, if it exists, whether it contains one or more chlorine atoms.

ens expand

ens expand ehandle ?allowambigous? ?noimplicith?

This command expands all superatoms in the ensemble. The mechanisms for the expansion of superatoms are described in detail for the atom expand command. This command is functionally equivalent, working on all atoms in the ensemble instead a single atom.

Example:

ens expand $ehandle

The command returns the total number of successfully expanded atoms.

ens expr

ens expr ehandle expression

Compute a standard SQL -style property expression for the ensemble. This is explained in detail in the chapter on property expressions.

ens fill

ens fill ehandle property value ?property value?...

Standard data manipulation command for setting data, ignoring possible mismatches between the lengths of the lists of objects associated with the property and the value list. It is explained in more detail in the section about setting property data.

Example:

ens fill $ehandle B_COLOR red

sets the color of the first bond in the ensemble to red.

ens filter

ens filter ehandle filterlist

Check whether the ensemble passes a filter list. The return value is 1 for success and 0 for failure.

Example:

ens filter [ens create CCCl] chlorine

checks whether the ensemble contains one or more chlorine atoms. If the filter operates on minor objects of the ensemble, it is sufficient to have a single ensemble minor object pass the filter condition.

ens forget

ens forget ehandle ?objclass?

Delete specific classes of minor objects and their data from the ensemble data structure. If no object class is specified, all minor object classes except atoms and bonds and the ensemble data are purged.

If the object class ens is specified, all property data attached to the ensemble object class (usually those properties starting with E_* ) are deleted, but not the ensemble itself.

ens fragment

ens fragment ehandle atomlist ?datasethandle? ?position?

Create a new ensemble from a set of atoms in another ensemble. All bonds existing between those atoms are also preserved. The atoms can be selected with any standard atom selection syntax, with one selector per list element. Duplicate atom specifications are ignored. Atom specifications which cannot be resolved generate an error.

By default, the new ensemble becomes a member of the same dataset (if any) as the source ensemble, but this can be changed with the options firth argument. If no explicit position is given, the ensemble is appended to the dataset. The new ensemble only inherits the selected atoms and bonds plus stable atom and bond properties, but not other minor objects or ensemble data.

The command returns the handle of the new ensemble object.

Example:

match ss $substructure $eh amap
set ehfrag [ens fragment $ehandle [unzip $amap 1]]

Above code sequence matches a substructure, and then extracts the matched structure part as a new ensemble.

ens get

ens get ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

Examples:

ens get $ehandle {M_WEIGHT A_ELEMENT}

yields a nested list with two elements. The first element is a list of the molecular weights of all molecules in the ensemble. The second element is a list of the element numbers of all atoms in the ensemble. If the information is not yet available, an attempt is made to compute it. If the computation fails, an error results.

ens get $ehandle B_ORDER ringbond

gives the bond orders of all bonds of the ensemble which are ring bonds.

The format of the optional parameter list argument is a series of keyword/value pairs, as produced by the Tcl command array get or the standard Tcl dictionary commands. If a this parameter list is present as argument, and the requested property data is already valid for the ensemble, a check if made if all the specified parameters are the same as the parameters the present property data was computed with. If this is the case, the values are directly returned as usual. Otherwise, the data is discarded and re-computed.

If computation of the property data is performed, either because the parameter set was not matched, or the requested data was not valid, the computation integrates the specified parameter set into the parameters of the computation function. Parameters from the list temporarily override the global settings of these parameters in the property definition. Parameters used by the property computation function but not listed in the local parameter list are neither used for data validity checking, nor their value changed during the computation request. After the computation finishes, the old global parameter settings of the property definition are restored.

The use of a parameter list argument is primarily useful only if a single property is requested with this command, but its use with a multiple-property request is not illegal - the parameter list is simply applied to all properties in sequence.

Example:

ens get $ehandle E_GIF {} [dict create width 200 height 200 bgcolor white]

Variants of the ens get command are ens new, ens dget, ens nget, ens show, ens sqldget, ens sqlget, ens sqlnew and ens sqlshow .

Further examples:

ens get $ehandle E_NAME
ens get $ehandle A_FLAGS(boxed)

In addition to property data, the ensemble object possesses a few attributes, which can be retrieved with the ens get command (but not its related sister subcommands like ens dget, ens sqlget, etc.). Some of them are also modifiable via ens set. These attributes are:

ens getparam

ens getparam ehandle property ?key? ?default?

Retrieve a named computation parameter from valid property data. If the key is not present in the parameter list, an empty string is returned. If the default argument is supplied, that value is returned in case the key is not found.

If the key parameter is omitted, a complete set of the parameters used for computation of the property value is returned in key/value format.

This command does not attempt to compute property data. If the specified property is not present, an error results.

Example:

ens getparam $ehandle E_GIF format

returns the actual format of the image, which could be gif , png , or various bitmap formats.

ens groups

ens groups ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of the groups the ensemble contains. This is explained in more detail in the section about object cross-references.

Example:

ens groups $ehandle

ens hadd

ens hadd ehandle ?filterset? ?flags? ?changeset?

Add a standard set of hydrogens to the ensemble. If the filterset parameter is specified, only those atoms which pass the filter set are processed.

Additional operation flags may be activated by setting the flags parameter to a list of flag names, or a numerical value representing the bit-ored values of the selected flags. By default, the flag set is empty, corresponding to the use of an empty string or none as parameter value. These flags are currently supported:

Adding hydrogens with this command, except wit a set protonate flag, is less destructive to the property data set of the ensemble than adding them with individual atom create/bond create commands, because many properties are designed to be indifferent to explicit hydrogen status changes, but are invalidated if the structure is changed in other ways.

If the effects of the hydrogen addition step to the validity of the property data set should not be handled according to this standard procedure, it is possible to explicitly generate additional property invalidation events by specifying an event list as the optional last parameter, for example a list of atom and bond to trigger both the atom change and bond change events.

The command returns the number of hydrogens which were added.

Example:

set ehandle [ens create {[C].[C]}]
ens hadd $ehandle

adds a total of eight hydrogens to the two carbon atoms, transforming them into methane.

ens hdup

ens hdup ehandle ?datasethandle? ?position? ?filterset? ?ctonlyflag?

This command is a convenience variant of the ens dup command. It has the same parameters, but also adds a full standard hydrogen set (equivalent to executing an ens hadd $eh command) to the duplicate.

The command arguments are documented in the paragraph on ens dup .

ens hfragment

ens hfragment ehandle atomlist ?datasethandle? ?position?

This command has the same arguments as ens fragment . The only difference is that after the duplication all open valences in the fragment are plugged with hydrogen, as if an ens hadd command had been executed immediately after the fragment creation command.

The command returns the handle of the new ensemble object.

ens hstrip

ens hstrip ehandle ?flags? ?changeset?

This command removes hydrogens from the ensemble. By default, all hydrogen atoms in the ensemble are removed.

The flags parameter can be used to make the operation more selective. It may be a list of the following flags:

If the flags parameter is an empty string, or none , it is ignored. The default flag value is wedgetransfer - but this default value is overridden if any flags are set!

If the changeset parameter is specified, the property change events listed in the parameter are triggered after the command.

Hydrogen stripping is not as disruptive to the ensemble data content as normal atom deletion, except when the deprotonate flag is set. The system assumes that this operation is done as part of some file output or visualization preparation. However, if any new data is computed after stripping, the computation functions see the stripped structure, and proceed to work on that reduced structure without knowledge that the structure may contain implicit hydrogens.

The command returns the number of stripped hydrogens.

Example:

ens hstrip $ehandle [list keeporiginal wedgetransfer]

ens image

ens image ehandle ?width? ?height? ?options?

This command generates a Tk image object displaying the ensemble as an icon. The command is only available in toolkit variants which are linked with the portable Tk GUI toolkit library and which are either statically linked with the GD image drawing library, or can load it dynamically.

The default image size is 64x64 pixels, but this may be overridden by the width and height parameters. If only width is set, it is also used for the height. The command returns a Tk image handle. These images may for example be placed on Tk canvases as canvas objects, or used on buttons and other GUI objects.

Because of the small size of the images, atoms are not displayed as symbols, but small color-coded squares. This is a command for the implementation of graphical structure-handling applications with icons. For serious structure visualization, use the E_GIF , E_EMF_IMAGE or E_EPS_IMAGE properties.

Additional options may be added by an arbitrary sequence of option/value pairs. Color names can be those registered in the X11 color database, or a numeric specification in the #rrggbb format. These options are currently supported:

Images are cached. If an image for the selected ensemble with the same display attributes exists, it is reused.

Example:

set img [ens image $ehandle 80 80 -border yellow -linecolor blue]
canvas create .canvaswin image 50 50 -image $img

ens index

ens index ehandle

Get the position of the ensemble in the object list of its dataset. If the ensemble is not member of a dataset, -1 is returned.

ens list

ens list ?filterlist?

This command returns a list of the ensemble handles currently registered in the application. This list may optionally be filtered by a standard filter list. If the filter operates on ensemble minor objects such as atoms or bonds and not directly on the ensemble object, it is sufficient if a single minor object passes the filter.

Example:

ens list halogen

lists the handles of all ensembles in the application which contain one or more halogen atoms.

ens lock

ens lock ehandle propertylist/objclass/all ?compute?

Lock property data of the ensemble, meaning that it is no longer managed by the standard data consistency manager. The data consistency manager deletes specific property data if anything is done to the ensemble which would invalidate the information. Blocking the consistency manager can be useful when building ensembles from components in a script. Property data remains locked until is it explicitly unlocked.

The property data to lock can be selected by providing a list of the following identifiers:

The lock can be released by an ens unlock command.

Example:

set eh [ens create CCC]
ens lock $eh A_SYMBOL 1
ens purge $eh A_ELEMENT
atom set $eh 1 A_query(dsearch) 3
ens unlock $eh A_SYMBOL

In this example, an ensemble is created, and the atom symbol information is locked. Next, the element number property is deleted, and a query attribute is set. Finally, the lock is released. Had the element symbol information not been locked, the ensemble would have become unusable due to an overzealous data consistency manager. Setting query information in property A_query can have an influence on the atom symbol. So the default action of invalidating A_SYMBOL when manipulating A_query is correct. However, in case there is no element information A_ELEMENT , and no atom symbol information A_SYMBOL , the element information is completely lost, and the ensemble becomes unusable. So in this case, locking A_SYMBOL (or alternatively A_ELEMENT ) is required to avoid unexpected side effects of structure editing.

ens loop

ens look ehandle objvariable ?maxmol? ?offset? body

Loop over all molecules in the ensemble, by providing a temporary ensemble duplicate of each found molecule. The handle of the duplication is stored in the object variable and visible to the loop code.

The loop code cannot delete the duplicate ensemble. It is automatically deleted at the end of each cycle. Changes made to the duplicate molecule are not seen in the base ensemble. It is however possible to explicitly assign data computed on the duplicate ensemble to the base ensemble.

The optional parameters allow more control over which molecules are processed. By default the maxmol parameter is -1, meaning an unlimited number of fragments are processed, and the offset is zero, meaning that processing begins with the first molecule in the molecule list of the base ensemble.

Within the loop code, the standard Tcl commands break and continue work as expected.

The command returns the number of molecule fragments processed.

Example:

set midx 0
ens loop $ehandle ehdup {
	mol set $ehandle [mol mol $ehandle #$midx] M_MYPROP [ens get $ehdup E_MYPROP]]
	incr midx
}

The example loop assigns a custom property where the compute function is only defined for a single-fragment ensemble to the equivalent molecule property in a multi-fragment base ensemble.

ens mask

ens mask ehandle labellist/all property onvalue ?resetvalue?

This command sets property values of a subset of minor objects of one class in the ensemble to a specific value, and optionally resets the values of the same property for all other minor objects of the ensemble which are not named.

The first argument after the ensemble handle is either a list of object identifiers, or the magic value all . Object identifiers are usually the standard numerical labels, but any construct which identifies an atom, a bond, etc. can be used. The next argument identifies the property. The object identifiers in the previous argument must correspond to the object class of the property, i.e. atom label pairs can only be used it the property is a bond property, but simple numerical labels work for all classes. If data for that property is not present on the ensemble, it is instantiated with the default value. The final one or two arguments must be decodable data values for that property.

If the all object subset identifier is used, all values of the property in the ensemble are set to the onvalue . Any reset value specification is ignored.

Otherwise, the explicit label list is processed. If a reset value is given, all values of the property in the ensemble are first reset to that value. If no reset value was specified is, no reset is performed and the current values remain valid. Then, all minor objects in the list are looked up, and their property value set to the onvalue .

Example:

ens mask $eh [ens atoms $eh carbon] A_COLOR green black

This command sets the A_COLOR property value for all carbon atoms in the ensemble to green, and all other atoms to black. This is shorter and more efficient then explicitly coding a loop of atom set statements.

ens match

ens match ehandle ss_ehandle ?matchflags? ?ignoreflags? ?atommapvar? ?bondmapvar? ?molmapvar?

Check whether the ensemble matches a substructure. The substructure may be any structure ensemble, and even be in the same ensemble as the primary command ensemble.

The precise operation of the substructure match routine can be tuned by providing a standard set of match flags and feature ignore flags. The default match flag set has set bits for the bondorder , atomtree and bondtree comparison features, and an empty ignore set. If a flag set is specified as an empty string, the default set is used. In order to reset a flag set, an explicit none value must be used. The bit options of the match flag are explained in the documentation of the match ss command.

The command returns 1 for a successful match, 0 otherwise. If an optional atom, bond, or molecule map variable is specified, it is set to a nested list of matching substructure/structure atom, bond or molecule labels. If no match can be found, the variable is set to an empty list. In case only a bond or molecule map variable is needed, an empty string can be used to skip the unused map variable argument positions.

This is a very simple variant of substructure matching. The match ss command provides many more advanced match determination and match processing options.

ens max

ens max ehandle propertylist ?filterset?

Get the maximum values of the properties named in the propertylist parameter. The return value of the command is a list of the maximum property values. The objects whose property values are used for the determination of the maximum values may optionally be filtered by a standard filter set. If no objects pass the filter, the result is an empty string.

Example:

ens max $ehandle A_ELEMENT

computes the maximum element number in the ensemble.

ens merge

ens merge ehandle ?ehandle_list?...

Merge a set of ensembles into one ensemble. All structure information is accumulated in the first (base) ensemble. Its handle remains unchanged. All other ensembles are destroyed. It is not possible to name an ensemble more than once in the argument lists, and ensembles cannot be merged with themselves.

The merged ensemble has a consistent property set for all minor objects. If the information content of the input ensembles varies, an attempt is made to compute the missing information for ensembles which do not have valid data for each individual property. If the computation fails, the property data is discarded for all merged objects. In addition, a merge property invalidation event is issued, which may lead to additional loss of property data. For surviving properties which have defined a merge update function, this function is then called and may perform additional data adjustments. For example, the A_XY 2D plot coordinate property merge function transforms the structure plot coordinates in the new ensemble to a uniform scale and arrange the coordinates for the atoms from the merged ensembles as a sequence of plots from left to right.

The return value of this command is the new first atom label for every merged ensemble, excluding the base ensemble. All minor object labels in the merged ensembles are re-assigned to avoid collisions. The new labels begin with the highest respective minor object label in use in the base ensemble plus one, and are thereafter assigned in sequence.

The ens add command performs the same operation as the ens merge command, but merges duplicates of the input ensembles, thus preserving them.

Example:

ens merge [ens create CC] [list [ens create CCC.CCCC] [ens create C]]

Merge three ensembles into one. The new ensemble contains the molecules ethane, propane, butane and methane in that order.

ens metadata

ens metadata ehandle property field ?value?

Obtain property metadata information, or set it. The handling of property metadata is explained in more detail in its own introductory section. The related commands ens setparam and ens getparam can be used for convenient manipulation of specific keys in the computation parameter field. Metadata can only be read from or set on valid property data.

Examples:

array set gifparams [ens metadata $ehandle E_GIF parameters]
ens metadata $ehandle E_NAME comment “This is a CAS name in 1995 revision. The IUPAC name, or any previous or later CAS revision name, look completely different.”

The first line retrieves the computation parameters of the property E_GIF as keyword/value pairs. These are read into the array variable gifparams , and may subsequently be accessed as $gifparams(format) , $gifparams(height) , etc. The second example shows how to attach a comment to a property value.

ens min

ens min ehandle propertylist ?filterset?

Get the minimum values of the properties named in the propertylist parameter. The return value of the command is a list of the minimum property values. The objects whose property values are used for the determination of the minimum values may optionally be filtered by a standard filter set. If no objects pass the filter, the result is an empty string.

Example:

ens min $ehandle A_FORMAL_CHARGE xatom

gets the lowest value of the formal charge of a hetero atom in the ensemble.

ens mols

ens mols ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the label(s) of the molecule the ensemble contains as minor objects. This is explained in more detail in the section about object cross-references.

Examples:

ens mols $ehandle
ens mols $ehandle heterocycle

The first example simply returns a list of the labels of the molecules the ensemble contains as minor objects. Note that it is possible that there is more than one molecule in the ensemble - this is the reason why the command name is mols , not mol . The second example returns the molecule label(s) of all the molecules in the ensemble which contain one or more heterocycles. If there are no such molecules, an empty list is returned.

ens move

ens move ehandle ?datasethandle|remotehandle? ?position?

Make the ensemble a member of a dataset, or remove it from a dataset. If the dataset handle parameter is omitted, or is an empty string, the ensemble is removed from its current dataset. If it was not a dataset member, this command does nothing. The dataset handle may be the name of a remote dataset for moving ensembles over a network connection.

If a dataset handle is specified, the ensemble is added to the dataset, and removed from any dataset it was member of before the execution of the command. By default the ensemble is added to the end of the dataset object list, but the final optional parameter allows the specification of an object list index. The first position is index zero. If the parameter value end is used, or the index is bigger than the current number of dataset objects minus one, the ensemble is appended as by the default. It is legal to use this command for moving ensembles within the same dataset.

Another special position value is random . This value moves to the ensemble to a random position in the dataset. Using this mode with remote datasets is currently not supported.

The dataset handle cannot be a transient dataset.

The return value of the command is the dataset membership of the ensemble prior to the move. It is either a dataset handle, or an empty string if it was not member of a dataset.

Examples:

ens move $ehandle $dhandle 0
ens move $ehandle

In the first example, the ensemble is inserted as the first element in a dataset. The second line reverts this operation and removes the ensemble from the dataset.

This command interacts with the insert control mechanism of size-constrained datasets. More information is provided in the description of the sizecontrol dataset parameter.

This command can be used with a remote dataset descriptor. In that case, the ensemble is packed into a serialized object representation, transmitted over the network and restored as member of the remote dataset at the specified position. The local ensemble is deleted if the transfer succeeds.

Example:

ens move $ehandle blockbuster@server2:9998 end

This command moves the ensemble to the dataset which was set up as listener on port 9998 and pass phrase blockbuster on host server2 . The local ensemble is deleted, and its copy is inserted at the end of the remote dataset.

ens mutex

ens mutex ehandle mode

Manipulate the object mutex. During the execution of a script command, the mutex of the major object(s) associated with the command are automatically locked and unlocked, so that the operation of the command is thread-safe. This applies to builds that support multi-threading, either by allowing multiple parallel script interpreters in separate threads or by supporting helper threads for the acceleration of command execution or background information processing. This command locks major objects for a period of time that exceeds a single command. A lock on the object can only be released from the same interpreter thread that set the lock. Any other threaded interpreters, or auxiliary threads, block until a mutex release command has been executed when accessing a locked command object. This command supports the following modes:

There is no trylock command variant because the command already needs to be able to acquire a transient object mutex lock for its execution.

ens need

ens need ehandle propertylist ?mode?

Standard command for the computation of property data, without immediate retrieval of results. This command is explained in more detail in the section about retrieving property data.

The return value is the ensemble handle.

Examples:

ens need $ehandle A_XY recalc
ens need $ehandle E_EINECS_ID threaded

ens new

ens new ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The difference between ens get and ens new is that the latter forces the re-computation of the property data, regardless whether it is present and valid, or not.

ens nget

ens nget ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The difference between ens get and ens nget is that the latter returns numeric data, even if symbolic names for the values are available.

ens nitrostyle

ens nitrostyle ehandle style

Change the internal encoding of nitro groups and similar functional groups in the ensemble. Possible values for the style parameter are:

The command returns the ensemble handle.

ens op2d

ens op2d ehandle mode ?atomfilter_bit/degrees?

Perform various operations on the standard 2D layout coordinates of the structure (property A_XY ). Properties tightly connected to A_XY are also updated (most notably, B_FLAGS to keep wedges in sync with stereochemistry defined in other properties).

In mode rotate , the optional argument is the rotation angle in degrees. If it is not specified, the default are 30 degrees.

For alignment and flipping operations, the atoms which are used to determine the orientation can be filtered by specifying one or more value bits of property A_FLAGS . Only atoms where one or more of these bits are set in A_FLAGS are used for computing the alignment (in modes xalign , yalign , xyalign - all atoms are moved) or are flipped (modes hflip , vflip - unselected atoms are not moved). If no but filter values are specified, or none is used, all ensemble atoms and bonds are processed.

The following modes are supported:

Additionally, the mode argument may an ensemble handle. In that case, it is interpreted as a substructure, matched onto the ensemble, and if a match is found, the 2D coordinates of the ensemble atoms are adjusted by scaling and rotation for maximum overlap between the 2D coordinates of the substructure and the matched part of the ensemble. This mode retains the relative positions of the matched atoms - this is not a full redraw operation around a match template.

ens pack

ens pack ehandle ?maxsize? ?request_propertylist? ?suppress_propertylist?

Pack the ensemble object into a base64-encoded compressed serialized object string. This string does not contain any non-printable characters and is a full dump of the internal state of the object, omitting only property data that was declared to be so easily re-computed that a dump is not worthwhile. Outside object relationship information, such as the dataset the reaction might be a member of, or associated tables are not included.

The maximum size of the object string (default -1, meaning unlimited) can be configured by the optional maxsize parameter. The size is specified in bytes. If the pack string would be longer than the maximum size, an error results.

The other two optional parameters allow to request a specific property set to be part of the package, even if it normally would not be included, and to explicitly omit properties from the dump. No property computation is performed, and suppressed properties are not purged from the ensemble.

Ensembles can be restored from a packed object string by the ens unpack and ens create commands.

The ensemble object and its minor objects are unchanged after using this command.

Example:

set dbstring [ens pack [ens create CC=O]]

ens pis

ens pis ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of the π systems the ensemble contains. This is explained in more detail in the section about object cross-references.

Examples:

ens pis $ehandle

π systems are a rather exotic feature and not commonly used. These are essentially descriptions of bonding interactions which use p or d orbitals, such as in standard covalent multiple bonds. A simple double bond is described with one σ system and one π system in this representation.

ens prepare

ens prepare ehandle molfilehandle

Prepare the ensemble for output via the specified file handle, for example by pre-computing properties that are needed for output. This has only an effect if the I/O module for the format of the file handle provides an output object preparation function, which is currently only the case for the BDB database format. The output of prepared and unprepared ensembles sent to the same file handle is indistinguishable.

The purpose of this command is to allow the preparation of the ensembles for output in a separate thread. For unprepared ensembles, a significant part of the time to write the record may be spent in computing required data. During this time, the file handle is blocked. Prepared ensembles already contain all required data, and are thus faster to write to file. The total time required in single-thread scripts for a simple molfile write command vs. a ens prepare plus molfile write combo is not much different. However, these operations are largely independent, and on multi-threaded scripts the total time savings can be significant if the two commands are executed in different threads.

ens properties

ens properties ehandle ?pattern? ?noempty?

Get a list of valid properties of the ensemble and its minor objects. Property subsets may be selected by a non-empty filter pattern, which the property names must match in order to be listed. If the ensemble is a member of a reaction, reaction properties are included in the list. The same mechanism is used for dataset properties.

If the noempty flag is set, only properties where at least one data element controlled by the ensemble (i.e. a value for an atom of the ensemble, etc.) is not the property default value are output. By default, the filter pattern is an empty string, and the noempty flag is not set.

This command may also be invoked as ens props .

Example:

ens properties $ehandle X_*
ens props $ehandle

The first example returns a list of the currently valid reaction properties of the reaction the ensemble is a member of, or an empty list if it is not. The second example lists all properties, including those of the ensemble proper, its minor objects such as atoms and bonds, and possibly of the reaction the ensemble is a member of, if it is an reaction ensemble.

ens purge

ens purge ehandle propertylist/objectclass/specialname ?emptyonly?

Delete property data from the ensemble. The properties may either be properties of a reaction the ensemble is a member of (prefix X_ ), properties of a dataset the ensemble is a member of (prefix D_ ), or properties of the ensemble proper and its minor objects, such as ensemble or atom properties. If a property marked for deletion is not present, it is silently ignored.

If an object class name, such as ens or atom , is used instead of a property name, all properties of that class set on the ensemble are deleted, if they are not locked, or filtered out by the optional empty-only flag.

Setting the optional boolean flag emptyonly allows restricts the deletion to those properties where all the values for a property associated with a major object (such as on all atoms in an ensemble for atom properties, or just the single ensemble property value for ensemble properties) are set to the default property value.

Besides normal property names, a few convenient special names for common property deletion tasks are defined and can be used as a replacement for the property list. These include:

Examples:

ens purge $ehandle X_IDENT
ens purge $ehandle E_IDENT 1
ens purge $ehandle stereochemistry

The first example deletes the property data X_IDENT from the reaction the ensemble is a member of - provided it actually is a reaction ensemble. The second example deletes property E_IDENT from the ensemble if the property value is equal to the default value for E_IDENT . The last example removes all stereochemistry information from the ensemble.

ens reaction

ens reaction ehandle ?filterlist?

Return the handle of the reaction the ensemble is a member of. Optionally, the reaction may be filtered by a simple filter list. If the ensemble is not part of a reaction, or does not pass the filter, an empty string is returned.

Because an ensemble can only participate in a single reaction, the command is spelled ens reaction in singular.

Example:

ens reaction $ehandle

ens rebuild

ens rebuild ehandle ?minor_obj_class?

This command discards all minor objects and attached property data of a specific class associated with the ensemble. Afterwards, the minor object set is re-populated by the standard set-up function of the object class, if such a set-up function is defined.

If no minor object class is specified, bonds are regenerated - for example from 3D atomic coordinates. Bonds , molecules ( mols ), sigma and pi systems ( sigmas , pis ), rings and ring systems ( rings , ringsystems ) can all be rebuilt. However, by default no reconstruction function is defined for groups and surface patches ( surfaces ), although it is possible to set one via the object class manipulation command.

Generally, object sets should only be regenerated under exceptional circumstances, for example in order to undo a manual manipulation. Object sets are automatically generated when they are required - for example, bonds are automatically derived from atomic 3D coordinates if any property data associated with bonds is used in any context, and the ensemble so far did not contain bond information. An explicit request to generate connectivity is rarely needed.

Under normal circumstances, the use of minor object information such as bonds encoded explicitly in an input file is preferable to indirectly derived sets, such as regenerated connectivity. The connectivity algorithm of the toolkit is rather capable, but has its limitations, especially when hydrogen-depleted charged structures are encountered.

Files encoded in a few notorious structure file formats, such as PDB , may contain an incomplete bond set - without any indication that the bond set is incomplete. The PDB input routine tries to detect this, and automatically augments the bond set if obvious deficiencies are found. However, in case of minor omissions in the input data, a PDB structure may be one of the rare cases when an explicit request for a rebuild of the bond set can be helpful.

Besides the set of ensemble minor objects, the pseudo object class aro is also recognized. This keyword triggers a re-evaluation of aromatic systems and re-assign Kekulé bond orders, but not completely redo the bond set.

Example:

ens rebuild $ehandle bonds

This command discards the old bond set, and generate a new one. This only works if there is information which can be used for regeneration, such as atomic 3D coordinates. If no such information is present, the loss of bonds is irreversible and the ensemble useless for almost all applications short of a simulated plasma torch atomization.

ens rename

ens rename ehandle srcproperty dstproperty

This is a variant of the ens assign command. Please refer the command description in that paragraph.

ens replace

ens replace ehandle property ?preserved_properties/all?

Substitute the ensemble with data from an ensemble property of that ensemble. The original handle is preserved. The original structure data, with the exception of explicitly saved properties, is discarded.

The exact type of operation depends on the data type of the property. The following data types are currently supported:

Any other property data type, NULL values of the property, non-ensemble properties, or malformed data result in an error and the original structure remains unchanged.

The structure source property is not a property of the replaced ensemble. In that ensemble, by default all other ensemble properties of the original are also purged, and all ensemble properties of the replacement structure are retained. However, by specifying a list of properties to be transferred, or using the special argument all , all or a subset of the ensemble property data of the original ensemble can be transferred to the replacement structure and thus saved. Under these circumstances, property data from the original ensemble has precedence and overwrites existing values of the same property on the replacement ensemble. However, all ensemble properties on the replacement ensemble which are not overwritten remain present in the result ensemble. It is not possible to transfer atom, bond, or any other ensemble minor object property data to the replacement structure directly with this command.

The command returns the original, unchanged ensemble handle.

Example:

ens replace $eh [ens get $eh E_CANONIC_TAUTOMER] [list E_IDENT E_NAME]

This command replaces the current structure with the canonic tautomer. The values of properties E_IDENT and E_NAME are transferred from the original form to the new form.

ens replicate

ens replicate ehandle ?count?

This command duplicates all molecules in the ensemble and appends them to the atom, bond and other minor object lists of the ensemble.

The default replication count is one, but any other number of duplications may be chosen by an appropriate count parameter. If the count is less than one, the command is silently ignored.

The command returns the original ensemble handle. As part of the integration step, merge property invalidation events are generated.

The ens dup command generates a new ensemble, while this command expands the current ensemble.

Example:

echo [ens get [ens replicate [ens create C.CC]] E_SMILES]

This prints C.CC.C.CC as result SMILES string, because both molecules in the original ensemble were duplicated and appended to the existing ensemble data.

ens rings

ens rings ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of the rings the ensemble contains. This is explained in more detail in the section about object cross-references.

Examples:

ens rings $ehandle
ens rings $ehandle [list heterocycle aroring]

The first example returns the labels of all rings the ensemble contains. If the ensemble does not contain any rings, an empty list is returned. Only labels of rings in the SSSR or ESSSR set are returned, even if the currently configured ring set is larger. The second example filters the rings - only heteroaromatic rings are reported.

ens ringsystems

ens ringsystems ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of the ring systems the ensemble contains. This is explained in more detail in the section about object cross-references.

Examples:

ens ringsystems $ehandle
ens ringsystems $ehandle [list heterocycle aroring]

The first example returns the labels of all ring systems the ensemble contains. If the ensemble does not contain any ring systems, an empty list is returned. The second example filters the ring systems - a ring system label is included in the output list only if that ring system contains one or more hetero aromats.

ens rotate

ens rotate ehandle angle axis ?center?

Rotate the ensemble in 3D space by manipulating property A_XYZ .

The angle argument is a floating-point number in degrees. The axis argument is a 3D vector in standard notation, i.e. usually a list of three floating point numbers for the x, y and z components. If the last optional argument is omitted, the center of rotation is the 3D unweighted coordinate average of all ensemble atoms with valid 3D coordinates, which is computed as property E_CENTER . If the center argument is specified, it is expected to be a 3D point which is used as center of rotation instead.

This operation triggers a 3dglop property invalidation event.

Example:

ens rotate $eh 60 {0 0 1}

Rotate the ensemble 60 degrees counterclockwise around the z axis.

ens scan

ens scan ehandle expression ?mode? ?parameters?

Perform a query on the ensemble object. The syntax of the query expression and the optional selection list is the same as that of the dataset scan command with a transient dataset consisting of the current ensemble only. For more details, please refer to the paragraphs on dataset scan and molfile scan .

The return value depends on the mode. The default query mode, this is different from the default in dataset scan , is exists .

ens set

ens set ehandle property value ?property value?..

Standard data manipulation command for setting property data. It is explained in more detail in the section about setting property data.

Example:

ens set $ehandle E_NAME “Pharmacon X-25”

ens setparam

ens setparam ehandle property key value ?key value?...

Set or update a property computation parameter in the metadata parameter list of a valid property. This command is described in the section about retrieving property data. The current settings of the computation parameters in the property definition are not changed.

Example:

ens setparam $ehandle E_GIF comment “Top Secret Lead Structure”

ens setup

ens setup ehandle ?minorobjclass?

Query the status of the minor object lists in the ensemble, or initialize one of these to an empty list.

If no class is specified, a dictionary with all currently registered minor object classes of the ensemble is returned. The object class names are the key, the value is a boolean flag for the status.

If an object class argument is supplied, the object class is instantiated on the ensemble, if necessary by auto-loading an object class handler module. Unknown object class names result in an error. If the minor object class is already instantiated, it is not changed. Otherwise, an empty minor object set is added. This is even the case if the minor object class handler provides a default object setup function (see ens rebuild command). Instantiating an object class with this command always creates an empty collection of the minor objects associated with the ensemble.

Minor object lists are usually implicitly instantiated, as in

ens get $eh M_LABEL

which automatically sets up the molecule/fragment object set if it is not yet present, and populates it with objects identifying disconnected fragments in the ensemble, or

group create $eh [list $a1 $a2 $a3]

which adds a group to the ensemble, again automatically initializing the group object set if it was not initialized.

The ens setup command is intended for special circumstances and not commonly used.

ens show

ens show ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The difference between ens get and ens show is that the latter does not attempt computation of property data, but raises an error if the data is not present and valid. For data already present, ens get and ens show are equivalent.

ens sigmas

ens sigmas ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of the σ systems the ensemble contains. This is explained in more detail in the section about object cross-references.

Examples:

ens sigmas $ehandle

σ systems are a rather exotic feature and not commonly used. These are essentially descriptions of bonding interactions which use s orbitals, such as normal, covalent single bonds, or the central bond in multiple bonds. A simple double bond is described with one σ system and one π system in this representation.

ens sort

ens sort ehandle ?sort_property? ?relabel? ?duplicate? ?datasethandle? ?position?

Sort the atoms in an ensemble according to a property value. The default property is A_LABEL , the standard atom label. The first optional argument can be used to sort on a different property, or a property field. However, the property must be either an atom property, or a molecule property. If the relabel flag is set, the ensemble atoms and molecules are renumbered after the sort in ascending order, starting with one. By default, atoms and molecules retain their original labels even if they change positions. If the duplicate flag is set, the sort operation works on a duplicate of the original ensemble. If the flag is unset, or the argument omitted, the operation modifies the original ensemble object.

The final two optional arguments allow the direct transfer of the modified ensemble or duplicate into a dataset, similar to an ens move command. The ensemble may be inserted into a specific position of a target dataset. If the special value end is used, or the zero-based position index is beyond the current end of the target dataset, the ensemble is simply appended. By default the ensemble is not moved, and if it is moved without an explicit position, it is appended.

The sequence of the atoms in the ensemble is rearranged so that the atoms are in ascending order of the values of the sort property or property field. Indirectly, molecules are also rearranged to correspond to the sequence of the first atoms in every molecule. This operation triggers a shuffle property invalidation event. If the renumbering option is selected, the atom and molecule sets are re-labeled with their standard label properties (i.e. A_LABEL for atoms, M_LABEL for molecules) in ascending order, starting with one. Other minor object collections remain in their original sequence and retain their current labels. Certain important properties which, if present, are dependent on atom label values, notably A_LABEL_STEREO , B_LABEL_STEREO and B_FLAGS , are specifically adjusted to the new labeling scheme instead of being invalidated.

The command returns an ensemble handle. If the operation was operating on a duplicate, it is the handle of the new ensemble, otherwise that of the original ensemble.

ens split

ens split ehandle ?dropsize? ?splitproperty?

Split the molecules of the ensemble into individual ensembles. The return value is a list of the handles of the new ensembles. The input ensemble is modified, and its old handle may be returned as one of the new single-molecule ensemble handles. If the input ensemble contains only a single molecule, and that molecule passes the optional size filter, the command is a no-op. If the input ensemble is a member of a reaction, the result ensembles become part of that reaction in the same role.

The optional dropsize parameter is a minimum value for the number of atoms in the molecules. If this is not an empty string, molecules which have less atoms than the minimum are deleted. If all molecules in the input ensemble are smaller than the required size, an empty list is returned and the input ensemble is destroyed.

The optional splitproperty argument can be used to spit the ensemble on values of a molecule property, which needs to be either already set or computable, instead of simply separating fragments on connectivity. All molecules in the input ensemble which have a common value of this property are put into a joint result ensemble, and each distinct split property value starts a new result ensemble. Molecules with a common property value do not need to be present in the input ensemble in a consecutive sequence, nor are there any special requirements for the data type or value range of the split property, as long as the data type has a comparison function. If the values of the split property are distinct over all molecules in the input ensemble, the outcome of command is indistinguishable from running it without any split property.

Example:

lassign [ens split [ens create “CC.CC”]] eh1 eh2

This example creates an ensemble with two ethane molecules, splits it, and assigns the two new ensemble handles to variables eh1 and eh2 .

set elist [ens split $eh {} M_REACTION_LABEL]

Split ensemble along the original reagent or product data blocks found in an RXN or RDF file.

ens sqldget

ens sqldget ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The differences between ens get and ens sqldget are that the latter does not attempt computation of property data, but initializes the property value to the default and returns that default, if the data is not present and valid; and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

ens sqlget

ens sqlget ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The difference between ens get and ens sqlget is that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

ens sqlnew

ens sqlnew ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The differences between ens get and ens sqlnew are that the latter forces re-computation of the property data, and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

ens sqlshow

ens sqlshow ehandle propertylist ?filterset? ?parameterlist?

Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.

For examples, see the ens get command. The differences between ens get and ens sqlshow are that the latter does not attempt computation of property data, but raises an error if the data is not present and valid, and that the SQL command variant formats the data as SQL values rather than for Tcl script processing.

ens subcommands

ens subcommands

Lists all subcommands of the ens command. Note that this command does not require an ensemble handle.

ens surfaces

ens surfaces ehandle ?filterset? ?filtermode?

Standard cross-referencing command to obtain the labels of surface patches the ensemble contains. This is explained in more detail in the section about object cross-references.

Example:

ens surfaces $ehandle carbon

This example lists all surface patches which are associated with carbon atoms. Surface patches associated with other atoms, or with no atoms, are not listed.

ens swapin

ens swapin ehandle

Swap an ensemble from the disk store fully back into memory, and disable further automatic loading and shelving. If the ensemble was not swapped out, the command does nothing.

The command returns the ensemble handle.

ens swapout

ens swapout ehandle

Remove most of the ensemble data from memory and store it in a temporary disk store. The ensemble handle remains valid. As soon as it is used in a command again after this command has been executed, the swapped ensemble data is automatically reloaded from file, and then stored again when the object lock is released. To disable the automatic swapping of an ensemble, use the ens swapin command.

This command is intended to be used in cases where a large number of ensembles must be kept in memory. Its routine use is not encouraged - it is only useful in case the programmer knows about access patterns. In other cases, the standard virtual memory mechanism of the operating system might yield better performance results.

The ensembles are stored as binary blobs in a key/value store in a process-specific swap directory cactvs%d, ( %d is replaced by the process ID) which is created automatically in the standard temporary directory. When an ensemble is deleted, its swap record is also removed, if one was created during the lifetime of the ensemble. When a Cactvs application program exits, the swap store as well as the swap directory are automatically deleted, even without explicit deletion of the last set of ensembles in memory. In case of program crashes, the swap directory and its contents may however survive. If ensemble swapping is used with unstable applications, the temporary directory should be checked from time to time.

The command returns the ensemble handle.

Example:

ens swapout $ehandle

ens tables

ens tables ehandle ?filterlist?

Return a list of the handles of all table objects the ensemble is associated with. Optionally, the table set may be filtered by a simple filter list. If the ensemble is not related to any table, or none of these tables passes the filter list, an empty string is returned.

This command is only available if the toolkit was compiled with table support.

Example:
ens tables $ehandle

ens taint

ens taint ehandle propertylist/changeset ?purge?

Issue a property data tainting event which acts on the ensemble data.

If the ensemble is a member of a dataset, the dataset and its objects are not tainted.

The event list may contain any number of the following items:

ens transfer

ens transfer ehandle target_ehandle propertylist

Move property data from one ensemble to another, without going through an intermediate scripting language object representation. If the property is not already valid on the source ensemble, an attempt is made to compute it.

If the property is not an ensemble property, the number of property-associated minor objects is usually expected to be the same in both ensembles, and expected to have the same label set, tough it is not required that they are in the same sequence. Property data is assigned to the target ensemble minor objects with the minor object label as reference key. In case of a label set or object count mismatch between the two ensembles, no error is raised. Excess source data items are discarded, and excess target minor objects, or those with unmatched labels, retain their original value if the property was present on the target, or are set to the default value if the property was freshly instantiated.

Properties which are not ensemble or ensemble minor object properties cannot be transferred. The two ensembles cannot be the same.

The return value of the command is the target ensemble handle.

Example:

ens transfer $eh $eh2 E_EMF_IMAGE

This copies property E_EMF_IMAGE from the first ensemble to the second.

ens transform

ens transform ehandle SMIRKSlist ?direction? ?reactionmode? ?selectionmode? ?flags? ?overlapmode? ?{?exclusionmode? excludesslist}? ?maxstructures? ?timeout? ?maxtransforms? ?niterations?

This command applies one or more SMIRKS transforms to an ensemble and returns a list of ensemble handles of transformation products. The transformation products are filtered for duplicates. The original start structure is never returned - if a transform set does not match, an empty list is returned.

The required parameter after the ensemble handle is a list of SMIRKS lines, where each SMIRKS line is itself a list. A SMIRKS line is in the simplest case a simple SMIRKS transform without any extra data, but it may be padded by additional parameters which apply only to the application of that transform. If these optional parameters local to the current transform are not specified, their global counterpart on the command line is used instead. The syntax of an individual SMIRKS line is

SMIRKStransform ?step? ?direction? ?flags? ?overlapmode?

The SMIRKS transform part is the only required list element. It may be provided either as a string in standard Daylight notation, or as a handle of a reaction, which should have been decoded in SMIRKS mode (see reaction create command). Care should be taken to pass SMIRKS strings as a proper list elements, because it may contain whitespace and naming information after the actual transform code. Example:

ens transform $ehandle [list [list {[C:1][C:2]>>[C:1]=[C:2] Dehydrogenation} 1]]

The string Dehydrogenation is part of the transform specification string and not the transform step. The name string is attached to the (intermediate, in this case) transform reaction object as property X_NAME and can be used to track the reaction history of transform result structures.

The optional step element in a transform line (a positive integer or 0) identifies the reaction step of the transform. Transform sets of different step numbers are isolated from each other and do not interact. Transforms are executed in ascending step number. Transforms with different step numbers need not to be sorted, and the step numbers neither need to begin with one, nor form an uninterrupted sequence. A step number of 0 disables the transform. The default step number is one. All transforms of the same step number are essentially executed in parallel and may interact with each other.

The third and again optional element of transform lines is the direction identifier. It may be either forward, backward, or bidirectional. In forward mode, only the left part of a transform is used for matching, and the matched structure part is modified according to the description on the right side. backward works the other way around, and in bidirectional mode, both sides of the transform scheme are independently matched, and, if the match is successful, transformed to the other side. If this parameter is not specified, or specified as an empty string, the global direction parameter from the command line is substituted.

The fourth and once more optional element of a transform line is a list of flag words. Every word sets an additional flag. Currently, the following flag words are recognized:

The fifth, final, and again optional element of a SMIRKS line is the overlap mode. Again, if this parameter is omitted or supplied as an empty string, the global default from the command line is used. The overlap mode determines whether a transform substructure which consists of multiple disconnected fragments may match onto common target structure atoms or bonds. The following values are supported:

Every SMIRKS line follows the outlined scheme, and all settings within that line are applicable only to the current transform scheme.

There is no general limit for the maximum number of transforms in this command. However, if transforms are combined with exclusion substructures, and these exclusion substructures are to be applied on a per-transform basis, (see below), the highest transform index for which an applicability flag can be set is 63. Every transform which is applied in bidirectional fashion, either by global configuration or transform-specific flags, is counted twice toward this limit.

All parameters after the SMIRKS lines list act globally. The third and optional direction parameter, command word number five, sets the default for the directionality of all transforms for which no local override was set in their respective SMIRKS lines. If this parameter is not specified, the default is forward .

The optional reaction mode, parameter four and command word six, does not have a counterpart in the SMIRKS lines. This parameter determines how the possibility of multiple matches of a transform substructure in the target molecules is handled. It can be one of these values:

The default value for the reaction mode is first .

The next optional command parameter, the selection mode , (command argument five and command word seven) again has no counterpart in the SMIRKS line parameters. It determines the interaction of transforms of the same step number. All these transforms form a group. This parameter determines which of the transforms from the current group are executed, and in which order. The parameter can be set to one of the following values:

The default selection mode is first .

The next and again optional flags parameter (command argument six, command word eight) defines the default for those transforms which do not possess an override flag set in their SMIRKS line. Note that if a flag set specified on a SMIKRS line it completely replaces the default flag set. It does not simply add or bit-or more flags compared to the global setting. The default flag set is empty.

Similarly, the overlap mode parameter (command argument seven, command word nine) sets the default for handling potential overlap when matching disconnected transform fragments onto the structure to be transformed. The default setting is none , disallowing any fragment overlap. If the transforms only consists of a single fragment in the applicable direction(s), there is no effect of this parameter.

The excludesslist parameter (command parameter eight, command word ten) again has a potentially complex internal structure. It defines exclusion fragments. An exclusion fragment blocks all sections of the target structure from matching any transform substructure, either by preventing the match of transform atoms (the default) or transform bonds. This is a useful feature for example to easily prevent amide groups from matching amino group transforms. The default exclusion substructure list is empty. The parameter is a list. Every list element can be a simple structure identifier, or a list of a structure identifier and a transform index list.

Structure identifiers recognized by this command are:

If the exclusion substructure identifier is not associated with a transform index list, the substructure applies to all transforms. The optional transform index list consists of an arbitrary number of transform indices in the range 0...63. If a transform index list is supplied, the exclusion substructure applies only to the listed transforms. Note that it is not possible to set individual exclusion indices for transforms beyond the 64th, even though it is allowable to use any number of transforms in the transform list. All ensembles, including intermediate result ensembles, are checked against all applicable exclusion structures immediately before the application of a transform is attempted.

The exclusion substructure specification list may be prepended by a magical list element with value ( marked ) atoms , ( marked ) bonds, unmarkedatoms or unmarkedbonds . These control the mechanism how matched substructures are marked in the transform source structure. The default mode is atoms , where excluded atoms are prevented from matching transform pattern atoms. The bonds mode switches this to preventing a bond match. The difference is that in bonds mode, transform pattern atoms can still overlap, by a single atom, excluded regions, but not change bonds therein, while in atoms mode absolutely no atom or bond overlap between excluded regions and transform patterns is allowed. The unmarked variants operated with a reversed exclusion set - i.e. atoms or bonds which are not matched are excluded from the structure region eligible for transform application.

In case the exclusion mode is ( marked ) atoms or unmarkedatoms , an atom identifier, i.e. any notation which is supported to identify an atom in the atom command, may also be used in addition to the three substructure specification styles listed above to directly exclude a single atom from matching by all transforms. In ( marked ) bonds or unmarkedbonds marker mode, bond identifications in the same style as supported by the bond command, such as bond labels or bond atom label pairs, are similarly allowed as additional direct bond exclusion specifications, and these again apply to all transforms.

Exclusion markings, once set for the input structure, are inherited by newly generated result structures, so that the protection remains active even for structures undergoing sequences of transformations.

The related dataset transform command does not support direct atom or bond exclusion marking, even if the dataset only contains a single structure.

An example for an exclusion list:

ens transform $eh $tlist ... [list „atoms“ {C(=O)[NH2]} {{C[NH]C} {0 1}} 1]

This exclusion set protects amide groups (the first substructure) from all transforms, secondary amines including their immediate carbon neighbor atoms from the first two transforms in the set (index 0 and 1, the transform set is specified in the tlist variable), and the single atom with label 1 in the input ensemble. The exclusion marker mode is explicitly spelled out as atoms in first exclusion list element, which however is already the default.

Another example:

ens transform $eh $tlist ... [list „unmarkedatoms“ {*}$statoms]

This transform only operates on the atoms of which the labels or other identifiers are included in the list in variable statoms . All other parts of the structure are excluded and cannot participate in the transform.

The next optional global command parameter (parameter nine, command word eleven) is the maximum number of result ensembles to generate. The input ensemble is not counted. As soon as the maximum is reached, the command finishes and returns the result ensembles which were generated so far. If the maximum number of results is set to a negative number (the default), no limit applies. If it is set to zero, the transform command is effectively disabled. The global control variable ::cactvs(setsize_exceeded) is set to 1 if the specified maximum number of result ensembles was going to be exceeded. At the beginning of the execution of the ens transform command, this control variable is reset to zero. The limit applies to the total of generated unique structures, which is not necessarily the same as the number of output structures in case the processing mode dictates that they are processed further and not included as intermediates in the result set. In the special case of exhaustive transform application, the parameter limits the size of the intermediate result set after each pass, not the overall total of unique structures.

The timeout parameter (command parameter ten, command word twelve) can be used to set a time limit in seconds for the command execution. If this parameter is set to 0 or a negative number, no timeout applies. This is the default. Otherwise, the generation of result ensembles is stopped after the specified time, and the command returns with the results generated so far. The global control variable ::cactvs(interrupted) is set to 1 if a timeout occurs. It is reset to 0 at the beginning of the execution of the command.

The second last parameter (command parameter eleven, command word thirteen) can be used to limit the number of transforms applied to the starting structure and intermediate structures. If this parameter is not specified, or specified as an empty string or a negative value, no limit is imposed. If this parameter or the timeout option is used, the result set may become dependent on the atom and bond order of the input structure because the traversed part of the possible transform match space is different and might yield different and/or a different number of results when the timeout or application count restriction is triggered.

The final optional parameter (command parameter twelve, command word fourteen) is an iteration count. Its default value is one, meaning that the whole transformation process is only executed once. If set to a larger value, the transformation routine calls itself recursively. This is equivalent to first running ens transform with a start structure, and then repeatedly execute dataset transform commands for the second and later iterations with the last result set. All limits and other control parameters are passed in the original configuration, and apply only to the next iteration, not globally over the sum of all transform cycles. By default, the result set of this mode is what the last iteration produced, but this can be changed to the union of all iteration results by the keepiterationintermediates flag. Uniqueness checking of result structures is applied to the full return set. If the parameter is set to zero or a negative value, no transformations are executed. If the setpathname flag is set, it is automatically switched to appendpathname for the second and later cycles, so that the name mirrors the full transformation history and is not reset in each cycle.

Example:

set t1 {{[O,S;X1:1]=[C:2x1][C:3X4][#1:4]>>[#1:4][O,S;X2:1][C:2x1]=[C:3] enol/thioenol}}
set elist [ens transform $eh [list $t1] bidirectional multistep all preservecharges none]

This example is part of a tautomer generator. The full standard generator in the toolkit uses a lengthy list of transform schemes and not just the one sample keto/enol schema displayed here. Because the operation is bidirectional, the transform transforms ketones into enols, and vice versa. If more than one interchangeable group exists, all intermediate structures are generated ( multistep reaction mode). All results are retained ( all selection mode), and all intermediate structures are again subjected to all transforms (this does not have any effect with a single transform, but the real application uses a set of transforms). Finally, charges should not be changed ( preservecharges flags), and fragment overlap is not allowed ( none overlap mode) - this again is without effect in this sample transform, because it does not consist of disconnected fragments on either side.

Multiple structures may be jointly transformed in a single command by means of the very similar dataset transform command.

ens translate

ens translate ehandle pt1 ?pt2?

Move the atoms of the ensemble by modifying their 3D coordinates in property A_XYZ . The first argument is interpreted as a 3D vector if this is the only coordinate argument. All atoms with valid 3D coordinates are moved according to the vector coordinates. In case a second argument is supplied, both arguments are interpreted as points in 3D space. The ensemble atoms are moved according to the difference vector between the second and the first point.

This operation triggers a 3dglop property invalidation event.

Examples:

ens translate $eh {0 0 1}
ens translate $eh [atom get $eh $a1 A_XYZ] [atom get $eh $a2 A_XYZ]

ens trim

ens trim ehandle ?propertylist?

Reduce the information content of a structure to a standard minimum set and discard any additional information. This process minimizes the storage requirements of the ensemble. The properties of the internally defined minimum set are computed if required. The retained property set is designed to support a faithful representation of connectivity including bond and atom labels and types as well as formal charges, stereochemistry, isotopes, 2D and 3D coordinates, but not of auxiliary additional attributes of atoms, bonds or other minor objects.

The optional fourth argument is a list of properties which should be retained in addition to the standard set. If any of these are not present on the ensemble to be trimmed, they are silently ignored and no attempt is made to compute them. Specifying properties of the standard retention set in this list is allowed but has no additional effect.

The return value of the command is a list of the remaining properties of the ensemble.

Example:

ens trim $ehandle {E_GIF E_SMILES}

ens uncharge

ens uncharge ehandle

Attempt to remove charges on atoms in a chemically sensible way. Charge removal by default happens via addition or removal of protons. In cases where this does not make chemical sense, a direct charge manipulation may be performed instead. Charged metal ions and other charged species without an obvious method for neutralization remain unchanged.

The command returns the number of atoms which were neutralized.

Example:

ens uncharge [ens create {[NH3+]CC(=O)[O-]}]

This sample line removes a proton from the charged amino group and add a proton to the charged carboxyl group of the initial glycine zwitterion. The returned result value is 2. In this example the total hydrogen count has not changed. In case of an unbalanced set of positive and negative, modified charged centers this is usually not the case.

ens unlock

ens unlock ehandle propertylist/objclass/all

Unlock property data for the ensemble, meaning that they are again under the control of the standard data consistency manager.

The property data to unlock can be selected by providing a list of the following identifiers:

Property data locks are obtained by the ens lock command.

Example:

set eh [ens create CCC]
ens lock $eh A_SYMBOL 1
ens purge $eh A_ELEMENT
atom set $eh 1 A_query(dsearch) 3
ens unlock $eh A_SYMBOL

In this example, an ensemble is created, and the atom symbol information is locked. Next, the element number property is deleted, and a query attribute is set. Finally, the lock is released. Had the element symbol information not been locked, the ensemble would have become unusable due to an overzealous data consistency manager. Setting query information in property A_query can have an influence on the atom symbol. So the default action of invalidating A_SYMBOL when manipulating A_query is correct. However, in case there is no element information A_ELEMENT , and no atom symbol information A_SYMBOL , the element information is completely lost, and the ensemble becomes unusable. So in this case, locking A_SYMBOL (or alternatively A_ELEMENT ) is required to avoid unexpected side effects of structure editing.

ens unpack

ens unpack packstring

Unpack a base64-encoded serialized object string which was created by an ens pack command. The return value of this function is the handle of the newly created ensemble object, which is an exact duplicate of the packed original ensemble.

Packed ensembles may also be unpacked by the ens create command.

Example:

set packdata [ens pack [ens create CCCl]]
set ehandle [ens unpack $packdata]

ens valencecheck

ens valencecheck ehandle ?failedatomvariable?

Perform a valence check on the ensemble, comparing the current bonding situation at all atoms to the list of element-specific valence states in the system element table. This command is intentionally quite picky, discouraging for example the use of pentavalent nitrogen. For the calculation of valence, only bonds of type normal are taken into account. Complex bonds and pseudo bond types thus do not interfere in the calculation. Some more exotic metal atoms with many different valence states, or few well-defined covalent compounds, such as vanadium or rhodium , always pass.

The return value of this command is the number of atoms which failed the valence check. If the optional failedatomvariable argument is specified, it is the name of a variable which receives a list of the atom labels which failed the check, or is set to an empty list in case no problems were found.

Note that this command assumes that all hydrogen atoms are in place. Processing of structures with implicit hydrogen atoms is not supported.

Example:

ens valencecheck [ens create {CN(=O)=O.C[N+](=O)[O-]}] badatoms

This sample command checks the valence situation of nitromethane in two encoding formats. The first molecule, using a pentavalent nitrogen encoding, is responsible for the result value 1, indicating one failed atom, and the variable badatoms is set to 2, the label of the pentavalent nitrogen atom. The second molecule passes the check and reports no additional problems.

ens valid

ens valid ehandle propertylist

Returns a list of boolean values indicating whether values for the named properties are currently set for the ensemble. No attempt at computation is made.

Example:

ens valid $xhandle X_IDENT

will report whether the ensemble has a standard ID (has a valid E_IDENT property) or not.

ens vector

ens vector ehandle property vectorname ?invert? ?integrate?

Map ensemble property data to a Blt library vector object. Please refer to the Blt manual pages for more information on these. Blt vector objects are very useful, for example, for the efficient set-up of GUI graphing widgets which are provided by the Blt Tk extension. This command automatically attempts to load the Blt Tcl module if necessary. If that fails, an error results.

The vectorized property data must be of a vector type, and the element type of the vector must either be a simple numeric type, or a bit for bitvectors, or a floating-point pair. It is possible to address a property subfield, for example the X/Y data points of a spectrum which are typically stored as a field in a complex compound property.

If the invert flag is set, the stored Blt vector object values are set to 1.0 minus the property data value. By default, this flag is not active. If the integrate flag is set, the Blt vector object element values are set to the sum of all preceding property data values. This flag is also disabled by default.

If the property data type is a float pair vector, two vector objects are created in the Blt namespace, with suffixes _X and _Y . For simple vector types, the vector name is used directly. It is possible to overwrite existing Blt vectors of the same name with this command.

The return value of the command is a list of the generated name of the vector, followed by the minimum and maximum data values in that vector object. These may the different from the ensemble property data values because of the application of the invert or integrate flags.For float pair vectors, the same information is repeated for the second vector object.

ens weed

ens weed ehandle keywords

This command performs a number of common clean-up and standardization operations on the ensemble, which are especially useful in the context of processing PDB files. The ensemble is potentially modified, but keeps its handle, which is returned as command result. In addition, properties A_XYZ and A_RESIDUE , which are normally susceptible to bond manipulations, are locked and retained.

The keywords argument selects the desired set of operations. Most of the keywords are single words, but the minsize and maxsize as well as the minaminoacids and maxaminoacids keywords take an additional integer number as argument. The following operations are currently supported:

The order of the keywords is not important. The sequence of operations is always

metalatoms > specialbonds > proteinspecialbonds,proteinhetatmbonds > metaloxygenbonds > disulphides > carbonless,hydrogenless,inorganic,maxsize,metalions,minsize,water > maxaminoacids,minaminoacids > duplicates

Applied operations which potentially change the set of molecules and rings trigger an automatic re-evaluation of this data after the operation block has been executed.

Example:

The code below is part of a reliable PDB ligand extractor.

ens weed $eh {metaloxygenbonds water proteinspecialbonds duplicates minsize 10 \ maxsize 300 maxaminoacids 6 disulfides}
if {[ens get $eh E_NATOMS]==0} {
# try again with additional bond cut step. Cannot do this by default, because# there are plenty of ligands with embedded amino acid parts# that are encoded as ATOM lines. PDB files suck.
	molfile backspace $fh
	set eh [molfile read $fh]
	ens weed $eh {metaloxygenbonds water proteinspecialbonds proteinhetatmbonds \													duplicates minsize 10 maxsize 300 maxaminoacids 6 disulfides}
}

ens xhandle

ens xhandle ehandle

Return the remote handle of the ensemble if it was exported and is currently under the control of a live-linked application. In case the ensemble is not exported, an error results.


1. Do not use this mode with transforms which add a group which is again matchable by the transform - you will face a runaway polymerization-style reaction!