The dataset command is the generic command used to manipulate datasets. The syntax of this command follows the standard schema of command/subcommand/majorhandle . Datasets are major objects and thus do not need any minor object labels for identification.
dataset get $dhandle D_SIZE
As explained in the introductory section on datasets, a normal persistent dataset handle may be substituted as third argument of the
dataset
command by an arbitrary list of dataset, ensemble, reaction, table and network handles. Substitution is only allowed in that argument position, not in case where a dataset handle is part of the command arguments of another object command, and not in a different argument position in the context of a
dataset
command. Such an object list is transformed into a transient dataset for the duration of the command execution. After the command has completed, the elements of the transient dataset are in most cases restored to their original state with respect to dataset membership and position, except in a few documented exceptional circumstances.
As a means to access an embedded dataset object, its handle may be replaced by the handle of the parent object where this is unambiguous, e.g.
ens move $eh $thandle
moves the ensemble into the embedded dataset of the table, while
dataset count $thandle
treats the table argument as part of a transient dataset as described above.
This is the list of currently officially supported subcommands:
dataset add dhandle objhandle ?position?
d.add(object=,?position=?)
d += object
Add an object to the dataset, relocating it from a current dataset if it exists. If no position is specified, the object is appended to the rear of the dataset object list. The position can either be a numerical zero-based index, or any string beginning with ‘e’ to indicate the end position.
If the object handle identifies a (local) dataset, and the target dataset does not accept datasets as members, all objects in the source dataset are instead moved to the new dataset, and then the source dataset is destroyed. If ensembles, reactions, tables or networks are moved, they are unlinked from any current datasets, but these original datasets themselves persist.
This dataset command is equivalent to issuing a move command from the object.
The command returns the dataset handle for Tcl , or the dataset reference for Python . The numerical operator shortcut for Python adds the object to the end of the dataset.
dataset add $dh $eh end
ens move $eh $dh end
dataset addthread dhandle ?body?
dataset addthread dhandle count body
dataset addthread dhandle count substitutiondict body
d.addthread(?count=?,?dictionary=?,?script=?)
Add one or more Tcl script threads to the dataset. By default, a single thread is added, but by setting the count parameter to a higher number multiple threads with the same script body can be added simultaneously, up to a maximum of 32 threads per dataset. It is possible to use this command to add additional threads to a dataset which already has attached threads. These older threads remain active.
The thread script code is always Tcl code, even if the command is issued from a Python interpreter. This is due to limitations in the Python thread model and described in more detail in the general Python scripting introduction.
The optional substitution dictionary contains a set of percent-prefixed keys and replacement values, following the
Tk
event procedure model. All such replacements are made before the script is passed to the thread interpreters. A single default substitution replacing the character sequence
%D
with the handle of the current dataset is always predefined and cannot be redefined. Replacement token keys (but not necessarily their values) are single case-depended characters, ignoring an optional percent prefix character. Within the script, percent signs which should be preserved as such must be doubled, just like in
Tk
event substitution commands.
The dataset threads are compatible to those of the standard
Tcl
threads package. Dataset-associated threads are automatically created in
preserved
state, and a
thread::wait
command is automatically appended at the end of the script, so they can be sent additional tasks via the
thread::send
commands. If no script body is specified, the initial script consists only of the wait command. Threads can be canceled or joined only if they are stopped the
thread::wait
statement.
When a dataset is deleted, all threads associated with this dataset need first to be joined, and this can only happen if they have finished processing the main body script and are all in their idle state in the
thread::wait
command. Object deletion is postponed until this condition is met. A global join on all currently executing dataset threads is automatically performed when the program exits, before any object clean-up tasks are run. An application where dataset threads are stuck and do not reach their t
hread::wait
cancellation points cannot be cleanly exited.
Duplicating datasets does not duplicate any associated threads.
The presence of threads on a dataset has consequences for the behavior of the
dataset wait
and
dataset pop
commands, as well as object insertion commands associated with other major object classes (e.g.
ens move
, or
molfile read
). Please refer to the respective paragraphs for details. The size control mechanism of datasets in the auto mode is also dependent on the presence of absence of linked dataset threads.
dataset addthread $dh 1 [dict create %T $th] {
while {1}
set eh [dataset pop %D]
if {$eh==""} break
if {[catch {ens get $eh E_CANONIC_TAUTOMER} eh_canonic]} {
ens delete $eh
continue
}
if {[catch {ens get $eh_canonic E_DESCRIPTORS}]} {
ens delete $eh
continue
}
table addens %T $eh_canonic
ens delete $eh
}
}
This code creates a processing thread on the dataset which computes properties on newly arriving ensembles, stores the data in a table (note the table handle substitution via the replacement dictionary) and then deletes the ensemble. The
dataset pop
command returns an empty string when it is known no more data will arrive, and otherwise blocks until an object for popping is available. This is managed by setting the
eod
dataset attribute from feeder threads.
The return value of the command is a list of the
Tcl
thread IDs of the newly created threads. These are suitable for use in the
dataset jointhreads
command or any standard
Tcl
thread package command.
dataset append dhandle ?property value?...
d.append({?property:value,?...})
d.append(?property,value,?...)
Standard data manipulation command for appending property data. It is explained in more detail in the section about setting property data.
The command returns the first data value.
dataset append $dhandle D_NAME “_new”
dataset append $dhandle eod 1
dataset assign dhandle srcproperty dstproperty
d.assign(srcproperty=,dstproperty=)
Assign property data to another property on the same ensemble. Both properties must be associated with the same object class. This process is more efficient than going through a pair of
dataset get/dataset set
commands, because in most cases no string or
Tcl/Python
script object representations of the property data need to be created.
Both source and destination properties may be addressed with field specifications. A data conversion path must exist between the data types of the involved properties. If any data conversion fails, the command fails. For example, it is possible to assign a string property to a numeric property - but only if all property values can be successfully converted to that numeric type. The reverse example case always succeeds, out-of-memory errors and similar global events excluded.
The original property data remains valid. The command variant
dataset rename
directly exchanges the property name without any data duplication or conversion, if that is possible. In any case, the original property data is no longer present after the execution of this command variant.
If the properties are not associated with datasets (prefix D_ ), the operation is performed on all dataset member objects.
The command returns the object handle for Tcl , or object reference for Python .
dataset assign $dhandle A_XY A_XY%
This code snippet creates backup atomic 2D layout coordinates on all dataset ensembles or reactions.
dataset biologics dhandle ?filterset? ?filtermode? ?recursive?
d.tables(?filters=?,?mode=?,?recursive=?)
Return a list of all the handles or references of the biologics in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.
If the recursive flag is set, and the dataset contains other datasets as objects, biologics in these nested datasets are also listed.
set n [dataset biologics $dhandle {} count]
dataset cancelthreads ?all?
dataset cancelthreads dhandle ?all?
dataset cancelthreads dhandle threadid...
Dataset.Cancelthreads()
d.cancelthreads(“all”)
d.cancelthreads()
d.cancelthreads(?threadid?,...)
Cancel (or more precisely, wait for and join) one or more threads associated with the dataset. Dataset threads can only be canceled when they are idle, executing the implicitly added
thread::wait
command at the end of their script. Therefore, this command is not just used for clean-up, but also useful for ascertaining that the threads have finished their tasks. The IDs of the threads associated with a dataset can be retrieved as the threads dataset attribute, or saved from the return value of the original
dataset addthread
command. The special all thread ID value can be used to cancel all threads of the dataset. This can also be achieved by setting an empty thread ID parameter, or omitting it altogether. If a dataset does not possess threads, this command does nothing. If a thread marked for cancellation has not yet finished, the cancellation command is suspended until it has.
This command can also be invoked without specifying an explicit or transient dataset argument, or passing it as all. In that case, the thread join cleanup is run on all threads of all currently defined datasets. This function is also implicitly run when a a script exits, before performing other application cleanup operations.
Thread cancellation for all dataset threads is implicitly invoked when a dataset is deleted, so an explicit clean-up is not required. However, this also means that a dataset deletion blocks if there are still active threads. It is not possible to forcefully cancel an thread which has entered an infinite loop, so careful programming is required.
The command returns the number of canceled threads.
dataset jointhreads
is an alias to this command.
dataset jointhreads $dh
dataset cancelthreads $dh [lindex [dataset get $th threads] 0]
dataset jointhreads
The first example waits for all threads on the specified dataset to finish. The second command waits for the completion of one specific thread, and the last command waits for all threads on all currently defined datasets.
dataset cast dhandle dataset/ens/reaction/table ?propertylist?
d.cast(objectclass=,?properties=?)
Transform the dataset into a different object. Depending on the target object class, the result is as follows:
If the optional property list is specified, an attempt is made to compute the listed properties before the cast operation, so that they may become a part of the new object. No error is raised if a computation fails.
The command returns the handle (reference for Python ) of the new object, or the input object in case of mode dataset.
dataset clear dhandle
d.clear()
Delete all objects in the dataset, but keep the dataset object. The return value is the number of deleted objects.
dataset count dhandle|remotehandle ?filterlist?
d.count(?filters=?)
Dataset.Count(dataset=,?filters=?)
Get the number of objects in the dataset. If the filter parameter is specified, only those objects which pass the filter are counted.
dataset count $dhandle astereogenic
counts the number of ensembles or reactions in the dataset with one or more potential atom stereo centers.
dataset size
is an alias to this command.
This command can be used with remote datasets. In the case of Python , this requires the use of the class method.
In case a simple count on a local dataset is required, without any filters, the dataset size can also be queried as attribute, as in
set n [dataset get $dhandle size]
dataset create ?objecthandle/objectlist?...
Dataset(?objectref/objectsequence?,...)
Dataset.Create(?objectref/objectsequence?,...)
This command creates a new dataset and returns the handle of the new dataset. If the optional object handle lists are provided as arguments, the specified objects (in case of ensemble, reaction, network or table handles), or elements of the object (for a dataset handle, with default accept flags) are moved to the new dataset. In case the accept flags of the target dataset are configured to allow datasets as primary dataset objects, the source dataset argument is not implicitly replaced by its content objects but added as a single object, retaining its objects as content. Otherwise, the source dataset is emptied but remains a valid object.
Besides handles of ensembles, reactions, networks, tables, molfiles and of other datasets, which are identified with priority, any string which can be decoded in an
ens create
statement is also allowed as member initialization identifier.
If the
dataset create
statement references objects which are not usually accepted by the default settings of the accept dataset attribute, that attribute is automatically adjusted to allow for these objects. The accept flag modification is persistent.
Molfile objects in the object handle list are treated different from other objects. The latter are directly moved into the dataset. In the case of
molfile
objects, the file is read from the current position to the end (or until a termination condition configured on the
molfile
handle is met), and the newly read objects are moved into the dataset.
The command always returns the handle of the new dataset (or a reference for Python ), never the handles of any objects which may have been placed into the dataset
dataset create [list $eh1 $eh2] $dh1
creates a new dataset and move the two specified ensembles $eh1 and $eh2, as well as everything contained in the dataset $dh1 , into the new dataset.
dataset create [molfile open myfile.sdf r hydrogens add]
creates a dataset from the file contents, with hydrogen addition configured on the
molfile
handle.
dataset create VXPBDCBTMSKCKZ
Above command matches a partial InChI key, and puts all structures from the NCI resolver which matches the non-stereo/isotope-specific part of their full InChI key, into the new dataset.
set ::cactvs(lookupmode) „name_pattern“
dataset create [list "+morphine +methyl"]
This command performs a name pattern lookup and puts all structures from the NCI resolver which contain both name fragments in one of their known names into the dataset. The name pattern string needs to be explicitly packed into a list, because otherwise it would be split into two independent list elements.
dataset dataset dhandle ?filterlist?
d.dataset(?filters=?)
Get the handle (or, for
Python
, a reference) of the container dataset the dataset is a member of. If the dataset is not itself a dataset member, or does not pass the optional filters, an empty string is returned, or
None
for
Python
.
dataset datasets dhandle ?filterset? ?filtermode? ?recursive?
d.datasets(?filters=?,?mode=?,?recursive=?)
Return a list of all the handles or references of the datasets that are members in the dataset identified by the command argument handle. Other objects (ensembles, reactions, tables, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.
If the recursive flag is set, and the dataset contains other datasets as objects, datasets in these nested datasets are also listed.
This command is
not
equivalent of the
dataset dataset
command!
set dlist [dataset datasets $dhandle]
dataset defined dhandle property
d.defined(property)
This command checks whether a property is defined for the dataset. This is explained in more detail in the section about property validity checking. Note that this is
not
a check for the presence of property data! The
dataset valid
command is used for this purpose.
dataset delete ?datasethandle/datasethandlelist/all?...
d.delete()
Dataset.Delete(“all”)
Dataset.Delete(?dref/drefsequence/dhandle?,...)
This command destroys datasets and everything contained therein. The special handle value all may be used to delete all datasets in the application at once.
The command returns the number of datasets which were successfully deleted.
Transient datasets cannot be used with this command. Neither can be datasets which are a component of another object, e.g. the internal datasets of tables or factories. These are only and automatically deleted when their parent object is destroyed. Datasets which are a property value are also undeletable by this command.
It is a common programming error to delete a dataset, or its parent object if one exists, without protecting its current member ensembles or reactions. If they are still needed in later processing they need to be explicitly transferred into another dataset or outside of it.
dataset delete all
dataset move $dhandle {}; dataset delete $dhandle
The first example destroys all datasets defined in the current script and everything contained in them. The second example shows how to delete a dataset and preserve its contents by moving all dataset elements out prior to deletion.
dataset dget dhandle propertylist ?filterset? ?parameterdict?
d.dget(property=,?filters=?,?parameters=?)
Dataset.Dget(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset dget
is that the latter does not attempt computation of property data, but rather initializes the property values to the default and return that default if the data is not yet available. For data already present,
dataset get
and
dataset dget
are equivalent.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset dup dhandle ?targethandle? ?cleartarget?
d.dup(?target=?,?cleartarget=?)
If the optional arguments are not supplied, the dataset with all data attached to the dataset and all objects which are contained in it are duplicated. The command returns a new dataset handle for
Tcl
, or reference for
Python
. All duplicated objects in the new datasets also are assigned handles which can be obtained by commands such as
dataset list $dhandle
.
It is possible to specify a target dataset as an optional argument. In that case, no new dataset is created, and dataset-level property data on the source dataset is not copied. All objects in the source dataset are duplicated and appended to the end of the target dataset. In case the boolean target clearance flag is set, which is also the default if the parameter is omitted, the target dataset is cleared before the new objects from the source dataset are added. In this command variant, the return value of the command is the target dataset handle or reference.
dataset dup $dhandle
dataset dup [list $eh1 $eh2] $dtarget 0
dataset ens dhandle ?filterset? ?filtermode? ?recursive?
d.ens(?filters=?,?mode=?,?recursive=?)
Return a list of all the handles or references of the ensembles in the dataset. Other objects (reactions, tables, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.
If the optional boolean recursive argument is set, ensembles which are a component of a reaction in the dataset are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and ensembles in these, as well as ensembles in reactions in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended to the result list in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, ensembles which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.
set elist [dataset ens $dhandle astereogenic]
lists those ensembles in the dataset which have one or more atoms which are potential atom stereo centers.
set cnt [dataset ens $dhandle {} count 1]
returns a count of all ensembles which are either directly members of the dataset, or indirectly as component objects of reactions in the dataset, or which are contained in datasets which are a themselves a member of the primary dataset.
dataset exists dhandle ?filterlist?
d.exists(?filters=?)
Dataset.Exists(dref,?filters=?)
Check whether a dataset handle or reference is valid. The command returns boolean 0 or 1. Optionally, the dataset may be filtered by a standard filter list, and if it does not pass the filter, it is reported as not valid. This command cannot be used with transient datasets.
dataset exists $dhandle
dataset expr dhandle expression
d.expr(expression)
Compute a standard SQL -style property expression for the dataset. This is explained in detail in the chapter on property expressions.
dataset extract dhandle propertylist ?filterset? ?filterprocs?
d.extract(property=,?filters=?,?filterfunctions=?)
This command is rather complex and closely related to the
dataset xlabel
command. It was designed for the efficient extraction of major or minor object data for filtered subsets of the dataset.
The property list parameter determines the property data which is extracted. Multiple properties may be specified, but they can only be associated with major objects and one arbitrary minor object class. So it is possible to simultaneously extract an ensemble and an atom property, but not an atom and a bond property.
The return value is a nested list of data items for every object which is encountered while traversing the dataset on the level of the minor object associated with the extraction property, or just ensembles or other major objects if no such property is selected. Every list element is itself a list which contains the extracted property values in the order they are named in the property list parameter.
The objects for which data is returned can further be filtered by a standard filter set, and additionally by a list of filter procedures (for Tcl , specified as procedure names) or functions (for Python , specified as function names or function references). These procedures or functions are called with the respective object handles/references and object labels as arguments. For example, a callback function used in an atom retrieval context would be called for each atom with its ensemble handle or reference and the atom label as arguments. If major objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.
The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.
Because this command is primarily intended for numerical data display, the returned values are formatted as with the nget command, i.e. instead of enumerated values the underlying numerical values are returned.
set dhandle [dataset create [ens create CO] [ens create CN]]
dataset extract $dhandle [list E_NAME A_SYMBOL] !hydrogen
This example first creates a dataset with methanol and methylamine . The second line performs the actual extraction and returns
{CH4O C} {CH4O O} {CH5N C} {CH5N N}
This kind of extracted data is useful for the display of filtered atomic (and other minor object’s) property values.
dataset filter dhandle filterset
d.filter(filters)
Check whether a the dataset passes a filter list. The return value is boolean 1 for success and 0 for failure. Note that only filters operating on dataset objects are applicable, not any filter for objects contained in the dataset (such as ensembles or reactions).
dataset find dhandle objecthandle
d.find(objectref)
Get the index of the dataset object. If it cannot be found in the dataset, the result is minus one.
dataset forget dhandle ?objectclass?
d.forget(?objectclass=?)
This command is essentially the same as the
ens forget
(or
reaction forget
, etc)
command. It is applied to all objects in the dataset.
If the object class is dataset , all dataset-level property data is deleted.
The command returns the dataset handle or reference, or, for Tcl only, an empty string if the dataset was transient.
dataset get dhandle propertylist ?filterset? ?parameterdict?
dataset get dhandle attribute
d.get(property=,?filters=?,?parameters=?)
d.get(attribute)
d[property/attribute]
d.property/attribute
Dataset.Get(items,property=,?filters=?,?parameters=?)
Dataset.Get(items,attribute)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
In addition to retrieving property data, it can also be used to query dataset attributes. The set of supported attributes is detailed in the paragraph on the
dataset set
command.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset get $dhandle {D_NAME D_SIZE}
yields the name and size of the dataset as a list. If the information is not yet available, an attempt is made to compute it. If the computation fails, an error results.
dataset get $dhandle [list E_FORMULA E_WEIGHT]
gives the formula and molecular weight of all dataset ensembles. The result is delivered as a nested list. The first list are the formulas, the second list contains the weights.
Currently, it is not possible to use filters with this command (and the other retrieval command variants) which are not operating directly on the dataset object, but on objects lower in the hierarchy such as ensembles or atoms.
For the use of the optional property parameter list argument, refer to the documentation of the
ens get
command.
Variants of the
dataset get
command are
dataset new, dataset dget, dataset jget, dataset jnew, dataset jshow, dataset nget, dataset show, dataset sqldget, dataset sqlget, dataset sqlnew,
and
dataset sqlshow
.
dataset getparam dhandle property ?key? ?default?
d.getparam(property=,?key=?,?default=?)
Retrieve a named computation parameter from valid property data. If the key is not present in the parameter list, an empty string is returned (
None
for
Python
). If the default argument is supplied, that value is returned in case the key is not found.
If the key parameter is omitted, a complete set of the parameters used for computation of the property value is returned in dictionary format.
This command does not attempt to compute property data. If the specified property is not present, an error results.
dataset getparam $dhandle E_GIF format
returns the actual format of the image, which could be GIF , PNG , or various bitmap formats.
dataset hadd dhandle ?filterset? ?flags? ?changeset?
d.hadd(?filters=?,?flags=?,?changeset=?)
Add a standard set of hydrogens to all ensembles and reactions in the dataset. If the filterset parameter is specified, only those atoms which pass the filter set are processed.
Additional operation flags may be activated by setting the flags parameter to a list of flag names, or a numerical value representing the bit-ored values of the selected flags. By default, the flag set is empty, corresponding to the use of an empty string or none as parameter value. These flags are currently supported:
Adding hydrogens with this command is less destructive to the property data set of the ensembles or reactions than adding them with individual
atom create/bond create
commands, because many properties are defined to be indifferent to explicit hydrogen status changes, but are invalidated if the structure is changed in other ways.
If the effects of the hydrogen addition step to the validity of the property data set should not be handled with this standard procedure, it is possible to explicitly generate additional property invalidation events by specifying a list as the optional last parameter, for example a list of atom and bond to trigger both the atom change and bond change events.
The command returns the total number of hydrogens added to all ensembles and reactions in the dataset.
dataset hadd $dhandle
dataset hdup dhandle ?targethandle? ?cleartarget?
d.hdup(?target=?,?cleartarget=?)
If the optional arguments are not supplied, the dataset with all data attached to the dataset and all objects which are contained in it are duplicated with hydrogen addition. The command returns a new dataset handle for
Tcl
, or reference for
Python
. All duplicated objects in the new datasets also are assigned handles which can be obtained by commands such as
dataset list $dhandle
.
It is possible to specify a target dataset as an optional argument. In that case, no new dataset is created, and dataset-level property data on the source dataset is not copied. All objects in the source dataset are duplicated with hydrogen addition and appended to the end of the target dataset. In case the boolean target clearance flag is set, which is also the default if the parameter is omitted, the target dataset is cleared before the new objects from the source dataset are added. In this command variant, the return value of the command is the target dataset handle or reference.
dataset dup $dhandle
dataset dup [list $eh1 $eh2] $dtarget 0
dataset hierarchies dhandle ?filterset? ?filtermode? ?recursive?
d.tables(?filters=?,?mode=?,?recursive=?)
Return a list of all the handles or references of the hierarchies in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.
If the recursive flag is set, and the dataset contains other datasets as objects, hierarchies in these nested datasets are also listed.
This is not the same as
dataset hierarchy
- the latter reports the hierarchy the dataset is a member of. This command lists the hierarchies in the dataset.
set n [dataset hierarchies $dhandle {} count]
dataset hierarchy dhandle ?filterlist? ?root?
d.hierarchy(?filters=?,?root=?)
Return the hierarchy handle or reference of the hierarchy the dataset is part of. If the dataset is not member of a hierarchy, or does not pass all of the optional filters, an empty string or
None
for
Python
is returned. By default, the hierarchy object which directly contains the dataset is returned. If the root flag is set, the root hierarchy object is reported instead, which is the same only if the hierarchy has only a single level.
This command is not the same as
dataset hierarchies
, which reports hierarchies in the dataset.
dataset hierarchy $dhandle
dataset hread dhandle ?datasethandle|enshandle? ?#recs|batch|all?
d.hread(?target=?,?limit=?)
This command provides the same functionality as
dataset read
, but additionally adds a stand set of hydrogen atoms to the read duplicate objects.
The command arguments are explained in the section on
dataset read
.
dataset hstrip dhandle ?flags? ?changeset?
d.hstrip(?flags=?,?changeset=?)
This command removes hydrogens from the dataset ensembles and reactions. By default, all hydrogen atoms in the dataset ensembles or reactions are removed.
The flags parameter can be used to make the operation more selective. It may be a list of the following flags:
If the flags parameter is an empty string, or none , it is ignored. The default flag value is wedgetransfer - but the default value is overridden if any flags are set!
If the changeset parameter is given, all property change events listed in the parameter are triggered.
Hydrogen stripping is not as disruptive to the ensemble or reaction data content as normal atom deletion. The system assumes that this operation is done as part of some file output or visualization preparation. However, if any new data is computed after stripping, the computation functions see the stripped structure, and proceed to work on that reduced structure without knowledge that there are implicit hydrogens.
dataset hstrip $dhandle [list keeporiginal wedgetransfer]
dataset index dhandle
dataset index dhandle position
d.index(?position=?)
This command comes in two variants. The tree-word version is the generic command to check dataset membership, which is the same for all objects which can be dataset members. The second version is specific to datasets objects and retrieves object references from this dataset.
This first version gets the position of the dataset in the object list of its parent dataset. If the dataset is not part of a parent dataset, -1 is returned. This is the generic dataset membership test command variant.
This second command variant obtains the object handle or reference of the object at the specified position in this dataset. Position counting begins with zero. If the index is outside the object position range, an empty string is returned. The special value end may be used to address the last object. The indexed object remains in the dataset.
Note that this
index
command is not equivalent to the standard
index
command on minor objects which is used to obtain the position of the minor object in the minor object list of the controlling major object. This kind of functionality is not needed for major objects, because they are not contained in any minor object list.
dataset index $dhandle end
dataset intersect dhandle1 dhandle2 ?property?...
d.intersect(dref2,?pref?...)
Perform an intersection check between two datasets. The result is a list of zero-based dataset index pairs (as in
dataset index
) of all identical corresponding dataset entries in both datasets, as judged by the value of the comparison property. The default comparison property is
E_ISOTOPE_STEREO_HASH
for full structural identity check of ensembles.
In case the first dataset contains duplicates, the index of the matching second dataset element is identical for all duplicates, and, in case the second dataset also contains corresponding duplicates, a (pseudo-)random element from among these duplicates, and the other duplicates in the second dataset are reported as not matched in the
dataset intersect3
command variant (see below).
The comparison property object class must match the class of the compared dataset objects (i.e. the default property is only suitable for comparison of ensembles in the datasets, but not for reactions, etc.). Objects of mismatching classes in the datasets are ignored.
set dh1 [dataset create CC CCC CCCC]
set dh2 [dataset create CCC CCCC CCCCC]
dataset intersect $dh1 $dh2
The result is
{1 0} {2 1}
, meaning the second (if we start counting with 1) element of the first dataset corresponds to the first element in the second, and the third element to the second.
dataset intersect3 dhandle1 dhandle2 ?property?...
d.intersect3(dref2,?pref?...)
This command is an extended variant of
dataset intersect
. The return value is a 3-element list comprising of a simple list of the element indices in the first dataset which are not matched, the match pair list as in dataset intersect of the equivalent elements, and a simple list containing the element indices of the second dataset which are not matched.
set dh1 [dataset create CC CCC CCCC]
set dh2 [dataset create CCC CCCC CCCCC]
dataset intersect3 $dh1 $dh2
The result is
0 {{1 0} {2 1}} 2
. The middle element of the result list is the same as in the example for the
dataset intersect
command. The first element indicates that the first (starting the count with 1) element of the first dataset was not matched, and the third element indicates that the third element of the second dataset was not matched.
dataset jget dhandle propertylist ?filterset? ?parameterdict?
d.jget(property=,?filters=?,?parameters=?)
Dataset.Jget(items,property=,?filters=?,?parameters=?)
This is a variant of
dataset get
which returns the result data as a
JSON
formatted string instead of
Tcl
or
Python
interpreter objects. The command is usable only for property data, not attribute retrieval.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset jointhreads ?all?
dataset jointhreads dhandle ?all?
dataset jointhreads dhandle threadid...
Dataset.Jointhreads()
d.jointhreads(“all”)
d.jointhreads()
d.jointhreads(?threadid?,...)
This is an alias for the
dataset
cancelthreads
command. Please refer to its documentation.
dataset jnew dhandle propertylist ?filterset? ?parameterdict?
d.jnew(property=,?filters=?,?parameters=?)
Dataset.Jnew(items,property=,?filters=?,?parameters=?)
This is a variant of
dataset new
which returns the result data as a
JSON
formatted string instead of
Tcl
or
Python
interpreter objects.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset jshow dhandle propertylist ?filterset? ?parameterdict?
d.jshow(property=,?filters=?,?parameters=?)
Dataset.Jshow(items,property=,?filters=?,?parameters=?)
This is a variant of
dataset show
which returns the result data as a
JSON
formatted string instead of
Tcl
or
Python
interpreter objects.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset ldup ?dhandlelist?...
Dataset.Ldup(?dref/drefsequence?,...)
Duplicate all datasets in the handle list(s) in default mode.
The return value is a single list (even if multiple source lists are used) of the duplicated ensemble handles or references. If an argument list element is an empty string (or
None
for
Python
), it indicates a missing object, and the output list also receives an empty string element (for
Tcl
) or
None
(for
Python
) at its position, without raising an error.
dataset lhdup ?dhandlelist?...
Dataset.Lhdup(?dref/drefsequence?,...)
Duplicate all datasets in the handle list(s) in default mode, and add hydrogens.
The return value is a single list (even if multiple source lists are used) of the duplicated ensemble handles or references. If an argument list element is an empty string (or
None
for
Python
), it indicates a missing object, and the output list also receives an empty string element (for
Tcl
) or
None
(for
Python
) at its position, without raising an error.
dataset list ?dhandle?
Dataset.List(?filters=?)
d.list()
Without a handle argument (for Tcl ), or called as the class method (for Python ) the command returns a list of the handles of all existing datasets.
If (in Tcl ) a dataset handle or transient dataset is passed as third argument, or the object method is used (for Python ) the command returns a list of all major objects in the dataset. In the Tcl case, this function is different from the behavior of the list subcommand for other major object classes, where the optional argument is a filter list. In Python , the filter list variant is supported.
dataset list
dataset list $dhandle
dataset lock dhandle propertylist/dataset/all ?compute?
d.lock(property=,?compute=?)
Lock property data of the dataset handle, meaning that it is no longer subject to the standard data consistency manager control. The data consistency manager deletes specific property data if anything is done to the dataset handle which would invalidate the information. Property data remains locked until is it explicitly unlocked.
The property data to lock can be selected by providing a list of the following identifiers:
A lock can be released by a
dataset unlock
command.
This command does not recurse into the objects contained in the dataset.
The return value is the dataset handle (for Tcl ) or reference (for Python ) or, if the dataset was transient, an empty string (for Tcl only).
dataset loop dhandle objvar ?maxrec? ?offset? body
d.loop(function=,?maxloop=?,?offset=?,?variable=?)
for obj in d:
Loop over the elements in a dataset. This command is similar to
molfile loop
. On each iteration, the variable is set to the handle of the current member object, and then the body code is executed. The variable refers to the original dataset element, not a duplicate. This is different from
dataset read.
All operations on the current loop item are allowed, including deletion. However, the next object after the current item must not be deleted or moved, because it is needed for the iteration process.
If a maximum record count is set, the loop terminates after the specified number of iterations. If the maximum record argument is set to an empty string, a negative value, or all , the loop covers all dataset elements. This is also the default.
For
Tcl
scripts, within the loop, the standard
Tcl
break
and
continue
commands work as expected. If the body script generates an error, the loop is exited.
If no offset is specified, the loop starts at the first element. Within the loop body, the dataset attribute
record
is continuously updated to indicate the current loop position. Its value starts with one, like file records in the
molfile loop
command.
The Python version of the loop method does intentionally have a different argument sequence for convenience. The function argument may either be a multi-line string (similar to the Tcl construct), or a function reference. Functions are called with the reference of the current loop object as single argument, and have their own context frame, so that the specification of a reference variable is not generally useful in that call style, though is is allowed. For string function blocks the code is executed in the local call frame, and the variable with the current object reference is visible locally. Script code blocks must be written with an initial indentation level of zero. Within the Python functions, the normal break and continue commands cannot be used to to scope limitations. Instead, the custom exceptions BreakLoop and ContinueLoop can be raised. These are automatically caught and processed in the loop body handler code.
In
Python
, there is also an object iterator so that simple loops over dataset elements can be written with a
for
statement. The dataset object iterator is of the
self
style (i.e. there is one per dataset, these are not independent objects), so nesting them is not possible on the same dataset.
Python object loop constructs and their peculiarities are discussed in more detail in the general chapter on Python scripting.
dataset loop $dh eh {
puts „[ens get $eh E_NAME] at position[ens index $eh]“
}
dataset match dhandle ss_ehandle ?matchflags? ?ignoreflags?
d.match(substructure=,?matchflags=?,?ignoreflags=?)
Perform a substructure match on all eligible objects in the dataset. The return value is the match count.
The arguments are the same as with
ens match
. The specification of variables to capture match locations is not possible in this command variant.
dataset max dhandle propertylist ?filterset?
d.max(property=,?filters=?)
Get the maximum value of one or more properties in from the elements in the dataset. The property argument may be any property attached to dataset members, or minor objects thereof. If the filterset argument is specified, the maximum value is searched only for objects which pass the filter set.
dataset max $dhandle E_WEIGHT
dataset max [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon
The first example finds the highest molecular weight in the dataset. The second example finds the largest (most positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.
dataset metadata dhandle property ?field ?value??
d.metadata(property=,?field=?,?value=?)
Obtain property metadata information, or set it. The handling of property metadata is explained in more detail in its own introductory section. The related commands
dataset setparam
and
dataset getparam
can be used for direct manipulation of specific keys in the computation parameter field. Metadata can only be read from or set on valid property data.
Valid field names are bounds , comment , info , flags , parameters and unit .
array set gifparams [dataset metadata $dhandle D_GIF parameters]
dataset metadata $dhandle D_QUALITY comment “This value looks suspicious to me”
The first line retrieves the computation parameters of the property
D_GIF
as keyword/value pairs. These are read into the array variable
gifparams
, and may subsequently be accessed as
$gifparams(format)
,
$gifparams(height)
, etc. The second example shows how to attach a comment to a property value.
dataset min dhandle propertylist ?filterset?
d.min(property=,?filters=?)
Get the minimum value of one or more properties from the elements in the dataset. The property argument may be any property attached to dataset sub-elements, or minor objects thereof. If the filterset argument is specified, the minimum value is searched only for objects which pass the filter set.
dataset min $dhandle E_WEIGHT
dataset min [list $ehandle1 $ehandle2] A_SIGMA_CHARGE carbon
The first example finds the smallest molecular weight in the dataset. The second example finds the smallest (most negative, or smallest positive) Gasteiger partial charge on any carbon atom in the two argument ensembles, which form a transient dataset.
dataset molfile dhandle ?filterset?
d.molfile(?filters=?)
Return the handle or reference of the molfile object associated with the dataset as backing page file. If no such file object exists, an empty string (for Tcl ) or None (for Python ) is returned.
set fh [dataset molfile $dh]
set fh [dataset get $dh pagefile]
dataset move dhandle datasethandle|remotehandle ?position?
d.move(target=,?position=?)
Move, depending on the acceptance flags of the destination dataset, either the objects in the dataset or transient dataset into another local or remote dataset, or move the dataset itself. If the destination dataset handle is an empty string (or
None
for
Python
), the dataset objects are removed from the original dataset, but not moved into any other dataset. If the destination dataset accepts datasets as members, which is not the default (see the
accept
attribute in the section on
dataset set
) the dataset is directly moved as object. Otherwise, its contained objects are moved, under preservation of the object order from the source dataset, and the source dataset is emptied, but not deleted.
Optionally, a position in the new dataset for the first moved object may be specified. This parameter is either an index (beginning with 0), or end , which is the default. If the contents of a dataset are spliced into another at a specific position, objects after the first element of the source dataset follow as a block.
Another special position value is random or rnd. This value moves to the dataset, or dataset contents, to a random position in the target dataset. Use of this mode with remote datasets is currently not supported.
In case of a transient command dataset the original dataset memberships of the dataset objects are not restored when the command completes.
The return value of the command is the dataset of the ensemble prior to the move operation. It is either a dataset handle/reference, or an empty string (
Tcl
) or
None
(
Python
) if it was not member of a dataset.
A dataset cannot be moved into itself.
dataset move $dhandle $dhandle2 0
dataset move $dhandle {}
dataset move [ens list] [dataset create]
The first line moves all objects in the source dataset into the first (and following) positions in the destination dataset. The second example removes all elements from the dataset. This is often useful in order to avoid dataset member destruction with the
dataset delete
command. The final example shows how to move a set of ensembles (here: all ensembles currently defined in the application) into a newly created dataset via an intermediate, transient dataset.
dataset move $dhandle vioxx@server55:10001
This command moves all objects in the first dataset to the remote dataset on host server55 , which listens on port 10001 and requires the pass phrase vioxx for access.
dataset mutex dhandle mode
d.mutex(mode)
During the execution of a script command, the mutex of the major object(s) associated with the command are automatically locked and unlocked, so that the operation of the command is thread-safe. This applies to toolkit builds that support multi-threading, either by allowing multiple parallel script interpreters in separate threads or by supporting helper threads for the acceleration of command execution or background information processing.
Going beyond this automatic per-statement protection, this command locks major objects for a period of time that exceeds a single command. A lock on the object can only be released from the same interpreter thread that set the lock. Any other threaded interpreters, or auxiliary threads, block until a mutex release command has been executed when accessing a locked command object. This command supports the following modes:
There is no trylock command variant because the command already needs to be able to acquire a transient object mutex lock for its execution.
dataset need dhandle propertylist ?mode? ?parameterdict?
d.need(property=,?mode=?,?parameters=?)
Standard command for the computation of property data, without immediate retrieval of results. In the common case of threaded computation, this starts a compute thread whose results or error status can be collected later. This command is explained in more detail in the section about retrieving property data.
If the dataset is not transient, the return value is the original dataset handle or reference.
dataset need $dhandle D_GIF recalc
dataset networks dhandle ?filterset? ?filtermode? ?recursive?
d.networks(?filters=?,?mode=?,?recursive=?)
Return a list of the handles or references of all the networks in the dataset. Other objects (ensembles, reactions, datasets, tables) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.
If the recursive flag is set, and the dataset contains other datasets as objects, networks in these nested datasets are also listed.
set n [dataset networks $dhandle {} count]
dataset new dhandle propertylist ?filterset? ?parameterdict?
d.new(property=,?filters=?,?parameters=?)
Dataset.New(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset new
is that the latter forces the re-computation of the property data, regardless whether it is present and valid, or not.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset nget dhandle propertylist ?filterset? ?parameterdict?
d.nget(property=,?filters=?,?parameters=?)
Dataset.Nget(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data and attributes. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset nget
is that the latter always returns numeric data, even if symbolic names for the values are available.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset nnew dhandle propertylist ?filterset? ?parameterdict?
d.nnew(property=,?filters=?,?parameters=?)
Dataset.Nnew(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data and attributes. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset nnew
is that the latter always returns numeric data, even if symbolic names for the values are available, and that property data re-computation is enforced.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset nitrostyle dhandle style
d.nitrostyle(style=)
Change the internal encoding of nitro groups and similar functional groups in the ensembles and reactions in the dataset. Possible values for the style parameter are:
dataset objects dhandle ?pattern?
d.objects(?pattern=?)
This is a non-standard cross-referencing command. The result is a list of all the objects in the dataset, where each result list element is a list or tuple consisting of the object type (ens, reaction, table, network, dataset), and the object handle or reference. Optionally, the dataset objects may be filtered by the pattern argument which applies to the object handle.
dataset objects $dhandle ens*
dataset ens $dhandle
except that the latter only lists the ensemble handles, not pairs of object class name and handle.
dataset pack dhandle ?maxsize? ?requestprops? ?suppressedprops? ?compressionlib?
d.pack(?maxsize=?,?requestprops=?,?suppressedprops=?,?compressionlib=?)
Pack the dataset and all objects it contains into a base-64 encoded, compressed string as a serialized object. The string does not contain any non-printing characters, quotation marks or other problematic characters and is thus well suited for storage in database tables and similar applications. These packed strings are portable and platform-independent.
The maximum size of the object string (default -1, meaning unlimited) can be configured by the optional maxsize parameter. The size is specified in bytes. If the pack string would be longer than the maximum size, an error results.
The two optional parameter lists allow to request a specific property set to be part of the package, even if it normally would not be included, and to explicitly omit properties from the dump. No property computation is performed, and suppressed properties are not purged from the source ensemble.
The default compression library is zlib . Other useful variants include lzo and gzip (and there are other internal types), but these may not be available on all builds due to license issues, and you need to specify the compression library when a dataset is unpacked. It is generally recommended to stay with zlib .
The return value of this command is the packed string.
In Python , datasets support the standard pickle / unpickle protocol.
dataset pack $dhandle
dataset pop dhandle|remotehandle ?position? ?timeout?
d.pop(?position=?,?timeout=?)
Dataset.Pop(dref/remotehandle,?position=?,?timeout=?)
Remove an object from a dataset. The handle or reference of the selected object is returned, and the object is no longer a member of the dataset when the command completes. If a timeout is specified, it is transferred to the dataset attribute of the same name before the command is executed, as with a
dataset set
command.
By default the first object in the dataset, at index zero, is returned. A different object can be selected by means of the optional position argument. It can be a numerical index, end for the last object, rnd / random for a random selection. If the object index if larger than the maximum index of any object, it is silently rewritten to end . Random pops are not supported on remote datasets.
This command works with remote datasets. In that case, the object is transferred via an intermediate serialized object representation over the network. It is unpacked on the local interpreter, and deleted on the remote interpreter.
If the desired dataset object cannot be found, and a timeout is set, including a negative value for an unlimited wait time, the command suspends execution until the object appears in the dataset, for example from a different script thread or as result of a remote object insertion. If a wait would be executed, but the
eod/targeteod
parameter pair of the dataset indicate that no further data can be expected, the command returns an empty string (for
Tcl
) or
None
(for
Python
) instead of the object handle or reference, but does not trigger an error. Otherwise, if the object cannot be delivered immediately or after the timeout, an error results.
set eh [dataset pop $eh end]
dataset properties dhandle ?pattern? ?intersectionmode?
d.properties(?pattern=?,?intersectionmode=?)
Get a list of valid properties of the dataset proper and the dataset objects. By default, both dataset properties (prefix D_ ) as well as the properties of the objects in the dataset (prefix E_ for ensembles, X_ for reactions, T_ for tables, N_ for networks, D_ for datasets as members) and the properties of their minor objects (atoms, bonds, etc.) are listed. Property subsets may be selected by specifying a string filter pattern. In case of dataset element properties which are not present in all dataset members, the default intersect mode is union, meaning that all properties are reported for which at least a single instance in any member exists. The alternative mode intersect only lists those dataset member properties which are present at all dataset members.
This command may also be invoked as
dataset props
or
d.props()
.
dataset properties $dhandle D_*
dataset props $dhandle E_* intersect
The first example returns a list of the currently valid dataset-level properties. The second example lists ensemble properties which are present in all dataset objects.
dataset purge dhandle propertylist ?emptyonly?
d.purge(?properties=?,?emptyonly=?)
Delete property data from the dataset. The properties may be both dataset properties (prefix D_ ) or properties of the dataset members, such as ensemble or atom properties. If a property marked for deletion is not present on an object, it is silently ignored.
If an object class name, such as ens or atom , is used instead of a property name, all properties of that class set on the objects in the dataset are deleted, if they are not locked, or filtered out by the optional empty-only flag.
Besides normal property names, a few convenient alias names for common property deletion tasks of ensembles in a dataset, or the reaction ensembles of reactions in the dataset, are defined and can be used as a replacement for the property list. These include:
The optional boolean flag emptyonly restricts the deletion to those properties where all the values for a property associated with a major object (such as on all atoms in an ensemble for atom properties, or just the single ensemble property value for ensemble properties) are set to the default property value.
The return value is the original dataset handle or reference.
dataset purge $dhandle D_GIF
dataset purge [ens list] E_IDENT 1
dataset purge $dhandle stereochemistry
The first example deletes the property data D_GIF for the selected dataset if it is present. The second example deletes property E_IDENT from all ensembles in the current application if their property value is equal to the default value of E_IDENT . The third examples removes stereochemistry from all dataset ensembles.
dataset reactions dhandle ?filterset? ?filtermode? ?recursive?
d.reactions(?filters=?,?mode=?,?recursive=?)
Return a list of all the handles or references of the reactions in the dataset. Other objects (ensembles, tables. datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the output further modified by a standard filter mode.
If the optional boolean recursive argument is set, reactions of which ensembles in the dataset are a component are also listed. Furthermore, if the dataset contains datasets as elements, these are recursively traversed, and reactions in these, as well as reactions as components of ensembles in these datasets, are listed. If the output mode of the command is a handle list, items found by recursion are appended in a straight fashion, without the creation of nested lists. By default the recursion flag is off. Regardless of the flag value, reactions which are associated with rows of a table in the dataset, but are not themselves dataset members, are not output.
set xlist [dataset reactions $dhandle]
Return a list of the handles of the reactions in the dataset.
set cnt [dataset reactions $dhandle {} count 1]
returns a count of all reactions which are either directly members of the dataset, or indirectly because ensembles in the dataset are part of a reaction, or which are contained in datasets which are a themselves a member of the primary dataset.
dataset read dhandle ?datasethandle/enshandle? ?#recs|batch|all?
d.read(?target=?,?limit=?)
This command returns handles or references of duplicates of one or more objects from the current dataset iterator position (
record
attribute). Its arguments mimic those of the
molfile read
command. The iterator record attribute is automatically incremented. When the end of the dataset is reached, an empty result is returned, but no error is raised.
The return value is usually the handle or reference of the object duplicated from the dataset member at the current read position. If an optional target dataset has been specified. the object is appended to that dataset, and the return value is the target dataset handle. It is also possible to use the magic dataset handles new or #auto , which create a new receptor dataset.
If instead of a target dataset an existing target ensemble is specified, the recipient ensemble is cleared, and the read dataset object placed into its hull without changing its handle. This requires that the read object is an ensemble, and not a reaction, table, dataset or network, and that only a single item is read. It is also possible to use an empty argument to skip these options.
By default, a single object is duplicated and the iterator record attribute of the dataset incremented by one. With the optional third argument, a different number of objects can be selected for reading as a block. The special value all reads all remaining objects, and batch copies a number of objects corresponding to the batchsize dataset attribute. If there are insufficient objects in the dataset to read all requested records, only the available set is returned, and no error results.
The dataset contents are not changed by this command. All extracted items are object duplicates. In order to fetch original objects from the dataset, use the
dataset pop
command, or the various object
move
commands.
The command variant
dataset hread
provides the same functionality as this command, but additionally adds a standard set of hydrogen atoms to the duplicates.
Dataset.Ref(identifier)
Python
only method to get a dataset reference from a handle or another identifier. For datasets, other recognized identifiers are dataset references, integers encoding the numeric part of the handle string, the dataset
UUID
or name, or a table handle (which returns the dataset embedded in the table).
dataset remove dhandle ?handle?...
d.remove(?handle/ref?,...)
Remove objects from a dataset. The removal objects must be in the dataset.
If the dataset is not virtual, the command returns the dataset handle or reference.
dataset rename dhandle srcproperty dstproperty
d.rename(srcproperty=,dstproperty=)
This is a variant of the
dataset assign
command. Please refer the command description in that paragraph.
dataset request dhandle propertylist ?reload? ?modelist?
Request property data for a dataset when the dataset is not maintained locally, but a partial shadow copy of a remotely managed dataset. It is assumed to have been only partially transferred via RPC to a slave from a master controller application, for example for display purposes, but without the full data content, which resides on the master.
If the requested property data is already present on the slave, and the
reload
flag is not set, this command is equivalent to a
dataset need
command and does not invoke communication with the master. Otherwise, the master is asked to provide the information, which may be calculated on the master only after receiving the request, or even delegated by the master to another remote server for computation.
Once the requested data has been received by the slave, it is added to the property data set of the local dataset copy. The optional
modelist
parameter is the same as in the
dataset need
command. This command is used to guarantee that critical or non-computable property data is obtained from the master. Local, unsynchronized data may still be computed by the slave using standard property data access commands. It is currently not possible to send data back to the master.
This command is only available on toolkit versions which have been compiled with RPC support.
In the absence of errors, the command returns a boolean status code. If it is zero, the request failed in a non-critical way. This for example happens in case the dataset is not under control of a remote application.
if {![dataset request $dhandle A_XY]} {
dataset need $dhandle A_XY
}
is a bullet proof method of guaranteeing that correct atomic 2D display coordinates are present for the dataset structures even if the script is run in a master/slave context.
dataset rewind dhandle
d.rewind()
Reset the dataset iterator record. This is equivalent to setting the record attribute to one.
dataset scan dhandle expression/queryhandle ?mode? ?parameterdict?
d.scan(query=,?resultmode=?,?parameters=?)
Dataset.Scan(items,query=,?resultmode=?,?parameters=?)
Perform a query on the dataset or transient dataset. The syntax of the query expression is the same as that of the
molfile scan
command and explained in more detail in its section on query expressions. Essentially, this command behaves like an in-memory data file version of the
molfile scan
command. However, currently queries work on ensembles and reactions as dataset members only. Any table, network or other object which is a member of a scanned dataset is skipped. Skipped items still count as records for positioning and query result output. In the absence of a specified scan record list (order parameter), dataset scans begin at the current position of the iterator record attribute that is shared with the
dataset read/hread
commands.
The optional parameter dictionary is the same as for
molfile scan
, but not all parameters are actually used. At this time, only the
matchcallback, maxhits, maxscan, order, progresscallback, progresscallbackfrequency, sscheckcallback, startposition
and
target
parameters have an effect. If result ensembles or reactions are transferred to a remote dataset via the
target
parameter, they are not deleted from the local dataset but duplicates are created instead. This is because the original objects are members of the dataset which, just like a structure file would, should remain unchanged as result of a scan. In contrast, in file scans, the transferred ensembles and reactions were read from file and created as new objects during the scan, and sending these does not change the underlying file. In case a progress callback function is used, the dataset handle is passed as argument in place of the
molfile
handle in
molfile scan
.
The return value depends on the mode. The default mode is enslist . The following modes are supported for dataset queries:
In this mode, the command returns a list of the names of the created arrays. For each name, a global Tcl array variable or Python dictionary is created, and for each match, a Tcl array element with an element name equal to the value of the first item specification index and an element value equal to the value of the third item specification is created (or a dictionary entry with key and value for Python ). For example, the scan mode specification
{array {E_NAME name2rec} {record rec2name E_NAME}}
results in the creation of two global Tcl arrays or Python dictionaries in the current interpreter, called name2rec and rec2name . The first has array elements (for Python , dictionary keys) where the element name is the name of the matching structure (property E_NAME ), and the value the pseudo-record number (because it is the default). The second array has elements where the record number is the array element name, and the corresponding value the structure name. The return value of the scan statement is the list (tuple for Python ) “name2rec rec2name” , containing the names of the two variables created.
If array or dictionary elements for a specific key already exist, the new value is appended as a list or tuple object. The result registration procedure does not overwrite the existing content. So, for example in above case, if there are multiple records with the same structure name, the array element indexed by name would contain a list or records, not just a single record. Since the global arrays or dictionaries are persistent, data is also appended over multiple scan statements. If this is not desired, a statement like
unset -nocomplain $arrayname
should be executed before the scan is started. It is legal to use the same array or dictionary name for the registration of multiple properties. In this case, each match appends a new list element for every reported property, though these lists will not be nested.
{table {E_NAME name} {E_CAS casno} record}
sets up a table with three columns called name , casno and record . The first two columns contain property data from the matching file records, the last one the record in the file which matched.
Instead of the keyword table , an existing table handle may also be used. In that case, any existing matching table columns are automatically re-used to store result data. Additionally specified properties are added as new columns to the right of the previously existing columns. New table rows generated by matches are appended to the bottom of the table.
If requested property data is not present on the matched dataset objects, an attempt is made to compute it. If this fails, the table object in retrieval mode table contains
NULL
cells, and property retrieval as list data produces empty list elements, but no errors. For minor object properties, the property list retrieval modes produce lists of all object property values instead of a single value. In
table
mode, only the data for the first object is retrieved, which makes this mode less suitable for direct minor object property retrieval.
The following pseudo properties can be retrieved in addition to normal properties:
molfile scan
. It is always an empty string in this command.
match ss
command).These pseudo properties are identical to those available for structure file queries. However, structure file queries support a couple of additional pseudo properties which are not available for dataset queries.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset scan $dhandle {E_WEIGHT < 200} recordlist
dataset scan $dhandle “structure >= c1ccccc1” {table E_NAME E_LOPG record}
dataset scan $dhandle “structure >~ $sshnd 90” {cmpvalue E_REACTION_ROLE X_IDENT}
The first example returns the record numbers (dataset member indices plus one) of all structures in the dataset which have a molecular weight of less than 200.
The seconds example generates a table with columns for name, logP and record number. The table is filled with data from all structures which contain a phenyl ring as substructure.
The final example returns a nested list of the properties of all dataset structures which have a Tanimoto similarity of 90% or more to the structure which is represented by its handle stored in the variable
$sshnd
. In this example, the ensembles are expected to be also part of a reaction, which is possible since reaction and dataset membership are completely unrelated. Each result list element contains the actual similarity value (which is the only comparison result value with a threshold evaluated in the query, so there is no ambiguity which comparison result
cmpvalue
refers to), the role of the ensemble in the reaction (
reagent
,
product
,
catalyst
, etc.) from property
E_REACTION_ROLE
, and the reaction ID in
X_IDENT
. The scan mode is here automatically set to
propertylist
, because the mode list consists exclusively of names of properties and pseudo properties.
set is_chno [dataset scan $ehandle {formula = C0-H0-N0-O0-} count]
This command checks whether the ensemble (which is, for the duration of the command, embedded into a transient dataset) contains only elements C, H, N and O.
dataset set dhandle ?property value?...
d.set(?property,value?,...)
d.set({property:value...})
d.property = value
d[property] = value
Standard data manipulation command. It is explained in more detail in the section about setting property data.
In addition to property data, the dataset object possesses a few attributes, which can be retrieved with the
get
command (but not by its related sister subcommands like
dget
,
sqlget
, etc.). Many of them are also modifiable via
dataset set.
These attributes are:
ens move
,
dataset add
, etc.) throws an error. If the object added to a dataset is a dataset, but the dataset does not accept datasets as members, the objects contained in the source dataset are added instead.
dataset read
command. The default batch size is 10.
dataset wait
command, or the use of the dataset object as argument to a scripted computation function expecting to be able to set function result data as property values, the dataset is marked as undeletable and any destruction command will silently fail.
dataset pop
and
dataset wait
commands to determine whether they should continue to wait or exit with an empty result. The initial value of this attribute is zero.
dataset wait
command uses this threshold as default command parameter.
Additional insertion control modes are disabled (all insertions into the dataset are blocked), discardfirst (if the maximum size has been reached, delete first object in dataset to make room), discardlast (if the maximum size has been reached, delete last object in dataset to make room), discardobject (if the maximum size has been reached, delete the object to be inserted), discardalways (never attempt an actual insertion, always delete the insertion object), ignore (if insertion cannot be performed, leave the insertion object where it currently is, with preservation of current dataset membership) and unlink (silently remove the insertion object from its old dataset, if it is a member of one, but do not insert it into the target dataset if that would exceed its maximum size).
If the object cannot be inserted and is deleted (but not if it is just unlinked or ignored, and thus continuing to exist) the dataset counter is still incremented.
The final mode is discardrandom . In this mode, if the maximum size of the dataset has not yet been reached, the object is simply added. Otherwise, a random number between one and the counter attribute of the dataset is computed. If the number is larger than the maximum dataset size, the object to be inserted is deleted, as in the discardnew mode. If the random number is between one and the dataset size, the object in the dataset at the random position is deleted. After that, the new object inserted at its designated position, which is not necessarily the position of the removed object. This mode is intended to support convenient sampling of object subsets. The random procedure yields the same mathematical results as directly picking random objects from the total object pool passing through the dataset, but may be interrupted at any time yielding a random subset of the objects processed so far.
dataset dataset
command), which can be changed (see
dataset move
command and the
accept
dataset attribute). This attribute is read-only. An embedded dataset object cannot be dissociated from its owner.
::cactvs(object_scope)
is also set, the object is visible only in the
Tcl
interpreter which set the scope flag and thus claimed it. Object list commands executed in other interpreters omit this object, and attempts to decode its handle in other interpreters will fail. The most common use of this feature is the hiding of persistent chemistry objects in scripted property computation functions.
On setting,
dataset set
first clears all dataset object selections. The command
dataset append
retains it. The argument is then parsed as a list of integer object indices, and the selection flag is set for all those indices where objects can be found in the dataset. Indices outside the range between zero and the dataset size minus one or duplicate index specifications are silently ignored.
To check or set the selection status of the dataset object proper, use the selected attribute.
dataset count
command without any filters.
ens swapout
or
reaction swapout
. The size check is performed at the moment new objects are added, and these new objects are the first to be swapped. The default value for this attribute can be set in the control array element::
cactvs(dataset_swap_threshold
). Its initial value is 10000. The default value for the embedded datasets in tables is controlled separately by ::
cactvs(table_swap_threshold
), which is also initially set to 10000. If this value is set to a negative value, all dataset elements which are currently swapped out are loaded back in. If it is set to a positive value, and the number of not currently swapped out objects of the dataset is more than the new limit, excess objects are swapped beginning from the end of the dataset queue until the in-memory object count of the dataset satisfies the new constraint. If the limit is increased, but not set to a negative unlimited value, the object swap status is not modified.
dataset addthread
command). Datasets without threads return an empty list. The handles are compatible with the standard
Tcl
thread package. Remote communication listener threads (see port attribute) are independent of
Tcl
support, do not have a
Tcl
handle, and are not listed by this command.
dataset wait
command. A negative value means an infinite wait period, and zero no wait period. The default setting is minus one.dataset set $dhandle D_NAME “New lead structures”
dataset set $dhandle E_NAME “Lead (metal)”
The first line is a simple set operation for a dataset property. The second line shows how to set properties of multiple ensembles in one step. The same property value is assigned to all ensembles.
dataset set $dhandle port 10001 passphrase blockbuster
Set up a listener thread on port 10001 which accepts connections from remote interpreters which need to present the pass phrase as credential. Remote interpreters can add (
ens move
,
reaction move
,
table move
) or remove (
dataset pop
) objects to or from this dataset, as well as query the dataset object count (
dataset count
). Objects are transferred over the network connection as serialized objects to and from the remote interpreters.
dataset setparam dhandle property ?key value?...
dataset setparam dhandle property dictionary
d.setparam(property,?key,value?...)
d.setparam(property,dict)
Set or update a property computation parameter in the parameter list of a valid property. This command is described in the section about retrieving property data.
The return value is the updated property computation parameter dictionary.
dataset setparam $dhandle D_GIF comment “Top Secret”
dataset show dhandle propertylist ?filterset? ?parameterdict?
d.show(property=,?filters=?,?parameters=?)
Dataset.Show(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset show
is that the latter does not attempt computation of property data, but raises an error if the data is not present and valid. For data already present,
dataset get
and
dataset show
are equivalent.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset sort dhandle {property ?direction ?cmpflags ?cmpvalue???}...
d.sort((property,?direction,?cmpflags,?cmpvalue???),...)
Sort a dataset according to property values of the objects in the dataset. If no sort property set is specified, the default sort properties are E_NATOMS (number of atoms) and, for breaking ties, E_WEIGHT (molecular weight) and finally E_HASHISY (stereo isotope hash code).
Every sort item is interpreted as a nested list/tuple and can have from one to four elements. The first, mandatory element is the sort property, or one of the magic names
record
(or #record) or
random
(#random). The next optional element is the sort direction, specified as
up
(or
ascending
) or
down
(
descending
). The default sorting order is ascending. The final optional comparison flags parameter can be set to a combination of any of the values allowed with the
prop compare
command. The default is an empty flag set. Properties in the sort list have precedence in the order they are specified in. Object property values of comparison list entries to the right in this list are only considered if the comparison of all data values of list elements to the left results in a tie.
If a comparison value is supplied as fourth argument, the sort utilizes the comparison results of dataset object property values against this value for ranking, not the direct comparison result between the dataset object property values. This is for example useful when sorting according to a bitvector similarity value to an external structure.
The magic property name record sorts by the object index in the dataset. Sorting upwards on this property does not change the object sequence in the dataset, and sorting downwards reverses it. This pseudo property is always added as a final implicit criterion, so that the sequence order of objects tied in all explicit comparisons is preserved. The other magic property name random assigns a random value to all dataset objects and sorts on this value, yielding a random object sequence.
The command returns a list of the handles of the objects controlled by the dataset in the newly sorted order. Simultaneously, the objects are physically moved within the dataset, so the sort has a persistent effect. The same result list may later be obtained by a
dataset objects
command.
It is possible to sort transient datasets, but this makes sense only if the object list sequence returned as command result is captured and used later, because the sort effect is not persistent since there exists no permanent dataset object.
dataset sort $dhandle {E_NAME up {ignorecase lazy}]
The example sorts the dataset according to the compound name (property E_NAME , data type string) in alphabetic order, using a lazy (ignoring whitespace and punctuation) and case-insensitive comparison mode.
dataset sort $dhandle {E_NATOMS down} {E_NRINGS up}
Sort the dataset in such a way that the ensembles with the largest number of atoms, and among these those with the smallest number of rings, come first.
dataset sort $dhandle random
This command randomizes the object order in the dataset.
dataset sort $dhandle {*}$sortlist
This is the recommended construct when using a sort property list store in a
Tcl
variable as command argument. Older versions of the
dataset sort
command used a single sort argument parameter instead of a variable-size argument set.
dataset sqldget dhandle propertylist ?filterset? ?parameterdict?
d.sqldget(property=,?filters=?,?parameters=?)
Dataset.Sqldget(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The differences between
dataset get
and
dataset sqldget
are that the latter does not attempt computation of property data, but initializes the property value to the default and returns that default, if the data is not present and valid; and that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
or
Python
script processing.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset sqlget dhandle propertylist ?filterset? ?parameterdict?
d.sqlget(property=,?filters=?,?parameters=?)
Dataset.Sqlget(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The difference between
dataset get
and
dataset sqlget
is that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
or
Python
script processing.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset sqlnew dhandle propertylist ?filterset? ?parameterdict?
d.sqlnew(property=,?filters=?,?parameters=?)
Dataset.Sqlnew(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The differences between
dataset get
and
dataset sqlnew
are that the latter forces re-computation of the property data, and that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
or
Python
script processing.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset sqlshow dhandle propertylist ?filterset? ?parameterdict?
d.sqlshow(property=,?filters=?,?parameters=?)
Dataset.Sqlshow(items,property=,?filters=?,?parameters=?)
Standard data manipulation command for reading object data. It is explained in more detail in the section about retrieving property data.
For examples, see the
dataset get
command. The differences between
dataset get
and
dataset sqlshow
are that the latter does not attempt computation of property data, but raises an error if the data is not present and valid, and that the
SQL
command variant formats the data as
SQL
values rather than for
Tcl
or
Python
script processing.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset statistics dhandle property
d.statistics(property)
Get basic statistics on the property values of the objects in the dataset. The property can be a basic property or a property field, but its element data type needs to be cast-able to a simple numeric type. In addition, it must be directly attached to any of the objects which can be members of a dataset, e.g. an ensemble property, but not an atom property.
If the property data is not present on an object, an attempt is made to compute it. In case that fails, or a dataset member object is not of a type matching the property, these objects are silently skipped.
The return value is a dictionary containing the number of objects in the dataset which were used for the statistics (key n ), the sum of property values ( sum ), the property value average ( avg ) and the property data standard deviation ( stddev ). The latter three values are floating points, regardless of the property data type. In case any of these values are not computable, for example because there were an insufficient number of objects, the reported value is zero.
The command verb can be abbreviated as stats .
set d [dataset statistics $dh E_WEIGHT]puts „Avg: [dict get $d avg]"
dataset subcommands
dir(Dataset)
Lists all subcommands of the
dataset
command. Note that this command does not require a dataset handle.
dataset tables dhandle ?filterset? ?filtermode? ?recursive?
d.tables(?filters=?,?mode=?,?recursive=?)
Return a list of all the handles or references of the tables in the dataset. Other objects in the dataset (ensembles, reactions, datasets, networks) are ignored. The object list may optionally be filtered by the filter list, and the result further modified by a standard filter mode argument.
If the recursive flag is set, and the dataset contains other datasets as objects, tables in these nested datasets are also listed.
set n [dataset tables $dhandle {} count]
dataset taint dhandle propertylist/changeset ?purge?
d.taint(property=,?purge=?)
Trigger a property data tainting event which acts on the dataset data, and all objects and their data contained in the dataset.
The parameters of this command are the same as for
ens taint
and explained there.
dataset taint $dhandle A_XYZ
All property data on the dataset and the dataset members is invalidated if it directly or indirectly depends on the 3D atomic coordinates.
The command returns the original object handle or reference.
dataset threadexec ?maxthreads? ?substitutiondict? scriptbody
Execute a script on the objects in the dataset in parallel in multiple threads. The number of threads is by default the lesser of 16 or the number of objects in the dataset, but this can be configured. If there are more dataset objects than threads, threads are started in a groupwise fashion. In the function body, standard
Tcl/Tk
percent substitution is performed. The default substitutions are
%D
for the dataset handle, and
%O
for the thread-specific dataset object. Other custom substitutions can be configured in the optional substitution dictionary, in a letter /value (no percent prefix) format.
There are some limitations on what the object threads can do. They are allowed to delete their own current object, or move it outside the dataset, but not other objects in the dataset. Additional objects may be appended to the dataset (they are not subject to processing by the original command), but not inserted in random positions. Computation in the script body must reach the end of the script, or be ended by
return
or
break
statements. An error in any of the threads stops the command. All threads of a group must have finished before a new group is started.
The command returns the dataset handle if the dataset is not virtual.
Because of multi-threading issues, there is no Python version of the command.
dataset transfer dhandle propertylist ?targethandle? ?targetpropertylist?
d.transfer(properties=,?target=?,?targetproperties=?)
Copy property data from one dataset to another dataset or other major object, without going through an intermediate scripting language object representation, or alternatively dissociate property data from the dataset. If a property in the argument property list is not already valid on the source dataset, an attempt is made to compute it.
If a target object is specified, the return value is the handle or reference of the target object. The source dataset and the target object cannot be the same object.
If a target property list is given, the data from the source is stored as content of a different property on the target. For this, the data types of the properties must be compatible, and the object class of the target property that of the target object. No attempt is made to convert data of mismatched types. In case of multiple properties, the source property list and the target property list are stepped through in parallel. If there is no target property list, or it is shorter than the source list, unmatched entries are stored as original property values, and this implies that the object class of the source and target objects are the same.
If no target object is specified, or it is spelled as an empty string or
Python
None
, the visible effect of the command is the same as a simple
dataset get
, i.e. the result is the property data value or value list. The property data is then deleted from the source object. In case the data type of the deleted property was that of a major object (i.e. an ensemble, reaction, table, dataset or network), it is only unlinked from the source object, but not destroyed. This means that the object handles returned by the command can henceforth the used as independent objects. They can be deleted by a normal object deletion command, and are no longer managed by the source object.
dataset transfer $dh D_SVG_IMAGE $lh L_1DPATTERN_SVG_IMAGE
This command performs a data transfer between different object classes, with change of the property under which the content is stored.
dataset transform dhandle SMIRKSlist ?direction? ?reactionmode? ?selectionmode? ?flags? ?overlapmode? ?{?exclusionmode? excludesslist}? ?maxstructures? ?timeout? ?maxtransforms? ?niterations? ?statusvariable?
d.transform(transforms=,?direction=?,?reactionmode=?,?selectionmode=?,?flags=?, ?overlapmode=?,?excludess=?,?maxstructures=?,?timeout=?,?maxtransforms=?, ?iterations=?)
Dataset.Transform(items,transforms=,?direction=?,?reactionmode=?, ?selectionmode=?,?flags=?,?overlapmode=?,?excludess=?,?maxstructures=?, ?timeout=?,?maxtransforms=?,?iterations=?)
This command is complex, but very similar to the
ens transform
command. Please refer to that command for a full description of the command arguments.
The major difference of
dataset transform
is that the start structure set is not a single ensemble, but rather the set of all ensembles in the dataset. Any dataset items which are not ensembles are ignored. The return value is, just as with the
ens transform
command, a list of result ensembles. These do not become part of the input dataset.
dataset transform [ens get $ehandle E_KEKULESET] $trafolist bidirectional \
multistep all {preservecharges checkaro setname}
This command first expands an ensemble object into a set of Kekulé structures. The property data type of the E_KEKULESET property is a dataset, so its handle is returned, and this dataset is then submitted for further transformation, which in this case involves manipulations of bonds in aromatic systems and thus is dependent on the Kekulé structures of the input ensembles.
The dataset variant of the transform command does not allow the use of marked or unmarked atom or bond specifications in the exclusion substructure list. Normal substructures are supported, and are applied to all start structures.
The Python class method is a one-shot command. The transient dataset created from the initialization items is automatically deleted when the command finishes.
dataset unique dhandle {property ?direction? ?cmpflags?}..
d.unique((property,?direction,?cmpflags,?cmpvalue???),...)
This command removes duplicate objects from the dataset and destroys them. Object equivalence is determined by pair-wise comparison of one or more properties. If all these properties are identical for any two objects, one of them is deleted. If no properties are specified, the default is the single property E_HASHISY , the standard isotope- and stereo-aware ensemble hash code.
The command returns labels or references of the ordered list of objects remaining in the dataset after deletion. The command is closely related to the
dataset sort
command, and the same restrictions on usable sort properties apply. Internally, the command performs a sort first, in order to avoid a quadratic growth of pair-wise comparisons. This has the side effect that the object order in the dataset is not preserved. Instead, the surviving objects are listed in ascending (by default) or descending (if the corresponding optional sort direction argument is set accordingly) values of the sort properties. The interpretation of the optional comparison flags and sort direction arguments, as well as the priority of the properties, and the special considerations when working on transient datasets, are the same as for the command
dataset sort
.
molfile read $fh $dh all
dataset unique $dh
This command first reads a complete file into a dataset, and then discard duplicates, using the default isotope- and stereo-aware structure hash code.
dataset unlock dhandle propertylist/dataset/all
d.unlock(property=)
Unlock property data for the dataset object, meaning that they are again under the control of the standard data consistency manager.
The property data to unlock can be selected by providing a list of the following identifiers:
Property data locks are obtained by the
dataset lock
command.
This command does not recurse into the objects contained in the dataset.
The return value is the original dataset handle or reference. If the argument was a transient dataset (only possible for Tcl ), the result is an empty string.
dataset unpack string ?compressionlib)
Dataset.Unpack(data=,?compressionlib=?)
Generate a dataset complete with all elements it contains from a packed, base64-encoded serialized object string, as it is generated by the complementary
dataset pack
command.
The return value is the handle or reference of the new dataset. All objects in the new dataset also are assigned standard handles, which can be retrieved with the usual commands such as
dataset ens
and
dataset reactions
.
The default compression library is
zlib
. For more options, see
dataset pack
.
Note that this command does not take a dataset handle as argument, but a pack string.
dataset unpack [dataset pack $dhandle]
This example is effectively the same as a
dataset dup
operation, but of course less efficient, because the objects have to be serialized, compressed, and base64-encoded and the same sequence of operations run backward again.
dataset valid dhandle propertylist
d.valid(property/propertysequence)
Returns a list of boolean values indicating whether values for the named properties are currently set for the dataset. No attempt at computation is made. For Python , where single-item lists are syntactically not the same as a single value, the return value is a single boolean if the argument was a string or a property reference, and only a single property was decoded.
dataset valid $dhandle D_NAME
reports whether the dataset is named (has a valid D_NAME property) or not.
dataset verify dhandle property
d.verify(property)
Verify the values of the specified property on the dataset. The property data must be valid, and a dataset property. If the data can be found, it is checked against all constraints defined for the property, and, if such a function has been defined, is tested with the value verification function of the property.
If all tests are passed, the return value is boolean 1, 0 if the data could be found but fails a test, and an error condition otherwise.
dataset wait dhandle ?size|query? ?script?
d.wait(?query=?,?size=?,?function=?)
Suspend the interpreter until the number of objects in the dataset has reached a threshold, or an object which satisfies a query expression can be found. The syntax of query expressions is the same as in the
dataset scan
command. Query parsing is attempted if the argument is not a simple integer. If no explicit size or query expression is specified, or an empty string (or
None
for Python) is passed as this parameter, the command uses the value of the
highwatermark
dataset attribute as default value for an implicit size threshold condition.
Another dataset attribute which has an influence on the execution of the command is the timeout attribute. If the dataset size has not grown to the required size, or no object which satisfies the query expression was added to the dataset after waiting for the timeout number of seconds, an error is raised. By default, the maximum wait period is indefinite, which corresponds to a negative timeout value. If the timeout value is set to zero, the wait condition must be met immediately, or an error results. However, no error is raised if the
eod/targeteod
dataset parameter pair indicates that no more data can be expected to be added in the dataset. In that case, the result is an empty string, or
None
for Python.
If no script function parameter is used, the return value of the command is the number of objects the dataset holds in case of an explicit or implicit size condition, or the handle/reference of the first matching object in case of a query expression.
If the object count already exceeds the threshold, or a matching object can be found at the moment the command is executed, the command returns immediately.
In the Tcl case, and in the presence of a script body parameter, the script is executed whenever the wait condition is met. If the script is ended with a continue statement, or simply reaches the end of the code block, the wait loop is automatically restarted. If the script reports an error, or is left via a break or return statement, the loop is terminated.
For Python , instead of the script body, a function name or reference can be used. This function is called in local scope with a single argument, which is either the current dataset item count in case of a simple threshold condition, or the reference of the object matching the query expression. Within the Python functions, the normal break and continue loop control commands cannot be used to to scope limitations. Instead, the custom exceptions BreakLoop and ContinueLoop can be raised. These are automatically caught and processed in the loop body handler code.
This command is mostly useful when running multi-threaded scripts, or when the dataset has an active remote command listener on a port. Under these circumstances, new objects may arrive in the dataset without participation of the local, waiting and stopped interpreter, which can then be processed.
While a
dataset wait
command is pending, the dataset cannot be deleted. Since it is possible that other threads or port monitors further update the dataset between the time the wait condition is met and script processing commences, action scripts should be prepared to see more or less items in the dataset than there were immediately after the trigger event.
loop n 1 $nrecs {
set eh [dataset wait $dh “E_FILE(startrec) = $n”]
molfile write $fh $eh
ens delete $eh
}
This is a part of a simple write thread which writes back processed ensembles in the same order as they were read from an input file. In case there are multiple processing threads, it is likely to happen that the computation on an ensemble read from a higher input file record finishes before another with a smaller record number and thus the sequence of the ensembles to be written as delivered in the output queue becomes out of sync. By waiting for ensembles in the input record sequence the original order is preserved. More robust versions of such a script should handle the case of ensembles from a specific input record never appearing in the dataset and similar sources of disruption.
dataset weed dhandle keywords
d.weed(keywordsequence)
d.weed(?keyword?,...)
This command performs standard clean-up operations on all ensembles and reactions in the dataset. The supported operations are described in more detail in the section on the equivalent
ens weed
command.
The return value of this command is the dataset handle or reference.
dataset xlabel dhandle propertylist ?filterset? ?filterprocs?
d.xlabel(property=,?filters=?,?filterfunctions=?)
This command is rather complex and closely related to the
dataset extract
command. Its purpose is to extract handle/reference and label information for selected subsets of the dataset. The return value is a nested list. The sublists consist of the object handle or reference, the object label (if the object does not have a label, 1 is substituted), and the dataset object index. The dataset object index starts with zero.
The selection of the class of objects which are extracted is performed indirectly via the property list. For practical purposes, this list should be a single property. Its object association type determines the class of objects selected. For example, A_LABEL or A_SYMBOL returns atom labels, while B_ORDER returns bond labels and E_NAME select complete ensembles, with 1 as pseudo ensemble label.
The objects for which data is returned can further be filtered by a standard filter set, and additionally by a list of filter procedures (for Tcl , specified as procedure names) or functions (for Python , specified as function names or function references). These procedures or functions are called with the respective object handles/references and object labels as arguments. For example, a callback function used in an atom retrieval context would be called for each atom with its ensemble handle or reference and the atom label as arguments. If major objects without a label are checked, such as complete ensembles, 1 is passed as the label. The callback procedures are expected to return a boolean value. If it is false or 0, the object is not added to the returned list, and the other check procedures are no longer called.
The command currently only works on ensembles in the dataset, ignoring any reactions, tables, datasets or networks which may be present.
This command is primarily useful for the display of filtered minor object data from datasets, such as atom property values for specific types of atoms.
set dhandle [dataset create [ens create O] [ens create C=C]]
dataset xlabel $dhandle A_LABEL !hydrogen
dataset xlabel $dhandle B_ORDER doublebond
First, a dataset with two ensembles (water and ethene) is created. This dataset is then queried. The first query is for all atoms in it which are not hydrogen. The returned list is
{ens0 1 0} {ens1 1 1} {ens1 2 1}
In object ens0 , which is the first object in the dataset, atom 1 passes the filter. In object ens1 , which is the second object in the dataset, atoms with label 1 and 2 pass. The second query asks for the labels of double bonds in the dataset. The use of property B_ORDER is arbitrary - any other bond property would do as well. The return value of this command is
{ens1 1 1}
which indicates that only the bond with label 1 in object ens1 , which is the second object in the dataset, fulfills this condition.