pyiron_atomistics.atomistics.structure.structurestorage.StructureStorage#

class pyiron_atomistics.atomistics.structure.structurestorage.StructureStorage(num_atoms=1, num_structures=1)[source]#

Bases: FlattenedStorage, HasStructure

Class that can write and read lots of structures from and to hdf quickly.

This is done by storing positions, cells, etc. into large arrays instead of writing every structure into a new group. Structures are stored together with an identifier that should be unique. The class can be initialized with the number of structures and the total number of atoms in all structures, but re-allocates memory as necessary when more (or larger) structures are added than initially anticipated.

You can add structures and a human-readable name with add_structure().

>>> container = StructureStorage()
>>> container.add_structure(Atoms(...), "fcc")
>>> container.add_structure(Atoms(...), "hcp")
>>> container.add_structure(Atoms(...), "bcc")

Accessing stored structures works with get_strucure(). You can either pass the identifier you passed when adding the structure or the numeric index

>>> container.get_structure(frame=0) == container.get_structure(frame="fcc")
True

Custom arrays may also be defined on the container

>>> container.add_array("energy", shape=(), dtype=np.float64, fill=-1, per="chunk")

(chunk means structure in this case, see below and FlattenedStorage)

You can then pass arrays of the corresponding shape to add_structure()

>>> container.add_structure(Atoms(...), "grain_boundary", energy=3.14)

Saved arrays are accessed with get_array()

>>> container.get_array("energy", 3)
3.14
>>> container.get_array("energy", 0)
-1

It is also possible to use the same names in get_array() as in get_structure().

>>> container.get_array("energy", 0) == container.get_array("energy", "fcc")
True

The length of the container is the number of structures inside it.

>>> len(container)
4

Each structure corresponds to a chunk in FlattenedStorage and each atom to an element. By default the following arrays are defined for each structure:

  • identifier shape=(), dtype=str, per chunk; human readable name of the structure

  • cell shape=(3,3), dtype=np.float64, per chunk; cell shape

  • pbc shape=(3,), dtype=bool per chunk; periodic boundary conditions

  • symbols: shape=(), dtype=str, per element; chemical symbol

  • positions: shape=(3,), dtype=np.float64, per element: atomic positions

If a structure has spins/magnetic moments defined on its atoms these will be saved in a per atom array as well. In that case, however all structures in the container must either have all collinear spins or all non-collinear spins.

__init__(num_atoms=1, num_structures=1)[source]#

Create new structure container.

Parameters:
  • num_atoms (int) – total number of atoms across all structures to pre-allocate

  • num_structures (int) – number of structures to pre-allocate

Methods

__init__([num_atoms, num_structures])

Create new structure container.

add_array(name[, shape, dtype, fill, per])

Add a custom array to the container.

add_chunk(chunk_length[, identifier])

Add a new chunk to the storeage.

add_structure(structure[, identifier])

Add a new structure to the container.

animate_structures([spacefill, show_cell, ...])

Animate a series of atomic structures.

collect_structures([filter_function])

Collects a copy of all structures in a compact StructureStorage.

copy()

Return a deep copy of the storage.

del_array(name[, ignore_missing])

Remove an array.

extend(other)

Add chunks from other to this storage.

find_chunk(identifier)

Return integer index for given identifier.

from_dict(obj_dict[, version])

Populate the object from the serialized object.

from_hdf(hdf[, group_name])

Read object to HDF.

from_hdf_args(hdf)

Read arguments for instance creation from HDF5 file.

get_array(name[, frame])

Fetch array for given structure.

get_array_filled(name)

Return elements of array name in all chunks.

get_array_ragged(name)

Return elements of array name in all chunks.

get_elements()

Return a list of chemical elements present in the storage.

get_structure([frame, wrap_atoms, ...])

Retrieve structure from object.

has_array(name)

Checks whether an array of the given name exists and returns meta data given to add_array().

instantiate(obj_dict[, version])

Create a blank instance of this class.

iter_structures([wrap_atoms])

Iterate over all structures in this object.

join(store[, lsuffix, rsuffix])

Merge given storage into this one.

list_arrays([only_user])

Return a list of names of arrays inside the storage.

lock([method])

Set read_only.

rewrite_hdf(hdf[, group_name])

Update the HDF representation.

sample(selector)

Create a new storage with chunks selected by given function.

set_array(name, frame, value)

Add array for given structure.

split(array_names)

Return a new storage with only the selected arrays present.

to_dict()

Reduce the object to a dictionary.

to_hdf(hdf[, group_name])

Write object to HDF.

to_pandas([explode, include_index])

Convert arrays to pandas dataframe.

transform_structures(modify)

Return a modified object by applying a function to each object lazily.

unlocked()

Unlock the object temporarily.

Attributes

cell

meta private:

identifier

meta private:

length

meta private:

number_of_structures

maximum iteration_step + 1 that can be passed to get_structure().

pbc

meta private:

plot

Accessor for StructurePlots instance using these structures.

positions

meta private:

read_only

False if the object can currently be written to

start_index

meta private:

symbols

meta private:

add_array(name, shape=(), dtype=<class 'numpy.float64'>, fill=None, per='element')#

Add a custom array to the container.

When adding an array after some chunks have been added, specifying fill will be used as a default value for the value of the array for those chunks.

Adding an array with the same name twice is ignored, if dtype and shape match, otherwise raises an exception.

>>> store = FlattenedStorage()
>>> store.add_chunk(1, "foo")
>>> store.add_array("energy", shape=(), dtype=np.float64, fill=42, per="chunk")
>>> store.get_array("energy", 0)
42.0
Parameters:
  • name (str) – name of the new array

  • shape (tuple of int) – shape of the new array per element or chunk; scalars can pass ()

  • dtype (type) – data type of the new array, string arrays can pass ‘U$n’ where $n is the length of the string

  • fill (object) – populate the new array with this value for existing chunk, if given; default None

  • per (str) – either “element” or “chunk”; denotes whether the new array should exist for every element in a chunk or only once for every chunk; case-insensitive

Raises:
  • ValueError – if wrong value for per is given

  • ValueError – if array with same name but different parameters exists already

add_chunk(chunk_length, identifier=None, **arrays)#

Add a new chunk to the storeage.

Additional keyword arguments given specify arrays to store for the chunk. If an array with the given keyword name does not exist yet, it will be added to the container.

>>> container = FlattenedStorage()
>>> container.add_chunk(2, identifier="A", energy=3.14)
>>> container.get_array("energy", 0)
3.14

If the first axis of the extra array matches the length of the chunk, it will be added as an per element array, otherwise as an per chunk array.

>>> container.add_chunk(2, identifier="B", forces=2 * [[0,0,0]])
>>> len(container.get_array("forces", 1)) == 2
True

Reshaping the array to have the first axis be length 1 forces the array to be set as per chunk array. That axis will then be stripped.

>>> container.add_chunk(2, identifier="C", pressure=np.eye(3)[np.newaxis, :, :])
>>> container.get_array("pressure", 2).shape
(3, 3)

Attention

Edge-case!

This will not work when the chunk length is also 1 and the array does not exist yet! In this case the array will be assumed to be per element and there is no way around explicitly calling add_array().

Parameters:
  • chunk_length (int) – length of the new chunk

  • identifier (str, optional) – human-readable name for the chunk, if None use current chunk index as string

  • **kwargs – additional arrays to store for the chunk

add_structure(structure, identifier=None, **arrays)[source]#

Add a new structure to the container.

Additional keyword arguments given specify additional arrays to store for the structure. If an array with the given keyword name does not exist yet, it will be added to the container.

>>> container = StructureStorage()
>>> container.add_structure(Atoms(...), identifier="A", energy=3.14)
>>> container.get_array("energy", 0)
3.14

If the first axis of the extra array matches the length of the given structure, it will be added as an per atom array, otherwise as an per structure array.

>>> structure = Atoms(...)
>>> container.add_structure(structure, identifier="B", forces=len(structure) * [[0,0,0]])
>>> len(container.get_array("forces", 1)) == len(structure)
True

Reshaping the array to have the first axis be length 1 forces the array to be set as per structure array. That axis will then be stripped.

>>> container.add_structure(Atoms(...), identifier="C", pressure=np.eye(3)[np.newaxis, :, :])
>>> container.get_array("pressure", 2).shape
(3, 3)
Parameters:
  • structure (Atoms) – structure to add

  • identifier (str, optional) – human-readable name for the structure, if None use current structre index as string

  • **kwargs – additional arrays to store for structure

animate_structures(spacefill: bool = True, show_cell: bool = True, center_of_mass: bool = False, particle_size: float = 0.5, camera: str = 'orthographic')#

Animate a series of atomic structures.

Parameters:
  • spacefill (bool) – If True, then atoms are visualized in spacefill stype

  • show_cell (bool) – True if the cell boundaries of the structure is to be shown

  • particle_size (float) – Scaling factor for the spheres representing the atoms. (The radius is determined by the atomic number)

  • center_of_mass (bool) – False (default) if the specified positions are w.r.t. the origin

  • camera (str) – camera perspective, choose from “orthographic” or “perspective”

Returns:

nglview IPython widget

Return type:

animation

collect_structures(filter_function=None) StructureStorage#

Collects a copy of all structures in a compact StructureStorage.

This can be used to force lazily applied modifications with transform_structures() or simply to obtain a known object type from a generic HasStructure object.

Parameters:

filter_function (function) – include structure only if this function returns True for it

Returns:

a copy of all (filtered) structures

Return type:

StructureStorage

copy()#

Return a deep copy of the storage.

Returns:

copy of self

Return type:

FlattenedStorage

del_array(name: str, ignore_missing: bool = False)#

Remove an array.

Works with both per chunk and per element arrays.

Parameters:
  • name (str) – name of the array

  • ignore_missing (bool) – if given do not raise an error if no array of the given name exists

Raises:

KeyError – if no array with given name exists and ignore_missing is not given

extend(other: FlattenedStorage)#

Add chunks from other to this storage.

Afterwards the number of chunks and elements are the sum of the respective previous values.

If other defines new arrays or doesn’t define some of the arrays they are padded by the fill values.

Parameters:

other (FlattenedStorage) – other storage to add

Raises:

ValueError – if fill values between both storages are not compatible

Returns:

return this storage

Return type:

FlattenedStorage

find_chunk(identifier)#

Return integer index for given identifier.

Parameters:

identifier (str) – name of chunk previously passed to add_chunk()

Returns:

integer index for chunk

Return type:

int

Raises:

KeyError – if identifier is not found in storage

from_dict(obj_dict: dict, version: str = None)#

Populate the object from the serialized object.

Parameters:
  • obj_dict (dict) – data previously returned from to_dict()

  • version (str) – version tag written together with the data

from_hdf(hdf: ProjectHDFio, group_name: str = None)#

Read object to HDF.

If group_name is given descend into subgroup in hdf first.

Parameters:
  • hdf (ProjectHDFio) – HDF group to read from

  • group_name (str, optional) – name of subgroup

classmethod from_hdf_args(hdf: ProjectHDFio) dict#

Read arguments for instance creation from HDF5 file.

Parameters:

hdf (ProjectHDFio) – HDF5 group object

Returns:

arguments that can be **kwarg-passed to cls().

Return type:

dict

get_array(name, frame=None)#

Fetch array for given structure.

Works for per atom and per arrays.

Parameters:
  • name (str) – name of the array to fetch

  • frame (int, str, optional) – selects structure to fetch, as in get_structure(), if not given return a flat array of all values for either all chunks or elements

Returns:

requested array

Return type:

numpy.ndarray

Raises:

KeyError – if array with name does not exists

get_array_filled(name: str) ndarray#

Return elements of array name in all chunks. Arrays are padded to be all of the same length.

The padding value depends on the datatpye of the array or can be configured via the fill parameter of add_array().

If name specifies a per chunk array, there’s nothing to pad and this method is equivalent to get_array().

Parameters:

name (str) – name of array to fetch

Returns:

padded arrray of all elements in all chunks

Return type:

numpy.ndarray

get_array_ragged(name: str) ndarray#

Return elements of array name in all chunks. Values are returned in a ragged array of dtype=object.

If name specifies a per chunk array, there’s nothing to pad and this method is equivalent to get_array().

Parameters:

name (str) – name of array to fetch

Returns:

ragged arrray of all elements in all chunks

Return type:

numpy.ndarray, dtype=object

get_elements() List[str][source]#

Return a list of chemical elements present in the storage.

Returns:

list of unique elements as strings of chemical symbols

Return type:

list

get_structure(frame=-1, wrap_atoms=True, iteration_step=None)#

Retrieve structure from object. The number of available structures depends on the job and what kind of calculation has been run on it, see number_of_structures.

Parameters:

frame (int, object) – index of the structure requested, if negative count from the back; if

:param _translate_frame() is overridden: :param frame will pass through it: :param iteration_step: deprecated alias for frame :type iteration_step: int :param wrap_atoms: True if the atoms are to be wrapped back into the unit cell :type wrap_atoms: bool

Returns:

the requested structure

Return type:

pyiron_atomistics.atomistics.structure.atoms.Atoms

Raises:

IndexError – if not -number_of_structures <= iteration_step < number_of_structures

has_array(name)#

Checks whether an array of the given name exists and returns meta data given to add_array().

>>> container.has_array("energy")
{'shape': (), 'dtype': np.float64, 'per': 'chunk'}
>>> container.has_array("fnorble")
None
Parameters:

name (str) – name of the array to check

Returns:

if array does not exist dict: if array exists, keys corresponds to the shape, dtype and per arguments of add_array()

Return type:

None

classmethod instantiate(obj_dict: dict, version: str = None) Self#

Create a blank instance of this class.

This can be used when some values are already necessary for the objects __init__.

Parameters:
  • obj_dict (dict) – data previously returned from to_dict()

  • version (str) – version tag written together with the data

Returns:

a blank instance of the object that is sufficiently initialized to call _from_dict() on it

Return type:

object

iter_structures(wrap_atoms=True)#

Iterate over all structures in this object.

Parameters:

wrap_atoms (bool) – True if the atoms are to be wrapped back into the unit cell; passed to get_structure()

Yields:

pyiron_atomistics.atomistitcs.structure.atoms.Atoms – every structure attached to the object

join(store: FlattenedStorage, lsuffix: str = '', rsuffix: str = '') FlattenedStorage#

Merge given storage into this one.

self and store may not share any arrays. Arrays defined on stores are copied and then added to self.

Parameters:
  • store (FlattenedStorage) – storage to join

  • lsuffix (str, optional) – if either are given rename all arrays by appending the suffices to the array name; lsuffix for arrays in this storage, rsuffix for arrays in the added storage; in this case arrays are no longer available under the old name

  • rsuffix (str, optional) – if either are given rename all arrays by appending the suffices to the array name; lsuffix for arrays in this storage, rsuffix for arrays in the added storage; in this case arrays are no longer available under the old name

Returns:

self

Return type:

FlattenedStorage

Raises:
  • ValueError – if the two stores do not have the same number of chunks

  • ValueError – if the two stores do not have equal chunk lengths

  • ValueError – if lsuffix and rsuffix are equal and different from “”

  • ValueError – if the stores share array names but lsuffix and rsuffix are not given

list_arrays(only_user=False) List[str]#

Return a list of names of arrays inside the storage.

Parameters:

only_user (bool) – If True include only array names added by the

:param user via add_array() and the identifier array.:

Returns:

array names

Return type:

list of str

lock(method: Literal['error', 'warning'] | None = None)#

Set read_only.

Objects may be safely locked multiple times without further effect.

Parameters:

method (str, either "error" or "warning") – if “error” raise an Locked exception if modification is attempted; if “warning” raise a LockedWarning warning; default is “error” or the value passed to the constructor.

Raises:

ValueError – if method is not an allowed value

property number_of_structures#

maximum iteration_step + 1 that can be passed to get_structure().

Type:

int

property plot#

Accessor for StructurePlots instance using these structures.

property read_only: bool#

False if the object can currently be written to

Setting this value will trigger _on_lock() and _on_unlock() if it changes.

Type:

bool

rewrite_hdf(hdf: ProjectHDFio, group_name: str = None)#

Update the HDF representation.

If an object is read from an older layout, this will remove the old data and rewrite it in the newest layout.

Parameters:
  • hdf (ProjectHDFio) – HDF group to read/write

  • group_name (str, optional) – name of subgroup

sample(selector: Callable[[FlattenedStorage, int], bool]) FlattenedStorage#

Create a new storage with chunks selected by given function.

If called on a subclass this correctly returns an instance of that subclass instead.

Parameters:

select (callable) – function that takes this storage as the first argument and the chunk index to sample as the second argument; if it returns True it will be part of the new storage.

Returns:

storage with the selected chunks

Return type:

FlattenedStorage or subclass

set_array(name, frame, value)[source]#

Add array for given structure.

Works for per chunk and per element arrays.

Parameters:
  • name (str) – name of array to set

  • frame (int, str) – selects structure to set, as in get_strucure()

  • value – value (for per chunk) or array of values (for per element); type and shape as per hasarray().

Raises:

KeyError – if array with name does not exists

split(array_names: Iterable[str]) FlattenedStorage#

Return a new storage with only the selected arrays present.

Arrays are deep-copied from self.

Parameters:

array_names (list of str) – names of the arrays to present in new storage

Returns:

storage with split arrays

Return type:

FlattenedStorage

to_dict() dict#

Reduce the object to a dictionary.

Returns:

serialized state of this object

Return type:

dict

to_hdf(hdf: ProjectHDFio, group_name: str = None)#

Write object to HDF.

If group_name is given create a subgroup in hdf first.

Parameters:
  • hdf (ProjectHDFio) – HDF group to write to

  • group_name (str, optional) – name of subgroup

to_pandas(explode=False, include_index=False) DataFrame#

Convert arrays to pandas dataframe.

Parameters:

explode (bool) – If False values of per element arrays are stored in the dataframe as arrays, otherwise each row in the dataframe corresponds to an element in the original storage.

Returns:

table of array values

Return type:

pandas.DataFrame

transform_structures(modify) TransformStructure#

Return a modified object by applying a function to each object lazily.

Parameters:

modify (function) – applied to each structure, has to return the modified structure

Returns:

a container with the modified structures

Return type:

TransformStructure

unlocked() _UnlockContext#

Unlock the object temporarily.

Context manager returns this object again and relocks it after the with statement finished.

Note

lock() vs. unlocked()

There is a small asymmetry between these two methods. lock() can only be done once (meaningfully), while unlocked() is a context manager and can be called multiple times.