pinder.core.structure package#
Submodules#
pinder.core.structure.atoms module#
- pinder.core.structure.atoms.atom_array_from_pdb_file(structure: Path | AtomArray, backend: str = 'fastpdb', extra_fields: list[str] | None = None) AtomArray | AtomArrayStack [source][source]#
- pinder.core.structure.atoms.backbone_mask(atoms: AtomArray | AtomArrayStack, backbone_definition: BackboneDefinition) ndarray[Any, dtype[bool_]] [source][source]#
- pinder.core.structure.atoms.mask_from_res_list(atoms: AtomArray | AtomArrayStack, res_list: list[int]) ndarray[Any, dtype[bool_]] [source][source]#
- pinder.core.structure.atoms.apply_mask(atoms: AtomArray | AtomArrayStack, mask: ndarray[Any, dtype[bool_]]) AtomArray | AtomArrayStack [source][source]#
Apply a boolean mask to an AtomArray or AtomArrayStack to filter atoms.
- Parameters:
- atoms(AtomArray | AtomArrayStack)
The atoms to be filtered.
- maskNDArray[np.bool_]
The boolean mask that specifies which atoms to keep.
- Returns:
- AtomArray | AtomArrayStack:
The filtered atoms.
- pinder.core.structure.atoms.filter_atoms(atoms: AtomArray | AtomArrayStack, calpha_only: bool = False, backbone_only: bool = False, heavy_only: bool = True, backbone_definition: BackboneDefinition = 'dockq') AtomArray | AtomArrayStack [source][source]#
- pinder.core.structure.atoms.get_backbone_atom_masks(native: AtomArray, decoy_stack: AtomArrayStack, backbone_only: bool, calpha_only: bool, backbone_definition: BackboneDefinition = 'dockq') tuple[ndarray[Any, dtype[bool_]], ndarray[Any, dtype[bool_]]] [source][source]#
- pinder.core.structure.atoms.assign_receptor_ligand(arr: AtomArray, chains: list[str]) tuple[list[str], list[str]] [source][source]#
- pinder.core.structure.atoms.renumber_res_ids(arr: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack [source][source]#
- pinder.core.structure.atoms.standardize_atom_array(arr: AtomArray | AtomArrayStack, standardize: bool = True) AtomArray | AtomArrayStack [source][source]#
- pinder.core.structure.atoms.stack_filter_intersect(arr_list: list[AtomArray], remove_elements: bool = False, standardize_order: bool = False) tuple[AtomArrayStack, bool] [source][source]#
- pinder.core.structure.atoms.get_seq_alignments(ref_seq: str, subject_seq: str, gap_penalty: tuple[float, float] = (-10.0, -1.0)) list[Alignment] [source][source]#
Generate optimal global alignments between two sequences using BLOSUM62 matrix.
- Parameters:
- ref_seq(str)
The reference sequence.
- subject_seq(str)
The subject sequence.
- gap_penalty(tuple[float, float], optional)
A tuple consisting of the gap open penalty and the gap extension penalty.
- Returns:
- list[Alignment]:
A list of Alignment objects that represent the optimal global alignment(s).
- Note:
The function uses the BLOSUM62 matrix by default for alignment scoring.
- pinder.core.structure.atoms.get_seq_identity(ref_seq: str | None = None, subject_seq: str | None = None, alignments: list[Alignment] | None = None, gap_penalty: tuple[float, float] = (-10.0, -1.0)) float [source][source]#
Align an arbitrary sequence with the reference sequence Return sequence identity between the two.
- pinder.core.structure.atoms.calc_num_mismatches(alignments: list[Alignment]) tuple[int, int] [source][source]#
Calculate number of mismatches in sequence alignment
- Parameters:
- alignmentslist[Alignment]
sequence alignment(s)
- Returns:
- tuple[int, int]:
A tuple containing number of mismatches and matches in sequence alignment, respectively.
- pinder.core.structure.atoms.align_sequences(ref_seq: str, subject_seq: str, ref_numbering: list[int] | None = None, subject_numbering: list[int] | None = None) tuple[str, str, list[int], list[int]] [source][source]#
Aligns two sequences and returns a tuple containing the aligned sequences and their respective numbering.
- Parameters:
- ref_seq(str)
The reference sequence to align to.
- subject_seq(str)
The subject sequence to be aligned.
- ref_numbering(list[int], optional)
List of residue numbers for the reference sequence.
- subject_numbering(list[int], optional)
List of residue numbers for the subject sequence.
- Returns:
- tuple: A tuple containing:
Aligned reference sequence (str)
Aligned subject sequence (str)
Numbering of the aligned reference sequence (list[int])
Numbering of the aligned subject sequence (list[int])
- Raises:
- ValueError if the sequences cannot be aligned.
- pinder.core.structure.atoms.get_seq_aligned_structures(ref: Path | AtomArray | AtomArrayStack, subject: Path | AtomArray | AtomArrayStack, pdb_engine: str = 'fastpdb') tuple[AtomArray | AtomArrayStack, AtomArray | AtomArrayStack] [source][source]#
Computes per-chain sequence alignments between reference and subject structures, and creates a new set of structures where the residue numbers in the subject structure are renumbered to match those in the reference structure for the aligned sequence segments for each chain.
This function processes each chain separately, aligns their sequences, and then constructs the multi-chain structure using the chain-aligned sequences.
- Parameters:
- refPath | _AtomArrayOrStack
The file path or structure array of the reference structure.
- subjectPath | _AtomArrayOrStack
The file path or structure array of the subject structure to align with the reference.
- pdb_enginestr, optional
The backend engine to use for PDB file processing, defaults to “fastpdb”.
- Returns:
- tuple
A tuple containing: - Reference structure with aligned sequence and residue numbering (_AtomArrayOrStack). - Subject structure with aligned sequence and residue numbering (_AtomArrayOrStack).
- Raises:
- ValueError
If the reference and subject structures do not have the same number of chains or if the chains cannot be matched properly.
Examples
>>> from pinder.core import PinderSystem >>> ps = PinderSystem("8i2f__A1_O34841--8i2f__B1_P54421") >>> ref_structure = ps.holo_receptor.atom_array.copy() >>> subject_structure = ps.apo_receptor.atom_array.copy() >>> ref_aln, subj_aln = get_seq_aligned_structures(ref_structure, subject_structure) >>> print((ref_aln.res_id[0], subj_aln.res_id[0])) (8, 8)
- pinder.core.structure.atoms.get_per_chain_seq_alignments(ref: Path | AtomArray | AtomArrayStack, subject: Path | AtomArray | AtomArrayStack, pdb_engine: str = 'fastpdb') dict[str, dict[int, int]] [source][source]#
Computes per-chain sequence alignments between reference and subject structures, and provides a mapping of residue numbers from the subject to the reference for each chain.
This function processes each chain separately, aligns their sequences, and creates a mapping of residue numbering from the subject structure to the reference structure based on sequence alignment. It is designed to work with multi-chain structures, aligning each chain independently.
- Parameters:
- refPath | _AtomArrayOrStack
The file path or structure array of the reference structure.
- subjectPath | _AtomArrayOrStack
The file path or structure array of the subject structure to align with the reference.
- pdb_enginestr, optional
The backend engine to use for PDB file processing, defaults to “fastpdb”.
- Returns:
- dict[str, dict[int, int]]
A dictionary where keys are chain identifiers from the subject structure and values are dictionaries mapping residue numbers in the subject structure to those in the reference structure for the aligned sequence segments.
- Raises:
- ValueError
If the reference and subject structures do not have the same number of chains or if the chains cannot be matched properly.
Examples
>>> from pinder.core import PinderSystem >>> ps = PinderSystem("8i2f__A1_O34841--8i2f__B1_P54421") >>> ref_structure = ps.holo_receptor.atom_array.copy() >>> subject_structure = ps.apo_receptor.atom_array.copy() >>> mappings = get_per_chain_seq_alignments(ref_structure, subject_structure) >>> print(mappings['R']) {7: 8, 8: 9, 9: 10, 10: 11, ...}
- pinder.core.structure.atoms.invert_chain_seq_map(seq_map: dict[str, dict[int, int]] | list[dict[str, dict[int, int]]]) dict[str, dict[int, int]] | list[dict[str, dict[int, int]]] [source][source]#
- pinder.core.structure.atoms.get_buried_sasa(a: AtomArray, b: AtomArray, point_number: int = 200, ignore_ions: bool = True) int [source][source]#
Calculate the rounded dSASA value for a pair of interacting substructures (chains).
- Parameters:
- astruc.AtomArray
The first substructure (chain).
- bstruc.AtomArray
The second substructure (chain).
- point_numberint, optional
The number of points per atom, by default 200. This parameter affects the speed and the accuracy of the calculation.
- Returns:
- int
The rounded dSASA value.
Examples
>>> atom1a = struc.atoms.Atom([0,0,0], chain_id="A") >>> atom2a = struc.atoms.Atom([1,1,1], chain_id="A") >>> atom1b = struc.atoms.Atom([2,2,2], chain_id="B") >>> atom2b = struc.atoms.Atom([3,3,3], chain_id="B") >>> a = struc.atoms.array([atom1a, atom2a]) >>> b = struc.atoms.array([atom1b, atom2b]) >>> get_buried_sasa(a, b, ignore_ions=False) 0
- pinder.core.structure.atoms.get_resolved_resi_from_atom_array(model: AtomArray | AtomArrayStack) dict[str, list[int]] [source][source]#
Returns a list of resolved resi (only checking the CA atom) in the PDB file.
- Parameters:
- model_AtomArrayOrStack
AtomArray or Stack
- Returns:
- List[int]
A list of resolved resi in the PDB file.
- pinder.core.structure.atoms.get_resolved_resi(pdb_path: str) dict[str, list[int]] [source][source]#
Returns a list of resolved resi (only checking the CA atom) in the PDB file.
- Parameters:
- pdb_pathstr
The path to the PDB file.
- Returns:
- List[int]
A list of resolved resi in the PDB file.
- pinder.core.structure.atoms.write_pdb(arr: AtomArray, filepath: Path) None [source][source]#
Write AtomArray to a PDB file.
- Parameters:
- arrAtomArray
AtomArray to save to PDB file.
- filepathPath
Path to outut PDB file.
- Returns:
- None
- pinder.core.structure.atoms.rename_chains(pdb_path: str, new_chain: str | dict[str, str]) None [source][source]#
Rename chains in a PDB file.
- Parameters:
- pdb_pathstr
The path to the PDB file whose chains are to be renamed.
- new_chainUnion[str, dict]
The new name(s) for the chain(s). If a string is provided, all chains are renamed to this string. If a dictionary is provided, chains are renamed according to the dictionary. If the key is an integer, the alphabetically sorted chain at that index is renamed to the value. If the key is a string, the chain with that name is renamed to the value.
- Returns:
- None
- pinder.core.structure.atoms.cif_to_pdb(cif_path: str, pdb_path: str, chains: dict[str, str] | None = None) None [source][source]#
Convert CIF file to PDB file
- Parameters:
- cif_pathstr
The path to the CIF file to be converted.
- pdb_pathstr
The path where the converted PDB file will be saved.
- chainsOptional[dict], default is None
If provided, only the chains specified in the dictionary will be kept and they will be renamed according to the dictionary. Note: The new chain names must be single characters to be PDB compatible.
- pinder.core.structure.atoms.normalize_orientation(model: AtomArray) AtomArray [source][source]#
Align the structure along principal axes.
The structure is transformed such that it is projected on the xy plane, minimizing variance along the z-axis and translating the centroid to the origin.
- Parameters:
- modelAtomArray
The AtomArray object to apply the normalization transformation to.
- Returns:
- AtomArray
The same atoms with transformed coordinates.
pinder.core.structure.contacts module#
- class pinder.core.structure.contacts.ContactPairs(residue_contacts, atom_contacts)[source][source]#
Bases:
NamedTuple
- residue_contacts: Set[Tuple[str, str, int, int]]#
Alias for field number 0
- atom_contacts: Set[Tuple[str, str, int, int]]#
Alias for field number 1
- pinder.core.structure.contacts.get_atoms_within_coordinate(atom_array: AtomArray | ndarray[Any, dtype[float64]], coords: ndarray[Any, dtype[float64]], radius: float, cell_size: float, as_mask: bool = False) ndarray[Any, dtype[int64]] [source][source]#
Find atom indices (contact matrix) within distance of coords.
- Parameters:
- atom_arrayAtomArray | NDArray[np.double]
The array of atoms representing the protein structure.
- coordsNDArray[np.double]
The coordinates to find atoms within.
- radiusfloat
The radius within which to consider atoms as being in contact.
- cell_sizefloat
The size of the cell for the cell list.
- as_maskbool, optional
Whether to return the contacts as a mask, by default False.
- Returns:
- NDArray[np.int_]
The contact indices.
- pinder.core.structure.contacts.get_stack_contacts(receptor: AtomArray, ligand_poses: AtomArrayStack, threshold: float = 4.0, cell_size: float | None = None, as_mask: bool = False) ndarray[Any, dtype[int64]] [source][source]#
This function concatenates all poses in the stack into a long list of atoms. It then creates a cell list of the atom array for efficient measurement of adjacency.
- Parameters:
- receptorAtomArray
The receptor atoms.
- ligand_posesAtomArrayStack
The ligand poses.
- thresholdfloat, optional
The threshold for contact distance, by default 4.0.
- cell_sizefloat or None, optional
The cell size for the cell list, by default None. If None, the threshold is used as the cell size.
- as_maskbool, optional
Whether to return the contacts as a mask, by default False.
- Returns:
- NDArray[np.int_]
The contact indices.
- pinder.core.structure.contacts.get_atom_neighbors(atom_array: AtomArray, query_array: AtomArray, radius: float, cell_size: float | None = None, as_mask: bool = False) AtomArray [source][source]#
Find atoms within a specified distance of coordinates.
- Parameters:
- atom_arrayAtomArray
The array of atoms.
- query_arrayAtomArray
The array of query atoms.
- radiusfloat
The radius within which to find neighboring atoms.
- cell_sizefloat or None, optional
The cell size for the cell list, by default None. If None, the radius is used as the cell size.
- as_maskbool, optional
Whether to return the contacts as a mask, by default False.
- Returns:
- AtomArray
The array of neighboring atoms.
- pinder.core.structure.contacts.pairwise_contacts(pdb_file: Path | AtomArray | AtomArrayStack, chain1: str | list[str], chain2: str | list[str], radius: float = 5.0, cell_size: float | None = None, calpha_only: bool = False, backbone_only: bool = False, heavy_only: bool = True, atom_and_residue_level: bool = False) Tuple[Set[Tuple[str, str, int, int]], Set[Tuple[str, str, int, int]]] | Set[Tuple[str, str, int, int]] | Tuple[List[Set[Tuple[str, str, int, int]]], List[Set[Tuple[str, str, int, int]]]] | List[Set[Tuple[str, str, int, int]]] [source][source]#
Calculate pairwise contacts between two chains.
- Parameters:
- pdb_filePath | AtomArray | AtomArrayStack
The pdb file or atom array.
- chain1str | list[str]
The first chain or list of chains.
- chain2str | list[str]
The second chain or list of chains.
- radiusfloat, optional
The radius for contact distance, by default 5.0.
- cell_sizefloat or None, optional
The cell size for the cell list, by default None. If None, the radius is used as the cell size.
- calpha_onlybool, optional
Whether to consider only alpha carbons, by default False.
- backbone_onlybool, optional
Whether to consider only backbone atoms, by default False.
- heavy_onlybool, optional
Whether to consider only heavy atoms, by default True.
- atom_and_residue_levelbool, optional
Whether to return atomic and residue-level contact pairs. Default is False (only residue-level).
- Returns:
- _AtomResContacts | _StackAtomResContacts
The contact pairs. If atom_and_residue_level, returns a tuple of residue contacts and atom contacts, respectively. If the input structure contains multiple models, the returned value is a list of contact pairs in the same order that the input stack was given.
Examples
>>> pairwise_contacts(pdb_file='1abc.pdb', chain1='A', chain2='B')
- pinder.core.structure.contacts.pairwise_chain_contacts(atom_array: AtomArray, radius: float = 4.5, cell_size: float | None = None) DataFrame | None [source][source]#
Calculate pairwise contacts between chains in a protein structure.
- Parameters:
- atom_arrayAtomArray
The array of atoms representing the protein structure.
- radiusfloat, optional
The radius within which to consider atoms as being in contact, by default 4.5.
- cell_sizefloat | None, optional
The size of the cell used for the contact calculation. If None, it is set to the value of radius.
- Returns:
- pd.DataFrame | None
A DataFrame containing the chain pairs and their contacts, or None if no contacts are found.
Examples
>>> from pinder.core.structure.atoms import atom_array_from_pdb_file >>> atom_array = atom_array_from_pdb_file('1abc.pdb') >>> pairwise_chain_contacts(atom_array)
- pinder.core.structure.contacts.pairwise_clashes(pdb_file: Path | AtomArray, chain1: str | None = None, chain2: str | None = None, radius: float = 1.2, cell_size: float | None = None, calpha_only: bool = False, backbone_only: bool = False, heavy_only: bool = False) dict[str, int | float] [source][source]#
- pinder.core.structure.contacts.label_structure_gaps(atom_array: AtomArray) DataFrame | None [source][source]#
Find gaps in residue numbering between C-alpha atoms.
- Parameters:
- atom_arrayAtomArray
The array of atoms for which to find gaps.
- Returns:
- chainslist
A sorted list of unique chain IDs from the atom array.
pinder.core.structure.models module#
- class pinder.core.structure.models.Base[source][source]#
Bases:
BaseModel
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_enum_values': True, 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_fields: ClassVar[Dict[str, FieldInfo]] = {}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- class pinder.core.structure.models.ChainConfig(*, decoy_receptor: List[str] = ['R'], decoy_ligand: List[str] = ['L'], native_receptor: List[str] = ['R'], native_ligand: List[str] = ['L'])[source][source]#
Bases:
Base
Preparation configuration settings.
Default configuration for preparation of normalized monomers, ready for use in PPI generation tasks.
- decoy_receptor: List[str]#
- decoy_ligand: List[str]#
- native_receptor: List[str]#
- native_ligand: List[str]#
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_enum_values': True, 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'decoy_ligand': FieldInfo(annotation=List[str], required=False, default=['L']), 'decoy_receptor': FieldInfo(annotation=List[str], required=False, default=['R']), 'native_ligand': FieldInfo(annotation=List[str], required=False, default=['L']), 'native_receptor': FieldInfo(annotation=List[str], required=False, default=['R'])}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- class pinder.core.structure.models.BackboneDefinition(value)[source][source]#
Bases:
str
,Enum
An enumeration.
- biotite = 'biotite'#
- dockq = 'dockq'#
pinder.core.structure.superimpose module#
- pinder.core.structure.superimpose.superimpose_chain(fixed: AtomArray | AtomArrayStack, mobile: AtomArray | AtomArrayStack, substitution_matrix: str | SubstitutionMatrix = 'BLOSUM62', gap_penalty: tuple[int, int] = (-10, -1), min_anchors: int = 3, outlier_threshold: float = 1.5, max_iterations: int = 10) tuple[AtomArray | AtomArrayStack, AffineTransformation, ndarray[Any, dtype[int64]], ndarray[Any, dtype[int64]]] [source][source]#
Superimpose one protein chain onto another one, considering sequence differences and conformational outliers.
The method finds corresponding residues by sequence alignment and selects their \(C_{\alpha}\) atoms as superimposition anchors. Then iteratively the anchor atoms are superimposed and outliers (in terms of high squared distance between matching anchor atoms) are removed, until either no outliers are left or the maximum number of iterations is reached.
- Parameters:
- fixedAtomArray, shape(n,) or AtomArrayStack, shape(m,n) or ndarray, shape(n,), dtype=float or ndarray, shape(m,n), dtype=float
The fixed structure(s). Alternatively coordinates can be given.
- mobileAtomArray, shape(n,) or AtomArrayStack, shape(m,n) or ndarray, shape(n,), dtype=float or ndarray, shape(m,n), dtype=float
The structure(s) which is/are superimposed on the fixed structure. Alternatively coordinates can be given.
- substitution_matrixstr or SubstitutionMatrix,
The (name of the) substitution matrix used for sequence alignment.
- gap_penaltyint or tuple of int, optional
The gap penalty for sequence alignment. A single value indicates a linear penalty, while a tuple indicates an affine penalty.
- min_anchors: int, optional
If less than min_anchors anchors are found by sequence alignment, the method ditches the alignment and matches all \(C_{\alpha}\) atoms. If the number of \(C_{\alpha}\) atoms does not match in this fallback case, an exception is raised. Furthermore, the outlier removal is stopped, if less than min_anchors anchors would be left.
- outlier_thresholdfloat, optional
The threshold for considering a conformational outlier. The threshold is given in units of IQR. This means that a residue is considered an outlier, if its squared distance is outlier_threshold x IQR above the upper quartile.
- max_iterationsint, optional
The maximum number of iterations for removing conformational outliers.
- Returns:
- fittedAtomArray or AtomArrayStack or ndarray, shape(n,), dtype=float or ndarray, shape(m,n), dtype=float
A copy of the mobile structure(s), superimposed on the fixed structure. Only coordinates are returned, if coordinates were given in mobile.
- transformAffineTransformation
This object contains the affine transformation(s) that were applied on mobile.
AffineTransformation.apply()
can be used to transform another AtomArray in the same way.- fixed_anchor_indices, mobile_anchor_indicesndarray, shape(k,), dtype=int
The indices of the anchor \(C_{\alpha}\) atoms in the fixed and mobile structure, respectively. These atoms were used for the superimposition.
- Note:
As this method relies on amino acid sequence alignment, it works only for proteins with decent homology.
pinder.core.structure.surgery module#
- pinder.core.structure.surgery.remove_annotations(structure: AtomArray | AtomArrayStack, categories: list[str] = ['element', 'ins_code']) AtomArray | AtomArrayStack [source][source]#
- pinder.core.structure.surgery.fix_annotation_mismatch(ref: AtomArray, decoys: AtomArrayStack, categories: list[str] = ['element', 'ins_code', 'b_factor']) tuple[AtomArray, AtomArrayStack] [source][source]#
- pinder.core.structure.surgery.fix_mismatched_atoms(native: AtomArray, decoy_stack: AtomArrayStack, max_atom_delta: int) tuple[AtomArray, AtomArrayStack] [source][source]#
- pinder.core.structure.surgery.set_canonical_chain_order(structure: AtomArray | AtomArrayStack | list[AtomArray], chains: ChainConfig, subject: str) AtomArray | AtomArrayStack | list[AtomArray] [source][source]#