pinder.data.annotation package#

Submodules#

pinder.data.annotation.canonical_apo module#

Select single representative apo monomers for a dimer entry.

Uses suite of apo-holo difficulty assessment metrics to create a scaled score and select a single receptor and single ligand monomer for a given pinder_id when apo structures are available.

pinder.data.annotation.canonical_apo.get_apo_monomer_weighted_score(apo_data: DataFrame, scale_type: str = 'standard') DataFrame[source][source]#
pinder.data.annotation.canonical_apo.get_system_monomer_difficulty(pinder_id: str) DataFrame | None[source][source]#
pinder.data.annotation.canonical_apo.get_canonical_apo_codes(pinder_id: str) dict[str, str] | None[source][source]#

pinder.data.annotation.constants module#

pinder.data.annotation.contact_classification module#

pinder.data.annotation.contact_classification.get_crystal_contact_classification(pdb_path: Path) list[Path | str | None][source][source]#

Get crystal contact classification using prodigy_cryst In addition to the contacts, the link density and label are also returned The label is the classification of the crystal contact as either “biological” or “crystal”

pinder.data.annotation.contact_classification.detect_disulfide_bonds(structure: AtomArray, distance: float = 2.05, distance_tol: float = 0.05, dihedral: float = 90.0, dihedral_tol: float = 10.0) ndarray[Any, dtype[int64]][source][source]#

Detect potential disulfide bonds.

This function is used to detects disulfide bridges in protein structures.

The employed criteria for disulfide bonds are quite simple in this case: the \(S_\gamma\) atoms of two cystein residues must be in a vicinity of \(2.05 \pm 0.05\) Å and the dihedral angle of \(C_\beta - S_\gamma - S^\prime_\gamma - C^\prime_\beta\) must be \(90 \pm 10 ^{\circ}\).

pinder.data.annotation.detached_components module#

pinder.data.annotation.detached_components.get_num_connected_components(args: tuple[Path, float]) tuple[int, int, str][source][source]#

Get number of connected components in CA-CA distance graph.

Find detached structures for each chain separately by detecting connected components in CA-CA distance graph with radius of 15 Angstroms (default)

pinder.data.annotation.elongation module#

pinder.data.annotation.elongation.get_max_var(coords: ndarray[Any, dtype[float64]]) float[source][source]#

Get the maximum variance of the coordinates of the CA atoms of the two

pinder.data.annotation.elongation.calculate_elongation(pdb_filename: str | Path) tuple[str, float | None, float | None, int | None, int | None, int | None, str, str, str | None, str | None][source][source]#

Get the maximum variance of the coordinates of the CA atoms of the two chains in the PDB file. Also get the length of the two chains and the number of atom types.

pinder.data.annotation.graphql module#

pinder.data.annotation.graphql.run_graphql_annotation_query(pdb_id: str) dict[str, Any][source][source]#

Fetch annotations for an entry by PDB entry ID.

The data returned is identical to the data used to generate webpages for entries. For example: https://www.rcsb.org/annotations/2A79 The query is taken from the Data API widget.

Parameters:
pdb_idstr

The PDB entry ID to fetch.

Returns:
data: dict[str, Any]

A dictionary of data for the entry. If the entry is not found, the response will be {‘data’: {‘entry’: None}}.

pinder.data.annotation.graphql.fetch_entry_annotations(pdb_id: str, data_json: Path, use_cache: bool = True) None[source][source]#

Fetch annotations for an entry by PDB entry ID and store in json.

If the entry is not found, the json is still saved with status: empty. If the query fails, e.g. due to ratelimit exceeded or network connectivity, the json is saved with status: failed. The contents of the json are used to enable cached results without requiring a new query unless specified.

Parameters:
pdb_idstr

The PDB entry ID to fetch.

data_jsonPath

The Path to the json file to write data to.

use_cachebool

Whether to skip the query if valid results exist on disk.

Returns:
None
pinder.data.annotation.graphql.parse_pfam(polymer_entity: dict[str, Any]) DataFrame[source][source]#

Extract PFAM (protein family) annotations from a polymer entity.

pinder.data.annotation.graphql.parse_ec(polymer_entity: dict[str, Any]) DataFrame[source][source]#

Extract PFAM (protein family) annotations from a polymer entity.

pinder.data.annotation.graphql.parse_polymer_entity_instance(polymer_entity_instance: dict[str, Any]) tuple[DataFrame, DataFrame][source][source]#

Extract polymer entity instance features and annotations.

Takes an instance of a polymer entity from a data entry and converts to dataframes corresponding to the rcsb_polymer_instance_annotation and rcsb_polymer_instance_feature keys. Also adds entity information to the dataframe containing identifiers like asym_id, auth_asym_id, entity_id.

Parameters:
polymer_entity_instancedict[str, Any]

The polymer entity instance coming from a polymer entity.

Returns:
tuple[pd.DataFrame, pd.DataFrame]

A tuple containing the annotation data and feature data for the entity instance.

pinder.data.annotation.graphql.cast_entity_info_lists(entity_info: dict[str, str | int | float | list[str | int | float]]) dict[str, str | int | float][source][source]#
pinder.data.annotation.graphql.add_entity_info(df: DataFrame, entity_info: dict[str, str | int | float]) DataFrame[source][source]#
pinder.data.annotation.graphql.parse_annotation_data(data_json: Path) tuple[DataFrame, DataFrame, DataFrame, DataFrame][source][source]#
pinder.data.annotation.graphql.csv_format_pfam(pfam_df: DataFrame, pdb_id: str) DataFrame[source][source]#
pinder.data.annotation.graphql.csv_format_features(feature_df: DataFrame, pdb_id: str) DataFrame[source][source]#
pinder.data.annotation.graphql.csv_format_annotations(annotation_df: DataFrame, pdb_id: str) DataFrame[source][source]#
pinder.data.annotation.graphql.safe_fetch_entry_annotations(pdb_id: str, pinder_dir: Path, use_cache: bool = True) None[source][source]#
pinder.data.annotation.graphql.populate_rcsb_annotations(pinder_dir: Path, pdb_ids: list[str] | None = None, max_workers: int | None = None, use_cache: bool = True, parallel: bool = True) None[source][source]#
pinder.data.annotation.graphql.split_annotation_types(dfs: list[DataFrame]) dict[str, list[DataFrame]][source][source]#

Split list of annotation dataframes into their respective annotation categories.

Reduces final dataframe size and used to write each category to its own file on disk.

pinder.data.annotation.graphql.extract_feature_positions(df: DataFrame) DataFrame[source][source]#
pinder.data.annotation.graphql.cast_incompatible_annotation_columns(annot_df: DataFrame) DataFrame[source][source]#
pinder.data.annotation.graphql.cast_incompatible_feature_columns(feat_df: DataFrame) DataFrame[source][source]#
pinder.data.annotation.graphql.collate_csvs(csvs: list[Path], output_csv: Path, max_workers: int | None = None, use_cache: bool = True) None[source][source]#
pinder.data.annotation.graphql.collect_csvs_by_group(output_dir: Path, csv_files: list[Path], entity_prefix: str, max_workers: int | None = None, use_cache: bool = True) None[source][source]#
pinder.data.annotation.graphql.collect_rcsb_annotations(pinder_dir: Path, max_workers: int | None = None, use_cache: bool = True) None[source][source]#

pinder.data.annotation.interface_gaps module#

pinder.data.annotation.interface_gaps.annotate_interface_gaps(pdb_file: Path, smaller_radius: float = 4.0, larger_radius: float = 8.0) DataFrame | None[source][source]#

Find atomic gaps near the PPI interface.

Look for atoms (and count residues) that are part of the interface and within a radius of one of the residue gaps (as defined by numbering).

pinder.data.annotation.interface_gaps.mp_annotate_interface_gaps(pdb_files: list[Path], parallel: bool = True, max_workers: int | None = None) DataFrame | None[source][source]#

pinder.data.annotation.planarity module#

pinder.data.annotation.planarity.get_planarity(pdb_file: Path) float[source][source]#

Calculate the planarity of the interface between two proteins.

Parameters:
pdb_filePath

The path to the pdb file.

Returns:
float

The root mean square error (RMSE) of the distance of the interface C-alpha atoms to the plane of the interface. If the interface C-alpha atoms are less than 3, returns -1.

pinder.data.annotation.sabdab module#

pinder.data.annotation.sabdab.download_sabdab(download_dir: Path | str, filename: str = 'sabdab_summary_all.tsv', url: str = 'https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/summary/all/', overwrite: bool = True) Path[source][source]#
pinder.data.annotation.sabdab.explode_sabdab_per_chain(sabdab_df: DataFrame) DataFrame[source][source]#
pinder.data.annotation.sabdab.map_to_pinder_chains(chains: DataFrame, sabdab: DataFrame) DataFrame[source][source]#
pinder.data.annotation.sabdab.map_to_sabdab_db(chains_long: DataFrame, sabdab_long: DataFrame) DataFrame[source][source]#
pinder.data.annotation.sabdab.add_sabdab_annotations(pinder_dir: Path, use_cache: bool = True) None[source][source]#

Module contents#