commot.tl.spatial_communication

commot.tl.spatial_communication(adata, database_name=None, df_ligrec=None, pathway_sum=False, heteromeric=False, heteromeric_rule='min', heteromeric_delimiter='_', dis_thr=None, cost_scale=None, cost_type='euc', cot_eps_p=0.1, cot_eps_mu=None, cot_eps_nu=None, cot_rho=10.0, cot_nitermax=10000, cot_weights=(0.25, 0.25, 0.25, 0.25), smooth=False, smth_eta=None, smth_nu=None, smth_kernel='exp', copy=False)

Infer spatial communication.

Parameters
  • adata (AnnData) – The data matrix of shape n_obs × n_var. Rows correspond to cells or spots and columns to genes. If the spatial distance is absent in .obsp['spatial_distance'], Euclidean distance determined from .obsm['spatial'] will be used.

  • database_name (Optional[str]) – Name of the ligand-receptor interaction database. Will be included in the keywords for anndata slots.

  • df_ligrec (Optional[DataFrame]) – A data frame where each row corresponds to a ligand-receptor pair with ligands, receptors, and the associated signaling pathways in the three columns, respectively.

  • pathway_sum (bool) – Whether to sum over all ligand-receptor pairs of each pathway.

  • heteromeric (bool) – Whether the ligands or receptors are made of heteromeric complexes.

  • heteromeric_rule (str) – Use either ‘min’ (minimum) or ‘ave’ (average) expression of the components as the level for the heteromeric complex.

  • heteromeric_delimiter (str) – The character in ligand and receptor names separating individual components.

  • dis_thr (Optional[float]) – The threshold of spatial distance of signaling.

  • cost_scale (Optional[dict]) – Weight coefficients of the cost matrix for each ligand-receptor pair, e.g. cost_scale[(‘ligA’,’recA’)] specifies weight for the pair ligA and recA. If None, all pairs have the same weight.

  • cost_type (str) – If ‘euc’, the original Euclidean distance will be used as cost matrix. If ‘euc_square’, the square of the Euclidean distance will be used.

  • cot_eps_p (float) – The coefficient of entropy regularization for transport plan.

  • cot_eps_mu (Optional[float]) – The coefficient of entropy regularization for untransported source (ligand). Set to equal to cot_eps_p for fast algorithm.

  • cot_eps_nu (Optional[float]) – The coefficient of entropy regularization for unfulfilled target (receptor). Set to equal to cot_eps_p for fast algorithm.

  • cot_rho (float) – The coefficient of penalty for unmatched mass.

  • cot_nitermax (int) – Maximum iteration for collective optimal transport algorithm.

  • cot_weights (tuple) – A tuple of four weights that add up to one. The weights corresponds to four setups of collective optimal transport: 1) all ligands-all receptors, 2) each ligand-all receptors, 3) all ligands-each receptor, 4) each ligand-each receptor.

  • smooth (bool) – Whether to (spatially) smooth the gene expression for identifying more global signaling trend.

  • smth_eta (Optional[float]) – Kernel bandwidth for smoothing

  • smth_nu (Optional[float]) – Kernel sharpness for smoothing

  • smth_kernel (str) – ‘exp’ exponential kernel. ‘lorentz’ Lorentz kernel.

  • copy (bool) – Whether to return a copy of the anndata.AnnData.

Returns

adata – Signaling matrices are added to .obsp, e.g., for a LR interaction database named “databaseX”, .obsp['commot-databaseX-ligA-recA'] is a n_obs × n_obs matrix with the ij th entry being the “score” of cell i sending signal to cell j through ligA and recA. The marginal sums (sender and receiver) of the signaling matrices are stored in .obsm['commot-databaseX-sum-sender'] and .obsm['commot-databaseX-sum-receiver']. Metadata of the analysis is added to .uns['commot-databaseX-info']. If copy=True, return the AnnData object and return None otherwise.

Return type

anndata.AnnData

Examples

>>> import anndata
>>> import commot as ct
>>> import pandas as pd
>>> import numpy as np
>>> # create a toy data with four cells and four genes
>>> X = np.array([[1,0,0,0],[2,0,0,0],[0,1,1,4],[0,3,4,1]], float)
>>> adata = anndata.AnnData(X=X, var=pd.DataFrame(index=['LigA','RecA','RecB','RecC']))
>>> adata.obsm['spatial'] = np.array([[0,0],[1,0],[0,1],[1,2]], float)
>>> # create a toy ligand-receptor database with two pairs sharing the same ligand
>>> df_ligrec = pd.DataFrame([['LigA','RecA_RecB'],['LigA','RecC']])
>>> df_ligrec = pd.DataFrame([['LigA','RecA_RecB','PathwayA'],['LigA','RecC','PathwayA']])
>>> # infer CCC with a distance threshold of 1.1. Due the threshold, the last cell won't be communicating with other cells.
>>> ct.tl.spatial_communication(adata, database_name='toyDB',df_ligrec=df_ligrec, pathway_sum=True, heteromeric=True, dis_thr=1.1)
>>> adata
AnnData object with n_obs × n_vars = 4 × 4
    uns: 'commot-toyDB-info'
    obsm: 'spatial', 'commot-toyDB-sum-sender', 'commot-toyDB-sum-receiver'
    obsp: 'commot-toyDB-LigA-RecA_RecB', 'commot-toyDB-LigA-RecC', 'commot-toyDB-PathwayA', 'commot-toyDB-total-total'
>>> print(adata.obsp['commot-toyDB-LigA-RecA_RecB'])
(0, 2)      0.600000000000005
>>> print(adata.obsp['commot-toyDB-LigA-RecC'])
(0, 2)      0.9000000000039632
>>> print(adata.obsm['commot-toyDB-sum-receiver'])
r-LigA-RecA_RecB  r-LigA-RecC  r-total-total  r-PathwayA
0               0.0          0.0            0.0         0.0
1               0.0          0.0            0.0         0.0
2               0.6          0.9            1.5         1.5
3               0.0          0.0            0.0         0.0
>>> print(adata.obsm['commot-toyDB-sum-sender'])
s-LigA-RecA_RecB  s-LigA-RecC  s-total-total  s-PathwayA
0               0.6          0.9            1.5         1.5
1               0.0          0.0            0.0         0.0
2               0.0          0.0            0.0         0.0
3               0.0          0.0            0.0         0.0
>>> print(adata.obsp['commot-toyDB-PathwayA'])
(0, 2)      1.5000000000039682
>>> print(adata.obsp['commot-toyDB-total-total'])
(0, 2)      1.5000000000039682
>>> adata.uns['commot-toyDB-info']
{'df_ligrec':   ligand   receptor   pathway
0   LigA  RecA_RecB  PathwayA
1   LigA       RecC  PathwayA, 'distance_threshold': 1.1}