commot.tl.communication_impact

commot.tl.communication_impact(adata, database_name=None, pathway_name=None, pathway_sum_only=False, heteromeric_delimiter='_', normalize=False, method=None, corr_method='spearman', tree_method='rf', tree_ntrees=100, tree_repeat=100, tree_max_depth=5, tree_max_features='sqrt', tree_learning_rate=0.1, tree_subsample=1.0, tree_combined=False, ds_genes=None, bg_genes=100)

Analyze impact of communication.

When using the ‘treebased_score’ as the method, there is potentially dilution of importance between the LR pairs if ‘tree_combined’ is set to True. Therefore, if uniqueness of potential impact of various LR pairs on the target genes is not the focus, ‘tree_combined’ can be set to False. If the unique impact of signaling in addition to the intra-cellular regulatory impact of target genes is not of interest, ‘bg_genes’ can be set to 0.

Parameters
  • adata (AnnData) – The data matrix of shape n_obs × n_var. Rows correspond to cells or positions and columns to genes. The full normalized dataset should be available in adata.raw.

  • pathway_name (Optional[str]) – Name of the signaling pathway.

  • normalize (bool) – Whether to perform normalization before determining variable genes.

  • method (Optional[str]) – ‘partial_corr’: partial correlation. ‘semipartial_corr’: semipartial correlation. ‘treebased_score’: machine learning based score (ensemble of trees).

  • corr_method (str) – The correlation coefficient to use when method is ‘partial_corr’ or ‘semipartial_corr’. ‘spearman’: Spearman’s r. ‘pearson’: Pearson’s r.

  • tree_method (str) – The ensemble of trees method to use when method is ‘treebased_score’. ‘gbt’: gradient boosted trees. ‘rf’: random forest.

  • tree_ntrees (int) – Number of trees when using ‘treebased_score’.

  • tree_repeat (int) – Number of times to repeat to account for randomness when using ‘treebased_score’.

  • tree_mas_depth – Max depth of trees when using ‘treebased_score’.

  • tree_max_features (str) – Max features for trees when using ‘treebased_score’.

  • tree_learning_rate (float) – Learning rate when using ‘treebased_score’.

  • tree_subsample (float) – Subsample (between 0 and 1) when using ‘treebased_score’.

  • tree_combined (bool) – If True, use a single model for each target gene with all features.

  • ds_genes (Optional[list]) – A list of genes for analyzing the correlation with cell-cell communication.

  • bg_genes (Union[list, int]) – If an integer, the top number of variable genes are used. Alternatively, a list of genes.

Returns

df_impact – A data frame describing the correlation between the ds_genes and cell-cell communication.

Return type

pd.DataFrame