Welcome to Node Classification

  • Written by Miguel Romero

  • Last update: 18/08/21

Node classification

This package aims to provide different approaches to the node classification problem (also known as attribute prediction) using machine learning techniques. There are two approaches available: flat node classification (fnc) and hierarchical classification (hc). Both approaches are based on a gradient boosting decision tree algorithm called XGBoost, in addition the approaches are equipped with an over-sampling technique call SMOTE.

Flat classification

Flat node classification aims to valuate whether the structural (topological) properties of a network are useful for predicting node attributes of nodes (i.e., node classification), without considering the (possible) relationships between the classes of the node attribute to be predicted, i.e., the classes are predicted independently.

Hierarchical classification

Hierarchical node classification considers the hierarchical organization of the classes of a node attribute to be predicted. Using a top-down approach a binary classifier is trained per class according to the hierarchy, which is represented as a DAG.

Installation

The node classification package can be install using pip, the requirements will be automatically installed:

python3 -m pip install nodeclass

The source code and examples can be found in this GitHub repository.

Example

Flat classification

This example illustrates how the node classification package can be used to check whether the structural properties of the gene co-expression network improve the performance of the prediction of gene functions for rice (Oryza sativa Japonica). In this example, a gene co-expression network gathered from ATTED II is used.

How to run the example?

The complete source code of the example can be found in the GitHub repository. First, the xgbfnc package need to be imported:

from nodeclass.models.xgbfnc import XGBfnc
from nodeclass.tools import data

After creating adjacency matrix adj for the network, the structural properties are computed using the module data of the package:

df, strc_cols = data.compute_strc_prop(adj)

This method returns a DataFrame with the structural properties of the network and a list of the names of these properties (i.e., column names). After adding the additional features of the network to the DataFrame:

test = XGBfnc()
test.load_data(df, strc_cols, y, term, output_path='output')
ans, pred, params = test.structural_test()

The data of the network is loaded using the load_data method. And the structural test is execute using the structural_test method. The test returns a boolean value which indicates whether the structural properties help to improve the prediction performance, the prediction for the model including the structural properties and its best parameters.

To run the example execute the following commands:

cd test/flat_classification
python3 test_small.py

Hierarchical classification

This example illustrates how the hierarchical classification package can be used to predict gene functions considering the hierachical structure of gene functions (as determined by Gene Ontology) based on the gene co-expression network. This example uses the data for rice (Oryza sativa Japonica),the gene co-expression network (GCN) was gathered from ATTED II.

How to run the example?

The complete source code of the example can be found in the GitHub repository. First, the xgbhc package need to be imported:

from nodeclass.models.xgbhc import XGBhc
from nodeclass.tools import data

The adjacency matrix for the GCN and the gene functions (from ancestral relations of biological processes), and the matrix of associations between genes and functions are created using the packaga data as follows:

gcn, go_by_go, gene_by_go, G, T = data.create_matrices(data_ppi, data_isa, data_term_def, data_gene_term, OUTPUT_PATH, True)

The tree representation of the hierarchy is generated from the adjacency matrix of the classes by removing the isolated classes, filtering the classes according to the number of nodes associated (if required) and finding the sub-hierarchies remaining. Then a minimum spanning tree (MST) algorithm is applied to each sub-hierarchy to get the its tree representation (the order and ancestors of the classes will be calculated):

roots, subh_go_list = data.generate_hierarchy(gcn, go_by_go, gene_by_go, data_term_def, G, T, OUTPUT_PATH, filter=[5,300], trace=True)
root, subh_go = roots[13], subh_go_list[13]
subh_adj = data.hierarchy_to_tree(gcn, go_by_go, gene_by_go, T, subh_go, OUTPUT_PATH)

Additionally, the structural properties of the sub-graph of the GCN, corresponding to the set of nodes associated to the classes in the sub-hierarchy, are computed using the module data:

data.compute_strc_prop(subh_adj, path=OUTPUT_PATH)

Finally, the XGBhc class is instantiated, the data of the sub-hierarchy is loaded and the prediction is done as follows:

model = XGBhc()
model.load_data(data, root, hierarchy, ancestors, DATA_PATH, OUTPUT_PATH)
model.train_hierarchy()

The results of the prediction are saved on the OUTPUT_PATH, including the roc and precision-recall curve, the confusion matrix and a csv file with some performance metrics (such as the auc roc, average precision, recall, precision and F1, true positive and true negative rate and the execution time).

To run the example execute the following commands:

cd test/hierarchical_classification
python3 test_data.py
python3 test.py

Documentation

Documentation of the package can be found here.

Models package

Flat node classification module

Module for flat node classification and testing the importance of the structural properties of the network.

class nodeclass.models.xgbfnc.XGBfnc

Bases: object

Class for flat node classification. This class builds two XGBoost binary classifier for the attribute prediction using two datasets with different features, one including structural properties of the network and the other one without them.

Variables
  • df (DataFrame) – Datasets with all features of the network.

  • orig_cols (List[string]) – List of feature names (columns of df) non realted to structural properties of the network.

  • strc_cols (List[string]) – List of feature names (columns of df) realted to structural properties of the network. The intersection between orig_cols and strc_cols must be empty.

  • y (Series) – Serie representing the node attribute to be predicted.

  • ylabel (string) – Name of the node attribute to be predcited.

  • output_path (string) – Path where the output of the algorithm will be stored.

  • figs_pat (string) – Path where the figures will be stored.

compare_plots(a, b, labels=['without', 'with'])

Plot roc curve, precision-recall curve and confussion matrices for the prediction of both models, i.e., without and with structural properties.

Parameters
  • a (np.array[float]) – Predicted probabilities for the model without the structural properties of the network.

  • b (np.array[float]) – Predicted probabilities for the model including the structural properties of the network.

  • labels (List[string]) – Labels of both models for the plots.

create_classifier(n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall', seed=None)

Builds the binary classifier within a hyper-parameters tuning model.

Parameters
  • n_iter (int) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5

  • n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None

  • n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2

  • eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”

  • scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”

  • seed (int) – Random number seed, defaults to None

Returns

Hyper-parameter tuning model with XGBoost binary classsifier

Return type

RandomizedSearchCV

create_path(path)

Create a path.

Parameters

path (string) – Relative path to be created.

Raises

OSError – the path already exist

evaluate(y_orig, y_pred_prob)

Evaluate the performance of the prediction using metrics such as the auc roc, average precision score, precision, recall and F1 score.

Parameters
  • y_orig (np.array[int]) – Truth values of the prediction.

  • y_pred_prob (np.array[float]) – Predicted probabilities with the XGBoost classifier.

Returns

Evaluation metrics for the prediction.

Return type

dict[string->float]

load_data(df, strc_cols, y, ylabel, output_path=None, figs_path=None)

Load the data of the network.

Parameters
  • df (DataFrame) – Dataset with all node features.

  • strc_cols (List[string]) – List of features related to structural properties.

  • y (Series) – Node attribute to be predicted. Should be the same size as the df.

  • ylabel (string) – Name of the node attribute to be predicted.

  • output_path (string) – Path to save output, defaults to “YYYY-MM-DD/”.

  • figs_path (string) – Path to save figs, defaults to “YYYY-MM-DD/”.

opt_threshold(y_orig, y_pred)

Compute the classification from probabilities based on the optimum threshold according to precision-recall curve, that is the threshold that maximies the F1 score.

Parameters
  • y_orig (np.array[int]) – Truth values of the prediction.

  • y_pred (np.array[float]) – Predicted probabilities with the XGBoost classifier.

Returns

Classification for the input array of probabilities which maximies F1 score.

Return type

np.array[int]

plot_performance(a, label)

Plot roc curve, precision-recall curve and confussion matrices for a prediction.

Parameters
  • a (np.array[float]) – Predicted probabilities for the model.

  • label (string) – Label of the models for the plots.

print_performance(scores, title)

Print the evaluation metrics for the prediction.

Parameters
  • scores (dict[string->float]) – Evaluation metrics for the prediction.

  • title (string) – Name of the model or experiment.

structural_test(n_splits=5, seed=None, log=False, csv=True)

Test whether the structural properties of the network help to improve the prediction performance by building two different models and compare their results. One model includes the structural properties, whereas the other not.

Parameters
  • n_splits (int) – Number of folds for cross-validation, defaults to 5

  • seed (int) – Random number seed, defaults to None

  • log (bool) – Flag for logging of the results of the test, default to False

  • csv (bool) – Flag for saving the results of the test in a csv file, defaults to True

Returns

Result of the structural test (True if the the structural properties improve the prediction performance, False otherwise), predicted labels and best parameter combination for the classfifier including the structural properties.

Return type

Tuple(bool, np.array[Int], dict[string->float])

train(X, n_splits=5, seed=None, n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall')

Evaluate the performance of the prediction using metrics such as the auc roc, average precision score, precision, recall and F1 score.

Parameters
  • X (DataFrame) – Iput dataset for the prediction.

  • n_splits (int) – Number of folds for cross-validation, defaults to 5

  • seed (int) – Random number seed, defaults to None

  • n_iter (int, optional) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5

  • n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None

  • n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2

  • eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”

  • scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”

Returns

Predicted probabilities with the XGBoost classifier, feature importance measured by total gain, and best parameter combination for the classfifier.

Return type

Tuple(np.array[float], dict[string->float], dict[string->float])

write_csv(a, b, labels=['without', 'with'])

Save the evaluation metrics for prediction of both models, i.e., without and with structural properties.

Parameters
  • a (np.array[float]) – Predicted probabilities for the model without the structural properties of the network.

  • b (np.array[float]) – Predicted probabilities for the model including the structural properties of the network.

  • labels (List[string]) – Labels of both models for the plots.

Hierarchical node classification module

Module for hierarchical node classification using a top-down approach.

class nodeclass.models.xgbhc.XGBhc

Bases: object

Class for hierarchical node classification. This class builds an XGBoost binary classifier for each class following a top-down approach.

Variables
  • data (DataFrame) – Dataset with the graph information (topological properties of nodes) similar for all classes in the hierarchy.

  • hierarchy (list[int]) – List of classes in the hierarchy

  • ancestors (list[int]) – List of ancestors of each class in the hierarchy, i.e., tree representation. Labels in ‘ancestors’ must match with the lebels of the classes in ‘hierarchy’.

  • label (string) – Label of the hierarchy, used to name the figures and output files.

  • data_path (string) – Path where the data of the model is stored. In particular, the specific data for each class in the hierarchy. Files in ‘data_path’ must match with the lebels of the classes in ‘hierarchy’.

  • output_path (string) – Path where the output of the algorithm will be stored.

  • figs_path (string) – Path where the figures will be stored.

check_data()

Verify the input data.

Returns

flag indicating whether the input data is ok or not.

Return type

boolean

create_classifier(n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall', seed=None)

Builds the binary classifier within a hyper-parameters tuning model.

Parameters
  • n_iter (int) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5

  • n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None

  • n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2

  • eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”

  • scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”

  • seed (int) – Random number seed, defaults to None

Returns

Hyper-parameter tuning model with XGBoost binary classsifier

Return type

RandomizedSearchCV

create_path(path)

Create a path.

Parameters

path (string) – Relative path to be created.

Raises

OSError – the path already exist

evaluate(y_orig, y_pred_prob)

Evaluate the performance of the prediction using metrics such as the auc roc, average precision score, precision, recall and F1 score.

Parameters
  • y_orig (np.array[int]) – Truth values of the prediction.

  • y_pred_prob (np.array[float]) – Predicted probabilities with the XGBoost classifier.

Returns

Evaluation metrics for the prediction.

Return type

dict[string->float]

load_data(data, label, hierarchy, ancestors, data_path, output_path=None, figs_path=None)

Load the data of the network and the hierarchy of classes.

Parameters
  • data (DataFrame) – Dataset with the graph information (topological properties of nodes) similar for all classes in the hierarchy.

  • hierarchy (list[int]) – List of classes in the hierarchy

  • ancestors (list[int]) – List of ancestors of each class in the hierarchy, i.e., tree representation.

  • label (string) – Label of the hierarchy, used to name the figures and output files.

  • data_path (string) – Path where the data of the model is stored. In particular, the specific data for each class in the hierarchy.

  • output_path (string) – Path where the output of the algorithm will be stored, defaults to “YYYY-MM-DD/”.

  • figs_path (string) – Path where the figures will be stored, defaults to “YYYY-MM-DD/”.

opt_threshold(y_orig, y_pred)

Compute the classification from probabilities based on the optimum threshold according to precision-recall curve, that is the threshold that maximies the F1 score.

Parameters
  • y_orig (np.array[int]) – Truth values of the prediction.

  • y_pred (np.array[float]) – Predicted probabilities with the XGBoost classifier.

Returns

Classification for the input array of probabilities which maximies F1 score.

Return type

np.array[int]

plot_performance(yorig, ypred)

Plot roc curve, precision-recall curve and confussion matrices for a prediction.

Parameters
  • y_orig (np.array[int]) – Truth values of the prediction.

  • y_pred_prob (np.array[float]) – Predicted probabilities with the XGBoost classifier.

train_class(X, y, label, n_splits, seed, n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall')

Training using a combination of SMOTE (Over-Sampling) and XGBoost techniques.

Parameters
  • X (DataFrame) – X

  • y (np.array) – y

  • n_splits (int) – n_splits

  • seed (int) – seed

  • n_iter (int, optional) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5

  • n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None

  • n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2

  • eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”

  • scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”

Returns

Predicted probabilities with the XGBoost classifier, feature importance measured by total gain, and best parameter combination for the classfifier.

Return type

Tuple(np.array[float], dict[string->float], dict[string->float])

train_hierarchy(n_splits=5, seed=None)

Hierarchical classification of nodes using a local classifier. This approach uses a bfs to traverse the hierarchy, represented as a tree (no node has more than ones parent).

Parameters
  • n_splits (int) – n_splits

  • seed (int) – seed

Returns

Predicted probabilities and classification predicted with the algorithm, feature importance measured by total gain, and best parameter combination for the classfifier.

Return type

Tuple(np.array[float], dict[string->float], dict[string->float])

write_csv(pred)

Save the evaluation metrics for prediction of both models, i.e., without and with structural properties.

Parameters

pred (np.array[float]) – Predicted probabilities for the model.

Tools package

Data processing module

nodeclass.tools.data.compute_strc_prop(adj_mad, dimensions=16, p=1, q=0.5, path=None, log=False, seed=None)

Compute multiple structural properties of the input network. Two types of properties are computed: hand-crafted and node embeddings.

Parameters
  • adj_mad (np.matrix[int]) – Adjacency matrix representation of the network, square and symmetric matrix.

  • dimensions (int) – Dimension of the node embedding, defaults to 16

  • p (float) – Return parameter of node2vec, defaults to 1

  • q (floar) – In-out parameter of node2vec, defaults to 0.5

  • path (string) – Relative path where the dataset will be saved, defaults to current path

  • log (bool) – Flag for logging of the results of the test, default to False

  • seed (float) – Random number seed, defaults to None

Returns

Dataset with scaled features representing the structural properties of the network and list of labels (names) of the features.

Return type

Tuple(Dataframe, List[string])

nodeclass.tools.data.create_matrices(edgl, isa, cls_def, n2c, output_path, trace=False)

Create the adjacency and association matrices of the nodes and classes from the edglists for the graph and the associations between classes.

Parameters
  • edgl (Dataframe) – Edgelist of the graph with columns ‘Source’ and ‘Target’

  • isa (DataFrame) – Edgelist of the initial representations of the associations between classes with columns ‘Class’ and ‘Ancestor’

  • cls_def (DataFrame) – Description of the classes with columns ‘Class’ and ‘Desc’

  • n2c (DataFrame) – Associations between classes and nodes by pairs ‘(node, class)’ with columns ‘Node’ and ‘Class’

  • output_path (string) – Path where the output is stored

  • trace (boolean) – Flag to print the trace of the process

Returns

Adjacency matrix of the nodes, adjacency matrix of the classes, association matrix between nodes and classes, list of nodes and list of classes in the same order of the matrices

Return type

np.matrix[float] MxM, np.matrix[float] NxN, np.matrix[float] MxN, np.array[string] M, np.array[string] N

nodeclass.tools.data.create_txt(data, output_path, filename)

Create a txt file from a list of objects

Parameters
  • data (list[object]) – List of objects (strings or numbers) to be stored as txt file

  • output_path (string) – Path where the txt file is stored

  • filename (string) – Name of the txt file

nodeclass.tools.data.generate_hierarchy(adj, cl_adj, node_by_cl, cls_def, V, C, output_path, filter=None, trace=False)

Find the possible sub-hierarchies of the hieararchy of classes by removing the isolated classes and applying a filter (if required) to the number of nodes associated from the adjacency and association matrices of graph and classes.

Parameters
  • adj (np.matrix[float] MxM) – Adjacency matrix of the nodes

  • cl_adj (np.matrix[float] NxN) – Adjacency matrix of the classes

  • node_by_cl (np.matrix[float] MxN) – Association matrix between nodes and classes

  • cls_def (DataFrame) – Description of the classes with columns ‘Class’ and ‘Desc’

  • V (np.array[string] M) – List of nodes in the same order of the ‘adj’ matrix

  • C (np.array[string] N) – List of classes in the same order of the ‘cl_adj’ matrix

  • output_path (string) – Path where the output is stored

  • filter (Tuple[int, int]) – Filter applied to the classes to be used for prediction from the hierarchy, lower and upper bound of the number of nodes associated to the classes, default to None

  • trace (boolean) – Flag to print the trace of the process, default to False

Returns

List of roots (class index) of each sub-hierarchy from the hierarchy of classes, list of classes (indexes) within each sub-hiearchy (the first element of the list is the root class)

Return type

np.array[int], np.array[np.array[int]]

nodeclass.tools.data.hierarchy_to_tree(adj, cl_adj, node_by_cl, C, hier_cl_idx, output_path)

Generates the tree representation of the hierarchy from the adjacency and association matrices of graph and classes. Generates two files: the order of classes in the hierarchy (tree representation) and the ancestors of each class (in the same order)

Parameters
  • adj (np.matrix[float] MxM) – Adjacency matrix of the nodes

  • cl_adj (np.matrix[float] NxN) – Adjacency matrix of the classes

  • node_by_cl (np.matrix[float] MxN) – Association matrix between nodes and classes

  • C (np.array[string] N) – List of classes in the same order of the ‘cl_adj’ matrix

  • hier_cl_idx (np.array[int]) – list of classes (indexes) within the hiearchy (the first element of the list is the root class), where ‘hier_cl_idx’ is a subset of ‘C’

  • output_path (string) – Path where the output is stored

Returns

Adjacency matrix of the corresponding subgraph of the hieararchy, i.e., subgraph of all nodes associated to the root class of the hierarchy

Return type

np.matrix[float]

nodeclass.tools.data.neighborhood_information(adj, node_by_cl, nodes, cl_idx, anc_idx)

Extract the information of the association of a class and its ancestor in the neigborhood of the nodes in the graph (using the adjacency matrix)

Parameters
  • adj (np.matrix[float] MxM) – Adjacency matrix of the nodes

  • node_by_cl (np.matrix[float] MxN) – Association matrix between nodes and classes

  • nodes (np.array[int]) – List of indexes of nodes to be considered

  • cl_idx (int) – Index of the class to be analyzed

  • anc_idx (int) – Index of the ancestor of the class to be analyzed

Returns

Lis of prability of association between the ‘nodes’ and the class, the ‘nodes’ and its ancestor, and the ‘nodes’ and the class given the probability of association with its ancestor

Return type

np.array[np.array[float], np.array[float], np.array[float]]

nodeclass.tools.data.nodes_in_bfs(bfs, root)

Convert a bfs object to a list

Parameters
  • bfs (object) – bfs object from networkx

  • root (int) – id of the root in the bfs

Returns

list of nodes in the subgraph in bfs order

Return type

list[int]

nodeclass.tools.data.scale_data(data)

Scale the data of a dataset without modifying the distribution of data.

Parameters

data (DataFrame) – Dataset

Returns

Dataset with scaled features

Return type

Dataframe

MST algorithm module

Module containing minimum spanning tree algorithm (MST) to turn a hierarchy of classes represented as a directed acyclic graph (DAG) into a tree. The algorithm uses the number of nodes associated to each class to select to select a parent class.

This algorithm is an adaptation of the algorithm used in (Jiang et. al. 2008).

nodeclass.tools.mst.direct_pa(cl, cls, hie)

Find the direct parents for a given class in a given heirarchy. The direct parents of a class are the set of classes that do not have any other descendant.

Parameters
  • cl (string) – List of indexes of the class for which the direct parents are being searched

  • cls (list[string] of size N) – List of indexes of the classes in the hierarchy to be considered by the algorithm

  • hie (np.matrix[float] of size NxN) – Adjacency matrix representing the hierarchy of classes considered in ‘cls’

Returns

List of indexes of direct parents of the class ‘cl’ in the hierarchy

Return type

list[int]

nodeclass.tools.mst.mst(nodes, cls, node_by_cl, cl_by_cl)

Minimal spanning tree (MST) algorithm for a hierarchy of classes.

Parameters
  • nodes (np.arrat[int] of size M) – List of indexes of the nodes to be considered by the algorithm

  • cls (list[string] of size N) – List of indexes of the classes in the hierarchy to be considered by the algorithm

  • node_by_cl (np.matrix(float) of size PxM) – Matrix representing the associations between nodes and classes, where PgeqN is the total number of classes in the hierarchy.

  • cl_by_cl (np.matrix(float) of size PxP) – Adjacency matrix representing the hierarchy of classes

Returns

Adjacency matrix of the tree representation of the hierarchy of classes considered in ‘cls’

Return type

np.matrix[float] of size NxN

Plotting module

Module for plotting the results of the prediction.

nodeclass.tools.plots.plot_conf_matrix(cm, filename, path, labels=[0, 1])

Plot a confusion matrix and save it in a PDF file

Parameters
  • cm (np.matrix[float]) – Confusion matrix

  • filename (string) – Name of the PDF file

  • path (string) – Path where the plot will be stored.

  • labels (List[float]) – Labels of the classes used in both axis of the matrix, default to [0,1].

nodeclass.tools.plots.plot_pr(rec, prc, ap, filename, path)

Plot a precision-recall curve and save it in a PDF file

Parameters
  • rec (np.array[float]) – Array of recall values

  • prc (np.array[float]) – Array of precision values

  • ap (float) – Average precision score

  • filename (string) – Name of the PDF file

  • path (string) – Path where the plot will be stored.

nodeclass.tools.plots.plot_prs(recl, prcl, apl, labels, filename, path)

Plot multiple precision-recall curves in the same figure and save it in a PDF file

Parameters
  • recl (np.arry[np.array[float]]) – Array of arrays of recall values for multiple predictions

  • prcl (np.arry[np.array[float]]) – Array of arrays of precision values for multiple predictions

  • apl (np.array[float]) – Array of average precision values for multiple predictions

  • labels (List[string]) – Labels of the multiple models plotted

  • filename (string) – Name of the PDF file

  • path (string) – Path where the plot will be stored.

nodeclass.tools.plots.plot_roc(fpr, tpr, auc, filename, path)

Plot a ROC curve and save it in a PDF file

Parameters
  • fpr (np.array[float]) – Array of false positive rate values

  • tpr (np.array[float]) – Array of true positive rate values

  • auc (float) – Area under roc curve

  • filename (string) – Name of the PDF file

  • path (string) – Path where the plot will be stored.

nodeclass.tools.plots.plot_rocs(fprl, tprl, aucl, labels, filename, path)

Plot multiple roc curves in the same figure and save it in a PDF file

Parameters
  • fprl (np.arry[np.array[float]]) – Array of arrays of false positive rate values for multiple predictions

  • tprl (np.arry[np.array[float]]) – Array of arrays of true positive rate values for multiple predictions

  • aucl (np.array[float]) – Array of area under roc curve values for multiple predictions

  • labels (List[string]) – Labels of the multiple models plotted

  • filename (string) – Name of the PDF file

  • path (string) – Path where the plot will be stored.

Indices and tables