Welcome to Node Classification¶
Written by Miguel Romero
Last update: 18/08/21
Node classification¶
This package aims to provide different approaches to the node classification problem (also known as attribute prediction) using machine learning techniques. There are two approaches available: flat node classification (fnc) and hierarchical classification (hc). Both approaches are based on a gradient boosting decision tree algorithm called XGBoost, in addition the approaches are equipped with an over-sampling technique call SMOTE.
Flat classification¶
Flat node classification aims to valuate whether the structural (topological) properties of a network are useful for predicting node attributes of nodes (i.e., node classification), without considering the (possible) relationships between the classes of the node attribute to be predicted, i.e., the classes are predicted independently.
Hierarchical classification¶
Hierarchical node classification considers the hierarchical organization of the classes of a node attribute to be predicted. Using a top-down approach a binary classifier is trained per class according to the hierarchy, which is represented as a DAG.
Installation¶
The node classification package can be install using pip, the requirements will be automatically installed:
python3 -m pip install nodeclass
The source code and examples can be found in this GitHub repository.
Example¶
Flat classification¶
This example illustrates how the node classification package can be used to check whether the structural properties of the gene co-expression network improve the performance of the prediction of gene functions for rice (Oryza sativa Japonica). In this example, a gene co-expression network gathered from ATTED II is used.
How to run the example?¶
The complete source code of the example can be found in the GitHub repository. First, the xgbfnc package need to be imported:
from nodeclass.models.xgbfnc import XGBfnc
from nodeclass.tools import data
After creating adjacency matrix adj for the network, the structural
properties are computed using the module data of the package:
df, strc_cols = data.compute_strc_prop(adj)
This method returns a DataFrame with the structural properties of the network and a list of the names of these properties (i.e., column names). After adding the additional features of the network to the DataFrame:
test = XGBfnc()
test.load_data(df, strc_cols, y, term, output_path='output')
ans, pred, params = test.structural_test()
The data of the network is loaded using the load_data method. And the
structural test is execute using the structural_test method. The test
returns a boolean value which indicates whether the structural properties
help to improve the prediction performance, the prediction for the model
including the structural properties and its best parameters.
To run the example execute the following commands:
cd test/flat_classification
python3 test_small.py
Hierarchical classification¶
This example illustrates how the hierarchical classification package can be used to predict gene functions considering the hierachical structure of gene functions (as determined by Gene Ontology) based on the gene co-expression network. This example uses the data for rice (Oryza sativa Japonica),the gene co-expression network (GCN) was gathered from ATTED II.
How to run the example?¶
The complete source code of the example can be found in the GitHub repository. First, the xgbhc package need to be imported:
from nodeclass.models.xgbhc import XGBhc
from nodeclass.tools import data
The adjacency matrix for the GCN and the gene functions (from ancestral
relations of biological processes), and the matrix of associations between
genes and functions are created using the packaga data as follows:
gcn, go_by_go, gene_by_go, G, T = data.create_matrices(data_ppi, data_isa, data_term_def, data_gene_term, OUTPUT_PATH, True)
The tree representation of the hierarchy is generated from the adjacency matrix of the classes by removing the isolated classes, filtering the classes according to the number of nodes associated (if required) and finding the sub-hierarchies remaining. Then a minimum spanning tree (MST) algorithm is applied to each sub-hierarchy to get the its tree representation (the order and ancestors of the classes will be calculated):
roots, subh_go_list = data.generate_hierarchy(gcn, go_by_go, gene_by_go, data_term_def, G, T, OUTPUT_PATH, filter=[5,300], trace=True)
root, subh_go = roots[13], subh_go_list[13]
subh_adj = data.hierarchy_to_tree(gcn, go_by_go, gene_by_go, T, subh_go, OUTPUT_PATH)
Additionally, the structural properties of the sub-graph of the GCN, corresponding to the set of nodes associated to the classes in the sub-hierarchy, are computed using the module data:
data.compute_strc_prop(subh_adj, path=OUTPUT_PATH)
Finally, the XGBhc class is instantiated, the data of the sub-hierarchy is loaded and the prediction is done as follows:
model = XGBhc()
model.load_data(data, root, hierarchy, ancestors, DATA_PATH, OUTPUT_PATH)
model.train_hierarchy()
The results of the prediction are saved on the OUTPUT_PATH, including the
roc and precision-recall curve, the confusion matrix and a csv file with some
performance metrics (such as the auc roc, average precision, recall, precision
and F1, true positive and true negative rate and the execution time).
To run the example execute the following commands:
cd test/hierarchical_classification
python3 test_data.py
python3 test.py
Documentation¶
Documentation of the package can be found here.
Models package¶
Flat node classification module¶
Module for flat node classification and testing the importance of the structural properties of the network.
- class nodeclass.models.xgbfnc.XGBfnc¶
Bases:
objectClass for flat node classification. This class builds two XGBoost binary classifier for the attribute prediction using two datasets with different features, one including structural properties of the network and the other one without them.
- Variables
df (DataFrame) – Datasets with all features of the network.
orig_cols (List[string]) – List of feature names (columns of df) non realted to structural properties of the network.
strc_cols (List[string]) – List of feature names (columns of df) realted to structural properties of the network. The intersection between orig_cols and strc_cols must be empty.
y (Series) – Serie representing the node attribute to be predicted.
ylabel (string) – Name of the node attribute to be predcited.
output_path (string) – Path where the output of the algorithm will be stored.
figs_pat (string) – Path where the figures will be stored.
- compare_plots(a, b, labels=['without', 'with'])¶
Plot roc curve, precision-recall curve and confussion matrices for the prediction of both models, i.e., without and with structural properties.
- Parameters
a (np.array[float]) – Predicted probabilities for the model without the structural properties of the network.
b (np.array[float]) – Predicted probabilities for the model including the structural properties of the network.
labels (List[string]) – Labels of both models for the plots.
- create_classifier(n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall', seed=None)¶
Builds the binary classifier within a hyper-parameters tuning model.
- Parameters
n_iter (int) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5
n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None
n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2
eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”
scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”
seed (int) – Random number seed, defaults to None
- Returns
Hyper-parameter tuning model with XGBoost binary classsifier
- Return type
RandomizedSearchCV
- create_path(path)¶
Create a path.
- Parameters
path (string) – Relative path to be created.
- Raises
OSError – the path already exist
- evaluate(y_orig, y_pred_prob)¶
Evaluate the performance of the prediction using metrics such as the auc roc, average precision score, precision, recall and F1 score.
- Parameters
y_orig (np.array[int]) – Truth values of the prediction.
y_pred_prob (np.array[float]) – Predicted probabilities with the XGBoost classifier.
- Returns
Evaluation metrics for the prediction.
- Return type
dict[string->float]
- load_data(df, strc_cols, y, ylabel, output_path=None, figs_path=None)¶
Load the data of the network.
- Parameters
df (DataFrame) – Dataset with all node features.
strc_cols (List[string]) – List of features related to structural properties.
y (Series) – Node attribute to be predicted. Should be the same size as the df.
ylabel (string) – Name of the node attribute to be predicted.
output_path (string) – Path to save output, defaults to “YYYY-MM-DD/”.
figs_path (string) – Path to save figs, defaults to “YYYY-MM-DD/”.
- opt_threshold(y_orig, y_pred)¶
Compute the classification from probabilities based on the optimum threshold according to precision-recall curve, that is the threshold that maximies the F1 score.
- Parameters
y_orig (np.array[int]) – Truth values of the prediction.
y_pred (np.array[float]) – Predicted probabilities with the XGBoost classifier.
- Returns
Classification for the input array of probabilities which maximies F1 score.
- Return type
np.array[int]
- plot_performance(a, label)¶
Plot roc curve, precision-recall curve and confussion matrices for a prediction.
- Parameters
a (np.array[float]) – Predicted probabilities for the model.
label (string) – Label of the models for the plots.
- print_performance(scores, title)¶
Print the evaluation metrics for the prediction.
- Parameters
scores (dict[string->float]) – Evaluation metrics for the prediction.
title (string) – Name of the model or experiment.
- structural_test(n_splits=5, seed=None, log=False, csv=True)¶
Test whether the structural properties of the network help to improve the prediction performance by building two different models and compare their results. One model includes the structural properties, whereas the other not.
- Parameters
n_splits (int) – Number of folds for cross-validation, defaults to 5
seed (int) – Random number seed, defaults to None
log (bool) – Flag for logging of the results of the test, default to False
csv (bool) – Flag for saving the results of the test in a csv file, defaults to True
- Returns
Result of the structural test (True if the the structural properties improve the prediction performance, False otherwise), predicted labels and best parameter combination for the classfifier including the structural properties.
- Return type
Tuple(bool, np.array[Int], dict[string->float])
- train(X, n_splits=5, seed=None, n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall')¶
Evaluate the performance of the prediction using metrics such as the auc roc, average precision score, precision, recall and F1 score.
- Parameters
X (DataFrame) – Iput dataset for the prediction.
n_splits (int) – Number of folds for cross-validation, defaults to 5
seed (int) – Random number seed, defaults to None
n_iter (int, optional) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5
n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None
n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2
eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”
scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”
- Returns
Predicted probabilities with the XGBoost classifier, feature importance measured by total gain, and best parameter combination for the classfifier.
- Return type
Tuple(np.array[float], dict[string->float], dict[string->float])
- write_csv(a, b, labels=['without', 'with'])¶
Save the evaluation metrics for prediction of both models, i.e., without and with structural properties.
- Parameters
a (np.array[float]) – Predicted probabilities for the model without the structural properties of the network.
b (np.array[float]) – Predicted probabilities for the model including the structural properties of the network.
labels (List[string]) – Labels of both models for the plots.
Hierarchical node classification module¶
Module for hierarchical node classification using a top-down approach.
- class nodeclass.models.xgbhc.XGBhc¶
Bases:
objectClass for hierarchical node classification. This class builds an XGBoost binary classifier for each class following a top-down approach.
- Variables
data (DataFrame) – Dataset with the graph information (topological properties of nodes) similar for all classes in the hierarchy.
hierarchy (list[int]) – List of classes in the hierarchy
ancestors (list[int]) – List of ancestors of each class in the hierarchy, i.e., tree representation. Labels in ‘ancestors’ must match with the lebels of the classes in ‘hierarchy’.
label (string) – Label of the hierarchy, used to name the figures and output files.
data_path (string) – Path where the data of the model is stored. In particular, the specific data for each class in the hierarchy. Files in ‘data_path’ must match with the lebels of the classes in ‘hierarchy’.
output_path (string) – Path where the output of the algorithm will be stored.
figs_path (string) – Path where the figures will be stored.
- check_data()¶
Verify the input data.
- Returns
flag indicating whether the input data is ok or not.
- Return type
boolean
- create_classifier(n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall', seed=None)¶
Builds the binary classifier within a hyper-parameters tuning model.
- Parameters
n_iter (int) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5
n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None
n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2
eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”
scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”
seed (int) – Random number seed, defaults to None
- Returns
Hyper-parameter tuning model with XGBoost binary classsifier
- Return type
RandomizedSearchCV
- create_path(path)¶
Create a path.
- Parameters
path (string) – Relative path to be created.
- Raises
OSError – the path already exist
- evaluate(y_orig, y_pred_prob)¶
Evaluate the performance of the prediction using metrics such as the auc roc, average precision score, precision, recall and F1 score.
- Parameters
y_orig (np.array[int]) – Truth values of the prediction.
y_pred_prob (np.array[float]) – Predicted probabilities with the XGBoost classifier.
- Returns
Evaluation metrics for the prediction.
- Return type
dict[string->float]
- load_data(data, label, hierarchy, ancestors, data_path, output_path=None, figs_path=None)¶
Load the data of the network and the hierarchy of classes.
- Parameters
data (DataFrame) – Dataset with the graph information (topological properties of nodes) similar for all classes in the hierarchy.
hierarchy (list[int]) – List of classes in the hierarchy
ancestors (list[int]) – List of ancestors of each class in the hierarchy, i.e., tree representation.
label (string) – Label of the hierarchy, used to name the figures and output files.
data_path (string) – Path where the data of the model is stored. In particular, the specific data for each class in the hierarchy.
output_path (string) – Path where the output of the algorithm will be stored, defaults to “YYYY-MM-DD/”.
figs_path (string) – Path where the figures will be stored, defaults to “YYYY-MM-DD/”.
- opt_threshold(y_orig, y_pred)¶
Compute the classification from probabilities based on the optimum threshold according to precision-recall curve, that is the threshold that maximies the F1 score.
- Parameters
y_orig (np.array[int]) – Truth values of the prediction.
y_pred (np.array[float]) – Predicted probabilities with the XGBoost classifier.
- Returns
Classification for the input array of probabilities which maximies F1 score.
- Return type
np.array[int]
- plot_performance(yorig, ypred)¶
Plot roc curve, precision-recall curve and confussion matrices for a prediction.
- Parameters
y_orig (np.array[int]) – Truth values of the prediction.
y_pred_prob (np.array[float]) – Predicted probabilities with the XGBoost classifier.
- train_class(X, y, label, n_splits, seed, n_iter=5, n_jobs_cv=None, n_jobs_xgb=2, eval_metric='aucpr', scoring='recall')¶
Training using a combination of SMOTE (Over-Sampling) and XGBoost techniques.
- Parameters
X (DataFrame) – X
y (np.array) – y
n_splits (int) – n_splits
seed (int) – seed
n_iter (int, optional) – Number of iterations in cross validation for hyper-parameters tuning, defaults to 5
n_jobs_cv (int) – Number of parallel jobs running for hyper-parameters tuning, defaults to None
n_jobs_xgb (int) – Number of parallel jobs running for training the classifier, defaults to 2
eval_metric (string) – Evaluation metric for training the classifier, defaults to “aucpr”
scoring (string) – Scoring metric for hyper-parameters tuning, defaults to “recall”
- Returns
Predicted probabilities with the XGBoost classifier, feature importance measured by total gain, and best parameter combination for the classfifier.
- Return type
Tuple(np.array[float], dict[string->float], dict[string->float])
- train_hierarchy(n_splits=5, seed=None)¶
Hierarchical classification of nodes using a local classifier. This approach uses a bfs to traverse the hierarchy, represented as a tree (no node has more than ones parent).
- Parameters
n_splits (int) – n_splits
seed (int) – seed
- Returns
Predicted probabilities and classification predicted with the algorithm, feature importance measured by total gain, and best parameter combination for the classfifier.
- Return type
Tuple(np.array[float], dict[string->float], dict[string->float])
- write_csv(pred)¶
Save the evaluation metrics for prediction of both models, i.e., without and with structural properties.
- Parameters
pred (np.array[float]) – Predicted probabilities for the model.
Tools package¶
Data processing module¶
- nodeclass.tools.data.compute_strc_prop(adj_mad, dimensions=16, p=1, q=0.5, path=None, log=False, seed=None)¶
Compute multiple structural properties of the input network. Two types of properties are computed: hand-crafted and node embeddings.
- Parameters
adj_mad (np.matrix[int]) – Adjacency matrix representation of the network, square and symmetric matrix.
dimensions (int) – Dimension of the node embedding, defaults to 16
p (float) – Return parameter of node2vec, defaults to 1
q (floar) – In-out parameter of node2vec, defaults to 0.5
path (string) – Relative path where the dataset will be saved, defaults to current path
log (bool) – Flag for logging of the results of the test, default to False
seed (float) – Random number seed, defaults to None
- Returns
Dataset with scaled features representing the structural properties of the network and list of labels (names) of the features.
- Return type
Tuple(Dataframe, List[string])
- nodeclass.tools.data.create_matrices(edgl, isa, cls_def, n2c, output_path, trace=False)¶
Create the adjacency and association matrices of the nodes and classes from the edglists for the graph and the associations between classes.
- Parameters
edgl (Dataframe) – Edgelist of the graph with columns ‘Source’ and ‘Target’
isa (DataFrame) – Edgelist of the initial representations of the associations between classes with columns ‘Class’ and ‘Ancestor’
cls_def (DataFrame) – Description of the classes with columns ‘Class’ and ‘Desc’
n2c (DataFrame) – Associations between classes and nodes by pairs ‘(node, class)’ with columns ‘Node’ and ‘Class’
output_path (string) – Path where the output is stored
trace (boolean) – Flag to print the trace of the process
- Returns
Adjacency matrix of the nodes, adjacency matrix of the classes, association matrix between nodes and classes, list of nodes and list of classes in the same order of the matrices
- Return type
np.matrix[float] MxM, np.matrix[float] NxN, np.matrix[float] MxN, np.array[string] M, np.array[string] N
- nodeclass.tools.data.create_txt(data, output_path, filename)¶
Create a txt file from a list of objects
- Parameters
data (list[object]) – List of objects (strings or numbers) to be stored as txt file
output_path (string) – Path where the txt file is stored
filename (string) – Name of the txt file
- nodeclass.tools.data.generate_hierarchy(adj, cl_adj, node_by_cl, cls_def, V, C, output_path, filter=None, trace=False)¶
Find the possible sub-hierarchies of the hieararchy of classes by removing the isolated classes and applying a filter (if required) to the number of nodes associated from the adjacency and association matrices of graph and classes.
- Parameters
adj (np.matrix[float] MxM) – Adjacency matrix of the nodes
cl_adj (np.matrix[float] NxN) – Adjacency matrix of the classes
node_by_cl (np.matrix[float] MxN) – Association matrix between nodes and classes
cls_def (DataFrame) – Description of the classes with columns ‘Class’ and ‘Desc’
V (np.array[string] M) – List of nodes in the same order of the ‘adj’ matrix
C (np.array[string] N) – List of classes in the same order of the ‘cl_adj’ matrix
output_path (string) – Path where the output is stored
filter (Tuple[int, int]) – Filter applied to the classes to be used for prediction from the hierarchy, lower and upper bound of the number of nodes associated to the classes, default to None
trace (boolean) – Flag to print the trace of the process, default to False
- Returns
List of roots (class index) of each sub-hierarchy from the hierarchy of classes, list of classes (indexes) within each sub-hiearchy (the first element of the list is the root class)
- Return type
np.array[int], np.array[np.array[int]]
- nodeclass.tools.data.hierarchy_to_tree(adj, cl_adj, node_by_cl, C, hier_cl_idx, output_path)¶
Generates the tree representation of the hierarchy from the adjacency and association matrices of graph and classes. Generates two files: the order of classes in the hierarchy (tree representation) and the ancestors of each class (in the same order)
- Parameters
adj (np.matrix[float] MxM) – Adjacency matrix of the nodes
cl_adj (np.matrix[float] NxN) – Adjacency matrix of the classes
node_by_cl (np.matrix[float] MxN) – Association matrix between nodes and classes
C (np.array[string] N) – List of classes in the same order of the ‘cl_adj’ matrix
hier_cl_idx (np.array[int]) – list of classes (indexes) within the hiearchy (the first element of the list is the root class), where ‘hier_cl_idx’ is a subset of ‘C’
output_path (string) – Path where the output is stored
- Returns
Adjacency matrix of the corresponding subgraph of the hieararchy, i.e., subgraph of all nodes associated to the root class of the hierarchy
- Return type
np.matrix[float]
- nodeclass.tools.data.neighborhood_information(adj, node_by_cl, nodes, cl_idx, anc_idx)¶
Extract the information of the association of a class and its ancestor in the neigborhood of the nodes in the graph (using the adjacency matrix)
- Parameters
adj (np.matrix[float] MxM) – Adjacency matrix of the nodes
node_by_cl (np.matrix[float] MxN) – Association matrix between nodes and classes
nodes (np.array[int]) – List of indexes of nodes to be considered
cl_idx (int) – Index of the class to be analyzed
anc_idx (int) – Index of the ancestor of the class to be analyzed
- Returns
Lis of prability of association between the ‘nodes’ and the class, the ‘nodes’ and its ancestor, and the ‘nodes’ and the class given the probability of association with its ancestor
- Return type
np.array[np.array[float], np.array[float], np.array[float]]
- nodeclass.tools.data.nodes_in_bfs(bfs, root)¶
Convert a bfs object to a list
- Parameters
bfs (object) – bfs object from networkx
root (int) – id of the root in the bfs
- Returns
list of nodes in the subgraph in bfs order
- Return type
list[int]
- nodeclass.tools.data.scale_data(data)¶
Scale the data of a dataset without modifying the distribution of data.
- Parameters
data (DataFrame) – Dataset
- Returns
Dataset with scaled features
- Return type
Dataframe
MST algorithm module¶
Module containing minimum spanning tree algorithm (MST) to turn a hierarchy of classes represented as a directed acyclic graph (DAG) into a tree. The algorithm uses the number of nodes associated to each class to select to select a parent class.
This algorithm is an adaptation of the algorithm used in (Jiang et. al. 2008).
- nodeclass.tools.mst.direct_pa(cl, cls, hie)¶
Find the direct parents for a given class in a given heirarchy. The direct parents of a class are the set of classes that do not have any other descendant.
- Parameters
cl (string) – List of indexes of the class for which the direct parents are being searched
cls (list[string] of size N) – List of indexes of the classes in the hierarchy to be considered by the algorithm
hie (np.matrix[float] of size NxN) – Adjacency matrix representing the hierarchy of classes considered in ‘cls’
- Returns
List of indexes of direct parents of the class ‘cl’ in the hierarchy
- Return type
list[int]
- nodeclass.tools.mst.mst(nodes, cls, node_by_cl, cl_by_cl)¶
Minimal spanning tree (MST) algorithm for a hierarchy of classes.
- Parameters
nodes (np.arrat[int] of size M) – List of indexes of the nodes to be considered by the algorithm
cls (list[string] of size N) – List of indexes of the classes in the hierarchy to be considered by the algorithm
node_by_cl (np.matrix(float) of size PxM) – Matrix representing the associations between nodes and classes, where PgeqN is the total number of classes in the hierarchy.
cl_by_cl (np.matrix(float) of size PxP) – Adjacency matrix representing the hierarchy of classes
- Returns
Adjacency matrix of the tree representation of the hierarchy of classes considered in ‘cls’
- Return type
np.matrix[float] of size NxN
Plotting module¶
Module for plotting the results of the prediction.
- nodeclass.tools.plots.plot_conf_matrix(cm, filename, path, labels=[0, 1])¶
Plot a confusion matrix and save it in a PDF file
- Parameters
cm (np.matrix[float]) – Confusion matrix
filename (string) – Name of the PDF file
path (string) – Path where the plot will be stored.
labels (List[float]) – Labels of the classes used in both axis of the matrix, default to [0,1].
- nodeclass.tools.plots.plot_pr(rec, prc, ap, filename, path)¶
Plot a precision-recall curve and save it in a PDF file
- Parameters
rec (np.array[float]) – Array of recall values
prc (np.array[float]) – Array of precision values
ap (float) – Average precision score
filename (string) – Name of the PDF file
path (string) – Path where the plot will be stored.
- nodeclass.tools.plots.plot_prs(recl, prcl, apl, labels, filename, path)¶
Plot multiple precision-recall curves in the same figure and save it in a PDF file
- Parameters
recl (np.arry[np.array[float]]) – Array of arrays of recall values for multiple predictions
prcl (np.arry[np.array[float]]) – Array of arrays of precision values for multiple predictions
apl (np.array[float]) – Array of average precision values for multiple predictions
labels (List[string]) – Labels of the multiple models plotted
filename (string) – Name of the PDF file
path (string) – Path where the plot will be stored.
- nodeclass.tools.plots.plot_roc(fpr, tpr, auc, filename, path)¶
Plot a ROC curve and save it in a PDF file
- Parameters
fpr (np.array[float]) – Array of false positive rate values
tpr (np.array[float]) – Array of true positive rate values
auc (float) – Area under roc curve
filename (string) – Name of the PDF file
path (string) – Path where the plot will be stored.
- nodeclass.tools.plots.plot_rocs(fprl, tprl, aucl, labels, filename, path)¶
Plot multiple roc curves in the same figure and save it in a PDF file
- Parameters
fprl (np.arry[np.array[float]]) – Array of arrays of false positive rate values for multiple predictions
tprl (np.arry[np.array[float]]) – Array of arrays of true positive rate values for multiple predictions
aucl (np.array[float]) – Array of area under roc curve values for multiple predictions
labels (List[string]) – Labels of the multiple models plotted
filename (string) – Name of the PDF file
path (string) – Path where the plot will be stored.