The simplest way to install mir_eval is by using pip, which will also install the required dependencies if needed.
To install mir_eval using pip, simply run
pipinstallmir_eval
Alternatively, you can install mir_eval from source by first installing the dependencies and then running
pythonsetup.pyinstall
from the source directory.
If you don’t use Python and want to get started as quickly as possible, you might consider using Anaconda which makes it easy to install a Python environment which can run mir_eval.
Once you’ve installed mir_eval (see Installing mir_eval), you can import it in your Python code as follows:
importmir_eval
From here, you will typically either load in data and call the evaluate() function from the appropriate submodule like so:
reference_beats=mir_eval.io.load_events('reference_beats.txt')estimated_beats=mir_eval.io.load_events('estimated_beats.txt')# Scores will be a dict containing scores for all of the metrics# implemented in mir_eval.beat. The keys are metric names# and values are the scores achievedscores=mir_eval.beat.evaluate(reference_beats,estimated_beats)
or you’ll load in the data, do some preprocessing, and call specific metric functions from the appropriate submodule like so:
reference_beats=mir_eval.io.load_events('reference_beats.txt')estimated_beats=mir_eval.io.load_events('estimated_beats.txt')# Crop out beats before 5s, a common preprocessing stepreference_beats=mir_eval.beat.trim_beats(reference_beats)estimated_beats=mir_eval.beat.trim_beats(estimated_beats)# Compute the F-measure metric and store it in f_measuref_measure=mir_eval.beat.f_measure(reference_beats,estimated_beats)
The documentation for each metric function, found in the mir_eval section below, contains further usage information.
Alternatively, you can use the evaluator scripts which allow you to run evaluation from the command line, without writing any code.
These scripts are are available here:
The structure of the mir_eval Python module is as follows:
Each MIR task for which evaluation metrics are included in mir_eval is given its own submodule, and each metric is defined as a separate function in each submodule.
Every metric function includes detailed documentation, example usage, input validation, and references to the original paper which defined the metric (see the subsections below).
The task submodules also all contain a function evaluate(), which takes as input reference and estimated annotations and returns a dictionary of scores for all of the metrics implemented (for casual users, this is the place to start).
Finally, each task submodule also includes functions for common data pre-processing steps.
mir_eval also includes the following additional submodules:
mir_eval.io which contains convenience functions for loading in task-specific data from common file formats
mir_eval.util which includes miscellaneous functionality shared across the submodules
mir_eval.sonify which implements some simple methods for synthesizing annotations of various formats for “evaluation by ear”.
mir_eval.display which provides functions for plotting annotations for various tasks.
The following subsections document each submodule.
The aim of a beat detection algorithm is to report the times at which a typical
human listener might tap their foot to a piece of music. As a result, most
metrics for evaluating the performance of beat tracking systems involve
computing the error between the estimated beat times and some reference list of
beat locations. Many metrics additionally compare the beat sequences at
different metric levels in order to deal with the ambiguity of tempo.
Based on the methods described in:
Matthew E. P. Davies, Norberto Degara, and Mark D. Plumbley.
“Evaluation Methods for Musical Audio Beat Tracking Algorithms”,
Queen Mary University of London Technical Report C4DM-TR-09-06
London, United Kingdom, 8 October 2009.
Beat times should be provided in the form of a 1-dimensional array of beat
times in seconds in increasing order. Typically, any beats which occur before
5s are ignored; this can be accomplished using
mir_eval.beat.trim_beats().
mir_eval.beat.f_measure(): The F-measure of the beat sequence, where an
estimated beat is considered correct if it is sufficiently close to a
reference beat
mir_eval.beat.cemgil(): Cemgil’s score, which computes the sum of
Gaussian errors for each beat
mir_eval.beat.goto(): Goto’s score, a binary score which is 1 when at
least 25% of the estimated beat sequence closely matches the reference beat
sequence
mir_eval.beat.p_score(): McKinney’s P-score, which computes the
cross-correlation of the estimated and reference beat sequences represented
as impulse trains
mir_eval.beat.continuity(): Continuity-based scores which compute the
proportion of the beat sequence which is continuously correct
Chord estimation algorithms produce a list of intervals and labels which denote
the chord being played over each timespan. They are evaluated by comparing the
estimated chord labels to some reference, usually using a mapping to a chord
subalphabet (e.g. minor and major chords only, all triads, etc.). There is no
single ‘right’ way to compare two sequences of chord labels. Embracing this
reality, every conventional comparison rule is provided. Comparisons are made
over the different components of each chord (e.g. G:maj(6)/5): the root (G),
the root-invariant active semitones as determined by the quality
shorthand (maj) and scale degrees (6), and the bass interval (5).
This submodule provides functions both for comparing a sequences of chord
labels according to some chord subalphabet mapping and for using these
comparisons to score a sequence of estimated chords against a reference.
A sequence of chord labels is represented as a list of strings, where each
label is the chord name based on the syntax of [1]. Reference
and estimated chord label sequences should be of the same length for comparison
functions. When converting the chord string into its constituent parts,
Pitch class counting starts at C, e.g. C:0, D:2, E:4, F:5, etc.
Scale degree is represented as a string of the diatonic interval, relative to
the root note, e.g. ‘b6’, ‘#5’, or ‘7’
Bass intervals are represented as strings
Chord bitmaps are positional binary vectors indicating active pitch classes
and may be absolute or relative depending on context in the code.
If no chord is present at a given point in time, it should have the label ‘N’,
which is defined in the variable mir_eval.chord.NO_CHORD.
mir_eval.chord.majmin_inv(): Compares major/minor chords, with
inversions. The bass note must exist in the triad.
mir_eval.chord.mirex(): A estimated chord is considered correct if it
shares at least three pitch classes in common.
mir_eval.chord.thirds(): Chords are compared at the level of major or
minor thirds (root and third), For example, both (‘A:7’, ‘A:maj’) and
(‘A:min’, ‘A:dim’) are equivalent, as the third is major and minor in
quality, respectively.
mir_eval.chord.triads(): Chords are considered at the level of triads
(major, minor, augmented, diminished, suspended), meaning that, in addition
to the root, the quality is only considered through #5th scale degree (for
augmented chords). For example, (‘A:7’, ‘A:maj’) are equivalent, while
(‘A:min’, ‘A:dim’) and (‘A:aug’, ‘A:maj’) are not.
mir_eval.chord.tetrads(): Chords are considered at the level of the
entire quality in closed voicing, i.e. spanning only a single octave;
extended chords (9’s, 11’s and 13’s) are rolled into a single octave with any
upper voices included as extensions. For example, (‘A:7’, ‘A:9’) are
equivlent but (‘A:7’, ‘A:maj7’) are not.
mir_eval.chord.sevenths(): Compares according to MIREX “sevenths”
rules; that is, only major, major seventh, seventh, minor, minor seventh and
no chord labels are compared.
Compute the weighted accuracy of a list of chord comparisons.
Parameters:
comparisonsnp.ndarray
List of chord comparison scores, in [0, 1] or -1
weightsnp.ndarray
Weights (not necessarily normalized) for each comparison.
This can be a list of interval durations
Returns:
scorefloat
Weighted accuracy
Examples
>>> (ref_intervals,... ref_labels)=mir_eval.io.load_labeled_intervals('ref.lab')>>> (est_intervals,... est_labels)=mir_eval.io.load_labeled_intervals('est.lab')>>> est_intervals,est_labels=mir_eval.util.adjust_intervals(... est_intervals,est_labels,ref_intervals.min(),... ref_intervals.max(),mir_eval.chord.NO_CHORD,... mir_eval.chord.NO_CHORD)>>> (intervals,... ref_labels,... est_labels)=mir_eval.util.merge_labeled_intervals(... ref_intervals,ref_labels,est_intervals,est_labels)>>> durations=mir_eval.util.intervals_to_durations(intervals)>>> # Here, we're using the "thirds" function to compare labels>>> # but any of the comparison functions would work.>>> comparisons=mir_eval.chord.thirds(ref_labels,est_labels)>>> score=mir_eval.chord.weighted_accuracy(comparisons,durations)
Compare chords along major-minor rules, with inversions. Chords with
qualities outside Major/minor/no-chord are ignored, and the bass note must
exist in the triad (bass in [1, 3, 5]).
Compute the directional hamming distance between reference and
estimated intervals as defined by [1] and used for MIREX
‘OverSeg’, ‘UnderSeg’ and ‘MeanSeg’ measures.
Melody extraction algorithms aim to produce a sequence of frequency values
corresponding to the pitch of the dominant melody from a musical
recording. For evaluation, an estimated pitch series is evaluated against a
reference based on whether the voicing (melody present or not) and the pitch
is correct (within some tolerance).
For a detailed explanation of the measures please refer to:
J. Salamon, E. Gomez, D. P. W. Ellis and G. Richard, “Melody Extraction
from Polyphonic Music Signals: Approaches, Applications and Challenges”,
IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.
and:
G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S.
Streich, and B. Ong. “Melody transcription from music audio:
Approaches and evaluation”, IEEE Transactions on Audio, Speech, and
Language Processing, 15(4):1247-1256, 2007.
For an explanation of the generalized measures (using non-binary voicings),
please refer to:
R. Bittner and J. Bosch, “Generalized Metrics for Single-F0 Estimation
Evaluation”, International Society for Music Information Retrieval
Conference (ISMIR), 2019.
Melody annotations are assumed to be given in the format of a 1d array of
frequency values which are accompanied by a 1d array of times denoting when
each frequency value occurs. In a reference melody time series, a frequency
value of 0 denotes “unvoiced”. In a estimated melody time series, unvoiced
frames can be indicated either by 0 Hz or by a negative Hz value - negative
values represent the algorithm’s pitch estimate for frames it has determined as
unvoiced, in case they are in fact voiced.
Metrics are computed using a sequence of reference and estimated pitches in
cents and voicing arrays, both of which are sampled to the same
timebase. The function mir_eval.melody.to_cent_voicing() can be used to
convert a sequence of estimated and reference times and frequency values in Hz
to voicing arrays and frequency arrays in the format required by the
metric functions. By default, the convention is to resample the estimated
melody time series to the reference melody time series’ timebase.
mir_eval.melody.voicing_measures(): Voicing measures, including the
recall rate (proportion of frames labeled as melody frames in the reference
that are estimated as melody frames) and the false alarm
rate (proportion of frames labeled as non-melody in the reference that are
mistakenly estimated as melody frames)
mir_eval.melody.raw_pitch_accuracy(): Raw Pitch Accuracy, which
computes the proportion of melody frames in the reference for which the
frequency is considered correct (i.e. within half a semitone of the reference
frequency)
mir_eval.melody.raw_chroma_accuracy(): Raw Chroma Accuracy, where the
estimated and reference frequency sequences are mapped onto a single octave
before computing the raw pitch accuracy
mir_eval.melody.overall_accuracy(): Overall Accuracy, which computes
the proportion of all frames correctly estimated by the algorithm, including
whether non-melody frames where labeled by the algorithm as non-melody
Compute the voicing recall given two voicing
indicator sequences, one as reference (truth) and the other as the estimate
(prediction). The sequences must be of the same length.
Examples
——–
>>> ref_time, ref_freq = mir_eval.io.load_time_series(‘ref.txt’)
>>> est_time, est_freq = mir_eval.io.load_time_series(‘est.txt’)
>>> (ref_v, ref_c,
… est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
… ref_freq,
… est_time,
… est_freq)
>>> recall = mir_eval.melody.voicing_recall(ref_v, est_v)
Parameters
———-
ref_voicing : np.ndarray
Compute the voicing false alarm rates given two voicing
indicator sequences, one as reference (truth) and the other as the estimate
(prediction). The sequences must be of the same length.
Examples
——–
>>> ref_time, ref_freq = mir_eval.io.load_time_series(‘ref.txt’)
>>> est_time, est_freq = mir_eval.io.load_time_series(‘est.txt’)
>>> (ref_v, ref_c,
… est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
… ref_freq,
… est_time,
… est_freq)
>>> false_alarm = mir_eval.melody.voicing_false_alarm(ref_v, est_v)
Parameters
———-
ref_voicing : np.ndarray
Compute the voicing recall and false alarm rates given two voicing
indicator sequences, one as reference (truth) and the other as the estimate
(prediction). The sequences must be of the same length.
Examples
——–
>>> ref_time, ref_freq = mir_eval.io.load_time_series(‘ref.txt’)
>>> est_time, est_freq = mir_eval.io.load_time_series(‘est.txt’)
>>> (ref_v, ref_c,
… est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
… ref_freq,
… est_time,
… est_freq)
>>> recall, false_alarm = mir_eval.melody.voicing_measures(ref_v,
… est_v)
Parameters
———-
ref_voicing : np.ndarray
Compute the raw pitch accuracy given two pitch (frequency) sequences in
cents and matching voicing indicator sequences. The first pitch and voicing
arrays are treated as the reference (truth), and the second two as the
estimate (prediction). All 4 sequences must be of the same length.
Parameters:
ref_voicingnp.ndarray
Reference voicing array. When this array is non-binary, it is treated
as a ‘reference reward’, as in (Bittner & Bosch, 2019)
ref_centnp.ndarray
Reference pitch sequence in cents
est_voicingnp.ndarray
Estimated voicing array
est_centnp.ndarray
Estimate pitch sequence in cents
cent_tolerancefloat
Maximum absolute deviation in cents for a frequency value to be
considered correct
(Default value = 50)
Returns:
raw_pitchfloat
Raw pitch accuracy, the fraction of voiced frames in ref_cent for
which est_cent provides a correct frequency values
(within cent_tolerance cents).
Compute the raw chroma accuracy given two pitch (frequency) sequences
in cents and matching voicing indicator sequences. The first pitch and
voicing arrays are treated as the reference (truth), and the second two as
the estimate (prediction). All 4 sequences must be of the same length.
Parameters:
ref_voicingnp.ndarray
Reference voicing array. When this array is non-binary, it is treated
as a ‘reference reward’, as in (Bittner & Bosch, 2019)
ref_centnp.ndarray
Reference pitch sequence in cents
est_voicingnp.ndarray
Estimated voicing array
est_centnp.ndarray
Estimate pitch sequence in cents
cent_tolerancefloat
Maximum absolute deviation in cents for a frequency value to be
considered correct
(Default value = 50)
Returns:
raw_chromafloat
Raw chroma accuracy, the fraction of voiced frames in ref_cent for
which est_cent provides a correct frequency values (within
cent_tolerance cents), ignoring octave errors
Compute the overall accuracy given two pitch (frequency) sequences
in cents and matching voicing indicator sequences. The first pitch and
voicing arrays are treated as the reference (truth), and the second two
as the estimate (prediction). All 4 sequences must be of the same length.
Parameters:
ref_voicingnp.ndarray
Reference voicing array. When this array is non-binary, it is treated
as a ‘reference reward’, as in (Bittner & Bosch, 2019)
ref_centnp.ndarray
Reference pitch sequence in cents
est_voicingnp.ndarray
Estimated voicing array
est_centnp.ndarray
Estimate pitch sequence in cents
cent_tolerancefloat
Maximum absolute deviation in cents for a frequency value to be
considered correct
(Default value = 50)
Returns:
overall_accuracyfloat
Overall accuracy, the total fraction of correctly estimates frames,
where provides a correct frequency values (within cent_tolerance).
Evaluate two melody (predominant f0) transcriptions, where the first is
treated as the reference (ground truth) and the second as the estimate to
be evaluated (prediction).
Parameters:
ref_timenp.ndarray
Time of each reference frequency value
ref_freqnp.ndarray
Array of reference frequency values
est_timenp.ndarray
Time of each estimated frequency value
est_freqnp.ndarray
Array of estimated frequency values
est_voicingnp.ndarray
Estimate voicing confidence.
Default None, which means the voicing is inferred from est_freq:
frames with frequency <= 0.0 are considered “unvoiced”
frames with frequency > 0.0 are considered “voiced”
ref_rewardnp.ndarray
Reference pitch estimation reward.
Default None, which means all frames are weighted equally.
kwargs
Additional keyword arguments which will be passed to the
appropriate metric or preprocessing functions.
Returns:
scoresdict
Dictionary of scores, where the key is the metric name (str) and
the value is the (float) score achieved.
The goal of multiple f0 (multipitch) estimation and tracking is to identify
all of the active fundamental frequencies in each time frame in a complex music
signal.
Multipitch estimates are represented by a timebase and a corresponding list
of arrays of frequency estimates. Frequency estimates may have any number of
frequency values, including 0 (represented by an empty array). Time values are
in units of seconds and frequency estimates are in units of Hz.
The timebase of the estimate time series should ideally match the timebase of
the reference time series, but if this is not the case, the estimate time
series is resampled using a nearest neighbor interpolation to match the
estimate. Time values in the estimate time series that are outside of the range
of the reference time series are given null (empty array) frequencies.
By default, a frequency is “correct” if it is within 0.5 semitones of a
reference frequency. Frequency values are compared by first mapping them to
log-2 semitone space, where the distance between semitones is constant.
Chroma-wrapped frequency values are computed by taking the log-2 frequency
values modulo 12 to map them down to a single octave. A chroma-wrapped
frequency estimate is correct if it’s single-octave value is within 0.5
semitones of the single-octave reference frequency.
The metrics are based on those described in
[5] and [6].
mir_eval.multipitch.metrics(): Precision, Recall, Accuracy,
Substitution, Miss, False Alarm, and Total Error scores based both on raw
frequency values and values mapped to a single octave (chroma).
Compute multipitch metrics. All metrics are computed at the ‘macro’ level
such that the frame true positive/false positive/false negative rates are
summed across time and the metrics are computed on the combined values.
Parameters:
ref_timenp.ndarray
Time of each reference frequency value
ref_freqslist of np.ndarray
List of np.ndarrays of reference frequency values
est_timenp.ndarray
Time of each estimated frequency value
est_freqslist of np.ndarray
List of np.ndarrays of estimate frequency values
kwargs
Additional keyword arguments which will be passed to the
appropriate metric or preprocessing functions.
Evaluate two multipitch (multi-f0) transcriptions, where the first is
treated as the reference (ground truth) and the second as the estimate to
be evaluated (prediction).
Parameters:
ref_timenp.ndarray
Time of each reference frequency value
ref_freqslist of np.ndarray
List of np.ndarrays of reference frequency values
est_timenp.ndarray
Time of each estimated frequency value
est_freqslist of np.ndarray
List of np.ndarrays of estimate frequency values
kwargs
Additional keyword arguments which will be passed to the
appropriate metric or preprocessing functions.
Returns:
scoresdict
Dictionary of scores, where the key is the metric name (str) and
the value is the (float) score achieved.
The goal of an onset detection algorithm is to automatically determine when
notes are played in a piece of music. The primary method used to evaluate
onset detectors is to first determine which estimated onsets are “correct”,
where correctness is defined as being within a small window of a reference
onset.
mir_eval.onset.f_measure(): Precision, Recall, and F-measure scores
based on the number of esimated onsets which are sufficiently close to
reference onsets.
Pattern discovery involves the identification of musical patterns (i.e. short
fragments or melodic ideas that repeat at least twice) both from audio and
symbolic representations. The metrics used to evaluate pattern discovery
systems attempt to quantify the ability of the algorithm to not only determine
the present patterns in a piece, but also to find all of their occurrences.
The input format can be automatically generated by calling
mir_eval.io.load_patterns(). This format is a list of a list of
tuples. The first list collections patterns, each of which is a list of
occurences, and each occurrence is a list of MIDI onset tuples of
(onset_time,mid_note)
A pattern is a list of occurrences. The first occurrence must be the prototype
of that pattern (i.e. the most representative of all the occurrences). An
occurrence is a list of tuples containing the onset time and the midi note
number.
mir_eval.pattern.standard_FPR(): Strict metric in order to find the
possibly transposed patterns of exact length. This is the only metric that
considers transposed patterns.
mir_eval.pattern.establishment_FPR(): Evaluates the amount of patterns
that were successfully identified by the estimated results, no matter how
many occurrences they found. In other words, this metric captures how the
algorithm successfully established that a pattern repeated at least twice,
and this pattern is also found in the reference annotation.
mir_eval.pattern.occurrence_FPR(): Evaluation of how well an estimation
can effectively identify all the occurrences of the found patterns,
independently of how many patterns have been discovered. This metric has a
threshold parameter that indicates how similar two occurrences must be in
order to be considered equal. In MIREX, this evaluation is run twice, with
thresholds .75 and .5.
mir_eval.pattern.three_layer_FPR(): Aims to evaluate the general
similarity between the reference and the estimations, combining both the
establishment of patterns and the retrieval of its occurrences in a single F1
score.
mir_eval.pattern.first_n_three_layer_P(): Computes the three-layer
precision for the first N patterns only in order to measure the ability of
the algorithm to sort the identified patterns based on their relevance.
mir_eval.pattern.first_n_target_proportion_R(): Computes the target
proportion recall for the first N patterns only in order to measure the
ability of the algorithm to sort the identified patterns based on their
relevance.
This metric checks if the prototype patterns of the reference match
possible translated patterns in the prototype patterns of the estimations.
Since the sizes of these prototypes must be equal, this metric is quite
restictive and it tends to be 0 in most of 2013 MIREX results.
Tolerance level when comparing reference against estimation.
Default parameter is the one found in the original matlab code by
Tom Collins used for MIREX 2013.
(Default value = 1e-5)
This metric is basically the same as the three-layer FPR but it is only
applied to the first n estimated patterns, and it only returns the
precision. In MIREX and typically, n = 5.
First n target proportion establishment recall metric.
This metric is similar is similar to the establishment FPR score, but it
only takes into account the first n estimated patterns and it only
outputs the Recall value of it.
Evaluation criteria for structural segmentation fall into two categories:
boundary annotation and structural annotation. Boundary annotation is the task
of predicting the times at which structural changes occur, such as when a verse
transitions to a refrain. Metrics for boundary annotation compare estimated
segment boundaries to reference boundaries. Structural annotation is the task
of assigning labels to detected segments. The estimated labels may be
arbitrary strings - such as A, B, C, - and they need not describe functional
concepts. Metrics for structural annotation are similar to those used for
clustering data.
Both boundary and structural annotation metrics require two dimensional arrays
with two columns, one for boundary start times and one for boundary end times.
Structural annotation further require lists of reference and estimated segment
labels which must have a length which is equal to the number of rows in the
corresponding list of boundary edges. In both tasks, we assume that
annotations express a partitioning of the track into intervals. The function
mir_eval.util.adjust_intervals() can be used to pad or crop the segment
boundaries to span the duration of the entire track.
mir_eval.segment.detection(): An estimated boundary is considered
correct if it falls within a window around a reference boundary
[7]
mir_eval.segment.deviation(): Computes the median absolute time
difference from a reference boundary to its nearest estimated boundary, and
vice versa [7]
mir_eval.segment.pairwise(): For classifying pairs of sampled time
instants as belonging to the same structural component [8]
mir_eval.segment.nce(): Interprets sampled reference and estimated
labels as samples of random variables Y_R, Y_E from which the
conditional entropy of Y_R given Y_E (Under-Segmentation) and
Y_E given Y_R (Over-Segmentation) are estimated
[9]
mir_eval.segment.mutual_information(): Computes the standard,
normalized, and adjusted mutual information of sampled reference and
estimated segments
mir_eval.segment.vmeasure(): Computes the V-Measure, which is similar
to the conditional entropy metrics, but uses the marginal distributions
as normalization rather than the maximum entropy distribution
[10]
Checks that the input annotations to a segment boundary estimation
metric (i.e. one that only takes in segment intervals) look like valid
segment times, and throws helpful errors if not.
Checks that the input annotations to a structure estimation metric (i.e.
one that takes in both segment boundaries and their labels) look like valid
segment times and labels, and throws helpful errors if not.
A hit is counted whenever an reference boundary is within window of a
estimated boundary. Note that each boundary is matched at most once: this
is achieved by computing the size of a maximal matching between reference
and estimated boundary points, subject to the window constraint.
size of the window of ‘correctness’ around ground-truth beats
(in seconds)
(Default value = 0.5)
betafloat > 0
weighting constant for F-measure.
(Default value = 1.0)
trimboolean
if True, the first and last boundary times are ignored.
Typically, these denote start (0) and end-markers.
(Default value = False)
Returns:
precisionfloat
precision of estimated predictions
recallfloat
recall of reference reference boundaries
f_measurefloat
F-measure (weighted harmonic mean of precision and recall)
Examples
>>> ref_intervals,_=mir_eval.io.load_labeled_intervals('ref.lab')>>> est_intervals,_=mir_eval.io.load_labeled_intervals('est.lab')>>> # With 0.5s windowing>>> P05,R05,F05=mir_eval.segment.detection(ref_intervals,... est_intervals,... window=0.5)>>> # With 3s windowing>>> P3,R3,F3=mir_eval.segment.detection(ref_intervals,... est_intervals,... window=3)>>> # Ignoring hits for the beginning and end of track>>> P,R,F=mir_eval.segment.detection(ref_intervals,... est_intervals,... window=0.5,... trim=True)
length (in seconds) of frames for clustering
(Default value = 0.1)
betafloat > 0
beta value for F-measure
(Default value = 1.0)
Returns:
precisionfloat > 0
Precision of detecting whether frames belong in the same cluster
recallfloat > 0
Recall of detecting whether frames belong in the same cluster
ffloat > 0
F-measure of detecting whether frames belong in the same cluster
Examples
>>> (ref_intervals,... ref_labels)=mir_eval.io.load_labeled_intervals('ref.lab')>>> (est_intervals,... est_labels)=mir_eval.io.load_labeled_intervals('est.lab')>>> # Trim or pad the estimate to match reference timing>>> (ref_intervals,... ref_labels)=mir_eval.util.adjust_intervals(ref_intervals,... ref_labels,... t_min=0)>>> (est_intervals,... est_labels)=mir_eval.util.adjust_intervals(... est_intervals,est_labels,t_min=0,t_max=ref_intervals.max())>>> precision,recall,f=mir_eval.structure.pairwise(ref_intervals,... ref_labels,... est_intervals,... est_labels)
length (in seconds) of frames for clustering
(Default value = 0.1)
betafloat > 0
beta value for F-measure
(Default value = 1.0)
Returns:
rand_indexfloat > 0
Rand index
Examples
>>> (ref_intervals,... ref_labels)=mir_eval.io.load_labeled_intervals('ref.lab')>>> (est_intervals,... est_labels)=mir_eval.io.load_labeled_intervals('est.lab')>>> # Trim or pad the estimate to match reference timing>>> (ref_intervals,... ref_labels)=mir_eval.util.adjust_intervals(ref_intervals,... ref_labels,... t_min=0)>>> (est_intervals,... est_labels)=mir_eval.util.adjust_intervals(... est_intervals,est_labels,t_min=0,t_max=ref_intervals.max())>>> rand_index=mir_eval.structure.rand_index(ref_intervals,... ref_labels,... est_intervals,... est_labels)
length (in seconds) of frames for clustering
(Default value = 0.1)
Returns:
ari_scorefloat > 0
Adjusted Rand index between segmentations.
Examples
>>> (ref_intervals,... ref_labels)=mir_eval.io.load_labeled_intervals('ref.lab')>>> (est_intervals,... est_labels)=mir_eval.io.load_labeled_intervals('est.lab')>>> # Trim or pad the estimate to match reference timing>>> (ref_intervals,... ref_labels)=mir_eval.util.adjust_intervals(ref_intervals,... ref_labels,... t_min=0)>>> (est_intervals,... est_labels)=mir_eval.util.adjust_intervals(... est_intervals,est_labels,t_min=0,t_max=ref_intervals.max())>>> ari_score=mir_eval.structure.ari(ref_intervals,ref_labels,... est_intervals,est_labels)
length (in seconds) of frames for clustering
(Default value = 0.1)
Returns:
MIfloat > 0
Mutual information between segmentations
AMIfloat
Adjusted mutual information between segmentations.
NMIfloat > 0
Normalize mutual information between segmentations
Examples
>>> (ref_intervals,... ref_labels)=mir_eval.io.load_labeled_intervals('ref.lab')>>> (est_intervals,... est_labels)=mir_eval.io.load_labeled_intervals('est.lab')>>> # Trim or pad the estimate to match reference timing>>> (ref_intervals,... ref_labels)=mir_eval.util.adjust_intervals(ref_intervals,... ref_labels,... t_min=0)>>> (est_intervals,... est_labels)=mir_eval.util.adjust_intervals(... est_intervals,est_labels,t_min=0,t_max=ref_intervals.max())>>> mi,ami,nmi=mir_eval.structure.mutual_information(ref_intervals,... ref_labels,... est_intervals,... est_labels)
length (in seconds) of frames for clustering
(Default value = 0.1)
betafloat > 0
beta for F-measure
(Default value = 1.0)
marginalbool
If False, normalize conditional entropy by uniform entropy.
If True, normalize conditional entropy by the marginal entropy.
(Default value = False)
Returns:
S_over
Over-clustering score:
For marginal=False, 1-H(y_est|y_ref)/log(|y_est|)
For marginal=True, 1-H(y_est|y_ref)/H(y_est)
If |y_est|==1, then S_over will be 0.
S_under
Under-clustering score:
For marginal=False, 1-H(y_ref|y_est)/log(|y_ref|)
For marginal=True, 1-H(y_ref|y_est)/H(y_ref)
If |y_ref|==1, then S_under will be 0.
S_F
F-measure for (S_over, S_under)
Examples
>>> (ref_intervals,... ref_labels)=mir_eval.io.load_labeled_intervals('ref.lab')>>> (est_intervals,... est_labels)=mir_eval.io.load_labeled_intervals('est.lab')>>> # Trim or pad the estimate to match reference timing>>> (ref_intervals,... ref_labels)=mir_eval.util.adjust_intervals(ref_intervals,... ref_labels,... t_min=0)>>> (est_intervals,... est_labels)=mir_eval.util.adjust_intervals(... est_intervals,est_labels,t_min=0,t_max=ref_intervals.max())>>> S_over,S_under,S_F=mir_eval.structure.nce(ref_intervals,... ref_labels,... est_intervals,... est_labels)
length (in seconds) of frames for clustering
(Default value = 0.1)
betafloat > 0
beta for F-measure
(Default value = 1.0)
Returns:
V_precision
Over-clustering score:
1-H(y_est|y_ref)/H(y_est)
If |y_est|==1, then V_precision will be 0.
V_recall
Under-clustering score:
1-H(y_ref|y_est)/H(y_ref)
If |y_ref|==1, then V_recall will be 0.
V_F
F-measure for (V_precision, V_recall)
Examples
>>> (ref_intervals,... ref_labels)=mir_eval.io.load_labeled_intervals('ref.lab')>>> (est_intervals,... est_labels)=mir_eval.io.load_labeled_intervals('est.lab')>>> # Trim or pad the estimate to match reference timing>>> (ref_intervals,... ref_labels)=mir_eval.util.adjust_intervals(ref_intervals,... ref_labels,... t_min=0)>>> (est_intervals,... est_labels)=mir_eval.util.adjust_intervals(... est_intervals,est_labels,t_min=0,t_max=ref_intervals.max())>>> V_precision,V_recall,V_F=mir_eval.structure.vmeasure(ref_intervals,... ref_labels,... est_intervals,... est_labels)
Evaluation criteria for hierarchical structure analysis.
Hierarchical structure analysis seeks to annotate a track with a nested
decomposition of the temporal elements of the piece, effectively providing
a kind of “parse tree” of the composition. Unlike the flat segmentation
metrics defined in mir_eval.segment, which can only encode one level of
analysis, hierarchical annotations expose the relationships between short
segments and the larger compositional elements to which they belong.
Annotations are assumed to take the form of an ordered list of segmentations.
As in the mir_eval.segment metrics, each segmentation itself consists of
an n-by-2 array of interval times, so that the i th segment spans time
intervals[i,0] to intervals[i,1].
Hierarchical annotations are ordered by increasing specificity, so that the
first segmentation should contain the fewest segments, and the last
segmentation contains the most.
Computes the tree measures for hierarchical segment annotations.
Parameters:
reference_intervals_hierlist of ndarray
reference_intervals_hier[i] contains the segment intervals
(in seconds) for the i th layer of the annotations. Layers are
ordered from top to bottom, so that the last list of intervals should
be the most specific.
estimated_intervals_hierlist of ndarray
Like reference_intervals_hier but for the estimated annotation
transitivebool
whether to compute the t-measures using transitivity or not.
windowfloat > 0
size of the window (in seconds). For each query frame q,
result frames are only counted within q +- window.
frame_sizefloat > 0
length (in seconds) of frames. The frame size cannot be longer than
the window.
betafloat > 0
beta parameter for the F-measure.
Returns:
t_precisionnumber [0, 1]
T-measure Precision
t_recallnumber [0, 1]
T-measure Recall
t_measurenumber [0, 1]
F-beta measure for (t_precision,t_recall)
Raises:
ValueError
If either of the input hierarchies are inconsistent
If the input hierarchies have different time durations
Computes the tree measures for hierarchical segment annotations.
Parameters:
reference_intervals_hierlist of ndarray
reference_intervals_hier[i] contains the segment intervals
(in seconds) for the i th layer of the annotations. Layers are
ordered from top to bottom, so that the last list of intervals should
be the most specific.
reference_labels_hierlist of list of str
reference_labels_hier[i] contains the segment labels for the
``i``th layer of the annotations
estimated_intervals_hierlist of ndarray
estimated_labels_hierlist of ndarray
Like reference_intervals_hier and reference_labels_hier
but for the estimated annotation
frame_sizefloat > 0
length (in seconds) of frames. The frame size cannot be longer than
the window.
betafloat > 0
beta parameter for the F-measure.
Returns:
l_precisionnumber [0, 1]
L-measure Precision
l_recallnumber [0, 1]
L-measure Recall
l_measurenumber [0, 1]
F-beta measure for (l_precision,l_recall)
Raises:
ValueError
If either of the input hierarchies are inconsistent
If the input hierarchies have different time durations
Compute all hierarchical structure metrics for the given reference and
estimated annotations.
Parameters:
ref_intervals_hierlist of list-like
ref_labels_hierlist of list of str
est_intervals_hierlist of list-like
est_labels_hierlist of list of str
Hierarchical annotations are encoded as an ordered list
of segmentations. Each segmentation itself is a list (or list-like)
of intervals (*_intervals_hier) and a list of lists of labels
(*_labels_hier).
kwargs
additional keyword arguments to the evaluation metrics.
Returns:
scoresOrderedDict
Dictionary of scores, where the key is the metric name (str) and
the value is the (float) score achieved.
T-measures are computed in both the “full” (transitive=True) and
“reduced” (transitive=False) modes.
Raises:
ValueError
Thrown when the provided annotations are not valid.
Source separation algorithms attempt to extract recordings of individual
sources from a recording of a mixture of sources. Evaluation methods for
source separation compare the extracted sources from reference sources and
attempt to measure the perceptual quality of the separation.
An audio signal is expected to be in the format of a 1-dimensional array where
the entries are the samples of the audio signal. When providing a group of
estimated or reference sources, they should be provided in a 2-dimensional
array, where the first dimension corresponds to the source number and the
second corresponds to the samples.
mir_eval.separation.bss_eval_sources(): Computes the bss_eval_sources
metrics from bss_eval, which optionally optimally match the estimated sources
to the reference sources and measure the distortion and artifacts present in
the estimated sources as well as the interference between them.
Ordering and measurement of the separation quality for estimated source
signals in terms of filtered true source, interference and artifacts.
The decomposition allows a time-invariant filter distortion of length
512, as described in Section III.B of [13].
Passing False for compute_permutation will improve the computation
performance of the evaluation; however, it is not always appropriate and
is not the way that the BSS_EVAL Matlab toolbox computes bss_eval_sources.
Parameters:
reference_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing true sources (must have same shape as
estimated_sources)
estimated_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing estimated sources (must have same shape as
reference_sources)
compute_permutationbool, optional
compute permutation of estimate/source combinations (True by default)
Returns:
sdrnp.ndarray, shape=(nsrc,)
vector of Signal to Distortion Ratios (SDR)
sirnp.ndarray, shape=(nsrc,)
vector of Source to Interference Ratios (SIR)
sarnp.ndarray, shape=(nsrc,)
vector of Sources to Artifacts Ratios (SAR)
permnp.ndarray, shape=(nsrc,)
vector containing the best ordering of estimated sources in
the mean SIR sense (estimated source number perm[j] corresponds to
true source number j). Note: perm will be [0,1,...,nsrc-1] if compute_permutation is False.
References
Examples
>>> # reference_sources[n] should be an ndarray of samples of the>>> # n'th reference source>>> # estimated_sources[n] should be the same for the n'th estimated>>> # source>>> (sdr,sir,sar,... perm)=mir_eval.separation.bss_eval_sources(reference_sources,... estimated_sources)
Please be aware that this function does not compute permutations (by
default) on the possible relations between reference_sources and
estimated_sources due to the dangers of a changing permutation. Therefore
(by default), it assumes that reference_sources[i] corresponds to
estimated_sources[i]. To enable computing permutations please set
compute_permutation to be True and check that the returned perm
is identical for all windows.
NOTE: if reference_sources and estimated_sources would be evaluated
using only a single window or are shorter than the window length, the
result of mir_eval.separation.bss_eval_sources() called on
reference_sources and estimated_sources (with the
compute_permutation parameter passed to
mir_eval.separation.bss_eval_sources()) is returned.
Parameters:
reference_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing true sources (must have the same shape as
estimated_sources)
estimated_sourcesnp.ndarray, shape=(nsrc, nsampl)
matrix containing estimated sources (must have the same shape as
reference_sources)
windowint, optional
Window length for framewise evaluation (default value is 30s at a
sample rate of 44.1kHz)
hopint, optional
Hop size for framewise evaluation (default value is 15s at a
sample rate of 44.1kHz)
compute_permutationbool, optional
compute permutation of estimate/source combinations for all windows
(False by default)
Returns:
sdrnp.ndarray, shape=(nsrc, nframes)
vector of Signal to Distortion Ratios (SDR)
sirnp.ndarray, shape=(nsrc, nframes)
vector of Source to Interference Ratios (SIR)
sarnp.ndarray, shape=(nsrc, nframes)
vector of Sources to Artifacts Ratios (SAR)
permnp.ndarray, shape=(nsrc, nframes)
vector containing the best ordering of estimated sources in
the mean SIR sense (estimated source number perm[j] corresponds to
true source number j). Note: perm will be range(nsrc) for
all windows if compute_permutation is False
Examples
>>> # reference_sources[n] should be an ndarray of samples of the>>> # n'th reference source>>> # estimated_sources[n] should be the same for the n'th estimated>>> # source>>> (sdr,sir,sar,... perm)=mir_eval.separation.bss_eval_sources_framewise( reference_sources,... estimated_sources)
Implementation of the bss_eval_images function from the
BSS_EVAL Matlab toolbox.
Ordering and measurement of the separation quality for estimated source
signals in terms of filtered true source, interference and artifacts.
This method also provides the ISR measure.
The decomposition allows a time-invariant filter distortion of length
512, as described in Section III.B of [13].
Passing False for compute_permutation will improve the computation
performance of the evaluation; however, it is not always appropriate and
is not the way that the BSS_EVAL Matlab toolbox computes bss_eval_images.
compute permutation of estimate/source combinations (True by default)
Returns:
sdrnp.ndarray, shape=(nsrc,)
vector of Signal to Distortion Ratios (SDR)
isrnp.ndarray, shape=(nsrc,)
vector of source Image to Spatial distortion Ratios (ISR)
sirnp.ndarray, shape=(nsrc,)
vector of Source to Interference Ratios (SIR)
sarnp.ndarray, shape=(nsrc,)
vector of Sources to Artifacts Ratios (SAR)
permnp.ndarray, shape=(nsrc,)
vector containing the best ordering of estimated sources in
the mean SIR sense (estimated source number perm[j] corresponds to
true source number j). Note: perm will be (1,2,...,nsrc)
if compute_permutation is False.
References
Examples
>>> # reference_sources[n] should be an ndarray of samples of the>>> # n'th reference source>>> # estimated_sources[n] should be the same for the n'th estimated>>> # source>>> (sdr,isr,sir,sar,... perm)=mir_eval.separation.bss_eval_images(reference_sources,... estimated_sources)
Please be aware that this function does not compute permutations (by
default) on the possible relations between reference_sources and
estimated_sources due to the dangers of a changing permutation.
Therefore (by default), it assumes that reference_sources[i]
corresponds to estimated_sources[i]. To enable computing permutations
please set compute_permutation to be True and check that the
returned perm is identical for all windows.
NOTE: if reference_sources and estimated_sources would be evaluated
using only a single window or are shorter than the window length, the
result of bss_eval_images called on reference_sources and
estimated_sources (with the compute_permutation parameter passed to
bss_eval_images) is returned
matrix containing estimated sources (must have the same shape as
reference_sources)
windowint
Window length for framewise evaluation
hopint
Hop size for framewise evaluation
compute_permutationbool, optional
compute permutation of estimate/source combinations for all windows
(False by default)
Returns:
sdrnp.ndarray, shape=(nsrc, nframes)
vector of Signal to Distortion Ratios (SDR)
isrnp.ndarray, shape=(nsrc, nframes)
vector of source Image to Spatial distortion Ratios (ISR)
sirnp.ndarray, shape=(nsrc, nframes)
vector of Source to Interference Ratios (SIR)
sarnp.ndarray, shape=(nsrc, nframes)
vector of Sources to Artifacts Ratios (SAR)
permnp.ndarray, shape=(nsrc, nframes)
vector containing the best ordering of estimated sources in
the mean SIR sense (estimated source number perm[j] corresponds to
true source number j)
Note: perm will be range(nsrc) for all windows if compute_permutation
is False
Examples
>>> # reference_sources[n] should be an ndarray of samples of the>>> # n'th reference source>>> # estimated_sources[n] should be the same for the n'th estimated>>> # source>>> (sdr,isr,sir,sar,... perm)=mir_eval.separation.bss_eval_images_framewise( reference_sources,... estimated_sources, window,.... hop)
Additional keyword arguments which will be passed to the
appropriate metric or preprocessing functions.
Returns:
scoresdict
Dictionary of scores, where the key is the metric name (str) and
the value is the (float) score achieved.
Examples
>>> # reference_sources[n] should be an ndarray of samples of the>>> # n'th reference source>>> # estimated_sources[n] should be the same for the n'th estimated source>>> scores=mir_eval.separation.evaluate(reference_sources,... estimated_sources)
The aim of a transcription algorithm is to produce a symbolic representation of
a recorded piece of music in the form of a set of discrete notes. There are
different ways to represent notes symbolically. Here we use the piano-roll
convention, meaning each note has a start time, a duration (or end time), and
a single, constant, pitch value. Pitch values can be quantized (e.g. to a
semitone grid tuned to 440 Hz), but do not have to be. Also, the transcription
can contain the notes of a single instrument or voice (for example the melody),
or the notes of all instruments/voices in the recording. This module is
instrument agnostic: all notes in the estimate are compared against all notes
in the reference.
There are many metrics for evaluating transcription algorithms. Here we limit
ourselves to the most simple and commonly used: given two sets of notes, we
count how many estimated notes match the reference, and how many do not. Based
on these counts we compute the precision, recall, f-measure and overlap ratio
of the estimate given the reference. The default criteria for considering two
notes to be a match are adopted from the MIREX Multiple fundamental frequency
estimation and tracking, Note Tracking subtask (task 2):
“This subtask is evaluated in two different ways. In the first setup , a
returned note is assumed correct if its onset is within +-50ms of a reference
note and its F0 is within +- quarter tone of the corresponding reference note,
ignoring the returned offset values. In the second setup, on top of the above
requirements, a correct returned note is required to have an offset value
within 20% of the reference note’s duration around the reference note’s
offset, or within 50ms whichever is larger.”
In short, we compute precision, recall, f-measure and overlap ratio, once
without taking offsets into account, and the second time with.
For further details see Salamon, 2013 (page 186), and references therein:
Salamon, J. (2013). Melody Extraction from Polyphonic Music Signals.
Ph.D. thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2013.
IMPORTANT NOTE: the evaluation code in mir_eval contains several important
differences with respect to the code used in MIREX 2015 for the Note Tracking
subtask on the Su dataset (henceforth “MIREX”):
mir_eval uses bipartite graph matching to find the optimal pairing of
reference notes to estimated notes. MIREX uses a greedy matching algorithm,
which can produce sub-optimal note matching. This will result in
mir_eval’s metrics being slightly higher compared to MIREX.
MIREX rounds down the onset and offset times of each note to 2 decimal
points using new_time=0.01*floor(time*100). mir_eval rounds down
the note onset and offset times to 4 decinal points. This will bring our
metrics down a notch compared to the MIREX results.
In the MIREX wiki, the criterion for matching offsets is that they must be
within 0.2*ref_durationor 0.05 seconds from each other, whichever
is greater (i.e. offset_dif<=max(0.2*ref_duration,0.05). The
MIREX code however only uses a threshold of 0.2*ref_duration, without
the 0.05 second minimum. Since mir_eval does include this minimum, it
might produce slightly higher results compared to MIREX.
This means that differences 1 and 3 bring mir_eval’s metrics up compared to
MIREX, whilst 2 brings them down. Based on internal testing, overall the effect
of these three differences is that the Precision, Recall and F-measure returned
by mir_eval will be higher compared to MIREX by about 1%-2%.
Finally, note that different evaluation scripts have been used for the Multi-F0
Note Tracking task in MIREX over the years. In particular, some scripts used
< for matching onsets, offsets, and pitch values, whilst the others used
<= for these checks. mir_eval provides both options: by default the
latter (<=) is used, but you can set strict=True when calling
mir_eval.transcription.precision_recall_f1_overlap() in which case
< will be used. The default value (strict=False) is the same as that
used in MIREX 2015 for the Note Tracking subtask on the Su dataset.
Notes should be provided in the form of an interval array and a pitch array.
The interval array contains two columns, one for note onsets and the second
for note offsets (each row represents a single note). The pitch array contains
one column with the corresponding note pitch values (one value per note),
represented by their fundamental frequency (f0) in Hertz.
mir_eval.transcription.precision_recall_f1_overlap(): The precision,
recall, F-measure, and Average Overlap Ratio of the note transcription,
where an estimated note is considered correct if its pitch, onset and
(optionally) offset are sufficiently close to a reference note.
mir_eval.transcription.onset_precision_recall_f1(): The precision,
recall and F-measure of the note transcription, where an estimated note is
considered correct if its onset is sufficiently close to a reference note’s
onset. That is, these metrics are computed taking only note onsets into
account, meaning two notes could be matched even if they have very different
pitch values.
mir_eval.transcription.offset_precision_recall_f1(): The precision,
recall and F-measure of the note transcription, where an estimated note is
considered correct if its offset is sufficiently close to a reference note’s
offset. That is, these metrics are computed taking only note offsets into
account, meaning two notes could be matched even if they have very different
pitch values.
Compute a maximum matching between reference and estimated notes,
only taking note offsets into account.
Given two note sequences represented by ref_intervals and
est_intervals (see mir_eval.io.load_valued_intervals()), we seek
the largest set of correspondences (i,j) such that the offset of
reference note i has to be within offset_tolerance of the offset of
estimated note j, where offset_tolerance is equal to
offset_ratio times the reference note’s duration, i.e. offset_ratio*ref_duration[i] where ref_duration[i]=ref_intervals[i,1]-ref_intervals[i,0]. If the resulting offset_tolerance is less than
offset_min_tolerance (50 ms by default) then offset_min_tolerance
is used instead.
Every reference note is matched against at most one estimated note.
Note there are separate functions match_note_onsets() and
match_notes() for matching notes based on onsets only or based on
onset, offset, and pitch, respectively. This is because the rules for
matching note onsets and matching note offsets are different.
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
offset_ratiofloat > 0
The ratio of the reference note’s duration used to define the
offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance will equal the ref_duration*0.2, or 0.05 (50
ms), whichever is greater.
offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio
description for an explanation of how the offset tolerance is
determined.
strictbool
If strict=False (the default), threshold checks for offset
matching are performed using <= (less than or equal). If
strict=True, the threshold checks are performed using < (less
than).
Returns:
matchinglist of tuples
A list of matched reference and estimated notes.
matching[i]==(i,j) where reference note i matches estimated
note j.
Compute a maximum matching between reference and estimated notes,
only taking note onsets into account.
Given two note sequences represented by ref_intervals and
est_intervals (see mir_eval.io.load_valued_intervals()), we see
the largest set of correspondences (i,j) such that the onset of
reference note i is within onset_tolerance of the onset of
estimated note j.
Every reference note is matched against at most one estimated note.
Note there are separate functions match_note_offsets() and
match_notes() for matching notes based on offsets only or based on
onset, offset, and pitch, respectively. This is because the rules for
matching note onsets and matching note offsets are different.
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the
reference note’s onset, in seconds. Default is 0.05 (50 ms).
strictbool
If strict=False (the default), threshold checks for onset matching
are performed using <= (less than or equal). If strict=True,
the threshold checks are performed using < (less than).
Returns:
matchinglist of tuples
A list of matched reference and estimated notes.
matching[i]==(i,j) where reference note i matches estimated
note j.
Compute a maximum matching between reference and estimated notes,
subject to onset, pitch and (optionally) offset constraints.
Given two note sequences represented by ref_intervals, ref_pitches,
est_intervals and est_pitches
(see mir_eval.io.load_valued_intervals()), we seek the largest set
of correspondences (i,j) such that:
The onset of reference note i is within onset_tolerance of the
onset of estimated note j.
The pitch of reference note i is within pitch_tolerance of the
pitch of estimated note j.
If offset_ratio is not None, the offset of reference note i
has to be within offset_tolerance of the offset of estimated note
j, where offset_tolerance is equal to offset_ratio times the
reference note’s duration, i.e. offset_ratio*ref_duration[i] where
ref_duration[i]=ref_intervals[i,1]-ref_intervals[i,0]. If the
resulting offset_tolerance is less than 0.05 (50 ms), 0.05 is used
instead.
If offset_ratio is None, note offsets are ignored, and only
criteria 1 and 2 are taken into consideration.
Every reference note is matched against at most one estimated note.
This is useful for computing precision/recall metrics for note
transcription.
Note there are separate functions match_note_onsets() and
match_note_offsets() for matching notes based on onsets only or based
on offsets only, respectively.
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the
reference note’s onset, in seconds. Default is 0.05 (50 ms).
pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the
reference note’s pitch, in cents. Default is 50.0 (50 cents).
offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the
offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance will equal the ref_duration*0.2, or 0.05 (50
ms), whichever is greater. If offset_ratio is set to None,
offsets are ignored in the matching.
offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio description
for an explanation of how the offset tolerance is determined. Note:
this parameter only influences the results if offset_ratio is not
None.
strictbool
If strict=False (the default), threshold checks for onset, offset,
and pitch matching are performed using <= (less than or equal). If
strict=True, the threshold checks are performed using < (less
than).
Returns:
matchinglist of tuples
A list of matched reference and estimated notes.
matching[i]==(i,j) where reference note i matches estimated
note j.
Compute the Precision, Recall and F-measure of correct vs incorrectly
transcribed notes, and the Average Overlap Ratio for correctly transcribed
notes (see average_overlap_ratio()). “Correctness” is determined
based on note onset, pitch and (optionally) offset: an estimated note is
assumed correct if its onset is within +-50ms of a reference note and its
pitch (F0) is within +- quarter tone (50 cents) of the corresponding
reference note. If offset_ratio is None, note offsets are ignored
in the comparison. Otherwise, on top of the above requirements, a correct
returned note is required to have an offset value within 20% (by default,
adjustable via the offset_ratio parameter) of the reference note’s
duration around the reference note’s offset, or within
offset_min_tolerance (50 ms by default), whichever is larger.
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the
reference note’s onset, in seconds. Default is 0.05 (50 ms).
pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the
reference note’s pitch, in cents. Default is 50.0 (50 cents).
offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the
offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance will equal the ref_duration*0.2, or
offset_min_tolerance (0.05 by default, i.e. 50 ms), whichever is
greater. If offset_ratio is set to None, offsets are ignored in
the evaluation.
offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio
description for an explanation of how the offset tolerance is
determined. Note: this parameter only influences the results if
offset_ratio is not None.
strictbool
If strict=False (the default), threshold checks for onset, offset,
and pitch matching are performed using <= (less than or equal). If
strict=True, the threshold checks are performed using < (less
than).
betafloat > 0
Weighting factor for f-measure (default value = 1.0).
Compute the Average Overlap Ratio between a reference and estimated
note transcription. Given a reference and corresponding estimated note,
their overlap ratio (OR) is defined as the ratio between the duration of
the time segment in which the two notes overlap and the time segment
spanned by the two notes combined (earliest onset to latest offset):
The Average Overlap Ratio (AOR) is given by the mean OR computed over all
matching reference and estimated notes. The metric goes from 0 (worst) to 1
(best).
Note: this function assumes the matching of reference and estimated notes
(see match_notes()) has already been performed and is provided by the
matching parameter. Furthermore, it is highly recommended to validate
the intervals (see validate_intervals()) before calling this
function, otherwise it is possible (though unlikely) for this function to
attempt a divide-by-zero operation.
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
matchinglist of tuples
A list of matched reference and estimated notes.
matching[i]==(i,j) where reference note i matches estimated
note j.
Compute the Precision, Recall and F-measure of note onsets: an estimated
onset is considered correct if it is within +-50ms of a reference onset.
Note that this metric completely ignores note offset and note pitch. This
means an estimated onset will be considered correct if it matches a
reference onset, even if the onsets come from notes with completely
different pitches (i.e. notes that would not match with
match_notes()).
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the
reference note’s onset, in seconds. Default is 0.05 (50 ms).
strictbool
If strict=False (the default), threshold checks for onset matching
are performed using <= (less than or equal). If strict=True,
the threshold checks are performed using < (less than).
betafloat > 0
Weighting factor for f-measure (default value = 1.0).
Compute the Precision, Recall and F-measure of note offsets: an
estimated offset is considered correct if it is within +-50ms (or 20% of
the ref note duration, which ever is greater) of a reference offset. Note
that this metric completely ignores note onsets and note pitch. This means
an estimated offset will be considered correct if it matches a
reference offset, even if the offsets come from notes with completely
different pitches (i.e. notes that would not match with
match_notes()).
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the
offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance will equal the ref_duration*0.2, or
offset_min_tolerance (0.05 by default, i.e. 50 ms), whichever is
greater.
offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio
description for an explanation of how the offset tolerance is
determined.
strictbool
If strict=False (the default), threshold checks for onset matching
are performed using <= (less than or equal). If strict=True,
the threshold checks are performed using < (less than).
betafloat > 0
Weighting factor for f-measure (default value = 1.0).
Transcription evaluation, as defined in mir_eval.transcription, does not
take into account the velocities of reference and estimated notes. This
submodule implements a variant of
mir_eval.transcription.precision_recall_f1_overlap() which
additionally considers note velocity when determining whether a note is
correctly transcribed. This is done by defining a new function
mir_eval.transcription_velocity.match_notes() which first calls
mir_eval.transcription.match_notes() to get a note matching based on
onset, offset, and pitch. Then, we follow the evaluation procedure described in
[16] to test whether an estimated note should be considered
correct:
Reference velocities are re-scaled to the range [0, 1].
A linear regression is performed to estimate global scale and offset
parameters which minimize the L2 distance between matched estimated and
(rescaled) reference notes.
The scale and offset parameters are used to rescale estimated
velocities.
An estimated/reference note pair which has been matched according to the
onset, offset, and pitch is further only considered correct if the rescaled
velocities are within a predefined threshold, defaulting to 0.1.
This submodule follows the conventions of mir_eval.transcription and
additionally requires velocities to be provided as MIDI velocities in the range
[0, 127].
mir_eval.transcription_velocity.precision_recall_f1_overlap(): The
precision, recall, F-measure, and Average Overlap Ratio of the note
transcription, where an estimated note is considered correct if its pitch,
onset, velocity and (optionally) offset are sufficiently close to a reference
note.
Match notes, taking note velocity into consideration.
This function first calls mir_eval.transcription.match_notes() to
match notes according to the supplied intervals, pitches, onset, offset,
and pitch tolerances. The velocities of the matched notes are then used to
estimate a slope and intercept which can rescale the estimated velocities
so that they are as close as possible (in L2 sense) to their matched
reference velocities. Velocities are then normalized to the range [0, 1]. A
estimated note is then further only considered correct if its velocity is
within velocity_tolerance of its matched (according to pitch and
timing) reference note.
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
ref_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of reference notes
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
est_velocitiesnp.ndarray, shape=(m,)
Array of MIDI velocities (i.e. between 0 and 127) of estimated notes
onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the
reference note’s onset, in seconds. Default is 0.05 (50 ms).
pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the
reference note’s pitch, in cents. Default is 50.0 (50 cents).
offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the
offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance will equal the ref_duration*0.2, or 0.05 (50
ms), whichever is greater. If offset_ratio is set to None,
offsets are ignored in the matching.
offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio description
for an explanation of how the offset tolerance is determined. Note:
this parameter only influences the results if offset_ratio is not
None.
strictbool
If strict=False (the default), threshold checks for onset, offset,
and pitch matching are performed using <= (less than or equal). If
strict=True, the threshold checks are performed using < (less
than).
velocity_tolerancefloat > 0
Estimated notes are considered correct if, after rescaling and
normalization to [0, 1], they are within velocity_tolerance of a
matched reference note.
Returns:
matchinglist of tuples
A list of matched reference and estimated notes.
matching[i]==(i,j) where reference note i matches estimated
note j.
Compute the Precision, Recall and F-measure of correct vs incorrectly
transcribed notes, and the Average Overlap Ratio for correctly transcribed
notes (see mir_eval.transcription.average_overlap_ratio()).
“Correctness” is determined based on note onset, velocity, pitch and
(optionally) offset. An estimated note is considered correct if
Its onset is within onset_tolerance (default +-50ms) of a
reference note
Its pitch (F0) is within +/- pitch_tolerance (default one
quarter tone, 50 cents) of the corresponding reference note
Its velocity, after normalizing reference velocities to the range
[0, 1] and globally rescaling estimated velocities to minimize L2
distance between matched reference notes, is within
velocity_tolerance (default 0.1) the corresponding reference note
If offset_ratio is None, note offsets are ignored in the
comparison. Otherwise, on top of the above requirements, a correct
returned note is required to have an offset value within
offset_ratio` (default 20%) of the reference note’s duration around
the reference note’s offset, or within offset_min_tolerance
(default 50 ms), whichever is larger.
Parameters:
ref_intervalsnp.ndarray, shape=(n,2)
Array of reference notes time intervals (onset and offset times)
ref_pitchesnp.ndarray, shape=(n,)
Array of reference pitch values in Hertz
ref_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of reference notes
est_intervalsnp.ndarray, shape=(m,2)
Array of estimated notes time intervals (onset and offset times)
est_pitchesnp.ndarray, shape=(m,)
Array of estimated pitch values in Hertz
est_velocitiesnp.ndarray, shape=(n,)
Array of MIDI velocities (i.e. between 0 and 127) of estimated notes
onset_tolerancefloat > 0
The tolerance for an estimated note’s onset deviating from the
reference note’s onset, in seconds. Default is 0.05 (50 ms).
pitch_tolerancefloat > 0
The tolerance for an estimated note’s pitch deviating from the
reference note’s pitch, in cents. Default is 50.0 (50 cents).
offset_ratiofloat > 0 or None
The ratio of the reference note’s duration used to define the
offset_tolerance. Default is 0.2 (20%), meaning the
offset_tolerance will equal the ref_duration*0.2, or
offset_min_tolerance (0.05 by default, i.e. 50 ms), whichever is
greater. If offset_ratio is set to None, offsets are ignored in
the evaluation.
offset_min_tolerancefloat > 0
The minimum tolerance for offset matching. See offset_ratio
description for an explanation of how the offset tolerance is
determined. Note: this parameter only influences the results if
offset_ratio is not None.
strictbool
If strict=False (the default), threshold checks for onset, offset,
and pitch matching are performed using <= (less than or equal). If
strict=True, the threshold checks are performed using < (less
than).
velocity_tolerancefloat > 0
Estimated notes are considered correct if, after rescaling and
normalization to [0, 1], they are within velocity_tolerance of a
matched reference note.
betafloat > 0
Weighting factor for f-measure (default value = 1.0).
Key Detection involves determining the underlying key (distribution of notes
and note transitions) in a piece of music. Key detection algorithms are
evaluated by comparing their estimated key to a ground-truth reference key and
reporting a score according to the relationship of the keys.
Keys are represented as strings of the form '(key)(mode)', e.g. 'C#major' or 'Fbminor'. The case of the key is ignored. Note that certain
key strings are equivalent, e.g. 'C#major' and 'Dbmajor'. The mode
may only be specified as either 'major' or 'minor', no other mode
strings will be accepted.
Checks that a key is well-formatted, e.g. in the form 'C#major'.
The Key can be ‘X’ if it is not possible to categorize the Key and mode
can be ‘other’ if it can’t be categorized as major or minor.
Adjust a list of time intervals to span the range [t_min,t_max].
Any intervals lying completely outside the specified range will be removed.
Any intervals lying partially outside the specified range will be cropped.
If the specified range exceeds the span of the provided data in either
direction, additional intervals will be appended. If an interval is
appended at the beginning, it will be given the label start_label; if
an interval is appended at the end, it will be given the label
end_label.
Parameters:
intervalsnp.ndarray, shape=(n_events, 2)
Array of interval start and end-times
labelslist, len=n_events or None
List of labels
(Default value = None)
t_minfloat or None
Minimum interval start time.
(Default value = 0.0)
t_maxfloat or None
Maximum interval end time.
(Default value = None)
start_labelstr or float or int
Label to give any intervals appended at the beginning
(Default value = ‘__T_MIN’)
end_labelstr or float or int
Label to give any intervals appended at the end
(Default value = ‘__T_MAX’)
mir_eval.util.match_events(ref, est, window, distance=None)¶
Compute a maximum matching between reference and estimated event times,
subject to a window constraint.
Given two lists of event times ref and est, we seek the largest set
of correspondences (ref[i],est[j]) such that
distance(ref[i],est[j])<=window, and each
ref[i] and est[j] is matched at most once.
This is useful for computing precision/recall metrics in beat tracking,
onset detection, and segmentation.
Parameters:
refnp.ndarray, shape=(n,)
Array of reference values
estnp.ndarray, shape=(m,)
Array of estimated values
windowfloat > 0
Size of the window.
distancefunction
function that computes the outer distance of ref and est.
By default uses |ref[i]-est[j]|
Returns:
matchinglist of tuples
A list of matched reference and event numbers.
matching[i]==(i,j) where ref[i] matches est[j].
Given a function and args and keyword args to pass to it, call the function
but using only the keyword arguments which it accepts. This is equivalent
to redefining the function with an additional **kwargs to accept slop
keyword args.
If the target function already accepts **kwargs parameters, no filtering
is performed.
Parameters:
_functioncallable
Function to call. Can take in any number of args or kwargs
Utility function for loading in data from an annotation file where columns
are delimited. The number of columns is inferred from the length of
the provided converters list.
Parameters:
filenamestr
Path to the annotation file
converterslist of functions
Each entry in column n of the file will be cast by the function
converters[n].
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
Returns:
columnstuple of lists
Each list in this tuple corresponds to values in one of the columns
in the file.
Examples
>>> # Load in a one-column list of event times (floats)>>> load_delimited('events.txt',[float])>>> # Load in a list of labeled events, separated by commas>>> load_delimited('labeled_events.csv',[float,str],',')
Import time-stamp events from an annotation file. The file should
consist of a single column of numeric values corresponding to the event
times. This is primarily useful for processing events which lack duration,
such as beats or onsets.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Import labeled time-stamp events from an annotation file. The file should
consist of two columns; the first having numeric values corresponding to
the event times and the second having string labels for each event. This
is primarily useful for processing labeled events which lack duration, such
as beats with metric beat number or onsets with an instrument label.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Import intervals from an annotation file. The file should consist of two
columns of numeric values corresponding to start and end time of each
interval. This is primarily useful for processing events which span a
duration, such as segmentation, chords, or instrument activation.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Import labeled intervals from an annotation file. The file should consist
of three columns: Two consisting of numeric values corresponding to start
and end time of each interval and a third corresponding to the label of
each interval. This is primarily useful for processing events which span a
duration, such as segmentation, chords, or instrument activation.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Import a time series from an annotation file. The file should consist of
two columns of numeric values corresponding to the time and value of each
sample of the time series.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Loads the patters contained in the filename and puts them into a list
of patterns, each pattern being a list of occurrence, and each
occurrence being a list of (onset, midi) pairs.
where N is the number of patterns, M[i] is the number of
occurrences of the i th pattern, and O[j] is the number of
onsets in the j’th occurrence. E.g.:
Import valued intervals from an annotation file. The file should
consist of three columns: Two consisting of numeric values corresponding to
start and end time of each interval and a third, also of numeric values,
corresponding to the value of each interval. This is primarily useful for
processing events which span a duration and have a numeric value, such as
piano-roll notes which have an onset, offset, and a pitch value.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Load key labels from an annotation file. The file should
consist of two string columns: One denoting the key scale degree
(semitone), and the other denoting the mode (major or minor). The file
should contain only one row.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Load tempo estimates from an annotation file in MIREX format.
The file should consist of three numeric columns: the first two
correspond to tempo estimates (in beats-per-minute), and the third
denotes the relative confidence of the first value compared to the
second (in the range [0, 1]). The file should contain only one row.
Parameters:
filenamestr
Path to the annotation file
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
Returns:
tempinp.ndarray, non-negative
The two tempo estimates
weightfloat [0, 1]
The relative importance of tempi[0] compared to tempi[1]
Utility function for loading in data from a delimited time series
annotation file with a variable number of columns.
Assumes that column 0 contains time stamps and columns 1 through n contain
values. n may be variable from time stamp to time stamp.
Parameters:
filenamestr
Path to the annotation file
dtypefunction
Data type to apply to values columns.
delimiterstr
Separator regular expression.
By default, lines will be split by any amount of whitespace.
headerbool
Indicates whether a header row is present or not.
By default, assumes no header is present.
commentstr or None
Comment regular expression.
Any lines beginning with this string or pattern will be ignored.
Setting to None disables comments.
Returns:
timesnp.ndarray
array of timestamps (float)
valueslist of np.ndarray
list of arrays of corresponding values
Examples
>>> # Load a ragged list of tab-delimited multi-f0 midi notes>>> times,vals=load_ragged_time_series('multif0.txt',dtype=int, delimiter='\t')>>> # Load a raggled list of space delimited multi-f0 values with a header>>> times,vals=load_ragged_time_series('labeled_events.csv', header=True)
An (ordered) list of labels to determine the plotting order.
If not provided, the labels will be inferred from
ax.get_yticklabels().
If no yticklabels exist, then the sorted set of unique values
in labels is taken as the label set.
basenp.ndarray, shape=(n,), optional
Vertical positions of each label.
By default, labels are positioned at integers
np.arange(len(labels)).
heightscalar or np.ndarray, shape=(n,), optional
Height for each label.
If scalar, the same value is applied to all labels.
By default, each label has height=1.
extend_labelsbool
If False, only values of labels that also exist in
label_set will be shown.
If True, all labels are shown, with those in labels but
not in label_set appended to the top of the plot.
A horizontal line is drawn to indicate the separation between
values in or out of label_set.
axmatplotlib.pyplot.axes
An axis handle on which to draw the intervals.
If none is provided, a new set of axes is created.
tickbool
If True, sets tick positions and labels on the y-axis.
kwargs
Additional keyword arguments to pass to
matplotlib.collection.BrokenBarHCollection.
Return the format for tick value x at position pos.
fix_minus(s)
Some classes may want to replace a hyphen for minus with the proper Unicode symbol (U+2212) for typographical correctness. This is a helper method to perform such a replacement when it is enabled via :rc:`axes.unicode_minus`.
format_data(value)
Return the full string representation of the value with the position unspecified.