Performance of joint ASV-PAD systems
Published: 2 years, 8 months ago
The goal of the joint ASV-PAD system is to separate real gennuine data (genuine users trying to get verified by the system) from zero-impostors (real data of the wrong users) and presentation attacks. The systems should be then be able to correctly operate in two scenarios: a typical verification scenario licit when there are no attacks (idially, it should have the same performance as the original ASV system), and in spoof scenario when attacks are present (idially, verification performance should not degrade).
During training, the system models each of these classes, and when evaluated on the development set (for this set, the class of each audio sample is known), the resulted scores, for both scenarios, are split into two sets in such a way that False Acceptance Rate (FAR) and False Reject Rate (FRR) are equal. This equal rate is usually called Equal Error Rate (ERR_licit and ERR_spoof in the table below). The median value of the split scores is the EER threshold, since this is the specific value of the system that leads to EER.
Applying the EER thresholds obtained from development set to the scores of the test set in both licit and spoof scenarios leads to another set of FAR (FAR_test_licit and FAR_test_spoof in the table) and FRR (FRR_test_licit and FRR_test_spoof in the table) values, which are the measures of the system's performance in uncontrolled evaluation settings. In a perfectly consistent ASV-PAD system, FAR and FRR values on the test set would be the same as FAR and FRR values obtained for development set. Hence, to summarize the performance of the system in one value, a Half Total Error Rate (HTER_licit and HTER_spoof in the table) is computed as the mean of FAR and FRR. These HTER values are then used as an overall measure of the joint ASV-PAD system performance.