Introduction
The Xdroid Speech To Text API enables digital transformation in contact centers through voice and text solutions based on artificial intelligence and machine learning.
API specification
Base URL
https://api-kpn.voiceanalytics.io/
Conceptual model
Definitions
Codec
A codec is a device or computer program which encodes or decodes a digital data stream or signal. Codec is short for coder-decoder.
G.711
G.711 is a narrowband audio codec, which defines two main compandings (compressing/expanding) algorithms, the μ-law algorithm and the A-law algorithm.
ISO
International Organization for Standardization (ISO) is an international standard-setting body.
MP3
MP3 is a coding format for digital audio.
MP4
MPEG-4 Part 14 or MP4 is a digital multimedia container format.
OGG
Ogg is a free, open multimedia container format.
Opus
Opus is an audio coding format using lossy compression.
PCI
Payment Card Industry (PCI) compliance is mandated by credit card companies to help ensure the security of credit card transactions in the payments industry.
PCM
Pulse Code Modulation (PCM) is an audio format. PCM is both uncompressed and lossless.
WAV
Waveform Audio File Format is an audio file format standard, developed by IBM and Microsoft.
API workflow
Requirements
- Accepted container formats:
- .wav
- .mp3 / .mp4
- .opus / .ogg
- Preferred audios that provides the best quality:
- Bitrate: 64 Kbit/s per channel (stereo recording is supported).
- Sample rate: 8 KHz / 16 KHz.
- Uncompressed / lossless telephony codecs (PCM Linear, G711 a-law/u-law).
Features and constraints
Features
- API provides speech to text transcriptions based on search word volume.
- Voice analytics system provides additional emotion analysis, keyword detection and semantic capabilities along with full quality evaluation for call centers.
Constraints
- Audio file should not be greater than 150 M bytes.
Getting started
Make sure you've read Getting Started for more info on how to register your application and start trying out our APIs.
Authentication
The API follows the KPN Store API authentication standard to secure the API. It includes the use of OAuth 2.0 client_id
and client_secret
to receive an access token.
Go to the Authentication tab on top of this page to find out how to:
- Authenticate to an API using cURL.
- Authenticate to an API on SwaggerHub.
- Import Open API Specifications (OAS), also called Swagger files into Postman.
How to...
Submit audio files for analysis
This endpoint lets you submit audio files to start a new voice analytics job.
Recommended formats are:
- WAV container (PCM Linear 16 bit, G711 μ-law/A-law) is recommended.
- MP3/OPUS recordings are also supported but depending on the compression level, it may affect transcription accuracy.
The supported content type is multipart/form-data
.
Request
POST /job
Upload the audio file and send the language config
in the body.
^^cURL request example^^
curl -X POST "https://api-prd.kpn.com/data/kpn/voiceanalytics/job" -H "accept: application/json" -H "Authorization: Bearer *****************" -H "Content-Type: multipart/form-data" -F "config={"language":"en","recording_start":"" }" -F "audio_file=@speech_orig.wav;type=audio/wav"
Body parameter | Type | Description |
---|---|---|
audio_file=@ |
multipart/form-data | Audio file. File size limit: 100 Mbyte for each file. Example: /Audios/0036550e-720f-1239-0b99-eecf4973.wav . |
config= |
object | JSON object containing language and recording_start parameters.Example: {"language":"en","recording_start":"" } |
language |
string | The parameter language is required in ISO language code. Supported language codes: Global English: en Global Spanish: es Dutch: nl French: fr Example: "language":"en" |
recording_start |
integer | Day and time when the recording starts. Optional Format: YmdHis . Example: "recording_start":"20201216081228" . |
Response
The response returns the unique job_id
. Save it to retrieve the transcription in later requests.
^^Response example^^
{
"job_id": 12
}
Retrieve transcription
This endpoint retrieves the JSON transcript of a finished transcription job. Send the unique job_id
as a path parameter.
Please do not use intervals that are shorter than 10 seconds to check the status to avoid a throttle penalty.
Request
GET /job/transcript/$job_id
^^cURL request^^
curl -X GET "https://api-prd.kpn.com/data/kpn/voiceanalytics/job/13" -H "accept: application/json" -H "Authorization: Bearer *****************"
Response
Expected flow of statuses is queued
> processing
> analyzed
.
Please calculate with real-time equivalent (RTE) of 1. That means that a minute length conversation will take approximately the same time as the length of the recording.
^^Response - Processing status^^
{
"job": {
"job_id": 12,
"created_at": "2020-12-16 15:26:39",
"audio_file": "xdroid-voiceanalytics-sample_20201216152639.wav",
"status": "processing"
},
"results": [
}
If the job status gets to analyzed
, the request will retrieve analytics results in the [results]
block. The results are in JSON array format.
^^Response - Analyzed status with results^^
{
"job": {
"job_id": 12,
"created_at": "2020-12-16 15:26:39",
"audio_file": "xdroid-voiceanalytics-sample_20201216152639.wav",
"status": "analyzed"
},
"results": [
{
"data_type": "TRANSCRIPT", // Type of data, see table below
"data_channel": 1, // Detected channel in stereo, where 1 = first, 2 second channel
"data_value": "welcome", // A transcribed word
"data_detect_start": 570, // Start time in milliseconds
"data_detect_end": 1020, // End time in milliseconds
"data_length": 450, // Length of block in milliseconds
"data_probability": 1 // Probability of result
},
{
"data_type": "TRANSCRIPT",
"data_channel": 1,
"data_value": "to",
"data_detect_start": 1020,
"data_detect_end": 1140,
"data_length": 120,
"data_probability": 1
},
{
"data_type": "TRANSCRIPT",
"data_channel": 1,
"data_value": "voice",
"data_detect_start": 1140,
"data_detect_end": 1440,
"data_length": 300,
"data_probability": 1
},
{
"data_type": "TRANSCRIPT",
"data_channel": 1,
"data_value": "analytics",
"data_detect_start": 1440,
"data_detect_end": 2050,
"data_length": 610,
"data_probability": 1
},
}
Parameter | Description |
---|---|
data_type |
TRANSCRIPT. Word-level transcription. Data value contains the word, data_probability is the internal confidence level. |
data_channel |
Detected channel in stereo, where 1 = first, 2 second channel. Example: 1 |
data_value |
Transcribed word. Example: welcome |
data_detect_start |
Start time in milliseconds. Example: 8820 |
data_detect_end |
End time in milliseconds. Example: 9070 |
data_length |
Length of block in milliseconds. Example: 250 |
data_probability |
Probability of result. Example: 0.83 |
Return codes
Code | Description |
---|---|
200 | Success. |
201 | Created. |
202 | Accepted. |
302 | Found. Link in location header. |
400 | Bad request. |
401 | Unauthorized. |
403 | Forbidden. |
404 | Not found. |
405 | Method not allowed. |
412 | Precondition failed. |
429 | Too many requests. |
500 | Internal server error. |
502 | Bad gateway. |
503 | Service unavailable. |
HTTP response headers
The following tables display the standard response headers that are returned with each API response:
Standard response field name | Description |
---|---|
sunset | This field will be populated with the deprecation details. By default the value is n/a. |
api-version | Indicates the API version you have used. |
quota-interval | Used to specify an integer (for example, 1, 2, 5, 60, and so on) that will be paired with the quota-time-unit you specify (minute, hour, day, week, or month) to determine a time period during which the quota use is calculated. For example, an interval of 24 with a quota-time-unit of hour means that the quota will be calculated over the course of 24 hours. |
quota-limit | Number of API calls an user can make within a given time period. If this limit is exceeded, the user will be throttled and API requests will fail. |
quota-reset-UTC | All quota times are set to the Coordinated Universal Time (UTC) time zone. |
quota-time-unit | Used to specify the unit of time applicable to the quota. For example, an interval of 24 with a quota-time-unit of hour means that the quota will be calculated over the course of 24 hours. |
quota-used | Number of API calls made within the quota. |
strict-transport-security | The HTTP Strict-Transport-Security (HSTS) response header lets a website tell browsers that it should only be accessed using HTTPS, instead of using HTTP. All present and future subdomains will be HTTPS for a maximum of 1 year and access is blocked to pages or sub domains that can only be served over HTTP including HSTS preload lists of web browsers. Strict-Transport-Security: max-age=31536000; includeSubDomains; preload. |
Access control field name | Description |
access-control-allow-credentials | Tells browsers whether to expose the response to frontend JavaScript when the request's credentials mode (Request.credentials) is include. When a request's credentials mode (Request.credentials) is include, browsers will only expose the response to frontend JavaScript if the Access-Control-Allow-Credentials value is true. Boolean. |
access-control-allow-origin | Indicates whether the response can be shared with requesting code from the given origin. |
access-control-allow-headers | Used in response to a pre-flight request which includes the Access-Control-Request-Headers to indicate which HTTP headers can be used during the actual request. |
access-control-max-age | Indicates how long the results of a pre-flight request (that is the information contained in the Access-Control-Allow-Methods and Access-Control-Allow-Headers headers) can be cached. |
access-control-allow-methods | Indicates which HTTP methods are allowed on a particular endpoint for cross-origin requests.For example: GET, PUT, POST, DELETE. |
content-length | The Content-Length entity header indicates the size of the entity-body, in bytes, sent to the recipient. |
content-type | The Content-Type entity header the client what the content type of the returned content actually is. |