Configuration parameters
These are the parameters currently recognized by
soundswallower.Config
and soundswallower.Decoder
along with their
default values. The configuration mechanism, along with these
parameters, may change in a subsequent release of SoundSwallower.
- Config(*args, **kwargs)
Create a SoundSwallower configuration. This constructor can be called with a list of arguments corresponding to a command-line, in which case the parameter names should be prefixed with a ‘-‘. Otherwise, pass the keyword arguments described below. For example, the following invocations are equivalent:
config = Config("-hmm", "path/to/things", "-dict", "my.dict") config = Config(hmm="path/to/things", dict="my.dict")
The same keyword arguments can also be passed directly to the constructor for
soundswallower.Decoder
.- Keyword Arguments
dict (str) – Main pronunciation dictionary (lexicon) input file
hmm (str) – Directory containing acoustic model files.
logfn (str) – File to write log messages in
fsg (str) – Sphinx format finite state grammar file
jsgf (str) – JSGF grammar file
toprule (str) – Start rule for JSGF (first public rule is default)
fdict (str) – Noise word pronunciation dictionary input file
dictcase (bool) – Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only), defaults to
False
beam (float) – Beam width applied to every frame in Viterbi search (smaller values mean wider beam), defaults to
1e-48
wbeam (float) – Beam width applied to word exits, defaults to
7e-29
pbeam (float) – Beam width applied to phone transitions, defaults to
1e-48
samprate (float) – Sampling rate, defaults to
16000.0
in C and Python and44100.0
in JavaScriptnfft (int) – Size of FFT, defaults to
512
in C and Python and2048
in JavaScriptfeatparams (str) – File containing feature extraction parameters.
mdef (str) – Model definition input file
senmgau (str) – Senone to codebook mapping input file (usually not needed)
tmat (str) – HMM state transition matrix input file
tmatfloor (float) – HMM state transition probability floor (applied to -tmat file), defaults to
0.0001
mean (str) – Mixture gaussian means input file
var (str) – Mixture gaussian variances input file
varfloor (float) – Mixture gaussian variance floor (applied to data from -var file), defaults to
0.0001
mixw (str) – Senone mixture weights input file (uncompressed)
mixwfloor (float) – Senone mixture weights floor (applied to data from -mixw file), defaults to
1e-07
aw (int) – Inverse weight applied to acoustic scores., defaults to
1
sendump (str) – Senone dump (compressed mixture weights) input file
mllr (str) – MLLR transformation to apply to means and variances
mmap (bool) – Use memory-mapped I/O (if possible) for model files, defaults to
True
ds (int) – Frame GMM computation downsampling ratio, defaults to
1
topn (int) – Maximum number of top Gaussians to use in scoring., defaults to
4
topn_beam (str) – Beam width used to determine top-N Gaussians (or a list, per-feature), defaults to
0
logbase (float) – Base in which all log-likelihoods calculated, defaults to
1.0001
compallsen (bool) – Compute all senone scores in every frame (can be faster when there are many senones), defaults to
False
bestpath (bool) – Run bestpath (Dijkstra) search over word lattice (3rd pass), defaults to
True
backtrace (bool) – Print results and backtraces to log., defaults to
False
maxhmmpf (int) – Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to
30000
lw (float) – Language model probability weight, defaults to
6.5
ascale (float) – Inverse of acoustic model scale for confidence score calculation, defaults to
20.0
wip (float) – Word insertion penalty, defaults to
0.65
pip (float) – Phone insertion penalty, defaults to
1.0
silprob (float) – Silence word transition probability, defaults to
0.005
fillprob (float) – Filler word transition probability, defaults to
1e-08
fsgusealtpron (bool) – Add alternate pronunciations to FSG, defaults to
True
fsgusefiller (bool) – Insert filler words at each state., defaults to
True
mfclogdir (str) – Directory to log feature files to
rawlogdir (str) – Directory to log raw audio files to
senlogdir (str) – Directory to log senone score files to
logspec (bool) – Write out logspectral files instead of cepstra, defaults to
False
smoothspec (bool) – Write out cepstral-smoothed logspectral files, defaults to
False
transform (str) – Which type of transform to use to calculate cepstra (legacy, dct, or htk), defaults to
legacy
alpha (float) – Preemphasis parameter, defaults to
0.97
frate (int) – Frame rate, defaults to
100
wlen (float) – Hamming window length, defaults to
0.025625
nfilt (int) – Number of filter banks, defaults to
40
lowerf (float) – Lower edge of filters, defaults to
133.33334
upperf (float) – Upper edge of filters, defaults to
6855.4976
unit_area (bool) – Normalize mel filters to unit area, defaults to
True
round_filters (bool) – Round mel filter frequencies to DFT points, defaults to
True
ncep (int) – Number of cep coefficients, defaults to
13
doublebw (bool) – Use double bandwidth filters (same center freq), defaults to
False
lifter (int) – Length of sin-curve for liftering, or 0 for no liftering., defaults to
0
input_float32 (bool) – Input is 32-bit floating point in [-1.0, 1.0], defaults to
False
in C and Python,True
in JavaScript.input_endian (str) – Endianness of input data, big or little, ignored if NIST or MS Wav, defaults to
little
warp_type (str) – Warping function type (or shape), defaults to
inverse_linear
warp_params (str) – Parameters defining the warping function
dither (bool) – Add 1/2-bit noise, defaults to
False
seed (int) – Seed for random number generator; if less than zero, pick our own, defaults to
-1
remove_dc (bool) – Remove DC offset from each frame, defaults to
False
remove_noise (bool) – UNSUPPORTED option, do not use, defaults to
False
remove_silence (bool) – UNSUPPORTED option, do not use, defaults to
False
verbose (bool) – Show input filenames, defaults to
False
feat (str) – Feature stream type, depends on the acoustic model, defaults to
1s_c_d_dd
ceplen (int) – Number of components in the input feature vector, defaults to
13
cmn (str) – Cepstral mean normalization scheme (‘live’, ‘batch’, or ‘none’), defaults to
live
cmninit (str) – Initial values (comma-separated) for cepstral mean when ‘live’ is used, defaults to
40,3,-1
varnorm (bool) – Variance normalize each utterance (only if CMN == current), defaults to
False
lda (str) – File containing transformation matrix to be applied to features (single-stream features only)
ldadim (int) – Dimensionality of output of feature transformation (0 to use entire matrix), defaults to
0
svspec (str) – Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)