Configuration parameters

These are the parameters currently recognized by soundswallower.Config and soundswallower.Decoder along with their default values. The configuration mechanism, along with these parameters, may change in a subsequent release of SoundSwallower.

Config(*args, **kwargs)

Create a SoundSwallower configuration. This constructor can be called with a list of arguments corresponding to a command-line, in which case the parameter names should be prefixed with a ‘-‘. Otherwise, pass the keyword arguments described below. For example, the following invocations are equivalent:

config = Config("-hmm", "path/to/things", "-dict", "my.dict")
config = Config(hmm="path/to/things", dict="my.dict")

The same keyword arguments can also be passed directly to the constructor for soundswallower.Decoder.

Keyword Arguments

dict (str) – Main pronunciation dictionary (lexicon) input file
hmm (str) – Directory containing acoustic model files.
logfn (str) – File to write log messages in
fsg (str) – Sphinx format finite state grammar file
jsgf (str) – JSGF grammar file
toprule (str) – Start rule for JSGF (first public rule is default)
fdict (str) – Noise word pronunciation dictionary input file
dictcase (bool) – Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only), defaults to False
beam (float) – Beam width applied to every frame in Viterbi search (smaller values mean wider beam), defaults to 1e-48
wbeam (float) – Beam width applied to word exits, defaults to 7e-29
pbeam (float) – Beam width applied to phone transitions, defaults to 1e-48
samprate (float) – Sampling rate, defaults to 16000.0 in C and Python and 44100.0 in JavaScript
nfft (int) – Size of FFT, defaults to 512 in C and Python and 2048 in JavaScript
featparams (str) – File containing feature extraction parameters.
mdef (str) – Model definition input file
senmgau (str) – Senone to codebook mapping input file (usually not needed)
tmat (str) – HMM state transition matrix input file
tmatfloor (float) – HMM state transition probability floor (applied to -tmat file), defaults to 0.0001
mean (str) – Mixture gaussian means input file
var (str) – Mixture gaussian variances input file
varfloor (float) – Mixture gaussian variance floor (applied to data from -var file), defaults to 0.0001
mixw (str) – Senone mixture weights input file (uncompressed)
mixwfloor (float) – Senone mixture weights floor (applied to data from -mixw file), defaults to 1e-07
aw (int) – Inverse weight applied to acoustic scores., defaults to 1
sendump (str) – Senone dump (compressed mixture weights) input file
mllr (str) – MLLR transformation to apply to means and variances
mmap (bool) – Use memory-mapped I/O (if possible) for model files, defaults to True
ds (int) – Frame GMM computation downsampling ratio, defaults to 1
topn (int) – Maximum number of top Gaussians to use in scoring., defaults to 4
topn_beam (str) – Beam width used to determine top-N Gaussians (or a list, per-feature), defaults to 0
logbase (float) – Base in which all log-likelihoods calculated, defaults to 1.0001
compallsen (bool) – Compute all senone scores in every frame (can be faster when there are many senones), defaults to False
bestpath (bool) – Run bestpath (Dijkstra) search over word lattice (3rd pass), defaults to True
backtrace (bool) – Print results and backtraces to log., defaults to False
maxhmmpf (int) – Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to 30000
lw (float) – Language model probability weight, defaults to 6.5
ascale (float) – Inverse of acoustic model scale for confidence score calculation, defaults to 20.0
wip (float) – Word insertion penalty, defaults to 0.65
pip (float) – Phone insertion penalty, defaults to 1.0
silprob (float) – Silence word transition probability, defaults to 0.005
fillprob (float) – Filler word transition probability, defaults to 1e-08
fsgusealtpron (bool) – Add alternate pronunciations to FSG, defaults to True
fsgusefiller (bool) – Insert filler words at each state., defaults to True
mfclogdir (str) – Directory to log feature files to
rawlogdir (str) – Directory to log raw audio files to
senlogdir (str) – Directory to log senone score files to
logspec (bool) – Write out logspectral files instead of cepstra, defaults to False
smoothspec (bool) – Write out cepstral-smoothed logspectral files, defaults to False
transform (str) – Which type of transform to use to calculate cepstra (legacy, dct, or htk), defaults to legacy
alpha (float) – Preemphasis parameter, defaults to 0.97
frate (int) – Frame rate, defaults to 100
wlen (float) – Hamming window length, defaults to 0.025625
nfilt (int) – Number of filter banks, defaults to 40
lowerf (float) – Lower edge of filters, defaults to 133.33334
upperf (float) – Upper edge of filters, defaults to 6855.4976
unit_area (bool) – Normalize mel filters to unit area, defaults to True
round_filters (bool) – Round mel filter frequencies to DFT points, defaults to True
ncep (int) – Number of cep coefficients, defaults to 13
doublebw (bool) – Use double bandwidth filters (same center freq), defaults to False
lifter (int) – Length of sin-curve for liftering, or 0 for no liftering., defaults to 0
input_float32 (bool) – Input is 32-bit floating point in [-1.0, 1.0], defaults to False in C and Python, True in JavaScript.
input_endian (str) – Endianness of input data, big or little, ignored if NIST or MS Wav, defaults to little
warp_type (str) – Warping function type (or shape), defaults to inverse_linear
warp_params (str) – Parameters defining the warping function
dither (bool) – Add 1/2-bit noise, defaults to False
seed (int) – Seed for random number generator; if less than zero, pick our own, defaults to -1
remove_dc (bool) – Remove DC offset from each frame, defaults to False
remove_noise (bool) – UNSUPPORTED option, do not use, defaults to False
remove_silence (bool) – UNSUPPORTED option, do not use, defaults to False
verbose (bool) – Show input filenames, defaults to False
feat (str) – Feature stream type, depends on the acoustic model, defaults to 1s_c_d_dd
ceplen (int) – Number of components in the input feature vector, defaults to 13
cmn (str) – Cepstral mean normalization scheme (‘live’, ‘batch’, or ‘none’), defaults to live
cmninit (str) – Initial values (comma-separated) for cepstral mean when ‘live’ is used, defaults to 40,3,-1
varnorm (bool) – Variance normalize each utterance (only if CMN == current), defaults to False
lda (str) – File containing transformation matrix to be applied to features (single-stream features only)
ldadim (int) – Dimensionality of output of feature transformation (0 to use entire matrix), defaults to 0
svspec (str) – Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)