Configuration parameters

These are the parameters currently recognized by soundswallower.Config and soundswallower.Decoder along with their default values. The configuration mechanism, along with these parameters, may change in a subsequent release of SoundSwallower.

Config(*args, **kwargs)

Create a SoundSwallower configuration. This constructor can be called with a list of arguments corresponding to a command-line, in which case the parameter names should be prefixed with a ‘-‘. Otherwise, pass the keyword arguments described below. For example, the following invocations are equivalent:

config = Config("-hmm", "path/to/things", "-dict", "my.dict")
config = Config(hmm="path/to/things", dict="my.dict")

The same keyword arguments can also be passed directly to the constructor for soundswallower.Decoder.

Keyword Arguments
  • dict (str) – Main pronunciation dictionary (lexicon) input file

  • hmm (str) – Directory containing acoustic model files.

  • logfn (str) – File to write log messages in

  • fsg (str) – Sphinx format finite state grammar file

  • jsgf (str) – JSGF grammar file

  • toprule (str) – Start rule for JSGF (first public rule is default)

  • fdict (str) – Noise word pronunciation dictionary input file

  • dictcase (bool) – Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only), defaults to False

  • beam (float) – Beam width applied to every frame in Viterbi search (smaller values mean wider beam), defaults to 1e-48

  • wbeam (float) – Beam width applied to word exits, defaults to 7e-29

  • pbeam (float) – Beam width applied to phone transitions, defaults to 1e-48

  • samprate (float) – Sampling rate, defaults to 16000.0 in C and Python and 44100.0 in JavaScript

  • nfft (int) – Size of FFT, defaults to 512 in C and Python and 2048 in JavaScript

  • featparams (str) – File containing feature extraction parameters.

  • mdef (str) – Model definition input file

  • senmgau (str) – Senone to codebook mapping input file (usually not needed)

  • tmat (str) – HMM state transition matrix input file

  • tmatfloor (float) – HMM state transition probability floor (applied to -tmat file), defaults to 0.0001

  • mean (str) – Mixture gaussian means input file

  • var (str) – Mixture gaussian variances input file

  • varfloor (float) – Mixture gaussian variance floor (applied to data from -var file), defaults to 0.0001

  • mixw (str) – Senone mixture weights input file (uncompressed)

  • mixwfloor (float) – Senone mixture weights floor (applied to data from -mixw file), defaults to 1e-07

  • aw (int) – Inverse weight applied to acoustic scores., defaults to 1

  • sendump (str) – Senone dump (compressed mixture weights) input file

  • mllr (str) – MLLR transformation to apply to means and variances

  • mmap (bool) – Use memory-mapped I/O (if possible) for model files, defaults to True

  • ds (int) – Frame GMM computation downsampling ratio, defaults to 1

  • topn (int) – Maximum number of top Gaussians to use in scoring., defaults to 4

  • topn_beam (str) – Beam width used to determine top-N Gaussians (or a list, per-feature), defaults to 0

  • logbase (float) – Base in which all log-likelihoods calculated, defaults to 1.0001

  • compallsen (bool) – Compute all senone scores in every frame (can be faster when there are many senones), defaults to False

  • bestpath (bool) – Run bestpath (Dijkstra) search over word lattice (3rd pass), defaults to True

  • backtrace (bool) – Print results and backtraces to log., defaults to False

  • maxhmmpf (int) – Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to 30000

  • lw (float) – Language model probability weight, defaults to 6.5

  • ascale (float) – Inverse of acoustic model scale for confidence score calculation, defaults to 20.0

  • wip (float) – Word insertion penalty, defaults to 0.65

  • pip (float) – Phone insertion penalty, defaults to 1.0

  • silprob (float) – Silence word transition probability, defaults to 0.005

  • fillprob (float) – Filler word transition probability, defaults to 1e-08

  • fsgusealtpron (bool) – Add alternate pronunciations to FSG, defaults to True

  • fsgusefiller (bool) – Insert filler words at each state., defaults to True

  • mfclogdir (str) – Directory to log feature files to

  • rawlogdir (str) – Directory to log raw audio files to

  • senlogdir (str) – Directory to log senone score files to

  • logspec (bool) – Write out logspectral files instead of cepstra, defaults to False

  • smoothspec (bool) – Write out cepstral-smoothed logspectral files, defaults to False

  • transform (str) – Which type of transform to use to calculate cepstra (legacy, dct, or htk), defaults to legacy

  • alpha (float) – Preemphasis parameter, defaults to 0.97

  • frate (int) – Frame rate, defaults to 100

  • wlen (float) – Hamming window length, defaults to 0.025625

  • nfilt (int) – Number of filter banks, defaults to 40

  • lowerf (float) – Lower edge of filters, defaults to 133.33334

  • upperf (float) – Upper edge of filters, defaults to 6855.4976

  • unit_area (bool) – Normalize mel filters to unit area, defaults to True

  • round_filters (bool) – Round mel filter frequencies to DFT points, defaults to True

  • ncep (int) – Number of cep coefficients, defaults to 13

  • doublebw (bool) – Use double bandwidth filters (same center freq), defaults to False

  • lifter (int) – Length of sin-curve for liftering, or 0 for no liftering., defaults to 0

  • input_float32 (bool) – Input is 32-bit floating point in [-1.0, 1.0], defaults to False in C and Python, True in JavaScript.

  • input_endian (str) – Endianness of input data, big or little, ignored if NIST or MS Wav, defaults to little

  • warp_type (str) – Warping function type (or shape), defaults to inverse_linear

  • warp_params (str) – Parameters defining the warping function

  • dither (bool) – Add 1/2-bit noise, defaults to False

  • seed (int) – Seed for random number generator; if less than zero, pick our own, defaults to -1

  • remove_dc (bool) – Remove DC offset from each frame, defaults to False

  • remove_noise (bool) – UNSUPPORTED option, do not use, defaults to False

  • remove_silence (bool) – UNSUPPORTED option, do not use, defaults to False

  • verbose (bool) – Show input filenames, defaults to False

  • feat (str) – Feature stream type, depends on the acoustic model, defaults to 1s_c_d_dd

  • ceplen (int) – Number of components in the input feature vector, defaults to 13

  • cmn (str) – Cepstral mean normalization scheme (‘live’, ‘batch’, or ‘none’), defaults to live

  • cmninit (str) – Initial values (comma-separated) for cepstral mean when ‘live’ is used, defaults to 40,3,-1

  • varnorm (bool) – Variance normalize each utterance (only if CMN == current), defaults to False

  • lda (str) – File containing transformation matrix to be applied to features (single-stream features only)

  • ldadim (int) – Dimensionality of output of feature transformation (0 to use entire matrix), defaults to 0

  • svspec (str) – Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)