Traceback (most recent call last):
File "/Users/pan/voice2text/main.py", line 50, in
_, probs = model.detect_language(mel)
File "/Users/pan/voice2text/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/pan/voice2text/.venv/lib/python3.10/site-packages/whisper/decoding.py", line 50, in detect_language
mel = model.encoder(mel)
File "/Users/pan/voice2text/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/pan/voice2text/.venv/lib/python3.10/site-packages/whisper/model.py", line 166, in forward
assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape"
AssertionError: incorrect audio shape
This error message indicates that there is a problem with the shape of the audio input to the model. Specifically, the shape of the input does not match the expected shape, resulting in an assertion error. It is likely that the input audio data needs to be reshaped or preprocessed in some way to match the expected shape.