Whisper is a speech recognition system, made available under an open software license, which has been trained at 680,000 hours of multilingual and multi -purpose collected from the Internet. It has been designed to be resistant to accents, background noise and technical language, and also enables transcription and translation of speech in many languages into English. This is a simple approach like “from zero to end” implemented in the form of a transformer-type code code. In addition, it can identify the language and create time tags at the phrase level. It was created in such a way that it was easy to use and has high accuracy, which allows programmers to add voice interfaces to even more applications.