Minigpt-4 is a tool that increases the understanding of the image and language by combining a frozen visual coder with a frozen large language model (LLM) using one layer of projection. This tool is capable of generating detailed descriptions of images, creating websites based on hand -written sketches, writing stories and poems inspired by data, providing solutions to problems presented in the images and learning users, how to cook based on photos of food. Minigpt-4 is highly effective computing, because it only requires training of the linear layer to match the visual characteristics to Vicuna with about 5 million paired images and texts.