Generative AI has seen rapid growth thanks to recent developments in open source machine learning. One major factor is the creation of models trained on large internet data corpora, like Pile, along with the availability of new datasets, such as Alpaca and Databricks 15k. Additionally, researchers have developed new machine learning training techniques like Low-Rank Adaptation of Large Language Models (LoRA) from Microsoft. This allows for finetuning of previously open sourced models on domain specific tasks, reducing the need for expensive computational resources when creating new models for new tasks.
Thanks in part to these advancements, we introduce tinyChat. tinyChat is an instruction-tuned large language model under 1B parameters that is open source under Apache 2.0. To put this into context, tinyChat (770m parameters + 2.4m LoRA adaptor) is less than 1% the size of GPT-3.5 (176B parameters). tinyChat is based on Google’s Flan-T5-Large, a 770m parameter model.
This model was finetuned using the Databricks 15k dataset recently open sourced by Databricks. This dataset contains instruction-following records generated by thousands of Databricks employees including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. By finetuning Flan-T5 on this dataset, we were able to demonstrate new capabilities such as summarization and creative text generation that were previously not possible with Flan-T5 alone.
While not as performant as GPT-3.5 given its size, tinyChat is able to show chatGPT like qualities and performs reasonably well on a variety of NLP tasks such as summarization, question and answering, and sentiment analysis.
Link to repo and model:
There are several reasons we believe that smaller LLMs, or small language models (SLMs) like tinyChat, can play a crucial role in the advancement of generative AI.
Small language models like tinyChat offer valuable advantages in various applications, particularly in complementing larger LLMs, mobile and IoT environments, and large-scale text extraction. Some key use cases for tinyChat include:
By focusing on the unique advantages that small language models like tinyChat provide, developers can unlock new possibilities for AI applications, driving progress in natural language processing while empowering users.
tinyChat improves on Flan-T5’s capabilities in NLP tasks like summarization and creative writing while retaining its questions-answering ability.
tinyChat like Flan-T5 can perform well on a variety of NLP tasks such as question answering with tinyChat improving on Flan-T5s creative capabilities and summarization.
In the example below, the question asks which country is mentioned in the input with the name of the country not disclosed. Both models are accurately able to determine the country is Morocco.
Prompt: What country did the event take place? Provide only the name of the COUNTRY.
Input: The event took place in Casablanca.
In the following example, the models are prompted to summarize the Wikipedia entry for the video game Witcher 3. Flan-T5 is not able to provide an accurate summary while tinyChat is able to provide one.
Creative Text Generation
Both models were asked to write a poem. In comparison to Flan-T5, tinyChat showed longer prose and creativity in its output.
Prompt: Write a poem
While benchmarking LLMs continues to be an area of research, some metrics can be provided on basic NLP tasks. We used the lm-evaluation-harness by EleutherAI to benchmark tinyChat and compare it against other open source models. See the Huggingface model card for metrics.
The release of tinyChat is just the beginning of our efforts to democratize access to NLP capabilities and promote responsible use of AI. In the future, we aim to focus on several key areas of development:
Our overarching goal is to develop efficient and responsible AI models that can be readily utilized by developers and organizations. We are confident that by working in tandem with the open-source community, we can drive progress in the field of AI in a way that benefits everyone.
While tinyChat could offer many promising applications, it is essential to recognize that the model is far from perfect and is prone to hallucinations like many LLMs currently available today. As a result, tinyChat is primarily intended for research and experimentation purposes at this stage. We advocate for the responsible use of AI and encourage users to follow Google’s Responsible AI Practices when working with tinyChat or any other language models.
We would like to express our gratitude to the open source community, Databricks and Huggingface. Its through their contributions we can create tinyChat.
Companies with the following characteristics are best suited for working with Leadmatic: