Small Language Model Category - MarkTechPost https://www.marktechpost.com/category/technology/artificial-intelligence/small-language-model/ An Artificial Intelligence News Platform Sat, 26 Oct 2024 19:26:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://www.marktechpost.com/wp-content/uploads/2022/04/cropped-Favicon-512-x-512-1-1-32x32.png Small Language Model Category - MarkTechPost https://www.marktechpost.com/category/technology/artificial-intelligence/small-language-model/ 32 32 127842392 Cohere for AI Releases Aya Expanse (8B & 32B): A State-of-the-Art Multilingual Family of Models to Bridge the Language Gap in AI https://www.marktechpost.com/2024/10/26/cohere-for-ai-releases-aya-expanse-8b-32b-a-state-of-the-art-multilingual-family-of-models-to-bridge-the-language-gap-in-ai/ https://www.marktechpost.com/2024/10/26/cohere-for-ai-releases-aya-expanse-8b-32b-a-state-of-the-art-multilingual-family-of-models-to-bridge-the-language-gap-in-ai/#respond Sat, 26 Oct 2024 19:12:16 +0000 https://www.marktechpost.com/?p=64312 Despite rapid advancements in language technology, significant gaps in representation persist for many languages. Most progress in natural language processing (NLP) has focused on well-resourced languages like English, leaving many others underrepresented. This imbalance means that only a small portion of the world’s population can fully benefit from AI tools. The absence of robust language […]

The post Cohere for AI Releases Aya Expanse (8B & 32B): A State-of-the-Art Multilingual Family of Models to Bridge the Language Gap in AI appeared first on MarkTechPost.

]]>

Despite rapid advancements in language technology, significant gaps in representation persist for many languages. Most progress in natural language processing (NLP) has focused on well-resourced languages like English, leaving many others underrepresented. This imbalance means that only a small portion of the world’s population can fully benefit from AI tools. The absence of robust language models for low-resource languages, coupled with unequal AI access, exacerbates disparities in education, information accessibility, and technological empowerment. Addressing these challenges requires a concerted effort to develop and deploy language models that serve all communities equitably.

Cohere for AI Introduces Aya Expanse: an open-weights state-of-art family of models to help close the language gap with AI. Aya Expanse is designed to expand language coverage and inclusivity in the AI landscape by providing open-weight models that can be accessed and built upon by researchers and developers worldwide. Available in multiple sizes, including Aya Expanse-8B and Aya Expanse-32B, these models are adaptable across a wide range of natural language tasks, such as text generation, translation, and summarization. The different model sizes offer flexibility for various use cases, from large-scale applications to lighter deployments. Aya Expanse utilizes advanced transformer architecture to capture linguistic nuances and semantic richness, and it is fine-tuned to handle multilingual scenarios effectively. The models leverage diverse datasets from low-resource languages like Swahili, Bengali, and Welsh to ensure equitable performance across linguistic contexts.

Aya Expanse plays a crucial role in bridging linguistic divides, ensuring underrepresented languages have the tools needed to benefit from AI advancements. The Aya Expanse-32B model, in particular, has demonstrated significant improvements in multilingual understanding benchmarks, outperforming models such as Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B—a model more than twice its size. In evaluations, Aya Expanse-32B achieved a 25% higher average accuracy across low-resource language benchmarks compared to other leading models. Similarly, Aya Expanse-8B outperforms leading models in its parameter class, including Gemma 2 9B, Llama 3.1 8B, and the recently released Ministral 8B, with win rates ranging from 60.4% to 70.6%. These results highlight Aya Expanse’s potential to support underserved communities and foster better language inclusivity.

The improvements in Aya Expanse stem from Cohere for AI’s sustained focus on expanding how AI serves languages around the world. By rethinking the core building blocks of machine learning breakthroughs, including data arbitrage, preference training for general performance and safety, and model merging, Cohere for AI has made a significant contribution to bridging the language gap. Making the model weights openly available encourages an inclusive ecosystem of researchers and developers, ensuring language modeling becomes a community-driven effort rather than one controlled by a few entities.

In conclusion, Aya Expanse represents a significant step towards democratizing AI and addressing the language gap in NLP. By providing powerful, multilingual language models with open weights, Cohere for AI advances language technology while promoting inclusivity and collaboration. Aya Expanse enables developers, educators, and innovators from diverse linguistic backgrounds to create applications that are accessible and beneficial to a broader population, ultimately contributing to a more connected and equitable world. This move aligns well with the core values of artificial intelligence—accessibility, inclusiveness, and innovation without borders.


Check out the Details, 8B Model and 32B Model. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)

The post Cohere for AI Releases Aya Expanse (8B & 32B): A State-of-the-Art Multilingual Family of Models to Bridge the Language Gap in AI appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/10/26/cohere-for-ai-releases-aya-expanse-8b-32b-a-state-of-the-art-multilingual-family-of-models-to-bridge-the-language-gap-in-ai/feed/ 0 64312
BRAG Released: High-Performance SLMs (Small Language Models) Specifically Trained for RAG Tasks Under $25 Each https://www.marktechpost.com/2024/08/05/brag-released-high-performance-slms-small-language-models-specifically-trained-for-rag-tasks-under-25-each/ https://www.marktechpost.com/2024/08/05/brag-released-high-performance-slms-small-language-models-specifically-trained-for-rag-tasks-under-25-each/#respond Mon, 05 Aug 2024 22:58:38 +0000 https://www.marktechpost.com/?p=60748 BRAG is a series of high-performance Retrieval Augmented Generation (RAG) models developed by Maximalists AI Researcher. The BRAG models are a family of small language models (SLMs) designed to offer cost-effective, high-performance alternatives in AI-driven language processing. These models have been trained at an impressively low cost of under $25 each, positioning them as efficient […]

The post BRAG Released: High-Performance SLMs (Small Language Models) Specifically Trained for RAG Tasks Under $25 Each appeared first on MarkTechPost.

]]>

BRAG is a series of high-performance Retrieval Augmented Generation (RAG) models developed by Maximalists AI Researcher. The BRAG models are a family of small language models (SLMs) designed to offer cost-effective, high-performance alternatives in AI-driven language processing. These models have been trained at an impressively low cost of under $25 each, positioning them as efficient and economical solutions in artificial intelligence.

The BRAG models were created in response to the need for efficient and high-performing language models that do not require the extensive computational resources typically associated with large-scale models like those from Nvidia and OpenAI. The primary motivation behind BRAG was to develop a series of models that could match or exceed the performance of leading models such as Cohere’s Command R+, Qwen2, Llama3.1, and Llama3 Instruct while keeping the training costs minimal.

The BRAG series includes four models: 

  1. BRAG-Qwen2-7b-v0.1
  2. BRAG-Llama-3.1-8b-v0.1
  3. BRAG-Llama-3-8b-v0.1
  4. BRAG-Qwen2-1.5b-v0.1

These models are chosen based on their performance in open benchmarks and ability to balance efficiency and capability. The models underwent a two-stage fine-tuning process inspired by Nvidia’s ChatQA approach, which involves initial training on general instruction datasets followed by RAG-specific datasets.

The BRAG models are particularly noteworthy for their performance relative to their size. The 1.5B models offer an excellent balance of performance and efficiency. In comparison, the 7B and 8B models can handle more complex tasks, such as long context understanding, tabular data interpretation, and mathematical reasoning. This strategic selection of models and training methodology allowed Maximalists to optimize performance while managing costs effectively.

The BRAG model training involved LoRA (Low-Rank Adaptation) and QLoRA (quantized LoRA) techniques. LoRA enables faster training with reduced computational demands by simplifying the adaptation matrices. In contrast, QLoRA compresses weight parameters to 4-bit precision, significantly reducing memory footprint and facilitating training on consumer-grade GPUs.

The models were evaluated using the ChatRAG-Bench, a benchmark designed to assess conversational QA and RAG capabilities across various document types and question formats. The evaluation metrics included F1-Score and Exact Match Accuracy, which provided insights into the models’ ability to generate precise and contextually relevant responses.

During the training process, several challenges were encountered, including handling long documents, interpreting tabular data, and addressing domain-specific queries. These issues were mitigated through careful dataset selection and experimentation with various data combinations. For instance, including datasets like DROP, Quoref, and SQuAD helped improve the models’ capabilities in handling complex and diverse data types. The F1 score metric, while widely accepted, was noted to have limitations in capturing semantic nuances and context. This highlighted the need for more holistic and context-aware evaluation metrics to better gauge model performance.

In conclusion, the Maximalists plan to enhance BRAG models by improving RAG performance and tabular data handling and introducing citation generation for better interpretability. They also aim to refine query rewriting techniques to improve search accuracy and relevance. The development of BRAG was supported by credits from Modal Labs, which facilitated cost-effective experimentation. By leveraging innovative training techniques and strategic model selection, BRAG has demonstrated that top-tier performance can be achieved with minimal resource expenditure, paving the way for more accessible and efficient AI solutions.


Check out the Models and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here

The post BRAG Released: High-Performance SLMs (Small Language Models) Specifically Trained for RAG Tasks Under $25 Each appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/08/05/brag-released-high-performance-slms-small-language-models-specifically-trained-for-rag-tasks-under-25-each/feed/ 0 60748
Small but Mighty: The Role of Small Language Models in Artificial Intelligence AI Advancement https://www.marktechpost.com/2024/04/16/small-but-mighty-the-role-of-small-language-models-in-artificial-intelligence-ai-advancement/ https://www.marktechpost.com/2024/04/16/small-but-mighty-the-role-of-small-language-models-in-artificial-intelligence-ai-advancement/#respond Tue, 16 Apr 2024 09:00:00 +0000 https://www.marktechpost.com/?p=56201 In recent years, there has been a great inclination toward Large Language Models (LLMs) due to their amazing text generation, analysis, and classification capabilities. These models use billions of parameters to execute a variety of Natural Language Processing (NLP) tasks. Almost every industry and tech company is heavily investing in the creation of these ever-larger […]

The post Small but Mighty: The Role of Small Language Models in Artificial Intelligence AI Advancement appeared first on MarkTechPost.

]]>

In recent years, there has been a great inclination toward Large Language Models (LLMs) due to their amazing text generation, analysis, and classification capabilities. These models use billions of parameters to execute a variety of Natural Language Processing (NLP) tasks. Almost every industry and tech company is heavily investing in the creation of these ever-larger models. 

However, these larger models come with their own limitations. These models are very large and need a lot of processing power and energy, which makes them prohibitive for smaller businesses with tighter budgets. As the competition for larger models is increasing quickly, an unexpected pattern is beginning to take shape: tiny is the new large. Small Language Models, or SLMs, are becoming increasingly popular as effective, flexible substitutes for their larger counterparts. 

The Rise of Small Language Models (SLMs)

Researchers are increasingly focusing on SLMs as a solution to the shortcomings of LLMs. These small, effective, and extremely flexible AI models provide a more simplified method of developing AI by challenging the idea that larger is always preferable. Compared to LLMs, SLMs have less complicated structures, fewer parameters, and a lower requirement for training data, which makes them more affordable and useful for a wider range of applications.

Comparisons of the performance of LLMs and SLMs indicate a rapidly closing performance gap, especially when it comes to certain activities like reasoning, math problems, and multiple-choice questions. Even smaller SLMs have outperformed some of their larger counterparts in some locations, demonstrating encouraging outcomes. This highlights the significance of design, training data, and fine-tuning procedures and suggests that model size may not be the only factor affecting performance.

Advantages of Small Language Models

SLMs are an appealing answer to AI’s language dilemma because they have a number of advantages over LLMs. First off, smaller businesses and people with tighter budgets can more easily utilise them due to their simplified design and lower processing demands. SLMs facilitate quicker development cycles and experimentation since they are simpler to train, optimize, and implement. Because of their specialized character, they may be customized precisely, which makes them very useful for particular activities or sectors. 

SLMs provide better privacy and security than LLMs because of their smaller codebase and simpler architecture. This qualifies them for sensitive data applications, where data breaches could have serious repercussions. SLMs’ streamlined architecture and decreased tendency for hallucinations within particular domains also add to their dependability and credibility.

Some Popular Examples of SLMs

  1. Llama 2: Created by Meta AI, Llama 2 has exhibited remarkable performance in the open-source community, with scales ranging from 7 billion to 70 billion parameters. 
  1. Alpaca 7B: Stanford researchers created Alpaca 7 B, a model refined from the LLaMA 7B model. Alpaca 7B, trained on 52K instruction-following demos, displays behaviors qualitatively similar to OpenAI’s GPT-3-based text-DaVinci-003. This model demonstrates how SLMs may be flexible and versatile in capturing a wide range of complicated language patterns and behaviors.
  1. Mistral and Mixtral: Mistral AI provides several SLMs, such as the mixture-of-experts model Mixtral 8x7B and Mistral-7B. In terms of performance, these models have proven to be competitive with larger models such as GPT-3.5. 
  1. Microsoft’s Phi: Microsoft’s Phi-2 is well-known for its potent reasoning powers and flexibility in handling tasks unique to a given domain. It can be fine-tuned to meet the needs of particular applications, resulting in high performance and accuracy levels. 
  1. DistilBERT: This model is a simplified and expedited version of Google’s 2018 deep learning NLP AI model, BERT (Bidirectional Encoder Representations Transformer). DistilBERT reduces the size and processing requirements of BERT while preserving its essential architecture. It provides variants scaled down and tailored for distinct limitations, in contrast to the large-scale implementation of BERT, which can include hundreds of millions of parameters. 
  1. Orca 2 – Instead of utilizing real-world datasets, Microsoft’s Orca 2 is created by optimizing Meta’s LLaMA 2 with artificial data produced from a statistical model. Orca 2 is smaller than other models, but it performs at a level that can equal or even exceed that of models ten times its size. 

Conclusion

In conclusion, SLMs are a major advancement in AI research and development that provide a more effective, flexible, and affordable way to address the language issue in AI. The emergence of SLMs promises to spur innovation, democratize access to AI, and completely transform sectors all around the world as the AI ecosystem develops. 

The post Small but Mighty: The Role of Small Language Models in Artificial Intelligence AI Advancement appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/04/16/small-but-mighty-the-role-of-small-language-models-in-artificial-intelligence-ai-advancement/feed/ 0 56201
This AI Research from China Introduces LLaVA-Phi: A Vision Language Assistant Developed Using the Compact Language Model Phi-2 https://www.marktechpost.com/2024/01/10/this-ai-research-from-china-introduces-llava-phi-a-vision-language-assistant-developed-using-the-compact-language-model-phi-2/ https://www.marktechpost.com/2024/01/10/this-ai-research-from-china-introduces-llava-phi-a-vision-language-assistant-developed-using-the-compact-language-model-phi-2/#respond Wed, 10 Jan 2024 12:50:00 +0000 https://www.marktechpost.com/?p=50355 Large language models have shown notable achievements in executing instructions, multi-turn conversations, and image-based question-answering tasks. These models include Flamingo, GPT-4V, and Gemini. The fast development of open-source Large Language Models, such as LLaMA and Vicuna, has greatly accelerated the evolution of open-source vision language models. These advancements mainly center on improving visual understanding by […]

The post This AI Research from China Introduces LLaVA-Phi: A Vision Language Assistant Developed Using the Compact Language Model Phi-2 appeared first on MarkTechPost.

]]>

Large language models have shown notable achievements in executing instructions, multi-turn conversations, and image-based question-answering tasks. These models include Flamingo, GPT-4V, and Gemini. The fast development of open-source Large Language Models, such as LLaMA and Vicuna, has greatly accelerated the evolution of open-source vision language models. These advancements mainly center on improving visual understanding by utilizing language models with at least 7B parameters and integrating them with a vision encoder. Autonomous driving and robotics are two examples of time-sensitive or real-time interactive applications that could benefit from a faster inference speed and shorter test times.

Regarding mobile technology, Gemini has been a trailblazer for multimodal approaches. Gemini-Nano, a simplified version, contains 1.8/3.25 billion parameters and can be used on mobile devices. Yet, information such as the model’s design, training datasets, and training procedures is confidential and cannot be shared with anybody.

A new study by Midea Group and East China Normal University provides LLaVA-Phi, a little language model-powered vision-language assistant. The most effective open-sourced tiny language model, Phi-2.2, and the robust open-sourced multimodal model, LLaVA-1.5, are combined in this study. The researchers use LLaVA’s high-quality visual instruction tuning data in a two-stage training pipeline. They tested LLaVA-Phi using eight different metrics.

Its performance is on par with, or even better than, other three times larger multimodal models, and it only has three billion parameters. 

The team used a wide variety of academic standards developed for multimodal models to thoroughly evaluate LLaVA-Phi. Examples of these tests include VQA-v2, VizWizQA, ScienceQA, and TextQA for general question-answering and more specialized assessments like POPE for object hallucination and MME, MMBench, and MMVet for a comprehensive evaluation of diverse multimodal abilities like visual understanding and visual commonsense reasoning. The proposed method outperformed other big multimodal models that were previously available by demonstrating that the model could answer questions based on visual cues. Amazingly, LLaVA-Phi achieved better results than models like IDEFICS, which rely on a 7B-parameter or greater LLMs. 

The top score the model achieved on ScienceQA stands out. The success of their multimodal model in answering math-based questions can be attributed to the Phi-2 language model, which has been trained on mathematical corpora and code production in particular. In the extensive multimodal benchmark of MMBench, LLaVA-Phi outperformed numerous prior art vision-language models based on 7B-LLM. 

Another parallel effort that constructs an effective vision-language model, MobileVLM, was also compared. LLaVA-Phi routinely beats all the approaches on all five measures.

The team highlights that since the model has not been fine-tuned to follow multilingual instructions, the LLaVA-Phi architecture cannot process instructions in various languages, including Chinese, because Phi-2 uses the codegenmono tokenizer. They intend to improve training procedures for small language models in the future and investigate the effect of visual encoder size, looking at methods like RLHF and direct preference optimization. These endeavors aim to further improve performance while decreasing model size.


Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

The post This AI Research from China Introduces LLaVA-Phi: A Vision Language Assistant Developed Using the Compact Language Model Phi-2 appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/01/10/this-ai-research-from-china-introduces-llava-phi-a-vision-language-assistant-developed-using-the-compact-language-model-phi-2/feed/ 0 50355