Machine Learning Category - MarkTechPost https://www.marktechpost.com/category/technology/artificial-intelligence/machine-learning/ An Artificial Intelligence News Platform Sat, 28 Dec 2024 07:32:43 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://www.marktechpost.com/wp-content/uploads/2022/04/cropped-Favicon-512-x-512-1-1-32x32.png Machine Learning Category - MarkTechPost https://www.marktechpost.com/category/technology/artificial-intelligence/machine-learning/ 32 32 127842392 Collective Monte Carlo Tree Search (CoMCTS): A New Learning-to-Reason Method for Multimodal Large Language Models https://www.marktechpost.com/2024/12/27/collective-monte-carlo-tree-search-comcts-a-new-learning-to-reason-method-for-multimodal-large-language-models/ https://www.marktechpost.com/2024/12/27/collective-monte-carlo-tree-search-comcts-a-new-learning-to-reason-method-for-multimodal-large-language-models/#respond Sat, 28 Dec 2024 07:32:35 +0000 https://www.marktechpost.com/?p=66773 In today’s world, Multimodal large language models (MLLMs) are advanced systems that process and understand multiple input forms, such as text and images. By interpreting these diverse inputs, they aim to reason through tasks and generate accurate outputs. However, MLLMs often fail at complex tasks because they lack structured processes to break problems into smaller […]

The post Collective Monte Carlo Tree Search (CoMCTS): A New Learning-to-Reason Method for Multimodal Large Language Models appeared first on MarkTechPost.

]]>

In today’s world, Multimodal large language models (MLLMs) are advanced systems that process and understand multiple input forms, such as text and images. By interpreting these diverse inputs, they aim to reason through tasks and generate accurate outputs. However, MLLMs often fail at complex tasks because they lack structured processes to break problems into smaller steps and instead provide direct answers without clear intermediate reasoning. These limitations reduce the success and efficiency of MLLMs in solving intricate problems.

Traditional methods for reasoning in multimodal large language models (MLLMs) have many problems. Prompt-based methods, like Chain-of-Thought, use set steps to copy human reasoning but struggle with difficult tasks. Plant-based methods, like Tree or Graph-of-Thought, try to find reasoning paths but are not flexible or reliable. Learning-based methods, like Monte Carlo Tree Search (MCTS), are slow and do not help with deep thinking. Most MLLMs rely on “direct prediction,” giving short answers without clear steps. Although MCTS works well in games and robotics, it is unsuited for MLLMs, and collective learning does not build strong step-by-step reasoning. These issues make it hard for MLLMs to solve complex problems.

To mitigate these issues, a team researchers from Nanyang Technological University, Tsinghua University, Baidu, and Sun Yat-sen University proposed CoMCTS, a framework to improve reasoning-path search in tree search tasks. Instead of relying on one model, it combines multiple pre-trained models to expand and evaluate candidate paths. This approach differs from traditional methods because it uses a more efficient strategy: several models work together, allowing for better performance and reducing errors during the reasoning process.

It consisted of four key steps: Expansion, Simulation, Backpropagation, and Selection. In the Expansion step, several models looked for different solutions simultaneously, increasing the variety of possible answers. In the Simulation step, incorrect or less effective paths were removed, making the search easier. During the Backpropagation step, the models improved by learning from their past mistakes and using that knowledge to make better predictions. The last step used a statistical method to choose the best action for the model to take. Reflective reasoning in this process helped the model learn from previous errors to make better decisions in similar tasks.

The researchers created the Mulberry-260K dataset, which comprised 260K multimodal input questions, combining text instructions and images from various domains, including general multimodal understanding, mathematics, science, and medical image understanding. The dataset was constructed using CoMCTS with training limited to 15K samples to avoid overabundance. The reasoning tasks required an average of 7.5 steps, with most tasks falling within the 6 to 8-step range. CoMCTS was implemented using four models: GPT4o, Qwen2-VL-7B, LLaMA-3.2-11B-Vision-Instruct, and Qwen2-VL-72B. The training process involved a batch size of 128 and a learning rate 1e-5 for two epochs.

The results demonstrated significant performance improvements over the baseline models, with gains of +4.2% and +7.5% for Qwen2-VL-7B and LLaMA-3.2-11B-Vision-Instruct, respectively. Additionally, the Mulberry dataset outperformed reasoning models like LLaVA-Reasoner-8B and Insight-V-8B, showing superior performance on various benchmarks. Upon evaluation, CoMCTS improved its performance by 63.8%. The involvement of reflective reasoning data led to slight improvements in model performance. This reveals the effects of Mulberry-260K and CoMCTS in improving the accuracy and flexibility of reasoning.

In conclusion, the proposed CoMCTS proves to be an approach that improves reasoning in multimodal large language models (MLLMs) by incorporating collective learning into tree search methods. This framework improved the efficiency of searching for a reasoning path, as demonstrated by the Mulberry-260K dataset and the Mulberry model, which surpasses traditional models in complex reasoning tasks. The proposed methods provide valuable insights for future research, can serve as a basis for advancing MLLMs, and can act as a baseline for developing more efficient models capable of handling increasingly complex tasks.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Collective Monte Carlo Tree Search (CoMCTS): A New Learning-to-Reason Method for Multimodal Large Language Models appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/27/collective-monte-carlo-tree-search-comcts-a-new-learning-to-reason-method-for-multimodal-large-language-models/feed/ 0 66773
YuLan-Mini: A 2.42B Parameter Open Data-efficient Language Model with Long-Context Capabilities and Advanced Training Techniques https://www.marktechpost.com/2024/12/27/yulan-mini-a-2-42b-parameter-open-data-efficient-language-model-with-long-context-capabilities-and-advanced-training-techniques/ https://www.marktechpost.com/2024/12/27/yulan-mini-a-2-42b-parameter-open-data-efficient-language-model-with-long-context-capabilities-and-advanced-training-techniques/#respond Sat, 28 Dec 2024 01:51:39 +0000 https://www.marktechpost.com/?p=66770 Large language models (LLMs) built using transformer architectures heavily depend on pre-training with large-scale data to predict sequential tokens. This complex and resource-intensive process requires enormous computational infrastructure and well-constructed data pipelines. The growing demand for efficient and accessible LLMs has led researchers to explore techniques that balance resource use and performance, emphasizing achieving competitive […]

The post YuLan-Mini: A 2.42B Parameter Open Data-efficient Language Model with Long-Context Capabilities and Advanced Training Techniques appeared first on MarkTechPost.

]]>

Large language models (LLMs) built using transformer architectures heavily depend on pre-training with large-scale data to predict sequential tokens. This complex and resource-intensive process requires enormous computational infrastructure and well-constructed data pipelines. The growing demand for efficient and accessible LLMs has led researchers to explore techniques that balance resource use and performance, emphasizing achieving competitive results without relying on industry-scale resources.

Developing LLMs is filled with challenges, especially regarding computation and data efficiency. Pre-training models with billions of parameters demand advanced techniques and substantial infrastructure. High-quality data and robust training methods are crucial, as models face gradient instability and performance degradation during training. Open-source LLMs often struggle to match proprietary counterparts because of limited access to computational power and high-caliber datasets. Therefore, the challenge lies in creating efficient and high-performing models, enabling smaller research groups to participate actively in advancing AI technology. Solving this problem necessitates innovation in data handling, training stabilization, and architectural design.

Existing research in LLM training emphasizes structured data pipelines, using techniques like data cleaning, dynamic scheduling, and curriculum learning to improve learning outcomes. However, stability remains a persistent issue. Large-scale training is susceptible to gradient explosions, loss spikes, and other technical difficulties, requiring careful optimization. Training long-context models introduce additional complexity as attention mechanisms’ computational demands grow quadratically with sequence length. Existing approaches like advanced optimizers, initialization strategies, and synthetic data generation help alleviate these issues but often fall short when scaled to full-sized models. The need for scalable, stable, and efficient methods in LLM training is more urgent than ever.

Researchers at the Gaoling School of Artificial Intelligence, Renmin University of China, developed YuLan-Mini. With 2.42 billion parameters, this language model improves computational efficiency and performance with data-efficient methods. By leveraging publicly available data and focusing on data-efficient training techniques, YuLan-Mini achieves remarkable performance comparable to larger industry models.

YuLan-Mini’s architecture incorporates several innovative elements to enhance training efficiency. Its decoder-only transformer design employs embedding tying to reduce parameter size and improve training stability. The model uses Rotary Positional Embedding (ROPE) to handle long contexts effectively, extending its context length to 28,672 tokens, an advancement over typical models. Other key features include SwiGLU activation functions for better data representation and a carefully designed annealing strategy that stabilizes training while maximizing learning efficiency. Synthetic data was critical, supplementing the 1.08 trillion tokens of training data sourced from open web pages, code repositories, and mathematical datasets. These features enable YuLan-Mini to deliver robust performance with a limited computing budget.

YuLan-Mini’s performance achieved scores of 64.00 on HumanEval in zero-shot scenarios, 37.80 on MATH-500 in four-shot settings, and 49.10 on MMLU in five-shot tasks. These results underscore its competitive edge, as the model’s performance is comparable to much larger and resource-intensive counterparts. The innovative context length extension to 28K tokens allowed YuLan-Mini to excel in long-text scenarios while still maintaining high accuracy in short-text tasks. This dual capability sets it apart from many existing models, which often sacrifice one for the other.

Key takeaways from the research include:

  • Using a meticulously designed data pipeline, YuLan-Mini reduces reliance on massive datasets while ensuring high-quality learning.
  • Techniques like systematic optimization and annealing prevent common issues like loss spikes and gradient explosions.
  • Extending the context length to 28,672 tokens enhances the model’s applicability to complex, long-text tasks.
  • Despite its modest computational requirements, YuLan-Mini achieves results comparable to those of much larger models, demonstrating the effectiveness of its design.
  • The integration of synthetic data improves training outcomes and reduces the need for proprietary datasets.

In conclusion, YuLan-Mini is a great new addition to evolving efficient LLMs. Its ability to deliver high performance with limited resources addresses critical barriers to AI accessibility. The research team’s focus on innovative techniques, from data efficiency to training stability, highlights the potential for smaller-scale research to contribute to the field significantly. With just 1.08T tokens, YuLan-Mini sets a benchmark for resource-efficient LLMs.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post YuLan-Mini: A 2.42B Parameter Open Data-efficient Language Model with Long-Context Capabilities and Advanced Training Techniques appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/27/yulan-mini-a-2-42b-parameter-open-data-efficient-language-model-with-long-context-capabilities-and-advanced-training-techniques/feed/ 0 66770
Unveiling Privacy Risks in Machine Unlearning: Reconstruction Attacks on Deleted Data https://www.marktechpost.com/2024/12/27/unveiling-privacy-risks-in-machine-unlearning-reconstruction-attacks-on-deleted-data/ https://www.marktechpost.com/2024/12/27/unveiling-privacy-risks-in-machine-unlearning-reconstruction-attacks-on-deleted-data/#respond Sat, 28 Dec 2024 01:42:21 +0000 https://www.marktechpost.com/?p=66764 Machine unlearning is driven by the need for data autonomy, allowing individuals to request the removal of their data’s influence on machine learning models. This field complements data privacy efforts, which focus on preventing models from revealing sensitive information about the training data through attacks like membership inference or reconstruction. While differential privacy methods limit […]

The post Unveiling Privacy Risks in Machine Unlearning: Reconstruction Attacks on Deleted Data appeared first on MarkTechPost.

]]>

Machine unlearning is driven by the need for data autonomy, allowing individuals to request the removal of their data’s influence on machine learning models. This field complements data privacy efforts, which focus on preventing models from revealing sensitive information about the training data through attacks like membership inference or reconstruction. While differential privacy methods limit these risks, unlearning enables the deletion of data from a trained model, ensuring it behaves as if the data were never included in the first place. Achieving this efficiently, without retraining the entire model, has been a key focus, particularly for complex models like deep neural networks.

However, unlearning introduces new privacy risks. When adversaries compare a model’s parameters before and after data deletion, they can exploit the differences to reconstruct the deleted data, even for simple models like linear regression. This process leverages the gradient of the deleted sample and the expected Hessian derived from public data to approximate the changes caused by unlearning. The approach highlights a unique vulnerability where unlearning unintentionally exposes sensitive data. By extending existing techniques for gradient-based reconstruction attacks, this research reveals how unlearning can facilitate exact data reconstruction, emphasizing the importance of safeguards like differential privacy to mitigate these risks.

Researchers from AWS AI, the University of Pennsylvania, the University of Washington, Carnegie Mellon University, and Jump Trading reveal that data deletion in machine learning models, even simple ones, exposes individuals to high-accuracy reconstruction attacks. These attacks recover deleted data by exploiting differences in model parameters before and after deletion. The study demonstrates effective attacks on linear regression models using closed-form training algorithms and extends these methods to models with pre-trained embeddings and generic architectures via Newton’s method. Experiments on tabular and image datasets highlight significant privacy risks in retraining for unlearning without safeguards like differential privacy.

The researchers present an attack to reconstruct deleted user data from regularized linear regression models by analyzing parameter changes before and after deletion. The method leverages the relationship between model parameters and the removed sample, approximating key statistics using public data. The approach generalizes to models with fixed embeddings and extends to non-linear architectures using Newton’s approximation method. Experiments demonstrate its applicability to multiclass classification and label inference by estimating gradients and reconstructing deleted data. This highlights the vulnerability of models to privacy breaches, especially without safeguards, as the attack remains effective across various architectures and loss functions.

The study evaluates our attack across diverse datasets for classification and regression tasks, including tabular and image data. Using full retraining, they compare model parameters before and after a single sample’s deletion. Our method leverages public data from the same distribution without needing knowledge of the deleted sample. Against baselines like “Avg” (average of public samples) and “MaxDiff” (maximizing parameter change), our attack consistently outperforms, achieving higher cosine similarity with deleted samples. Tested on MNIST, CIFAR10, and ACS income data, our approach reconstructs deleted samples effectively across various models, emphasizing vulnerabilities in machine learning systems and the need for privacy safeguards.

In conclusion, The work introduces a reconstruction attack capable of recovering deleted data from simple machine-learning models with high accuracy. The attack achieves near-perfect results for linear regression and performs effectively on models using embeddings or optimizing different loss functions. Highlighting privacy risks in data deletion or machine unlearning, the findings emphasize the need for techniques like differential privacy. Counterintuitively, data deletion updates can increase vulnerability to reconstruction attacks, even in basic models, exposing sensitive data. Through extensive experiments on diverse datasets, this study underscores the significant privacy risks posed by data deletion requests, even in seemingly low-risk model settings.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Unveiling Privacy Risks in Machine Unlearning: Reconstruction Attacks on Deleted Data appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/27/unveiling-privacy-risks-in-machine-unlearning-reconstruction-attacks-on-deleted-data/feed/ 0 66764
Meet SemiKong: The World’s First Open-Source Semiconductor-Focused LLM https://www.marktechpost.com/2024/12/27/meet-semikong-the-worlds-first-open-source-semiconductor-focused-llm/ https://www.marktechpost.com/2024/12/27/meet-semikong-the-worlds-first-open-source-semiconductor-focused-llm/#respond Fri, 27 Dec 2024 20:16:13 +0000 https://www.marktechpost.com/?p=66760 The semiconductor industry enables advancements in consumer electronics, automotive systems, and cutting-edge computing technologies. The production of semiconductors involves sophisticated processes that demand unparalleled precision and expertise. These processes include chip design, manufacturing, testing, and optimization, each stage requiring deep domain knowledge. The field has traditionally depended on seasoned engineers whose experience has been built […]

The post Meet SemiKong: The World’s First Open-Source Semiconductor-Focused LLM appeared first on MarkTechPost.

]]>

The semiconductor industry enables advancements in consumer electronics, automotive systems, and cutting-edge computing technologies. The production of semiconductors involves sophisticated processes that demand unparalleled precision and expertise. These processes include chip design, manufacturing, testing, and optimization, each stage requiring deep domain knowledge. The field has traditionally depended on seasoned engineers whose experience has been built over decades. However, the industry faces a significant challenge: the rapid retirement of veteran experts, creating a knowledge gap that threatens innovation and efficiency. This growing concern has prompted companies to explore AI as a viable solution for capturing, scaling, and leveraging expert knowledge. Also, the cost and time associated with chip design and manufacturing must be minimized to meet market demands. These challenges highlight the limitations of traditional methods and emphasize the necessity of tailored AI solutions.

Existing approaches to these challenges include generalized AI models and basic automation tools. While these methods have been beneficial in analyzing data and improving decision-making, they often fall short in addressing the unique complexities of the semiconductor industry. General-purpose AI tools, for instance, lack the domain-specific understanding required to analyze intricate manufacturing processes effectively. As a result, companies cannot fully bridge the gap between theoretical AI capabilities and practical industry needs, leaving room for specialized solutions to transform the field.

Researchers from Meta, AITOMATIC, and other collaborators under the Foundation Models workgroup of the AI Alliance have introduced SemiKong. SemiKong represents the world’s first semiconductor-focused large language model (LLM), designed using the Llama 3.1 platform. This model was fine-tuned with extensive semiconductor-specific datasets, including industry documents, research papers, and anonymized operational data. Unlike generic AI systems, SemiKong is tailored to understand semiconductor processes’ unique terminology and requirements. By integrating this model with the AITOMATIC Domain-Expert Agents (DXAs), companies can effectively leverage AI tools to address specific industry challenges. These innovations aim to reduce costs, accelerate development timelines, and promote collaboration across the semiconductor sector.

The technology behind SemiKong is built on advanced AI and neurosymbolic architectures. AITOMATIC’s DXAs operate through a structured three-phase lifecycle: 

  1. Capturing domain expertise
  2. Training the model with synthetic and structured data
  3. Applying the resulting system in real-world scenarios 

SemiKong plays a central role in this ecosystem, acting as the “brain” for complex reasoning and decision-making tasks. Lightweight model versions, such as Llama 3.2, complement the main system by enabling faster data access and analysis in resource-constrained environments. These models integrate seamlessly with manufacturing systems and IoT platforms, allowing companies to optimize workflows, predict maintenance needs, and improve decision-making.

SemiKong has outperformed several closed-source language models in generating semiconductor-specific content and understanding complex processes. This has led to tangible benefits, including a 20-30% reduction in time to market for new chip designs and a 15-25% improvement in first-time-right manufacturing outcomes. These tools have also improved the onboarding process for new engineers, accelerating their learning curve by 40-50%. In one example, SemiKong-enabled DXAs reduced the time required for etching recipe formulation, which typically takes hours to minutes.

The key takeaways from the research underscore the significance of SemiKong and DXAs in the semiconductor field:

  1. DXAs effectively capture and structure the knowledge of veteran engineers, ensuring that critical expertise is preserved and scaled for future use.  
  2. SemiKong reduces chip design time-to-market by up to 30%, significantly cutting costs and improving operational efficiency.  
  3. By simplifying and expediting the onboarding process, DXAs help new engineers become productive faster, reducing the industry’s reliance on seasoned experts.  
  4. Integrating IoT platforms enables real-time parameter calibration and predictive maintenance, enhancing equipment performance and reliability.

In conclusion, the research highlights a pioneering solution to one of the semiconductor industry’s most pressing challenges: the loss of critical domain expertise. By introducing SemiKong and DXAs, the researchers have provided a comprehensive framework that preserves knowledge and enhances productivity and innovation. These advancements can potentially reshape semiconductor manufacturing, offering scalable, cost-effective solutions to address the field’s complexities. Integrating AI tools like SemiKong is crucial for a more efficient and resilient semiconductor industry.


Check out the Details and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Meet SemiKong: The World’s First Open-Source Semiconductor-Focused LLM appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/27/meet-semikong-the-worlds-first-open-source-semiconductor-focused-llm/feed/ 0 66760
Google DeepMind Introduces Differentiable Cache Augmentation: A Coprocessor-Enhanced Approach to Boost LLM Reasoning and Efficiency https://www.marktechpost.com/2024/12/27/google-deepmind-introduces-differentiable-cache-augmentation-a-coprocessor-enhanced-approach-to-boost-llm-reasoning-and-efficiency/ https://www.marktechpost.com/2024/12/27/google-deepmind-introduces-differentiable-cache-augmentation-a-coprocessor-enhanced-approach-to-boost-llm-reasoning-and-efficiency/#respond Fri, 27 Dec 2024 20:02:30 +0000 https://www.marktechpost.com/?p=66757 Large language models (LLMs) are integral to solving complex problems across language processing, mathematics, and reasoning domains. Enhancements in computational techniques focus on enabling LLMs to process data more effectively, generating more accurate and contextually relevant responses. As these models become complex, researchers strive to develop methods to operate within fixed computational budgets without sacrificing […]

The post Google DeepMind Introduces Differentiable Cache Augmentation: A Coprocessor-Enhanced Approach to Boost LLM Reasoning and Efficiency appeared first on MarkTechPost.

]]>

Large language models (LLMs) are integral to solving complex problems across language processing, mathematics, and reasoning domains. Enhancements in computational techniques focus on enabling LLMs to process data more effectively, generating more accurate and contextually relevant responses. As these models become complex, researchers strive to develop methods to operate within fixed computational budgets without sacrificing performance.

One major challenge in optimizing LLMs is their inability to effectively reason across multiple tasks or perform computations beyond their pre-trained architecture. Current methods for improving model performance involve generating intermediate steps during task processing, often at the cost of increased latency and computational inefficiency. This limitation hampers their ability to perform complex reasoning tasks, particularly those requiring longer dependencies or higher accuracy in predictions.

Researchers have explored methods like Chain-of-Thought (CoT) prompting, which guides LLMs to reason step by step. While effective in some cases, CoT relies on sequential processing of intermediate reasoning steps, leading to slower computation times. KV-cache compression has also been proposed to reduce memory usage but does little to improve reasoning capabilities. These approaches, though valuable, underscore the need for a method that combines efficiency with enhanced reasoning ability.

Researchers from Google DeepMind have introduced a method called Differentiable Cache Augmentation. This technique uses a trained coprocessor to augment the LLM’s key-value (kv) cache with latent embeddings, enriching the model’s internal memory. The key innovation lies in keeping the base LLM frozen while training the coprocessor, which operates asynchronously. The researchers designed this method to enhance reasoning capabilities without increasing the computational burden during task execution.

The methodology revolves around a three-stage process. First, the frozen LLM generates a kv-cache from an input sequence, encapsulating its internal representation. This kv-cache is passed to the coprocessor, which processes it with additional trainable soft tokens. Not tied to specific words, these tokens act as abstract prompts for generating latent embeddings. Once processed, the augmented kv-cache is fed back into the LLM, enabling it to generate contextually enriched outputs. This asynchronous operation ensures the coprocessor’s enhancements are applied efficiently without delaying the LLM’s primary functions. Training the coprocessor is conducted using a language modeling loss, focusing solely on its parameters while preserving the integrity of the frozen LLM. This targeted approach allows for scalable and effective optimization.

Performance evaluations demonstrated significant improvements. The method was tested on the Gemma-2 2B model, achieving considerable results across various benchmarks. For instance, on the reasoning-intensive GSM8K dataset, accuracy improved by 10.05% when 64 latent embeddings were used. Similarly, MMLU performance increased by 4.70% under the same configuration. These enhancements underscore the model’s ability to perform better on complex reasoning tasks. Further, perplexity reductions were observed at multiple token positions. For example, perplexity decreased by 3.94% at position one and 1.20% at position 32 when 64 latent embeddings were applied, showcasing the model’s improved prediction capabilities over longer sequences.

Further analysis showed that the augmentation’s effectiveness scales with the number of latent embeddings. For GSM8K, accuracy rose incrementally with additional embeddings, from 1.29% with four embeddings to the peak improvement of 10.05% with 64 embeddings. Similar trends were observed in other benchmarks like ARC and MATH, indicating the broader applicability of this method. The researchers confirmed that their approach consistently outperformed baseline models without task-specific fine-tuning, demonstrating its robustness and adaptability.

This work represents a significant step forward in enhancing LLMs’ reasoning capabilities. By introducing an external coprocessor to augment the kv-cache, the researchers from Google DeepMind have created a method that improves performance while maintaining computational efficiency. The results highlight the potential for LLMs to tackle more complex tasks, paving the way for further exploration into modular enhancements and scalable reasoning systems. This breakthrough underscores the importance of continual innovation in AI to meet the growing demands of reasoning-intensive applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Google DeepMind Introduces Differentiable Cache Augmentation: A Coprocessor-Enhanced Approach to Boost LLM Reasoning and Efficiency appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/27/google-deepmind-introduces-differentiable-cache-augmentation-a-coprocessor-enhanced-approach-to-boost-llm-reasoning-and-efficiency/feed/ 0 66757
AWS Researchers Propose LEDEX: A Machine Learning Training Framework that Significantly Improves the Self-Debugging Capability of LLMs https://www.marktechpost.com/2024/12/26/aws-researchers-propose-ledex-a-machine-learning-training-framework-that-significantly-improves-the-self-debugging-capability-of-llms/ https://www.marktechpost.com/2024/12/26/aws-researchers-propose-ledex-a-machine-learning-training-framework-that-significantly-improves-the-self-debugging-capability-of-llms/#respond Fri, 27 Dec 2024 07:26:34 +0000 https://www.marktechpost.com/?p=66754 Code generation using Large Language Models (LLMs) has emerged as a critical research area, but generating accurate code for complex problems in a single attempt remains a significant challenge. Even skilled human developers often require multiple iterations of trial-and-error debugging to solve difficult programming problems. While LLMs have demonstrated impressive code generation capabilities, their self-debugging […]

The post AWS Researchers Propose LEDEX: A Machine Learning Training Framework that Significantly Improves the Self-Debugging Capability of LLMs appeared first on MarkTechPost.

]]>

Code generation using Large Language Models (LLMs) has emerged as a critical research area, but generating accurate code for complex problems in a single attempt remains a significant challenge. Even skilled human developers often require multiple iterations of trial-and-error debugging to solve difficult programming problems. While LLMs have demonstrated impressive code generation capabilities, their self-debugging ability to analyze incorrect code and make necessary corrections is still limited. This limitation is evident in open-source models like StarCoder and CodeLlama, which show significantly lower self-refinement performance compared to models like GPT-3.5-Turbo.

Existing approaches to improve code generation and debugging capabilities in LLMs have followed several distinct paths. LLMs have shown significant success across various code-related tasks, including code generation, bug fixing, program testing, and fuzzing. These models use extensive pre-training on vast datasets to understand patterns and generate contextually relevant code. However, most existing work has primarily focused on single-round generation rather than iterative improvement. Other methods like ILF, CYCLE, and Self-Edit have explored supervised fine-tuning approaches while solutions like OpenCodeInterpreter and EURUS have attempted to create high-quality multi-turn interaction datasets using advanced models for fine-tuning purposes.

Researchers from Purdue University, AWS AI Labs, and the University of Virginia have proposed LEDEX (learning to self-debug and explain code), a novel training framework designed to enhance LLMs’ self-debugging capabilities. The framework builds on the observation that a sequential process of explaining incorrect code followed by refinement enables LLMs to analyze and improve faulty code in a better way. LEDEX implements an automated pipeline to collect high-quality datasets for code explanation, and refinement. Moreover, it combines supervised fine-tuning (SFT) and reinforcement learning (RL) approaches, utilizing successful and failed trajectories with a specialized reward system that evaluates code explanation and refinement quality.

LEDEX employs a comprehensive architecture containing data collection, verification, and multi-stage training processes. The framework begins by collecting code explanation and refinement datasets through queries to pre-trained or instruction-tuned models. These responses undergo rigorous execution-based verification to filter and maintain only high-quality explanation and refinement data. The collected dataset then serves as input for supervised fine-tuning which significantly enhances the model’s capabilities in bug explanation and code refinement. LEDEX uses programming problems from MBPP, APPS, and CodeContests to train data. To expand the dataset of incorrect solutions, the framework prompts pre-trained LLMs like StarCoder and CodeLlama with 3-shot examples to generate 20 solutions per problem.

LEDEX is evaluated using three model backbones: StarCoder-15B, CodeLlama-7B, and CodeLlama-13B, with initial training data collected from GPT-3.5-Turbo. The SFT phase shows significant improvements, achieving up to a 15.92% increase in pass@1 and 9.30% in pass@10 metrics across four benchmark datasets. The subsequent RL phase further enhances performance with additional improvements of up to 3.54% in pass@1 and 2.55% in pass@10. Notably, LEDEX’s model-agnostic nature is shown through experiments with CodeLlama-7B, which achieve substantial improvements (8.25% in pass@1 and 2.14% in pass@10) even when trained on data collected from CodeLlama-34B or itself, proving its effectiveness independent of GPT-3.5-Turbo.

In conclusion, researchers introduced LEDEX, a comprehensive and scalable framework that combines automated data collection, verification processes, SFT, and RL with innovative reward designs to significantly improve LLMs’ ability to identify and correct code errors. The framework’s model-agnostic nature is evidenced by its successful implementation with GPT-3.5-Turbo and CodeLlama, while its rigorous data verification process ensures the quality of code explanations and refinements. Human evaluations further validate the framework’s effectiveness, confirming that LEDEX-trained models produce superior code explanations that effectively assist developers in understanding and resolving code issues.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post AWS Researchers Propose LEDEX: A Machine Learning Training Framework that Significantly Improves the Self-Debugging Capability of LLMs appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/26/aws-researchers-propose-ledex-a-machine-learning-training-framework-that-significantly-improves-the-self-debugging-capability-of-llms/feed/ 0 66754
DeepSeek-AI Just Released DeepSeek-V3: A Strong Mixture-of-Experts (MoE) Language Model with 671B Total Parameters with 37B Activated for Each Token https://www.marktechpost.com/2024/12/26/deepseek-ai-just-released-deepseek-v3-a-strong-mixture-of-experts-moe-language-model-with-671b-total-parameters-with-37b-activated-for-each-token/ https://www.marktechpost.com/2024/12/26/deepseek-ai-just-released-deepseek-v3-a-strong-mixture-of-experts-moe-language-model-with-671b-total-parameters-with-37b-activated-for-each-token/#respond Fri, 27 Dec 2024 04:32:12 +0000 https://www.marktechpost.com/?p=66743 The field of Natural Language Processing (NLP) has made significant strides with the development of large-scale language models (LLMs). However, this progress has brought its own set of challenges. Training and inference require substantial computational resources, the availability of diverse, high-quality datasets is critical, and achieving balanced utilization in Mixture-of-Experts (MoE) architectures remains complex. These […]

The post DeepSeek-AI Just Released DeepSeek-V3: A Strong Mixture-of-Experts (MoE) Language Model with 671B Total Parameters with 37B Activated for Each Token appeared first on MarkTechPost.

]]>

The field of Natural Language Processing (NLP) has made significant strides with the development of large-scale language models (LLMs). However, this progress has brought its own set of challenges. Training and inference require substantial computational resources, the availability of diverse, high-quality datasets is critical, and achieving balanced utilization in Mixture-of-Experts (MoE) architectures remains complex. These factors contribute to inefficiencies and increased costs, posing obstacles to scaling open-source models to match proprietary counterparts. Moreover, ensuring robustness and stability during training is an ongoing issue, as even minor instabilities can disrupt performance and necessitate costly interventions.

DeepSeek-AI just gave a Christmas present to the AI world by releasing DeepSeek-V3, a Mixture-of-Experts (MoE) language model featuring 671 billion parameters, with 37 billion activated per token. The model builds on proven architectures such as Multi-Head Latent Attention (MLA) and DeepSeekMoE, which were refined in earlier versions. DeepSeek-V3 has been trained on an extensive dataset of 14.8 trillion high-quality tokens, ensuring a broad and diverse knowledge base. Importantly, the model is fully open-source, with accessible models, papers, and training frameworks for the research community to explore.

Technical Details and Benefits

DeepSeek-V3 incorporates several innovations aimed at addressing long-standing challenges in the field. Its auxiliary-loss-free load balancing strategy ensures efficient distribution of computational loads across experts while maintaining model performance. The adoption of a multi-token prediction training objective enhances data efficiency and facilitates faster inference through speculative decoding. Additionally, FP8 mixed precision training improves computational efficiency by reducing GPU memory usage without sacrificing accuracy. The DualPipe algorithm further minimizes pipeline bubbles by overlapping computation and communication phases, reducing all-to-all communication overhead. These advancements enable DeepSeek-V3 to process 60 tokens per second during inference—a significant improvement over its predecessor.

Performance Insights and Results

DeepSeek-V3 has been rigorously evaluated across multiple benchmarks, demonstrating strong performance. On educational datasets like MMLU and MMLU-Pro, it achieved scores of 88.5 and 75.9, respectively, outperforming other open-source models. In mathematical reasoning tasks, it set new standards with a score of 90.2 on MATH-500. The model also performed exceptionally in coding benchmarks such as LiveCodeBench. Despite these achievements, the training cost was kept relatively low at $5.576 million, requiring only 2.788 million H800 GPU hours. These results highlight DeepSeek-V3’s efficiency and its potential to make high-performance LLMs more accessible.

Conclusion

DeepSeek-V3 represents a meaningful advancement in open-source NLP research. By tackling the computational and architectural challenges associated with large-scale language models, it establishes a new benchmark for efficiency and performance. Its innovative training methods, scalable architecture, and strong evaluation results make it a competitive alternative to proprietary models. DeepSeek-AI’s commitment to open-source development ensures that the broader research community can benefit from its advancements.


Check out the Paper, GitHub Page, and Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post DeepSeek-AI Just Released DeepSeek-V3: A Strong Mixture-of-Experts (MoE) Language Model with 671B Total Parameters with 37B Activated for Each Token appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/26/deepseek-ai-just-released-deepseek-v3-a-strong-mixture-of-experts-moe-language-model-with-671b-total-parameters-with-37b-activated-for-each-token/feed/ 0 66743
A Comprehensive Analytical Framework for Mathematical Reasoning in Multimodal Large Language Models https://www.marktechpost.com/2024/12/26/a-comprehensive-analytical-framework-for-mathematical-reasoning-in-multimodal-large-language-models/ https://www.marktechpost.com/2024/12/26/a-comprehensive-analytical-framework-for-mathematical-reasoning-in-multimodal-large-language-models/#respond Fri, 27 Dec 2024 00:42:39 +0000 https://www.marktechpost.com/?p=66737 Mathematical reasoning has emerged as a critical frontier in artificial intelligence, particularly in developing Large Language Models (LLMs) capable of performing complex problem-solving tasks. While traditional mathematical reasoning focuses on text-based inputs, modern applications increasingly involve multimodal elements including diagrams, graphs, and equations. This presents significant challenges for existing systems in processing and integrating information […]

The post A Comprehensive Analytical Framework for Mathematical Reasoning in Multimodal Large Language Models appeared first on MarkTechPost.

]]>

Mathematical reasoning has emerged as a critical frontier in artificial intelligence, particularly in developing Large Language Models (LLMs) capable of performing complex problem-solving tasks. While traditional mathematical reasoning focuses on text-based inputs, modern applications increasingly involve multimodal elements including diagrams, graphs, and equations. This presents significant challenges for existing systems in processing and integrating information across different modalities. The complexities extend beyond simple text comprehension, like deep semantic understanding, context preservation across modalities, and the ability to perform complex reasoning tasks combining visual and textual elements.

Since 2021, there has been a steady increase in math-specific Large Language Models (MathLLMs), each addressing different aspects of mathematical problem-solving. Early models like GPT-f and Minerva established foundational capabilities in mathematical reasoning, while Hypertree Proof Search and Jiuzhang 1.0 advanced theorem proving and question understanding. The field further diversified in 2023 by introducing multimodal support through models like SkyworkMath, followed by specialized developments in 2024 focusing on mathematical instruction (Qwen2.5-Math) and proof capabilities (DeepSeek-Proof). Despite these advancements, existing approaches focus too narrowly on specific mathematical domains or fail to address the challenges of multimodal mathematical reasoning.

Researchers from HKUST (GZ), HKUST, NTU, and Squirrel AI have proposed a comprehensive analytical framework to understand the landscape of mathematical reasoning in the context of multimodal large language models (MLLMs). Researchers reviewed over 200 research papers published since 2021, focusing on the emergence and evolution of Math-LLMs in multimodal environments. This systematic approach examines the multimodal mathematical reasoning pipeline while investigating the role of both traditional LLMs and MLLMs. The research particularly emphasizes the identification and analysis of five major challenges that affects the achievement of artificial general intelligence in mathematical reasoning.

The basic architecture focuses on problem-solving scenarios where the input consists of problem statements presented either in pure textual format or accompanied by visual elements such as figures and diagrams. The system processes these inputs to generate solutions in numerical or symbolic formats. While English dominates the available benchmarks, some datasets exist in other languages like Chinese and Romanian. Dataset sizes vary significantly, ranging from compact collections like QRData with 411 questions to extensive repositories like OpenMathInstruct-1 containing 1.8 million problem-solution pairs.

The evaluation of mathematical reasoning capabilities in MLLMs uses two primary approaches: discriminative and generative evaluation methods. In discriminative evaluation, models are evaluated based on their ability to correctly classify or select answers, with advanced metrics like performance drop rate (PDR), and specialized metrics like error step accuracy. The generative evaluation approach focuses on the model’s capacity to produce detailed explanations and step-by-step solutions. Notable frameworks like MathVerse utilize GPT-4 to evaluate the reasoning process, while CHAMP implements a solution evaluation pipeline where GPT-4 serves as a grader comparing generated answers against ground truth solutions.

Here are the five key challenges in mathematical reasoning with MLLMs:

  • Visual Reasoning Limitations: Current models struggle with complex visual elements like 3D geometry and irregular tables.
  • Limited Multimodal Integration: While models handle text and vision, they cannot process other modalities like audio explanations or interactive simulations.
  • Domain Generalization Issues: Models that excel in one mathematical domain often fail to perform well in others, limiting their practical utility.
  • Error Detection and Feedback: MLLMs currently lack robust mechanisms to detect, categorize, and correct mathematical errors effectively.
  • Educational Integration Challenges: Current systems don’t adequately account for real-world educational elements like handwritten notes and draft work.

In conclusion, researchers presented a comprehensive analysis of mathematical reasoning in MLLMs, that reveals significant progress and persistent challenges in the field. The emergence of specialized Math-LLMs has shown substantial advancement in handling complex mathematical tasks, particularly in multimodal environments. Moreover, addressing the above five challenges is crucial for developing more sophisticated AI systems capable of human-like mathematical reasoning. The insights from this analysis provide a roadmap for future research directions, highlighting the importance of more robust and versatile models that can effectively handle the complexities of mathematical reasoning.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post A Comprehensive Analytical Framework for Mathematical Reasoning in Multimodal Large Language Models appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/26/a-comprehensive-analytical-framework-for-mathematical-reasoning-in-multimodal-large-language-models/feed/ 0 66737
This Research from Amazon Explores Step-Skipping Frameworks: Advancing Efficiency and Human-Like Reasoning in Language Models https://www.marktechpost.com/2024/12/26/this-research-from-amazon-explores-step-skipping-frameworks-advancing-efficiency-and-human-like-reasoning-in-language-models/ https://www.marktechpost.com/2024/12/26/this-research-from-amazon-explores-step-skipping-frameworks-advancing-efficiency-and-human-like-reasoning-in-language-models/#respond Fri, 27 Dec 2024 00:15:09 +0000 https://www.marktechpost.com/?p=66734 The pursuit of enhancing artificial intelligence (AI) capabilities is significantly influenced by human intelligence, particularly in reasoning and problem-solving. Researchers aim to create language models that emulate human-like behaviors, such as optimizing reasoning processes. This involves exploring how models can transition from detailed, step-by-step solutions to more efficient methods by selectively skipping steps, a hallmark […]

The post This Research from Amazon Explores Step-Skipping Frameworks: Advancing Efficiency and Human-Like Reasoning in Language Models appeared first on MarkTechPost.

]]>

The pursuit of enhancing artificial intelligence (AI) capabilities is significantly influenced by human intelligence, particularly in reasoning and problem-solving. Researchers aim to create language models that emulate human-like behaviors, such as optimizing reasoning processes. This involves exploring how models can transition from detailed, step-by-step solutions to more efficient methods by selectively skipping steps, a hallmark of human expertise. These advancements contribute to achieving artificial general intelligence (AGI) with improved efficiency and task-solving capabilities.

A key challenge in AI is the models’ inability to replicate humans’ selective approach to skipping redundant steps during problem-solving. Humans develop this skill through practice, which allows them to reduce cognitive effort and focus on more complex aspects of a problem. Current language models lack this ability, adhering strictly to detailed processes even when simpler, equally effective solutions exist. Developing models incorporating such step-skipping behavior can enhance their efficiency and generalization abilities across various tasks.

Traditional training methods for language models involve step-by-step reasoning, relying on detailed datasets. Techniques such as chain-of-thought prompting encourage sequential solutions but do not address step skipping. As a result, while these models excel in solving problems comprehensively, they fail to demonstrate the efficiency observed in human experts. This limitation presents an opportunity to refine model training approaches to integrate more flexible reasoning capabilities.

Researchers from institutions like Fudan University, UC Santa Barbara, Shanghai AI Laboratory, Westlake University, and Amazon AWS AI developed a novel framework to address this. This approach introduces controlled training environments where models are guided to generate solutions with fewer steps without compromising accuracy. The method emphasizes training models on datasets combining complete and skipped reasoning paths, enabling them to learn efficient and accurate shortcuts.

The training framework comprises two main phases: initialization and iteration. The model is trained on a dataset containing comprehensive, step-by-step reasoning solutions during initialization. This establishes a foundational understanding of problem-solving. In the iteration phase, models are guided to generate shorter reasoning paths by reducing the number of steps in their responses. These shorter paths, verified for accuracy, are mixed with full-step solutions to create expanded datasets. Each iteration refines the model’s ability to identify and skip redundant steps, gradually improving efficiency. For instance, in tasks involving algebraic analogies, multi-digit arithmetic, and directional reasoning, the researchers generated datasets with detailed steps and selectively omitted certain steps to simulate human-like efficiency. These iterations allow the models to self-generate skipping data, refining their reasoning processes.

Empirical evaluations demonstrated the effectiveness of this approach across three tasks: algebraic analogies, multi-digit addition, and directional reasoning. Results highlighted that step-skipping enhanced both efficiency and generalization. For algebraic analogies, models achieved an accuracy increase of 4.76% in out-of-domain tasks, with a marked reduction in the number of reasoning steps. In multi-digit addition, performance improved by 13.91% in easier out-of-domain scenarios and by 4.75% in harder scenarios, underscoring the benefits of skipped reasoning steps. Similarly, directional reasoning tasks improved, with accuracy gains of up to 9.2% on challenging datasets. These results demonstrate that integrating skipped-step reasoning does not compromise task performance but enables models to solve problems more effectively and efficiently.

Further, the iterative training method showed that models could learn to balance accuracy and efficiency. Each iteration decreased the number of steps taken while maintaining or improving accuracy. By the fifth iteration, models consistently outperformed those trained solely on full-step datasets. This iterative refinement process also provided insights into the models’ ability to generalize to out-of-domain scenarios, suggesting that training on mixed datasets is instrumental in enhancing task-solving capabilities.

The study presents a significant advancement in equipping language models with human-like reasoning abilities. By incorporating step-skipping behavior, researchers demonstrated that models could achieve greater efficiency and maintain accuracy across diverse tasks. This approach addresses a critical limitation in existing models and opens avenues for future research on bridging the gap between human and machine reasoning. The contributions from leading institutions and companies underscore the collaborative efforts driving innovation in AI. The findings provide a promising direction for developing more efficient and versatile language models, paving the way for future advancements in artificial intelligence.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post This Research from Amazon Explores Step-Skipping Frameworks: Advancing Efficiency and Human-Like Reasoning in Language Models appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/26/this-research-from-amazon-explores-step-skipping-frameworks-advancing-efficiency-and-human-like-reasoning-in-language-models/feed/ 0 66734
Neural Networks for Scalable Temporal Logic Model Checking in Hardware Verification https://www.marktechpost.com/2024/12/26/neural-networks-for-scalable-temporal-logic-model-checking-in-hardware-verification/ https://www.marktechpost.com/2024/12/26/neural-networks-for-scalable-temporal-logic-model-checking-in-hardware-verification/#respond Thu, 26 Dec 2024 16:23:27 +0000 https://www.marktechpost.com/?p=66731 Ensuring the correctness of electronic designs is critical, as hardware flaws are permanent post-production and can compromise software reliability or the safety of cyber-physical systems. Verification is central to digital circuit engineering, with FPGA and IC/ASIC projects dedicating 40% and 60% of their time, respectively, to this process. While testing approaches, such as directed or […]

The post Neural Networks for Scalable Temporal Logic Model Checking in Hardware Verification appeared first on MarkTechPost.

]]>

Ensuring the correctness of electronic designs is critical, as hardware flaws are permanent post-production and can compromise software reliability or the safety of cyber-physical systems. Verification is central to digital circuit engineering, with FPGA and IC/ASIC projects dedicating 40% and 60% of their time, respectively, to this process. While testing approaches, such as directed or constrained random testing, are easy to implement, they are inherently non-exhaustive and cannot ensure the absence of critical errors. Formal verification, particularly model checking, addresses these limitations by mathematically confirming whether a design satisfies its specifications across all possible executions. However, methods like BDDs and SAT solvers remain computationally intensive and struggle to scale for complex circuits. Engineers often rely on bounded model checking to reduce computational demands, which sacrifices global correctness over extended time horizons.

Formal verification has evolved over decades, with temporal logic playing a key role in describing system behaviors. Based on Linear Temporal Logic (LTL), SystemVerilog Assertions are widely used to define safety and liveness properties. Safety properties are efficiently verified using BDDs, while SAT-based methods scale better for bounded model checking but remain incomplete without achieving impractically high thresholds. Advanced techniques like IC3 and Craig Interpolation improve unbounded safety checking, while Emerson-Lei fixed-point computations and k-liveness extend verification to liveness properties. Verifying systems with complex arithmetic remains challenging, often requiring explicit-state abstractions, inductive invariants, or ranking functions. Originally developed for software termination analysis, ranking functions have been generalized for hardware liveness verification, incorporating non-linear, piecewise-defined, and lexicographic methods to address modern system complexities.

Researchers from the University of Birmingham, Amazon Web Services, and Queen Mary University of London have developed a machine learning-based approach for hardware model checking that integrates neural networks and symbolic reasoning. Their method uses neural networks to represent proof certificates for LTL specifications, trained from randomly generated system executions. The approach guarantees formal correctness over unbounded time horizons by employing satisfiability solving to validate these certificates. Experiments demonstrate its effectiveness, outperforming both academic and commercial model checkers in speed and task completion across standard hardware verification problems, contributing to improved safety and reliability in system designs.

LTL model checking verifies if all possible sequences of actions in a system (M) comply with a given LTL formula (Phi), which describes the desired temporal properties. The system (M) includes input and state variables, with its behavior determined by transition rules. To check this, (Phi) is converted into a type of automaton called a Büchi automaton (A_Phi). The verification ensures that the combined system (M) and the automaton (A_neg Phi) (representing the formula’s negation) have no valid infinite sequences. Neural ranking functions aid in proving termination and are validated using SMT solvers.

The experimental evaluation tested 194 verification tasks derived from 10 parameterized hardware designs with varying complexity. A prototype neural model-checking tool was developed, using Spot to generate automata, Verilator for data generation, PyTorch for training, and Bitwuzla for SMT-solving. The tool was benchmarked against industry leaders ABC, nuXmv, and anonymized tools X and Y. It completed 93% of tasks, outperforming competitors in scalability and runtime, although challenges like local minima and extended SMT-check times remain. While generally faster, it struggled with trivial tasks like UARTt due to overhead. The method’s limitations include reliance on word-level inputs and risks of dataset bias.

In conclusion, the study introduces an approach to model-checking temporal logic using neural networks as proof certificates for hardware verification. Neural networks are trained on synthetic system executions, leveraging their ability to represent ranking functions for fair termination. The method combines machine learning and symbolic reasoning by validating neural certificates with satisfiability solvers, ensuring formal guarantees. Applied to SystemVerilog designs, it outperforms state-of-the-art tools in scalability. Despite the computational demand of SMT solving, the approach is effective with simple feed-forward networks. This marks the first successful use of neural certificates for temporal logic, establishing a foundation for further advancements in model checking.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

The post Neural Networks for Scalable Temporal Logic Model Checking in Hardware Verification appeared first on MarkTechPost.

]]>
https://www.marktechpost.com/2024/12/26/neural-networks-for-scalable-temporal-logic-model-checking-in-hardware-verification/feed/ 0 66731