Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors

October 19, 2023

Large Language Models

INTRODUCTION By now most of the world has been astonished with ChatGPT from OpenAI and its various abilities so much so that the debate of Artificial General Intelligence AGI has been restarted and given a fresh lease. Such is the impact of ChatGPT (and other similar models, stable diffusion for image generation) that various governments are already looking into incorporating AI at the same time they are also concerned about its potential or being misused and hence any regulations regarding its use are being discussed and proposed. In fact, a group of AI researchers have called for a moratorium on releasing new models (citation). Many jobs are being replaced by such models while other are transformed. In this article, we would try to understand what this amazing technology is and how various industries can use it for their own benefit. We shall start by unwrapping the term LLMs and discuss language models and some of the earlier attempts and their shortcomings. We will then discuss what powers the current LLM and how to train them. We will then discuss how to make an LLM follow human instructions and finally end with some of the use-cases where these LLMs can be used. UNWRAPING THE TERM LLM So, what does large and language model in the term Large Language Models mean? I think large is clear in its meaning, however the latter needs some introduction. Language Model is a term used by researchers to define a probabilistic model which finds the probability of a given sequence of terms. For example, what is the probability of the sentence The quick brown fox jumps over a lazy dog. In pure mathematical terms, given a sequence of words 𝑊 = (𝑤1, 𝑤2, … , 𝑤𝑘), a language model finds the probability: Before we discuss how to estimate this probability, we would like to understand why we might need to know this probability and what consequence does it have? Let’s first analyze how we humans learn and use the language. Most of us use intuition to make sentences and judge the grammatical correctness or appropriateness of a text, without necessarily being taught the rules or us remembering the grammatical rules precisely. Our brains have this amazing capability to deduce this intuition from regular use of language. Language models can be understood to be modelling this intuition from exactly the same source humans deduce it, i.e., language (in the form of text). Probabilistic likelihood is the tool that the language models use to measure it. Calculating the probability therefore gives us a measure of finding the most likely sequence versus an absurd or unlikely sequence. For example, consider the sentence, I am looking … my purse. The sequence with a preposition (at, for, towards, in, into etc.) would be most probable rather than having any other word in its place (It wouldn’t make sense to have computer/ cat/ hand etc. in the blank space). Therefore, the word sequences that are more likely will have a higher probability in contrast to those that are grammatically wrong or semantically incorrect. In fact, the next word prediction or masked word prediction is the pseudo-task used to train today’s LLMs, more on that later. Another benefit of language modelling is in machine translations since the correct translation will be more likely and hence with more probability. We can argue on similar lines on the tasks of information retrieval, speech recognition, summarization etc. HOW TO CALCULATE/ ESTIMATE THE PROBABILITY? From the product rule of probability, we know that, Therefore, equation (1) above can be expanded as below: That is, the probability of the whole word sequence is the product of the probability of each word given all the previous words in the sequence. For example, the probability, 𝑃 of the sentence “The quick brown fox jumps over the lazy dog” would be: However, calculating the exact probability this way is intractable for longer sequences because the number of terms grows exponentially. Language models therefore make simplifying assumptions, like only considering a limited context of a few words before and after the target word when calculating these conditional probabilities. They also use probability distributions over words and their contexts rather than exact values. We, therefore, have 𝑛-Grams where 𝑛 defines the number of words in the joint probability, sometimes also called the context window. Setting 𝑛 = 3, would be a trigram and equation (2) would be: We can already see the limitation of a smaller 𝑛, because it provides a too little context for the next word prediction. There have been other methods of language modelling like the Hidden Markov Model (HMM), Statistical Modelling etc., however all of them suffer from the limitation of context window. Also designing and creating the dataset is sometimes prohibitive since we need to deal with probabilities explicitly. TRANSFORMERS TO THE RESCUE Transformers are a type of neural network that are sequence-to-sequence, that is, they take as input a sequence of tokens and output another sequence of tokens. (Tokens can be understood as roughly being words but not necessarily words every time the way we understand them.) Transformer is the engine behind the extraordinary power and success of today’s LLMs, and attention is the key architectural thing in these transformers. We won’t explain what transformers are and their inner architecture here, however one can refer to the excellent blog by Jay Alammar and another one by Peter Bloem. To fully grasp the importance and formidable capabilities of transformers without delving into their intricate workings, it is crucial to retrace the origins of deep learning, which initially gained prominence in computer vision. In the pre-deep learning era, machine learning comprised two primary stages: feature engineering and model training. Domain-specific features were meticulously crafted for individual datasets and tasks, making them incompatible with different tasks. However, this paradigm shifted with the emergence of Convolutional Neural Networks (CNNs) trained on large-scale datasets like ImageNet for image classification. CNNs not only outperformed their predecessors but also introduced a new training

Large Language Models Read More »

The Evolving Landscape of Cyber Security Technologies

Introduction In today’s hyperconnected world, where digital transformation is revolutionizing every aspect of our lives, the need for robust cyber security technologies has never been more critical. Malicious actors are constantly evolving their tactics, making it imperative for organizations to stay ahead by leveraging advances in cyber security technologies. In this blog post, we will explore some of the key advancements in this field that are helping organizations secure their digital assets and protect against cyber threats. Assessment and Training Programs for Cybersecurity Professionals The first step towards bolstering cyber security is establishing a formalized assessment and training program for cybersecurity professionals. This program should align with industry standards such as the National Institute of Standards and Technology (NIST) Cybersecurity Workforce guidance. By incentivizing certifications in critical need areas through salary incentives, organizations can attract and retain top talent in the field of cyber security. This helps build a highly skilled workforce equipped with up-to-date knowledge and expertise to tackle emerging threats effectively. Standard Operating Procedures and Knowledge Management Developing robust Standard Operating Procedures (SOPs) and implementing a comprehensive knowledge management capability is crucial in ensuring organizational compliance with security policies, procedures, standards, and guidelines. Employees and stakeholders need to be well-informed on how to engage with the organization’s IT department and adhere to security protocols. By implementing a digital system, such as ClickUp or Atlassian’s Jira, organizations can efficiently manage and disseminate information in real-time, enabling seamless collaboration and streamlined workflows. Technology Architecture, Governance, and Deployment For organizations to effectively combat cyber threats, they must enhance their technology architecture, governance, and deployment strategies. The AretecSBD team advises the Office of Information Technology (OIT) to evaluate, elevate, and strengthen the agency’s cybersecurity posture. By assessing current IT operations and identifying gaps, recommendations can be made for improvement and optimization in line with industry best practices such as the NIST Cybersecurity Framework. Additionally, implementing a new cybersecurity Zero-Trust architecture and a standard set of controls can minimize risks and fortify the organization’s defenses against evolving threats. Real-time Cyber Event Detection In today’s threat landscape, it is crucial to detect and respond rapidly to cybersecurity events to minimize their impact on organizational systems and data. By leveraging advanced monitoring tools, organizations can identify potential vulnerabilities and cyber threats in real-time. These tools provide insights into system and network behavior, enabling early detection of malicious activities and timely response. It is essential to prioritize vulnerabilities based on severity and potential impact, promptly address them, and continuously test and verify fixes to ensure their effectiveness. In-Depth Security Assessment and Data Governance To ensure compliance with cybersecurity standards, it is essential to conduct comprehensive security assessments, such as PCPE penetration testing and vulnerability testing. These assessments help identify potential security risks and provide actionable insights to safeguard the organization’s overall infrastructure, systems, and data. Additionally, organizations must address challenges related to data documentation and cataloging. By establishing a comprehensive inventory of all data assets, organizations can improve data governance, compliance efforts, and overall cybersecurity posture. Leveraging Technology for Secure Application Development The secure development of applications is paramount in today’s digital landscape. By integrating scheduled dynamic security scans into the software development lifecycle using tools like GitLab, organizations can proactively identify and address potential vulnerabilities, minimizing the risk of security breaches. This approach ensures that the application remains secure for users and protects sensitive data from unauthorized access. Conclusion In conclusion, cyber security technologies have come a long way in helping organizations combat the ever-evolving threat landscape. By establishing formalized assessment and training programs, implementing SOPs and knowledge management capabilities, enhancing technology architectures, and leveraging tools for real-time detection, organizations can significantly strengthen their cyber defenses. Additionally, in-depth security assessments, data governance, and secure application development practices are vital in building a resilient cyber security framework. Embracing these advances and staying proactive is crucial to protecting digital assets and maintaining cybersecurity in today’s dynamic digital environment.

The Evolving Landscape of Cyber Security Technologies Read More »