Thursday, July 6, 2023

Feature vs label in Machine Learning ?

 In the context of machine learning and data analysis, "features" and "labels" are two important concepts.


Features refer to the input variables or attributes that are used to represent the data. These are the characteristics or properties of the data that are considered as inputs to a machine learning model. For example, if you're building a spam detection system, the features could include the subject line, sender, and body of an email.


Labels, on the other hand, refer to the output variable or the target variable that you want the machine learning model to predict or classify. The labels represent the desired outcome or the ground truth associated with each data point. In the spam detection example, the labels would indicate whether an email is spam or not.


To train a machine learning model, you need a labeled dataset where each data point has both the features and the corresponding labels. The model learns patterns and relationships between the features and labels during the training process and uses that knowledge to make predictions or classifications on new, unseen data.


In summary, features are the input variables that describe the data, while labels are the output variables that represent the desired outcome or prediction associated with the data.

deploy falcon 7b & 40b on amazon sagemaker example

 https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/generativeai/llm-workshop/lab10-falcon-40b-and-7b/falcon-40b-deepspeed.ipynb 


https://youtu.be/-IV1NTGy6Mg 

https://www.philschmid.de/sagemaker-falcon-llm 

Wednesday, July 5, 2023

Difference between using transformer for multi-class classification and clustering using last hidden layer

 The difference between fine-tuning a transformer model for multi-class classification and using it with a classification header, versus fine-tuning and then extracting last hidden layer embeddings for clustering, lies in the objectives and methods of these approaches.


Fine-tuning with a classification header: In this approach, you train the transformer model with a classification head on your labeled data, where the model learns to directly predict the classes you have labeled. The final layer(s) of the model are adjusted during fine-tuning to adapt to your specific classification task. Once the model is trained, you can use it to classify new data into the known classes based on the learned representations.


Fine-tuning and extracting embeddings for clustering: Here, you also fine-tune the transformer model on your labeled data as in the previous approach. However, instead of using the model for direct classification, you extract the last hidden layer embeddings of the fine-tuned model for each input. These embeddings capture the learned representations of the data. Then, you apply a clustering algorithm (such as k-means or hierarchical clustering) on these embeddings to group similar instances together into clusters. This approach allows for discovering potential new categories or patterns in the data.

How cache can be enabled for embeded text as well for search query results in Azure AI ?

 Great question, Rahul! Caching in the context of Azure AI (especially when using **RAG pipelines with Azure OpenAI + Azure AI Search**) can...