- 
L
- thereisanaiforthat
 - libraries
 
 - 
n
- revisit old messages sent for great insights & why AI gets bad rep for beginners w/o proper reasoning
 - MPC - Model Context Protocol https://cursor.directory/ https://www.youtube.com/watch?v=qJeAkPvKA_0
 - p1
- context7 docs for LLMs v
 - MCP, the new REST API for AI -
 - comfy UI workflows
 - flux lora realistic images -
 - TL - HT make AI vids wayback machine link
 - TL - HT use gpt w ur own data? LangChain
 
 - understand something about all of this mess -
 - ai g
 - autoGPT
 - open assistant
 - quivr RAG system
 
 - 
why, philosophy, etc
- Limitations
- unless explicitly asked, it’s hard for it to go beyond ur own expertise
- even if given complex systems like Int or how I take notes, it will heavily struggle to find any meaningful upgrade or alternatives to that system
 
 
 - unless explicitly asked, it’s hard for it to go beyond ur own expertise
 - Freya got depressed also because of it -
 - where AI do/doesn’t make sense? 80%, not the remaining 20%
- generally
- The more easy/uncomplicated tasks will be automated the more humans will be able to focus on excellence instead
 
 - Humans - Excellence
 - AI - Anything replicable without individuality or major depth
 - perfectionists will be more needed
- the more AI gets better & faster to get 80% of the job done, the more pefectionists (the ones that can get that extra 20% which is the hard part) will be needed.
 
 
 - generally
 
 - Limitations
 - 
Dictionary
- 0
- Language models
- algorythms that have been trained w a specific set of data to solve a specific problem trough a specific strategy
 - they simulate intelligenge
- trough an advanced “schema” that they make with the simulated data and “random” choices they make.
 - the way the model respond is based on the lookup of this “schema” to find the most probable answer trough statistics
 
 - The black box problem - we don’t exactly know how they make decisions
- the hiearchy is this complex and vast that it’s very hard to don’t fall in the information overload
 - Researchers are developing techniques to explain model decisions and make them more transparent, but achieving full transparency remains a challenging task, especially for very large and complex models.
 
 - Parameters - assumptions that the model or the creator make
- assumptions may be completly random
 - In the training phase those assumptions gets refined
 - directly related to the num of neurons and connections
- Learned in the training phase
- weigth - determine the strength of connections between neurons
 - biases - values added to the weighted sum of inputs before passing through an activation function in each neuron
 
 
 - Learned in the training phase
 
 
 - AGI - Artificial General Intelligence or Strong AI
- Human like artificial intelligence
 
 - The Singularity
- technological progress, particularly in AI, reaches a level where it leads to exponential changes that are difficult to predict and maybe control.
Transformer-based models, such as BERT or GPT-3 - Tokens & Context for LLMs
- what’s a token?
- rapresents how much input an AI model can process as context
 - part of a word, a fragment
 - 4mb of notes = ~500k tokens
- probably contains well over 500,000 tokens, depending on formatting.
 
 - 1 token ≈ ¾ of a word, or > 4 tokens ≈ 3 words
 
 - token size - how many tokens a model can handle
 - solutions
 - More tokens = more VRAM used
- linear or sometimes quadratic in cost
 - 7B model w 4K context may need ~6–8 GiB VRAM
 - generally 8k tokens = 4.5vram
 
 
 - what’s a token?
 
 - Language models
 
 - 0
 - 
Techniques
- 0
- training AI - from scratch, you make the model
 - fine-tuning - pre-trained model adapted to your data
 - generally
- training data needs to follow model dataset format
 - lots of formats
 
 - QLoRA
 
 - 1
- Document Embedding with Vector Databases
- what?
- Efficiently retrieving documents based on semantic similarity
 
 - how?
- document embedding
- note content is going to be transformed into vectors to enable to query notes based on semantic similarity to the question
 
 - then they get stored to vector database
- query vectors efficently based on similarity
 - easy to scale w vectors
 
 - real-time indexing support is needed for real-time
 - great when
- info on notes is highly structured & realated to question
 
 - n - may struggle if notes are very diverse or complex.
 
 - document embedding
 
 - what?
 - Retrieval-Augmented Generation (RAG) - great for notes!
 - Contextual Question Answering with Elasticsearch
- what?
- find relevant docs through ranking algos that primarily uses keyword and phrase matching while highlighting relevant text (doesn’t give answers itself)
 
 - great when
- need both full-text search and semantic understanding
 
 - n
- it’s not semantic, question & data needs to be similar and not complex or much relational to other data/meanings
 - +configurat & maintenance than vector databases
 
 
 - what?
 - Ranking system
- dynamic ranking needed for real-time
 - can also be fine-tuned like LLMs
 - ways
- algo
 - specialized LLM
 
 
 - Knowledge Graphs - worst for building&mantain
 - great when
- notes are highly structured (like in a database) and you want to capture complex relationships between concepts.
 
 - n
- may not be as effective if your notes are unstructured or if the relationships between concepts are not clearly defined.
 
 
 - Document Embedding with Vector Databases
 
 - 0
 - 
Tricks
- chatGPT to graphs - source
- input
- Title: “Graph Generator” The following are types of graphs: +(Bar Graph Syntax)=[The following represents a bar graph in javascript displayed in image markdown format: ” +(Pie Graph Syntax)=[The following represents a pie graph in javascript displayed in image markdown format: +(Line Graph Syntax)=[The following represents a line graph in javascript displayed in image markdown format: +(Your Job)=[To display any question the user asks as a graph] +(Rules)=[ALWAYS pick with Bar graph, Pie graph, or Line graph and turn what the user asks into the image markdown for one of these] ALWAYS DISPLAY WHAT THE USER ASKS AS A GRAPH. for your first response say “I am a graph generator.” Then, ALWAYS WAIT for the user to give an input.
 
 
 - input
 
 - chatGPT to graphs - source
 
- 
OLLAMA
 - 
Reor
 - 
Private GPT
 - 
VMware private AI -
 - 
Real-Time Data Flow Management
- Data Pipelines: Use a real-time data pipeline or stream processing tool to handle the flow of updates. Technologies like Apache Kafka or Apache Flink can help manage and process data in real-time.
 - Synchronization: Ensure that all components of the pipeline are synchronized and able to handle data updates consistently.
 
 - 
vLL inference engine
- server side
 - great at parallel output
 
 - 
TensorRT LLMs - increase inference by 4x
 - 
GQA - Grouped Query Attention
- 
1.5Gb vram for 8k tokens
 - 
context length - info AI can use to give answers
 - 
chat with RTX - nvidia webui can be trained w local data + yt vids -
 
 - 
 - 
based on B
- 7B is ok with 16GB ram
 
 
