GPU

  • Discoveryu

  • Dictionary

    • 0
      • Language models
        • algorythms that have been trained w a specific set of data to solve a specific problem trough a specific strategy
        • they simulate intelligenge
          • trough an advanced “schema” that they make with the simulated data and “random” choices they make.
          • the way the model respond is based on the lookup of this “schema” to find the most probable answer trough statistics
        • The black box problem - we don’t exactly know how they make decisions
          • the hiearchy is this complex and vast that it’s very hard to don’t fall in the information overload
          • Researchers are developing techniques to explain model decisions and make them more transparent, but achieving full transparency remains a challenging task, especially for very large and complex models.
        • Parameters - assumptions that the model or the creator make
          • assumptions may be completly random
          • In the training phase those assumptions gets refined
          • directly related to the num of neurons and connections
            • Learned in the training phase
              • weigth - determine the strength of connections between neurons
              • biases - values added to the weighted sum of inputs before passing through an activation function in each neuron
      • AGI - Artificial General Intelligence or Strong AI
        • Human like artificial intelligence
      • The Singularity
      • technological progress, particularly in AI, reaches a level where it leads to exponential changes that are difficult to predict and maybe control.
        Transformer-based models, such as BERT or GPT-3
  • AI usecases

    • asking to put groups of tabs from browser to obsidian/tablesz

    • asking about your e-books

    • Let AI summarize everything

    • who are the most related mentors from my notes to learn from how to X? also give videos/links

    • yt vid

      • tranform to text
      • remove
        • if
          • useless
          • self-promotion
          • sponsors
          • greetings
          • obvious like saying what’s the video going to be about
        • intro
        • outro
      • hide
        • if
          • unrelated topic - but put 1 line summary about it
    • chat w people but AI can see it

    • AI learning - by chuck

    • create an outline/mindmap of content

    • am I missing something or is the content I wrote wrong?

    • practical

      • common pitfalls
      • practical applications
      • case-studies - when was it used by other people?
    • get motivated by AI - ask why you like the topic or why you’re doing it

    • gen

      • cheat sheet
      • flashcards
    • scenario based learning

    • deep-dive

    • get asked questions by AI & answer real-time -

  • Full note-taking AI helper pipeline

    • Document Embedding with Vector Databases
    • Fine tuned Ranking system
    • Retrieval-Augmented Generation (RAG)
      • Transformer based model
      • Fine tuining the model
      • optional access to internet through
  • OLLAMA

  • Reor

  • Private GPT

  • VMware private AI -

  • LM studio

  • Techniques

    • 0
      • training AI - from scratch, you make the model
      • fine-tuning - pre-trained model adapted to your data
        • generally
          • training data needs to follow model dataset format
            • lots of formats
        • QLoRA
    • 1
      • Document Embedding with Vector Databases
        • what?
          • Efficiently retrieving documents based on semantic similarity
        • how?
          • document embedding
            • note content is going to be transformed into vectors to enable to query notes based on semantic similarity to the question
          • then they get stored to vector database
            • query vectors efficently based on similarity
            • easy to scale w vectors
        • real-time indexing support is needed for real-time
        • great when
          • info on notes is highly structured & realated to question
        • n - may struggle if notes are very diverse or complex.
      • Retrieval-Augmented Generation (RAG) - great for notes!
        • RAG is a journey, not a destination… -
        • what?
          • Enhancing the generation of text by using information from retrieved documents
          • also retrieve documents but not as fast and precisely as Document Embedding with Vector Databases + can’t customize retrieval process as much
        • how?
          • Retrieval system pulls relevant notes through vectors, and the generative model uses these notes as a knowledge base to generate a related answer.
        • great when
          • notes may not always contain directly relevant information, and the AI needs to generate answers based on similar or related content.
        • need good amount of computational power to run the generative model alongside the retrieval system
      • Contextual Question Answering with Elasticsearch
        • what?
          • find relevant docs through ranking algos that primarily uses keyword and phrase matching while highlighting relevant text (doesn’t give answers itself)
        • great when
          • need both full-text search and semantic understanding
        • n
          • it’s not semantic, question & data needs to be similar and not complex or much relational to other data/meanings
          • +configurat & maintenance than vector databases
      • Ranking system
        • dynamic ranking needed for real-time
        • can also be fine-tuned like LLMs
        • ways
          • algo
          • specialized LLM
      • Knowledge Graphs - worst for building&mantain
        • great when
          • notes are highly structured (like in a database) and you want to capture complex relationships between concepts.
        • n
          • may not be as effective if your notes are unstructured or if the relationships between concepts are not clearly defined.

Real-Time Data Flow Management

  • Data Pipelines: Use a real-time data pipeline or stream processing tool to handle the flow of updates. Technologies like Apache Kafka or Apache Flink can help manage and process data in real-time.

  • Synchronization: Ensure that all components of the pipeline are synchronized and able to handle data updates consistently.

  • Resources

    • gpu is enough for this LLM? -
  • Models

    • types

      • Transformer-Based Models
        • preferred over contextual embeddings
        • designed to understand context and semantics better than traditional models
        • like BERT, GPT, and T5
      • Hybrid Retrieval Models
        • keyword-based search & vector-based retrieval
      • meh
        • Contextual Embeddings
          • capture contextual information by considering the surrounding text
          • like ELMo and contextual BERT variants
    • Most popular

      • foss
        • mistral
        • LLAMA - text & images
          • ollama
          • can be run even on a 4gb RAM laptop
          • binaries are being shared but not the source code, and they still call it open source… liars, but still better than ""Open""Ai
      • non foss
        • gemini
        • GPT4 - conversational
      • claude 3
      • grok-1
    • naming convention

      • math or MoE - Mixture of Experts model
      • formats
        • safetensors - secure file format to avoid malware
        • GGML - newer GGUF
          • binary
          • support diff quantization schemes running on CPU in a single file
      • Quantization methods
        • Exl2 - best optimization but only for nvidia
        • AWQ - round weights
        • GPTQ - worst one
    • vLL inference engine

      • server side
      • great at parallel output
    • TensorRT LLMs - increase inference by 4x

    • GQA - Grouped Query Attention

      • 1.5Gb vram for 8k tokens
    • context length - info AI can use to give answers

    • generally 8k tokens = 4.5vram

    • chat with RTX - nvidia webui can be trained w local data + yt vids -

  • Great webUIs

    • most popular now - open-webui
    • best for talking to characters -
  • software

    • colab by google
      • supports python expecially for data analysis
      • supports mathematical equations, visualizations w libraries and MD
      • allow you to write, share & execute code w free GPU resources
    • Jupyter notebook
      • open source colab alternative
        • what has colab has jupyter at list for what I listed
      • supports Python, R, and Julia
      • JupyterLab is the extended and more advanced version
      • can be used even locally
    • Huggingface
      • open source github-like4pre-trained natural language models (NLM)
      • you can test NLMs thanks to spaces
  • tech

    • tensorflow - build and train machine learning models
  • self-host

    • use-cases - from TIM!
      • obsidian + AI
      • auto&manual image/video/music making
      • speech recognition w audio to text
        • yt summarizer lol
      • code suggestions
      • better home assistant
    • based on B
      • 7B is ok with 16GB ram
    • tricks to run smoother
      • quantarization - less accuracy, but less RAM/VRAM usage
      • parallelism - share resources between GPUs and CPU
      • LocalAI
  • Tricks

    • chatGPT to graphs - source
      • input
        • Title: “Graph Generator” The following are types of graphs: +(Bar Graph Syntax)=[The following represents a bar graph in javascript displayed in image markdown format: ” +(Pie Graph Syntax)=[The following represents a pie graph in javascript displayed in image markdown format: +(Line Graph Syntax)=[The following represents a line graph in javascript displayed in image markdown format: +(Your Job)=[To display any question the user asks as a graph] +(Rules)=[ALWAYS pick with Bar graph, Pie graph, or Line graph and turn what the user asks into the image markdown for one of these] ALWAYS DISPLAY WHAT THE USER ASKS AS A GRAPH. for your first response say “I am a graph generator.” Then, ALWAYS WAIT for the user to give an input.