CS 370 - Spring 2025. 4/16/2025


[Home]

Welcome to CS 370!

Video of the Day

How to Use Freepik's Real-time Sketch to Image AI Generator, Pikaso submitted by Caroline Baillie.
This video details how Pikaso (by Freepik) works. It is a relatively new AI picture generator based on human input. These currently exist through platforms like DALLE, but what makes Pikaso different is that you can have a simplistic sketch of what you want and move around objects to further customize the generated image. I think this is super cool and useful, specifically as someone who doesn't have the best artistic skills but has the creative vision.

I hereby solicit suggestions for the video of the day. Please email me your ideas with explanations. Selected entries will win 5 homework points. If your video is played at the beginning of class, you must also briefly explain something about the video and something about yourself - in person.

Canvas Quiz of the Day (need daily password)

Most days, there will be a simple canvas quiz related to the lecture. You need a password to activate the quiz, which I will provide in class. These quizzes will count toward your class participation grade. The quiz is available only during class. You get full credit for class participation by completing half of the quizzes.

Click for today's quiz.

Lecture 22: Natural Language Processing.

Announcements

  • Information Society Project Yale Law School. Weekly Events

  • DSAC Election is Live
    Voting is now open for the 2025-26 DSAC Election! It'll be open for quite some time (until Saturday, April 19th at 11:59 pm), and we'll send a few reminders before voting closes, so we encourage you to take the time to read all of our candidate statements before submitting your votes. Anyone, regardless of major, is eligible to vote as long as they've taken/are taking CPSC 223.

    Voting is important because the new DSAC will help influence curriculum and policy changes within the department! Here's some info about what DSAC does! The margin of victory last year was just two votes, so your vote could really change the outcome of the election!

    Two important notes -- we really want to make sure the election is fair!

  • No campaigning is allowed. Again, non-candidates are allowed to share the general voting form (please do!) and encourage others to vote, but they cannot ask people to vote for specific candidates.

  • Email campaigning is specifically disallowed.

    All votes will be kept confidential! The only people who have access to the results are Tyler Schroder, Prof Kim, Dr. Slade, and Harry Jain.

    Please direct any questions or issues with this form to Tyler Schroder.

  • Kaggle competition! Win big bucks!
    We are excited to announce the Yale/UNC-CH Geophysical Waveform Inversion Competition on Kaggle! This competition invites participants to develop machine learning algorithms for estimating subsurface properties, such as velocity maps, from seismic waveform data.

    By entering this challenge, you'll have the opportunity to bridge the gaps in seismic analysis by combining physics and machine learning. Your participation could lead to significant advancements in full waveform inversion, which has applications in subsurface energy exploration, medical diagnostics, non-destructive material testing, and more.

    Entry Deadline: June 23, 2025

    Prizes:
    1st Place - $12,000,
    2nd Place - $10,000,
    3rd Place - $10,000,
    4th Place - $10,000,
    5th Place - $8,000.

    This is a unique opportunity to showcase your skills, contribute to a critical area of research, and win substantial prizes. We look forward to seeing your innovative solutions and wish you the best of luck in the competition.

    Best regards,
    Lu Lu, Assistant Professor, Yale University
    Youzuo Lin, Associate Professor, University of North Carolina at Chapel Hill

    Lu Lu
    Assistant Professor of Statistics and Data Science
    Faculty of Institute for Foundations of Data Science
    Faculty of Wu Tsai Institute
    Faculty of Center for Algorithms, Data, and Market Design at Yale
    Faculty of Center for Materials Innovation
    Yale University
    
    https://lugroup.yale.edu

  • That Whole "Yale Thing" Video compilation of references to Yale in movies. Friday April 25, 7pm, HQ L02. (list of movies)

    Administrivia

  • I have office hours Wednesdays from 4-6 p.m. via Zoom, meeting ID 459 434 2854.

  • The TF's office hours are posted on Ed Discussion.

  • I am available for lunch on Mondays at 1 pm in Morse.

  • Homework assignments: [Assignments]. hw7 is now available. Note: there will be no hw8. You can concentrate on the paper and possibly the project.

    I have reviewed all the project proposals. Let me know if you did not get an OK, aka, Complete.

    I have added a section to the paper assignment explaining how to write a paper. Please review this.

    Asides from previous lectures

    AI in the news

  • Introducing Firebase Studio April 9, 2025. submitted by Eva Dale.
  • New open source AI company Deep Cogito releases first models and they’re already topping the charts VentureBeat, April 8, 2025.
  • DensePose From WiFi WiFi signals being used to track human posture and movement through walls, no cameras needed using AI. Article submitted by Biniyam Lombe.
  • Google AI Search Shift Leaves Website Makers Feeling ‘Betrayed’ Bloomberg, April 7, 2025.

  • Access to the Atlantic
  • Access to Economist (Economist.com)
  • Access to Financial Times
  • Access to Wall Street Journal from Yale.
  • Q and AI Bloomberg.
  • Access to Bloomberg.com from Yale.

    NLP and LLMs

  • AIMA Slides:

    Vanilla Natural Language Processing

  • NLP Progress Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks. (BLEU for machine translation.)

    AIMA Jupyter notebooks:

  • text.html
  • nlp.html
  • nlp_apps.html
  • WordNet open-source, hand-curated dictionary in machine readable form. provides the categories used by ImageNet.
  • Penn Treebank.
  • Natural Language Corpus Data Norvig, Beautiful Code (the actual data)

    Deep Learning for Natural Language Processing

  • Word embeddings: fastText, word2vec (Google TensorFlow), and GloVe (Stanford).
  • Moore's Law vs the More Law.
  • Hands-on large language models : language understanding and generation Jay Alammar, O'Reilly, 2024. (Yale library online book).
  • C4 (Colossal Clean Crawled Corpus)
  • Efficient Estimation of Word Representations in Vector Space the word2vec paper, by Jeff Dean and the Google guys. 2011.
  • Attention is All You Need Vaswani et al. 2017. Introduced the transformer architecture.
    [Home]