AI learns how vision and sound...

Humans naturally learn by making connections between sight and sound. For instance, we can watch someone playing the cello and recognize that the cellist’s movements are generating the music we hear. A new approach developed by researchers from MIT and elsewhere improves an AI model’s ability to learn in this same fashion. This could be useful in applications such as journalism and film production, where the model could help with curating multimodal content through automatic video and audio retrieval. In the longer term, this work could be used to improve a robot’s ability to understand real-world environments, where auditory and visual information are often closely connected. Improving upon prior work from their group, the researchers created a method that helps machine-learning models align corresponding audio and visual data from video clips without the need for human labels. They adjusted how their original model is trained so it learns a finer-grained correspondence between a particular video frame and the audio that occurs in that moment. The researchers also made some architectural tweaks that help the system balance two distinct learning objectives, which improves performance. Taken together, these relatively simple improvements boost the accuracy of their approach in video retrieval tasks and...
Read more

Understand the Model Behind GPT-3, BERT,...

You know that expression When you have a hammer, everything looks like a nail? Well, in machine learning, it seems like we really have discovered a magical hammer for which everything is, in fact, a nail, and they’re called Transformers. Transformers are models that can be designed to translate text, write poems and op eds, and even generate computer code. In fact, lots of the amazing research I write about on daleonai.com is built on Transformers, like AlphaFold 2, the model that predicts the structures of proteins from their genetic sequences, as well as powerful natural language processing (NLP) models like GPT-3, BERT, T5, Switch, Meena, and others. You might say they’re more than meets the… ugh, forget it. If you want to stay hip in machine learning and especially NLP, you have to know at least a bit about Transformers. So in this post, we’ll talk about what they are, how they work, and why they’ve been so impactful. A Transformer is a type of neural network architecture. To recap, neural nets are a very effective type of model for analyzing complex data types like images, videos, audio, and text. But there are different types of neural networks optimized...
Read more

How to build a working AI...

Synthetic data is on the rise in artificial intelligence. It’s going to make AI cheaper, better and less biased. It’s also very obtainable and usable. In a short while, it has gone from being an experimental technology. To something, I would not hesitate to use for production AI solutions.  To illustrate that, I will build an AI that can classify the difference between apples and bananas. I will only use images of the two classes generated by another AI – In this case, using DALL-E Mini. An Apple or Banana recognizer  I will build an image classifier using only easy-to-access, free AutoAI tools. Generating data We need around 30 images of each label, bananas and apples. We will be using DALL-E Mini, an open-source version of NVIDIAs text-to-image model DALL-E 2. To generate the images, you can go to https://huggingface.co/spaces/dalle-mini/dalle-mini. Here you can prompt the text-to-image model with queries such as: “Banana on table” “Banana on random background” “Apple on table” “Apple on random background” Try to match the background you will be testing on.
Read more

ICCV 2023 top papers, general trends,...

I was lucky and privileged enough to attend the ICCV 2023 conference in Paris. After collecting papers and notes I decided to share my notes along with my favourite ones. Here are the best papers picked out along with their key ideas. If you like my notes below, share them on social media! Towards understanding the connection between generative and discriminative learning Key idea: A very new trend that I am extremely excited about is the connection between generative and discriminative modeling. Is there any shared representation between them? The authors demonstrate the existence of matching neurons (rosetta neurons) across different models that express a shared concept (such as object contours, object parts, and colors). These concepts emerge without any supervision or manual annotations.Source Yes! The paper “Rosetta Neurons: Mining the Common Units in a Model Zoo” showed that completely different models pretrained with completely different objectives learn shared concepts (such as object contours, object parts, and colors). These concepts emerge without any supervision or manual annotations. I had only seen object-related concepts emerge on the self-attention maps of self-supervised vision transformers such as DINO so far. They further show that the activations look similar, even for StyleGAN2. The process...
Read more

Open Source Video Generators Create Feature-Length...

Open Source Video Generators Create Feature-Length Films Open Source Video Generators Create Feature-Length Films a concept that once seemed futuristic is now reshaping the storytelling landscape. Whether you’re a filmmaker, developer, or content creator, this shift holds compelling opportunities. With artificial intelligence (AI) tools evolving rapidly, we are witnessing an age where anyone with a vision can produce a full-length film using entirely open source software. This revolution is sparking curiosity, demanding attention, and inviting both experts and hobbyists to rethink how stories are told. Also Read: Debating the True Meaning of Open-Source AI Filmmaking has long entailed large crews, high budgets, and complex logistics. Open source video generators dramatically lower these barriers. By leveraging AI, artists can now create scenes, characters, dialogue, and even background scores from a single dashboard. These tools are powered by image generation, deep learning algorithms, and natural language processing. Generators like Stable Diffusion, RunwayML, and OpenAI’s GPT models are proving that full-length feature films can be composed using lines of code and creative direction. These engines don’t just design frames they interpret text prompts into entire sequences, enriched with stylistic elements and coherent narratives. As these platforms improve, major studios and indie filmmakers alike...
Read more

Everyone Has Given Up on AI...

The End of the AI Safety Debate For years, a passionate contingent of researchers, ethicists, and policymakers warned about the potential dangers of unchecked artificial intelligence development. They argued about p(doom) probabilities, AI alignment strategies, and regulations that could prevent catastrophe. But as of now, that conversation has all but collapsed. The frontier AI companies—OpenAI, Anthropic, Google DeepMind, and others—have fully shifted gears. They’re no longer talking about pausing AI progress or carefully evaluating existential risks. Instead, they are racing to roll out increasingly advanced models with the primary focus being one thing: dominance. AI safety was once a core part of the conversation; now, it’s little more than a PR footnote. So, what happens now? The Cost of AI is Dropping to Near Zero One of the most overlooked aspects of AI acceleration is the rapidly declining cost of both training and inference. Just a few years ago, training a state-of-the-art AI model required billions of dollars in compute resources. Today, open-source models can be fine-tuned on consumer GPUs for a fraction of the cost. Not only that, but API-based access to the most powerful AI systems is becoming cheaper by the month. What used to cost hundreds or...
Read more

to make or to buy? —...

With the educational content shift taking place, publishers need to innovate. In a previous blog post, we’ve discussed why embracing content agility is key in this regard. Of course, such transitions require publishers to make investment-related decisions. With every innovation, the main question is, ‘Will we make or buy this?’ To make: independence + uncertainty Those who decide to develop a new initiative in-house will need to reach into their pockets. R&D, internal development, and maintenance are costly. Such investments do come with an advantage: once you’ve built your solution, you will have the technology, knowledge, and capability to independently manage and maintain it. You will be in control. On the other hand, this requires constant investments and a complete overhaul of the organisation. For as your innovation continues to develop, you will need to expand your R&D teams, which should specialise in AI. Meanwhile, the outcomes are often uncertain — if you set out to develop AI solutions, you’re entering a trial-by-error field. You can’t be sure your investments will pay off the moment you get started. To buy: ROI and focus + data-related concerns The alternative is joining forces with a provider that has developed a proven solution...
Read more

How To Get Your Team Ready...

Robot ordering a human out AI integration programs such as chatbots can sometimes make our employees a little nervous. While any type of change can be scary, especially when it comes to one’s livelihood, the most prevalent concern about AI initiatives is job security. Is a robot going to take my job? The simple answer to this is no. AI solutions like chatbots have incredible potential to alleviate simple or routine tasks from employee’s shoulders, but they will not replace skilled human efforts. In fact, we’ve found that bots open new doors to even greater possibilities. The first step to getting your employees ready for AI integration is to educate them about what this step means for their jobs and the organization. There’s certainly a lot to learn in this department, but we should do our best to break it down into manageable, understandable pieces. Show your employees how chatbots will help them to do their jobs. Give them examples of the tasks that bots will start to cover, and then introduce them to the new, more interesting work ahead. They will soon be able to solely focus their attention on complex, meaningful work. Simply put, educate them about how...
Read more

AI for Ad Revenue Stalls in...

In a digital ecosystem where user attention is fragmented and ad-blockers are rising, media companies are finding it harder than ever to maintain let alone grow advertising revenue. Traditional approaches to targeting and monetization are hitting a ceiling, often resulting in flat or declining ROI from digital ad operations. To thrive in today’s fast-evolving media landscape, organizations must move beyond conventional segmentation, rule-based personalization, and basic A/B testing. The answer? AI-powered ad optimization. By leveraging AI to better understand audiences, predict content engagement, and dynamically adjust campaigns, media companies can unlock untapped ad revenue potential. The Limits of Traditional Ad Monetization For years, ad revenue strategies in media have revolved around maximizing impressions, increasing click-through rates, and optimizing basic user demographics. But as audiences diversify and attention spans shrink, these methods offer diminishing returns. Challenges faced by media companies today include: Ad fatigue among users due to overexposure and irrelevant targeting Inaccurate personalization based on broad, outdated demographic assumptions Inefficient monetization due to generic segmentation and static campaign strategies Difficulty scaling successful strategies across content types and platforms These inefficiencies result in missed revenue opportunities and underperforming campaigns a costly combination in a highly competitive digital environment. How AI Ad...
Read more

Hey There, Good Lookin’ – Robot...

ChatGPT Now Creates Beautiful, Downloadable, .PDF Reports: In a great leap forward for writers, ChatGPT is now able to auto-format the research it does for you into beautifully presentable, downloadable, .PDFs. Now available to ChatGPT Plus, Team and Pro subscribers, the extremely helpful feature works with ChatGPT’s Deep Research. The tool is an AI agent that can be prompted to do extensive research on your behalf and come back with a well-researched report, complete with link citations. Observes Michael Nunez: The export feature enables users to “download comprehensive research reports with fully preserved formatting, tables, images, and clickable citations.” Writers and researchers, for example, will be able to prompt ChatGPT Deep Research to create an extremely informative and artfully produced .PDF that will be presented by ChatGPT as a finished report – or ebook. Bonus: The new export-to-.PDF feature works on both new reports and prior reports you’ve created with Deep Research. Subscribers to ChatGPT Enterprise and Education accounts are promised to see the new feature soon, according to Nunez. In other news and analysis on AI writing: *ChatGPT Now Connects to Your Data Library on OneDrive or SharePoint: Writers and researchers with a wealth of data stored on MS...
Read more
AI learns how vision and sound...

AI learns how vision and sound...

Humans naturally learn by making connections between sight and sound. For instance, we can watch someone playing the cello and

READ MORE
Understand the Model Behind GPT-3, BERT,...

Understand the Model Behind GPT-3, BERT,...

You know that expression When you have a hammer, everything looks like a nail? Well, in machine learning, it seems

READ MORE
How to build a working AI...

How to build a working AI...

Synthetic data is on the rise in artificial intelligence. It’s going to make AI cheaper, better and less biased. It’s also

READ MORE
ICCV 2023 top papers, general trends,...

ICCV 2023 top papers, general trends,...

I was lucky and privileged enough to attend the ICCV 2023 conference in Paris. After collecting papers and notes I

READ MORE
Open Source Video Generators Create Feature-Length...

Open Source Video Generators Create Feature-Length...

Open Source Video Generators Create Feature-Length Films Open Source Video Generators Create Feature-Length Films a concept that once seemed futuristic

READ MORE
Everyone Has Given Up on AI...

Everyone Has Given Up on AI...

The End of the AI Safety Debate For years, a passionate contingent of researchers, ethicists, and policymakers warned about the

READ MORE
to make or to buy? —...

to make or to buy? —...

With the educational content shift taking place, publishers need to innovate. In a previous blog post, we’ve discussed why embracing

READ MORE
How To Get Your Team Ready...

How To Get Your Team Ready...

Robot ordering a human out AI integration programs such as chatbots can sometimes make our employees a little nervous. While

READ MORE
AI for Ad Revenue Stalls in...

AI for Ad Revenue Stalls in...

In a digital ecosystem where user attention is fragmented and ad-blockers are rising, media companies are finding it harder than

READ MORE
Hey There, Good Lookin’ – Robot...

Hey There, Good Lookin’ – Robot...

ChatGPT Now Creates Beautiful, Downloadable, .PDF Reports: In a great leap forward for writers, ChatGPT is now able to auto-format

READ MORE
Previous Next