How to submit your own ideas for Data Science of the Day

Data Science of the Day is a fun way to share new, or perhaps even novel perspectives on data science related topics.

The criteria for an idea to be shared:

  • You must be logged into the forum. In the upper right, you can log in and you can even use Single-Sign On to create an account (just to make it all that much easier)
  • The topic must be related to data science in some fashion (this is quite broad including, but not limited to: loading, storing, and processing data, ML, DL, inferencing, ML Ops, there are lots of topics that are in play here)
  • A resource to share with the user (blog, research paper, video, etc…)
  • The novel thought about the topic (max ~280 characters)
  • (Optional) An interesting title (max ~140 characters)

If you have a suggestion, just reply to this thread with the information outlined and it will get taken into consideration for posting.

After you are logged in, you can subscribe to the Data Science of the Day , click the little notification bell near the upper right and updates can be sent directly to your inbox.

Note: A new post will be made every business weekday, and it will be published at 9a in the Eastern Time zone.

2 Likes
An Introduction to GPU DataFrames for Pandas Users
All Machine Learning Algorithms You Should Know in 2022
Have You Discovered the New Features in JupyterLab 3.0?
Trend Data and Charts to Satisfy Our Ever Increasing Dependence on Understanding What Other People are Doing. #FOMO
12 Software Design Tips for Data Scientist
Exclusive Interview with Kaggle Notebooks Grandmaster Gabriel Preda
The Triton Inference Server Lets Teams Deploy Trained AI Models From Any Framework
How to Deploy ArangoDB Graphs on GPUs for Accelerated Graph Algorithms using Nvidia’s RAPIDS cuGraph Library
Startups Take Advantage of Open Source NLP
EvoJAX: A Great Framework For Most Deep Tasks
Gradients without Backpropagation
Climate Intelligence As a Competitive Differentiator
TinyML: Neural Nets on Microcontrollers
How to Track Your GPU Usage During Machine Learning
What is MLOps and How Is It Different from DevOps?
Explaining Autoencoders
Multivariate Time Series Forecasting with Transformers
PyTorch, MLflow & Optuna: Experiment Tracking and Hyperparameter Optimization
Build and Deploy Your XGBoost Model Using Jupyter and Algorithmia
Gamifying Machine Learning for Stronger Security and AI Models
NLP is Evolving at a Rapid Pace. What is a Transformer? What is BERT? What is Next?
7 PyTorch Tips You Should Know
Swish versus GELU. Which Activation Function Should You Choose for Image Classification and Why?
SciPy 1.6 was Released on Dec 31st, 2020. Find Out What New Features are Offered
A Decent GPU is Crucial for Machine Learning
Difference Between Autoencoder and Variational Autoencoder
There is No-One-Size-Fits-All
Data Science for Unrest Prediction
No-Code AI Development and Automation Bias
5 Fundamental Functions in PyTorch, Every Deep Learning Beginner Should Know
A Comprehensive Guide to Convolutional Neural Networks
How to Find the Correlation Between Continuous Variables?
Yann LeCun’s Deep Learning Course
TrajAir: A General Aviation Trajectory Dataset
Elastic Distributed Training with XGBoost on Ray
Fast Fractional Differencing on GPUs Using Numba and RAPIDS
How Attention Works in Deep Learning
CUDA Python Allows You to Maniupate CUDA Kernals Using the Python API
What are Vector Embeddings?
Fast Fourier Transform: Scaling Multi-Point Evaluation
CUDA Python: Work with CUDA Directly Using Python
Embed SQL Into Your R Code
Can Artificial Intelligence Be Used in Order to Create Artwork?
RAPIDAligner: Aligning Time Series at the Speed of Light
Creating a Real-Time License Plate Detection and Recognition App
An Intro to AI Image Recognition and Image Generation
How Random Forest Tackles Variance
Transformers Explained Visually
Probability vs Likelihood
6 Essential Tips to Solve Data Science Projects
Target Encoding Instead of One-Hot or Labelencoder
An Introduction to GPU Accelerated Machine Learning in Python
Why Transformers are Slowly Replacing CNNs in Computer Vision?
Run RAPIDS on Microsoft Windows 10 Using WSL 2
Installing Any Version of CUDA on Ubuntu and Using Tensorflow and Torch on GPU
An Introduction to GPU Accelerated Graph Processing in Python
Can AI Make a Better Fusion Reactor?
Big Wins With AI Can Come From Starting Small
Deep Learning for Cyber Security - Part 1
Deep Learning for Cyber Security - Part 2
Scaling Language Model Training to a Trillion Parameters Using Megatron
An Introduction to GPU Accelerated Data Streaming in Python
Predicting Bitcoin Price Behavior Using RAPIDS
Does 100% Train Accuracy Indicate Overfitting?
Image Feature Extraction Using PyTorch
Understanding EfficientNet
Ever Heard of Travis Oliphant? What About NumPy, SciPy, Anaconda?
An Introduction to Entity Resolution — Needs and Challenges
Remove Bias in the Dataset to Remove Bias in the Model
Cooperative Driving Dataset (CODD) for Multi-Agent Perception Research
Machine Learning Has a Backdoor Problem
The Best Keep Getting Better. HW and SW Improve SQL Performance
Go 200,000x Faster in the Field of Weather Analysis with CUDA Python (Numba)
Unsupervised Learning is About Understanding the Data and Grasp Its Structure. NMF Can Help With This
Training a Multilingual Model Works Better When You Use Enough Data and a Larger Model
Advancing Sports Analytics Through AI Research
Directed Acyclic Graphs (DAGs) are Incredibly Important in Large Scale Data Processing. Want to Know How It Applies to Machine Learning?
Ever Think About Characterizing Signal Propagation to Close the Performance Gap in Unnormalized ResNets? Us to, Check out NFNets
Interested in Discovering New Ways to Take JupyterLab to The Next Level?
An Introduction to GPU Accelerated Machine Learning in Python
Embed Your SQL Query Into Your Python Code and Let It Rip on a GPU
The Billion Dollar AI Problem That Just Keeps Scaling
Implement ResNet with PyTorch
Peter Norvig Simulates an Economic Marketplace with Agent Interaction Using Python
An Introduction to GPU Accelerated Signal Processing in Python
An Example of Using a Variational Auto-Encoder (VAE) on Economic Data
How is Logistic Regression Related to Neural Networks?
An Introduction to Agent-based Models: Simulating Segregation with Python
A Fundamentally Novel and Faster Way of Performing One of The Most Basic Computations in Data Science
Why Python is Best Programming Language for Data Science & Machine Learning?
How AI Helps Prevent Cyberbullying
Can a Transformer Solve a Math Problem?
Multi-Node Multi-GPU (MNMG) Example on Azure Using Dask-CloudProvider
If You Could Speed Up a Spark Job Without Changing Your Code, Would You? RAPIDS Makes This a Reality
The Advantage of Using Filesystem Spec (fsspec)?
An Introduction to Distributed Computing with GPUs in Python
The Scikit-Learn Allows for Custom Estimators to Run on CPUs, GPUs and Multiple GPUs
There is Fast and Then There is Blazing Fast. Which Would You Rather Have on Google Colab?
Use Python to Build a Model to Classify Emotions in Acoustic Data
Accelerating K-Nearest Neighbors 600x Using RAPIDS cuML
RAPIDS Is in a Constant State of Improvement. Benchmarking Provides the Metrics to Measure This
Scale Recommender Systems on GPUs Using NVTabular
Clustering Can Be Compute Intensive As Centroids Are Calculated. cuML Speeds This Up
AgentPy Is an Open-Source Library For the Development and Analysis of Agent-Based Models in Python
JupyterLite is a JupyterLab Distribution That Runs Entirely in the Web Browser
Effective ML Model Deployment Requires More Than a Great Team
Training a Recommender System with Over 100 Billion Parameters?
One-Dimensional CNN for Human Behavior Classification
TRTorch is a PyTorch Deep Learning Optimizer to Run on GPUs
How Machine Learning is Changing Software: A Biased Overview
A Comparison of ML Experiment Tracking Tools
MADGRAD: A Best-of-Both-Worlds Optimizer with The Generalization Performance of SGD and at Least as Fast Convergence as That of Adam, Often Faster
The Bug Affecting Thousands of Pytorch Projects. Is One of Them Yours?
Stop Struggling with Data Science Workflows
Tutorial; NLP Classification of COVID Preventive Measures
The Torch.Linalg Module: Accelerated Linear Algebra with Autograd in PyTorch
GitHub Copilot is a Next Gen Code Assistant. That Will Speed Up Your Coding But Also Make You Better
The Rise of Cognitive AI
100 Helpful Python Tips You Can Learn Before Finishing Your Morning Coffee
Game Theory Applied to Large-Scale Data Analysis is Thinking Outside the Box
A Comprehensive Study on Challenges in Deploying Deep Learning Based Software
SQLModel, SQL Databases in Python, Designed for Simplicity, Compatibility, and Robustness
Like Books? What About a Book on Creating Python Packages
Do Vision Transformers See Like Convolutional Neural Networks?
What is GridMask Data Augmentation?
What are Diffusion Models?
Executing RAPIDS from Your Computer (Without a Local GPU)
What Are the Advantages of Different Classification Algorithms?
What Is the Relation Between Logistic Regression and Neural Networks?
How Big Data Carried Graph Theory Into New Dimensions
Running PySpark Inside Docker
Self-Supervised Voice Emotion Recognition Using Transfer Learning
Interpreting A/B Test Results: False Positives and Statistical Significance
7 Embarrassingly Easy Ways to Speed Up Your Core Python Program
An introduction to Processing Cyber Security Logs With GPUs in Python
Understanding Singular Value Decomposition and Its Application in Data Science
Trying to Get Started with Data Science
Need a Recommendation on Watch to Watch Next?
Embeddings in Machine Learning: Everything You Need to Know
Is Understanding How Best to Deploy AI and Related Technologies Your Top Priorities?
A Basic Introduction To Research and Study Design For Machine Learning
Overview of ONNX and Operators
Automatically Create Machine Learning Models? What Kind of Fairytale are You Living in
Tutorial on Bayesian Inference For Single-Cell Gene Expression Data Using PyStan
The Story of Autoencoders
Bayesian Inference in Action with a Simple Example
Understanding the Effect of Bagging on Variance and Bias Visually
Pydash is a Python Package for Cleaning Data in a Functional Way
SymPy Is a Python Library That Allows You to Compute Mathematical Objects Symbolically
Visually Describe Your Data Ideas Using Excalidraw
AI Detects Gravitational Waves Faster than Real Time
Have You Checked Out Node.js RAPIDS?
RAPIDS Fire is a Data Science Podcast
Multi-Instance GPUs, with Kevin Klues and Pradeep Venkatachalam
6 Tips for MLOps Acceleration & Simplification
TensorRT 8 Is Out. Here is What You Need to Know
OpenAI Makes GPT-3 Generally Available Through its API
Foundations of Geometric Deep Learning
Etsy Runs Hundreds of Experiments a Day. How Do They Improve Accuracy and Speed?
What are Graph Neural Networks?
A Tutorial of Graph Neural Networks in Google Colab
What Data Do We Need for Training an AV Motion Planner?
Google Replaces BERT Self-Attention with Fourier Transform
Semi-Automated Exploratory Data Analysis (EDA) in Python
Level Up 7 Data Science Skills Through YouTube
How to Build ML Model using BigQuery
Pytorch Implementation of Different VAE Models to Model Heterogeneous Data
Credit Card Fraud Detection using XGBoost, SMOTE, and Threshold Moving
Hello Machine Learning Compiler, What Do You Do?
The State of MLOps 2021
Python & Data Engineering: Under the Hood of Join Operators
How to Train Vision Transformers?
Decision Transformer: Reinforcement Learning via Sequence Modeling
Once Is a Dataset for 3D Object Detection in the Autonomous Driving Scenario
CASNet: A Cross-Attention Siamese Network for Video Salient Object Detection
Orbit is a Python Package for Bayesian Time Series Forecasting and Inference
Python3 Cheat-sheets
Leafmap is a Python Package for Interactive Mapping and Geospatial Analysis in Jupyter
Delta Sharing: An Open Protocol for Secure Data Sharing
5 Computer Vision Trends for 2021
67 Machine Learning Algorithms Explained Using Python
Multi-Task Learning Trains a Single Model to Make Multiple Kinds of Predictions on a Single Sample
Geometric Foundations of Deep Learning
Distributed Deep Learning in Open Collaborations
Identifying Stable mRNA Vaccines for Covid Using GNN and RNN
Do We Really Need Deep Learning Models for Time Series Forecasting?
Ambient Intelligence Is Becoming Reality
What Do Data Scientists Do?
Meet Gato, a "Generalist Agent" from Deepmind
A simpple Introduction to the K-Nearest Neighbors Algorithm
Plot Overview for Matplotlib Users
LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything
Top 15 Must-Read Computer Vision Books
Going for Gold While Predicting It
Deploy MNIST Trained Model as a Web Service
MMDetection Tutorial — An End2End State-Of-The-Art Object Detection Library
How to Deploy ML Models to the Cloud Quickly and Easily
Calling for an Open Standard for Metadata
Data Quality Assurance with Great Expectations and Kubeflow Pipelines
Read Cassandra SStables Directly Into cuDF
Sprinkle of cuDF Scalars into The Workflow
Meta is Teaching AI to Multi-Task
GPUs May Be Better, Not Just Faster, at Training Deep Neural Networks
What is The Power of AI in Financial Services
Are Bad Labels the Problem?
Trends to Watch in Sept 2021
Gaussian Processes From Scratch
Kaggle's 2021 State of Data Science and Machine Learning Survey
PyTorch or Tensorflow 2021
Feature Store vs Data Warehouse
PyDy is a Tool Kit That Utilizes a Variety of Scientific Programs to Enable the Study of Multibody Dynamics
Parallelize Functions and ML Models in Python
Understanding Entropy: The Golden Measurement of Machine Learning
AlphaFold is Groundbreaking and You Can Now Run It in Google Colab
Deep Learning Helps Predict Traffic Crashes Before They Happen
Multi-Task Learning with Transformers
Python: Production-Level Coding Practices
How to Use Streamlit and Python to Build a Data Science App
Reinforcement Learning Algorithms Comparison
Taking Edge Computing and Continuous Updates to Space
NIH Awards Nearly $75M to Catalyze Data Science Research in Africa
A Path Forward for Trusted AI in Breast Cancer Risk Prediction
Bias — Variance TradeOff & Regularization
Deploying Deep Learning Models with Model Server
How Algorithms Understand Text
Replicating Minecraft World Generation in Python
An Important Skill for Data Scientists and Machine Learning Practitioners
Fast and Scalable AI Model Deployment with NVIDIA Triton Inference Server
Transformers from Scratch
What is K-Medoids Clustering and When Should You Use It Instead of K-Means
Just In Case You Want to Know the Best Data Visualization of 2021
Tech Predictions for 2022 and Beyond
Re:Invent Recap
What is Neural Compression?
Adversarial Machine Learning: A Beginner’s Guide to Adversarial Attacks and Defenses
Data Science for Tech Leaders
AI Toolkit to Help Achieve Sustainability Goals
Creating Sparse, Multitask Neural Networks
3 Keras Design Patterns Every ML Engineer Should Know
Apache Arrow: High Performance Columnar Data Framework
What Is a Kalman Filter
How is Liquid Neural Networks Used in Computer Vision
Automation Will Drive Tech and Media Spending to $2.5T
Why Decision Trees?
Another Way to Optimize Your Recursive Functions
A Fast and Slow AI System
How GPUs are Beginning to Displace Clusters for Big Data & Data Science
Learn about Combinatorial Optimization with Google OR-Tools
Deploy Your First Jupyter Notebook to Docker
The Fastest Way to Loop Using Python
Why Overfitting Leads To High Variance
Andrew NG: Unbiggen AI
What Are the Most Optimized Approaches to Converting Your Audio Data into Text?
Jax - Numphy on GPUs and TPUs
How to Perform Unsupervised Feature Selection Using Supervised Algorithms
What Is a Tensor Core Anyway?
Neural Architecture Search Can Help Find the Right NN for You
Evolving with BERT: Introduction to RoBERTa
Kubeflow vs. ML—flow An MLOps Comparison
Data Lake Benefits, Architectures and Best Practices
What is The Power of AI in Financial Services?
A Step Toward One Neural Network to Rule Them All
How to Version Machine Learning Experiments Instead of Tracking Them
Three Approaches to Encoding Time Information as Features for ML Models
GPU-Accelerated SHAP for RAPIDS and XGboost is Now Here
Distributed Data Science Using NVTabular on Spark & Dask
Don't Create Your Own Function If There is Already a Built-in Python Function for That Task
AI Models for Answering Questions