The Most Important Basic Generative AI Terms to Know  

Algorithms –  Direct, specific instructions for computers created by a human through coding that tells the computer how to perform a task.

The code follows the algorithmic logic of “if”, “then”, and “else.”  An example of an algorithm would be:         

  • IF the customer orders size 13 shoes,         

  • THEN display the message ‘Sold out, Sasquatch!’;         

  • ELSE ask for a color preference.     

Besides rule-based algorithms, there are machine-learning algorithms used to create AI. In this case, the data and goal is given to the algorithm, which works out for itself how to reach the goal.

There is a popular perception that algorithms provide a more objective, more complete view of reality, but they often will simply reinforce existing inequities, reflecting the bias of creators and the materials used to train them.

Artificial Intelligence (AI) – Basically, AI means “making machines intelligent”, so they can make some decisions on their own without the need for any human interference.

The phrase was coined in a research proposal written in 1956. The current excitement about the field was kick-started in 2012 by an online contest called the ImageNet Challenge, in which the goal was getting computers to recognize and label images automatically.

Big Data – This is data that’s too big to fit on a single server.

Typically, it is unstructured and fast-moving. In contrast, small data fits on a single server, is already in structured form (rows and columns), and changes relatively infrequently. If you are working in Excel, you are doing small data. Two NASA researchers (Michael Cox and David Ellsworth) first wrote in a 1997 paper that when there’s too much information to fit into memory or local hard disks, “We call this the problem of big data.”

Generative AI – Artificial intelligence that can produce content (text, images, audio, video, etc.) such as ChatGPT.  

It operates similarly to the “type ahead” feature on smartphones that makes next-word suggestions. Gen AI is based on the particular content it was trained on (exposed to).

GPT – The “GPT” in ChatGPT stands for Generative Pre-Trained Transformer. 

Hallucinations – when an LLM provides responses that are inaccurate responses or not based on facts. 

Hallucination – the AI saying things that sound plausible and authoritative but simply aren’t so.

Large Language Models (LLMs) – AI trained on billions of language uses, images and other data. It can predict the next word or pixel in a pattern based on the user’s request. ChatGPT and Google Bard are LLMs.

The kinds of text LLMs can parse out:

  • Grammar and language structure.

  • How a word is used in language (noun, verb, etc.).

  • Word meaning and context (ex: The word green may mean a color when it is closely related to a word like “paint,” “art,” or “grass.”

  • Proper names (Microsoft, Bill Clinton, Shakira, Cincinnati).

  • Emotions (indications of frustration, infatuation, positive or negative feelings, or types of humor).

Machine learning (ML) – AI that spots patterns and improves on its own. 

An example would be algorithms recommending ads for users, which become more tailored the longer it observes the users‘ habits (someone’s clicks, likes, time spent, etc.). 

Data scientists use ML to make predictions by combining ML with other disciplines (like big data analytics and cloud computing) to solve real-world problems. However, while this process can uncover correlations between data, it doesn’t reveal causation. It is also important to note that the results provide probabilities, not absolutes.

Neural Network – In this type of machine learning computers learn a task by analyzing training examples. It is modeled loosely on the human brain—the interwoven tangle of neurons that process data in humans and find complex associations.

Neural networks were first proposed in 1944 by two University of Chicago researchers (Warren McCullough and Walter Pitts) who moved to MIT in 1952 as founding members of what’s sometimes referred to as the first cognitive science department. Neural nets were a major area of research in both neuroscience and computer science until 1969. The technique then enjoyed a resurgence in the 1980s, fell into disfavor in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips. 

Open Source AI – When the source code of an AI is available to the public, it can be used, modified, and improved by anyone. Closed AI means access to the code is tightly controlled by the company that produced it.

The closed model gives users greater certainty as to what they are getting, but open source allows for more innovation. Open-source AI would include Stable Diffusion, Hugging Face, and Llama (created by Meta). Closed Source AI would include ChatGPT and Google’s Bard.

Prompts – Instructions for an AI. It is the main way to steer the AI in a particular direction, indicate intent, and offer context. It can be time-consuming if the task is complex.  

Prompt Engineer – An advanced user of AI models, a prompt engineer doesn’t possess special technical skills but is able to give clear instructions so the AI returns results that most closely match expectations.

This skill can be compared to a psychologist who is working with a client who needs help expressing what they know. 

Red Teaming  –  Testing an AI by trying to force it to act in unintended or undesirable ways, thus uncovering potential harms.

The term comes from a military practice of taking on the role of an attacker to devise strategies.  

While some of these definitions are a bit of an oversimplification, they will point the beginner in the right direction. -Stephen Goforth

25 Data Science Articles from Dec 2022

A Pandas DataFrame cheatsheet for exploratory analysis & data manipulation 

Five ways that data roles will change in 2023 related to Chief Data Officers

AI & machine learning are “top of mind for the Army, especially as it pertains to protecting its assets in space”

10 weird things about SpaceX's more than 3,000 Starlink satellites (and that number keeps growing)

Initial specific steps toward launching a machine learning project 

Adobe has just released a remarkable and free AI-powered enhanced speech tool

The four biggest trends they expect to shape the AI landscape in 2023

Synthetic data applications, limitations & vulnerabilities

A guide to the roles and responsibilities on a data migration team

A tech journalist goes back to high school to find out what OpenAI’s Chatbot can pass AP Lit

The current limitations of AI’s military impact & where tech could one day spark “revolutionary changes” 

How Bayesian network structure learning can incorporate missing data 

The NGA has plans to develop an overarching cloud-based enterprise management system capable of automating its data collection and dissemination and ultimately replacing the overall Foundation GEOINT storage and management process 

A new paper on “Localization and classification of space objects using EfficientDet detector for space situational awareness”

Potential uses of ChatGPT for data scientists

McKinsey on the state of AI since the research firm began tracking it five years ago

A new collaborative effort is designed to “support interoperable open map data as a shared asset that can strengthen mapping services worldwide”

Different kinds of geospatial specialists are needed in different situations

China outpaces efforts by U.S. intelligence agencies to harness power of publicly available data 

The Space Dev Agency’s first major satellite launch has been delayed again

A look under the hood: How does ChatGPT work internally? 

An AI method from MIT and IBM research “improves the training and inference performance of deep learning models on large graphs”

Some basics about the new AI called ChatGPT 

Why Neural Network explainability is important, how to do it, & the tools for it

“The FCC approved part of SpaceX’s application for the second generation of the Starlink constellation, which will allow SpaceX to deploy up to 7,500 satellites”

27 Data Science Articles from June 2022

The priorities of the first-ever assistant secretary of the Air Force for space acquisition and integration (& top acquisition executive for the Space Force)

Google Cloud expands Earth Engine to help businesses and governments

Comparing C++ to Python (with examples)

Can synthetic data help AI get quicker results —and be less discriminatory? Here comes the fake data

OpenAI says its latest AI has learned to play Minecraft

US intelligence artificial intelligence use is booming but it's not the secret weapon you might imagine

“A major challenge facing the DoD at the moment is disparate data, spread across many different databases and stakeholders. Future winners will be those that can take all the data into a single location and make sense of it.”

“AI solutions for defense are much more mundane and focused on improving decision-making for humans” than many would imagine”

Space 2.0: “The shape of space is expanding beyond traditional defense & aerospace to an expansive range of practical & profitable applications.” A look at the 2022 trends

China launches first crewless drone carrier—experts suggest that it could also be used as a military vessel  

Space-based assets aren’t immune to cyberattacks: Russia's attack on Viasat satellites exposed how vulnerable space-based assets are and the potential for spillover damage

Which is better for data science visualization—R or Python? (hint: it all depends on the nature of the problem to be solved) 

Overcoming overfitting a model in machine learning

How space debris threatens modern life  

Ranking Pandas for Python, Dask & Datatable based on their performance

Snowflake ups support for python Build and offers Native Application Framework to run applications inside the Snowflake Data Cloud platform

Pentagon’s new AI and data chief waited days just for an ID card: ‘Let me say honestly that the bureaucracy is real’

The basic process of handling satellite image data for geospatial deep learning

6 Types of “feature importance” — a useful (and yet slippery) machine learning concept

Google Cloud’s new machine learning tools for its Vertex AI are now making their debut after being featured at the recent Applied ML Summit

The remarkable story of deploying the satellite communication system Starlink in Ukraine

Creating a simple, interactive dashboard with Panel & Python

Wanted: artificial intelligence & machine autonomy algorithms for military command and control

A visual breakdown of threats to space-based services such as Starlink & GPS

Google won’t allow people to create deepfakes using its collaborative machine learning platform any longer

“Python may be the second choice to R, but its popularity and ease of use positions it to dominate data science” 

Top YouTube channels for learning data science

Some basic data cleaning issues and possible solutions

Daily Data Science stories here.