The Most Important Basic Generative AI Terms to Know  

Algorithms –  Direct, specific instructions for computers created by a human through coding that tells the computer how to perform a task.

The code follows the algorithmic logic of “if”, “then”, and “else.”  An example of an algorithm would be:         

  • IF the customer orders size 13 shoes,         

  • THEN display the message ‘Sold out, Sasquatch!’;         

  • ELSE ask for a color preference.     

Besides rule-based algorithms, there are machine-learning algorithms used to create AI. In this case, the data and goal is given to the algorithm, which works out for itself how to reach the goal.

There is a popular perception that algorithms provide a more objective, more complete view of reality, but they often will simply reinforce existing inequities, reflecting the bias of creators and the materials used to train them.

Artificial Intelligence (AI) – Basically, AI means “making machines intelligent”, so they can make some decisions on their own without the need for any human interference.

The phrase was coined in a research proposal written in 1956. The current excitement about the field was kick-started in 2012 by an online contest called the ImageNet Challenge, in which the goal was getting computers to recognize and label images automatically.

Big Data – This is data that’s too big to fit on a single server.

Typically, it is unstructured and fast-moving. In contrast, small data fits on a single server, is already in structured form (rows and columns), and changes relatively infrequently. If you are working in Excel, you are doing small data. Two NASA researchers (Michael Cox and David Ellsworth) first wrote in a 1997 paper that when there’s too much information to fit into memory or local hard disks, “We call this the problem of big data.”

Generative AI – Artificial intelligence that can produce content (text, images, audio, video, etc.) such as ChatGPT.  

It operates similarly to the “type ahead” feature on smartphones that makes next-word suggestions. Gen AI is based on the particular content it was trained on (exposed to).

GPT – The “GPT” in ChatGPT stands for Generative Pre-Trained Transformer. 

Hallucinations – when an LLM provides responses that are inaccurate responses or not based on facts. 

Hallucination – the AI saying things that sound plausible and authoritative but simply aren’t so.

Large Language Models (LLMs) – AI trained on billions of language uses, images and other data. It can predict the next word or pixel in a pattern based on the user’s request. ChatGPT and Google Bard are LLMs.

The kinds of text LLMs can parse out:

  • Grammar and language structure.

  • How a word is used in language (noun, verb, etc.).

  • Word meaning and context (ex: The word green may mean a color when it is closely related to a word like “paint,” “art,” or “grass.”

  • Proper names (Microsoft, Bill Clinton, Shakira, Cincinnati).

  • Emotions (indications of frustration, infatuation, positive or negative feelings, or types of humor).

Machine learning (ML) – AI that spots patterns and improves on its own. 

An example would be algorithms recommending ads for users, which become more tailored the longer it observes the users‘ habits (someone’s clicks, likes, time spent, etc.). 

Data scientists use ML to make predictions by combining ML with other disciplines (like big data analytics and cloud computing) to solve real-world problems. However, while this process can uncover correlations between data, it doesn’t reveal causation. It is also important to note that the results provide probabilities, not absolutes.

Neural Network – In this type of machine learning computers learn a task by analyzing training examples. It is modeled loosely on the human brain—the interwoven tangle of neurons that process data in humans and find complex associations.

Neural networks were first proposed in 1944 by two University of Chicago researchers (Warren McCullough and Walter Pitts) who moved to MIT in 1952 as founding members of what’s sometimes referred to as the first cognitive science department. Neural nets were a major area of research in both neuroscience and computer science until 1969. The technique then enjoyed a resurgence in the 1980s, fell into disfavor in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips. 

Open Source AI – When the source code of an AI is available to the public, it can be used, modified, and improved by anyone. Closed AI means access to the code is tightly controlled by the company that produced it.

The closed model gives users greater certainty as to what they are getting, but open source allows for more innovation. Open-source AI would include Stable Diffusion, Hugging Face, and Llama (created by Meta). Closed Source AI would include ChatGPT and Google’s Bard.

Prompts – Instructions for an AI. It is the main way to steer the AI in a particular direction, indicate intent, and offer context. It can be time-consuming if the task is complex.  

Prompt Engineer – An advanced user of AI models, a prompt engineer doesn’t possess special technical skills but is able to give clear instructions so the AI returns results that most closely match expectations.

This skill can be compared to a psychologist who is working with a client who needs help expressing what they know. 

Red Teaming  –  Testing an AI by trying to force it to act in unintended or undesirable ways, thus uncovering potential harms.

The term comes from a military practice of taking on the role of an attacker to devise strategies.  

While some of these definitions are a bit of an oversimplification, they will point the beginner in the right direction. -Stephen Goforth

20 Data Science articles from February 2023

Five statistical paradoxes that data scientists should be aware of in order to do accurate analysis

What Pentagon leaders say they have learned from a year of battle in Ukraine:"The power of information is winning”

Software to sow doubts as you meta-analyze  

Machine learning is vulnerable to a wide variety of attacks. How the adversary can disrupt model training and even introduce backdoors

How Pandas alternatives—Polars, DuckDB, Vaex, and Modin—stack up to one of the most popular libraries in Python

Six of the most important types of machine learning algorithm 

“Big Data is real, but most people may not need to worry about it”

The ChatGPT prompts any data scientist must use

No, chatbots aren’t sentient. Here’s how their underlying technology works

5 Common Data Analytics Types Explained in Laymen’s Terms

Using the metaverse to virtually assemble and test AI war machines for the US military

Researchers discover a more flexible approach to machine learning—liquid neural nets

The evolving role of the data engineer

Top Predictive Analytics Trends in 2023

Even the pentagon Is using ChatGPT—the DoD’s used it to write a press release about a new counter-drone task force

How NGA Is integrating commercial analytic services into agency workflows

Python string matching without complex RegEx Syntax

Six python libraries especially useful to data engineers and natural language processing

Can ChatGPT write better code than Data Scientist? 

Researchers say ChatGPT can “weed out errors with sample code and fix it better than existing programs designed to do the same.”

25 Data Science Articles from Dec 2022

A Pandas DataFrame cheatsheet for exploratory analysis & data manipulation 

Five ways that data roles will change in 2023 related to Chief Data Officers

AI & machine learning are “top of mind for the Army, especially as it pertains to protecting its assets in space”

10 weird things about SpaceX's more than 3,000 Starlink satellites (and that number keeps growing)

Initial specific steps toward launching a machine learning project 

Adobe has just released a remarkable and free AI-powered enhanced speech tool

The four biggest trends they expect to shape the AI landscape in 2023

Synthetic data applications, limitations & vulnerabilities

A guide to the roles and responsibilities on a data migration team

A tech journalist goes back to high school to find out what OpenAI’s Chatbot can pass AP Lit

The current limitations of AI’s military impact & where tech could one day spark “revolutionary changes” 

How Bayesian network structure learning can incorporate missing data 

The NGA has plans to develop an overarching cloud-based enterprise management system capable of automating its data collection and dissemination and ultimately replacing the overall Foundation GEOINT storage and management process 

A new paper on “Localization and classification of space objects using EfficientDet detector for space situational awareness”

Potential uses of ChatGPT for data scientists

McKinsey on the state of AI since the research firm began tracking it five years ago

A new collaborative effort is designed to “support interoperable open map data as a shared asset that can strengthen mapping services worldwide”

Different kinds of geospatial specialists are needed in different situations

China outpaces efforts by U.S. intelligence agencies to harness power of publicly available data 

The Space Dev Agency’s first major satellite launch has been delayed again

A look under the hood: How does ChatGPT work internally? 

An AI method from MIT and IBM research “improves the training and inference performance of deep learning models on large graphs”

Some basics about the new AI called ChatGPT 

Why Neural Network explainability is important, how to do it, & the tools for it

“The FCC approved part of SpaceX’s application for the second generation of the Starlink constellation, which will allow SpaceX to deploy up to 7,500 satellites”

29 Data Science Articles from Oct 2022

“The Pentagon needs an intelligent decision support system to assist with analyzing all the data available without causing information overload for the analyst while detecting nuances and subtleties an analyst may not observe.” Read more.

Russia's anti-satellite threat tests laws of war in space

The Space Force & US Space Command could see action if Russia follows through on threats to target commercial satellites assisting Ukraine’s defense of its homeland. Read more.

SpaceX Amazon & FCC discuss satellite spectrum rulemaking

Senior Russian foreign ministry warns that the commercial satellites used by the US & its allies could become "legitimate" targets for retaliatory action by Russia. Read more.

Understanding graph neural networks & how they “apply the predictive power of deep learning to rich data structures that depict objects and their relationships as points connected by lines in a graph.” Read more.

How linear regression is used in machine learning

Linguists believed that learning language is impossible without a built-in grammar template. New AI models prove otherwise. Read more.

The value of imaginary numbers in quantum ideas to describe the hidden shape of the universe. Read more.

NSA cybersecurity director's 6 takeaways from the war in Ukraine 

Artificial intelligence explainability according to MIT: “the ability to manage AI initiatives in ways that ensure models are value-generating, compliant, representative, and reliable” Read more.

Military research groups are buying advanced US software products & selling them on, boosting China’s hypersonic missile program—despite export controls designed to prevent resales to foreign entities. Read more.

FCC tightens rules on space junk: the five-year limit for getting rid of dead satellites could slow the growing orbital litter problem—if companies will abide by it. Read more.

Surprise discovery of radio signals could help track space junk and limit global security risks

The future of military satellite communications starts now

Ukraine Lessons for Naval Intelligence's Next War

Russia launches three satellite deployment missions in one week

An update on the space race matching smartphones with low-orbit satellites

Radiation from outer space could affect the computers on satellites

The charge required to corrupt data is getting smaller all the time, meaning it is actually getting easier for cosmic rays to have this effect. Read more

3 Simple Ways to Speed Up Your Python Code

“Sweeping change is coming to the U.S. Army’s fleet of fixed-wing intelligence-gathering aircraft over the next several years.” Read more.

10 Data Science Cheat Sheets

The war in Ukraine has underlined the growing importance of space to armies on the ground

“For serious software development, the no-code/low-code approach doesn’t work when you need to develop mission critical software. It is even more far-fetched, then, to have only citizen data scientists running your AI/ML.” Read more.  

What will happen to the space debris in orbit?

10 things journalists should know About AI

The NRO is “redefining how it works with the US Space Force and the US Space Command” to “expand the NRO’s space-based intelligence surveillance and reconnaissance” as it faces a “more complex near-peer adversary environment.” Read more.

How to create satellite imagery datasets and how to apply a classification model to them based on convolutional neural networks. Read more.  

Follow on Twitter 

21 Data Science articles from August 2022

R vs. Pandas: Understanding, slicing, filtering, and manipulating dataframes in R and Python Pandas

A Python Cheat Sheet for Data Structures and Algorithms

A new method for the spatial point patterns generation by classifying remote sensing images using convolutional neural network

Intelsat has lost the ability to command its Galaxy 15 satellite

School yourself on space junk—with some cool graphics

This fall the US Defense Innovation Unit will test ways to mitigate GNSS disruptions accelerating the use of commercial GEOINT and NAVWAR tools

The US Air Force is asking researchers to develop quantum computing software algorithms to boost AI and machine automation technologies for new generations of command and control systems

The evolution from artificial intelligence to machine learning to data science

The limitations of blockchains and criterion for judging when a blockchain is applicable

Some prominent members of the AI community are expressing doubt about machine learning’s role in AI’s future

Data manipulation using the dplyr package In R including filtering, selecting, arranging, summarizing and more

If war comes to space, who will control US spy satellites?

As US intelligence & military speed new sensors to space they are still working on details of who’s ultimately in charge during a conflict

Machine learning innovation among military industry companies has dropped off in the last year

How datasets are used in neural networks

A primer on how neural networks work

Some background on neural networks

A new area of artificial intelligence called analog deep learning promises faster computation with a fraction of the energy usage—by propelling protons through solids at unprecedented speeds

A scorecard to evaluate open-source software risks based on potential vulnerabilities and dependencies

Satellite imaging, not tourism, is the modern space race: “The full potential of readily available, nearly instantaneous space imagery has yet to be harnessed”

Action in in Ukraine reveals the vulnerabilities of drones

25 Data Science Articles from July 2022

Is machine learning about to face a reproducibility crisis?

The US Space Force has picked Wallaroo’s AI & machine learning platform to solve edge model deployment challenges in space  

A new Python package for computing effect sizes

Deep learning researchers should try other techniques when solving a problem

Chinese-made Huawei equipment atop cell towers is capable of capturing and disrupting highly restricted Defense Department communications

A small satellite mission for DARPA called Red-Eye comes to light that tested crosslinks and on-orbit processing

NGA is supporting the new Dept of State-funded Conflict Observatory program which helps document potential war crimes

US military cybersecurity experts at DARPA are concerned about the ubiquitous & open source Linux kernel

“Satellite modularity may be the key to answering new imperatives for the military space”

Large language models are not very good at being factual. “It looks very coherent. It’s almost true. But it’s often wrong.”

AI-enabled image fraud in scientific publications

It’s turning out to be harder than expected to integrate multiple types of satellites with a new ground architecture using custom-built ground stations

What does the smallsats trend & the rise of low earth orbit constellations mean for Geostationary Orbit?

China sent a new data relay satellite into orbit that can work in different orbital positions low- and mid-orbiting satellites

From physicist to machine learning engineer for Google—talking about skills he had to learn, “use cases, best practices & the field of ML”

Making atomic force microscopy more accurate with artificial intelligence by using machine learning techniques to reduce AFM uncertainty

Apache Hive hacks for a data scientist

Countries that have launched the most objects into space ranked

A practical guide for time series data forecasting using machine learning models in Python

He dropped out to become a poet. Now he’s won the top award for mathematics: “Mathematicians are a lot like artists in that really we’re looking for beauty”

How predicting the occurrence of a word (n-gram language modeling) can be used at the convergence of artificial intelligence & linguistics (natural language processing)

Team seeks to improve coordination between data from geospatial intelligence information systems – operated by the National Geospatial Intelligence Agency and US Air Force aircraft

Some of the best spreadsheets with the power of Python and the ease of use of Excel

“The question of whether space is really crowded is hotly debated in the industry” Myths vs. Reality

Adversarial machine learning poses a new threat to national security

Daily data science articles here.

27 Data Science Articles from June 2022

The priorities of the first-ever assistant secretary of the Air Force for space acquisition and integration (& top acquisition executive for the Space Force)

Google Cloud expands Earth Engine to help businesses and governments

Comparing C++ to Python (with examples)

Can synthetic data help AI get quicker results —and be less discriminatory? Here comes the fake data

OpenAI says its latest AI has learned to play Minecraft

US intelligence artificial intelligence use is booming but it's not the secret weapon you might imagine

“A major challenge facing the DoD at the moment is disparate data, spread across many different databases and stakeholders. Future winners will be those that can take all the data into a single location and make sense of it.”

“AI solutions for defense are much more mundane and focused on improving decision-making for humans” than many would imagine”

Space 2.0: “The shape of space is expanding beyond traditional defense & aerospace to an expansive range of practical & profitable applications.” A look at the 2022 trends

China launches first crewless drone carrier—experts suggest that it could also be used as a military vessel  

Space-based assets aren’t immune to cyberattacks: Russia's attack on Viasat satellites exposed how vulnerable space-based assets are and the potential for spillover damage

Which is better for data science visualization—R or Python? (hint: it all depends on the nature of the problem to be solved) 

Overcoming overfitting a model in machine learning

How space debris threatens modern life  

Ranking Pandas for Python, Dask & Datatable based on their performance

Snowflake ups support for python Build and offers Native Application Framework to run applications inside the Snowflake Data Cloud platform

Pentagon’s new AI and data chief waited days just for an ID card: ‘Let me say honestly that the bureaucracy is real’

The basic process of handling satellite image data for geospatial deep learning

6 Types of “feature importance” — a useful (and yet slippery) machine learning concept

Google Cloud’s new machine learning tools for its Vertex AI are now making their debut after being featured at the recent Applied ML Summit

The remarkable story of deploying the satellite communication system Starlink in Ukraine

Creating a simple, interactive dashboard with Panel & Python

Wanted: artificial intelligence & machine autonomy algorithms for military command and control

A visual breakdown of threats to space-based services such as Starlink & GPS

Google won’t allow people to create deepfakes using its collaborative machine learning platform any longer

“Python may be the second choice to R, but its popularity and ease of use positions it to dominate data science” 

Top YouTube channels for learning data science

Some basic data cleaning issues and possible solutions

Daily Data Science stories here.