August 30, 2023

The Most Important Basic Generative AI Terms to Know

August 30, 2023/ Stephen Goforth

Algorithms – Direct, specific instructions for computers created by a human through coding that tells the computer how to perform a task.

The code follows the algorithmic logic of “if”, “then”, and “else.” An example of an algorithm would be:

IF the customer orders size 13 shoes,
THEN display the message ‘Sold out, Sasquatch!’;
ELSE ask for a color preference.

Besides rule-based algorithms, there are machine-learning algorithms used to create AI. In this case, the data and goal is given to the algorithm, which works out for itself how to reach the goal.

There is a popular perception that algorithms provide a more objective, more complete view of reality, but they often will simply reinforce existing inequities, reflecting the bias of creators and the materials used to train them.

Artificial Intelligence (AI) – Basically, AI means “making machines intelligent”, so they can make some decisions on their own without the need for any human interference.

The phrase was coined in a research proposal written in 1956. The current excitement about the field was kick-started in 2012 by an online contest called the ImageNet Challenge, in which the goal was getting computers to recognize and label images automatically.

Big Data – This is data that’s too big to fit on a single server.

Typically, it is unstructured and fast-moving. In contrast, small data fits on a single server, is already in structured form (rows and columns), and changes relatively infrequently. If you are working in Excel, you are doing small data. Two NASA researchers (Michael Cox and David Ellsworth) first wrote in a 1997 paper that when there’s too much information to fit into memory or local hard disks, “We call this the problem of big data.”

Generative AI – Artificial intelligence that can produce content (text, images, audio, video, etc.) such as ChatGPT.

It operates similarly to the “type ahead” feature on smartphones that makes next-word suggestions. Gen AI is based on the particular content it was trained on (exposed to).

GPT – The “GPT” in ChatGPT stands for Generative Pre-Trained Transformer.

Hallucinations – when an LLM provides responses that are inaccurate responses or not based on facts.

Hallucination – the AI saying things that sound plausible and authoritative but simply aren’t so.

Large Language Models (LLMs) – AI trained on billions of language uses, images and other data. It can predict the next word or pixel in a pattern based on the user’s request. ChatGPT and Google Bard are LLMs.

The kinds of text LLMs can parse out:

Grammar and language structure.
How a word is used in language (noun, verb, etc.).
Word meaning and context (ex: The word green may mean a color when it is closely related to a word like “paint,” “art,” or “grass.”
Proper names (Microsoft, Bill Clinton, Shakira, Cincinnati).
Emotions (indications of frustration, infatuation, positive or negative feelings, or types of humor).

Machine learning (ML) – AI that spots patterns and improves on its own.

An example would be algorithms recommending ads for users, which become more tailored the longer it observes the users‘ habits (someone’s clicks, likes, time spent, etc.).

Data scientists use ML to make predictions by combining ML with other disciplines (like big data analytics and cloud computing) to solve real-world problems. However, while this process can uncover correlations between data, it doesn’t reveal causation. It is also important to note that the results provide probabilities, not absolutes.

Neural Network – In this type of machine learning computers learn a task by analyzing training examples. It is modeled loosely on the human brain—the interwoven tangle of neurons that process data in humans and find complex associations.

Neural networks were first proposed in 1944 by two University of Chicago researchers (Warren McCullough and Walter Pitts) who moved to MIT in 1952 as founding members of what’s sometimes referred to as the first cognitive science department. Neural nets were a major area of research in both neuroscience and computer science until 1969. The technique then enjoyed a resurgence in the 1980s, fell into disfavor in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips.

Open Source AI – When the source code of an AI is available to the public, it can be used, modified, and improved by anyone. Closed AI means access to the code is tightly controlled by the company that produced it.

The closed model gives users greater certainty as to what they are getting, but open source allows for more innovation. Open-source AI would include Stable Diffusion, Hugging Face, and Llama (created by Meta). Closed Source AI would include ChatGPT and Google’s Bard.

Prompts – Instructions for an AI. It is the main way to steer the AI in a particular direction, indicate intent, and offer context. It can be time-consuming if the task is complex.

Prompt Engineer – An advanced user of AI models, a prompt engineer doesn’t possess special technical skills but is able to give clear instructions so the AI returns results that most closely match expectations.

This skill can be compared to a psychologist who is working with a client who needs help expressing what they know.

Red Teaming – Testing an AI by trying to force it to act in unintended or undesirable ways, thus uncovering potential harms.

The term comes from a military practice of taking on the role of an attacker to devise strategies.

While some of these definitions are a bit of an oversimplification, they will point the beginner in the right direction. -Stephen Goforth

February 28, 2023

20 Data Science articles from February 2023

February 28, 2023/ Stephen Goforth

Five statistical paradoxes that data scientists should be aware of in order to do accurate analysis

What Pentagon leaders say they have learned from a year of battle in Ukraine:"The power of information is winning”

Software to sow doubts as you meta-analyze

Machine learning is vulnerable to a wide variety of attacks. How the adversary can disrupt model training and even introduce backdoors

How Pandas alternatives—Polars, DuckDB, Vaex, and Modin—stack up to one of the most popular libraries in Python

Six of the most important types of machine learning algorithm

“Big Data is real, but most people may not need to worry about it”

The ChatGPT prompts any data scientist must use

No, chatbots aren’t sentient. Here’s how their underlying technology works

5 Common Data Analytics Types Explained in Laymen’s Terms

Using the metaverse to virtually assemble and test AI war machines for the US military

Researchers discover a more flexible approach to machine learning—liquid neural nets

The evolving role of the data engineer

Top Predictive Analytics Trends in 2023

Even the pentagon Is using ChatGPT—the DoD’s used it to write a press release about a new counter-drone task force

How NGA Is integrating commercial analytic services into agency workflows

Python string matching without complex RegEx Syntax

Six python libraries especially useful to data engineers and natural language processing

Can ChatGPT write better code than Data Scientist?

Researchers say ChatGPT can “weed out errors with sample code and fix it better than existing programs designed to do the same.”

February 02, 2023

15 Data Science Articles from Jan 2023

February 02, 2023/ Stephen Goforth

Exploring data science use cases with the GPT-3 API using Python

ChatGPT and the goal of eliminating some of the rote work involved in writing code

Bayesian statistics and machine learning: How do they differ?

An integrated approach of remote sensing and geospatial analysis for the impacts of climate change

Can ChatGPT provide answers to data science questions to the same standard of humans?

A fresh look at data science from the author of a new textbook on the subject

The “bounty” of geospatial data is outstripping the ability to process it. The answer says the NGA is artificial intelligence

23 Generative AI tools to create text/image/video

A few key trends that are likely to shape the development of AI in the coming year

Starlink’s performance in Ukraine has ignited a new space race

An argument for deep learning limitations based on generalization

NGA is working with small drones to help rescue operations resulting from wildfire & other disasters

Why Most Introductory Examples of Bayesian Statistics Misrepresent It

How China is building a parallel generative AI universe

The NGA usually monitors Iranian protests & North Korean missiles. It’s now helping find hurricane victims, too

December 30, 2022

25 Data Science Articles from Dec 2022

December 30, 2022/ Stephen Goforth

A Pandas DataFrame cheatsheet for exploratory analysis & data manipulation

Five ways that data roles will change in 2023 related to Chief Data Officers

AI & machine learning are “top of mind for the Army, especially as it pertains to protecting its assets in space”

10 weird things about SpaceX's more than 3,000 Starlink satellites (and that number keeps growing)

Initial specific steps toward launching a machine learning project

Adobe has just released a remarkable and free AI-powered enhanced speech tool

The four biggest trends they expect to shape the AI landscape in 2023

Synthetic data applications, limitations & vulnerabilities

A guide to the roles and responsibilities on a data migration team

A tech journalist goes back to high school to find out what OpenAI’s Chatbot can pass AP Lit

The current limitations of AI’s military impact & where tech could one day spark “revolutionary changes”

How Bayesian network structure learning can incorporate missing data

The NGA has plans to develop an overarching cloud-based enterprise management system capable of automating its data collection and dissemination and ultimately replacing the overall Foundation GEOINT storage and management process

A new paper on “Localization and classification of space objects using EfficientDet detector for space situational awareness”

Potential uses of ChatGPT for data scientists

McKinsey on the state of AI since the research firm began tracking it five years ago

A new collaborative effort is designed to “support interoperable open map data as a shared asset that can strengthen mapping services worldwide”

Different kinds of geospatial specialists are needed in different situations

China outpaces efforts by U.S. intelligence agencies to harness power of publicly available data

The Space Dev Agency’s first major satellite launch has been delayed again

A look under the hood: How does ChatGPT work internally?

An AI method from MIT and IBM research “improves the training and inference performance of deep learning models on large graphs”

Some basics about the new AI called ChatGPT

Why Neural Network explainability is important, how to do it, & the tools for it

“The FCC approved part of SpaceX’s application for the second generation of the Starlink constellation, which will allow SpaceX to deploy up to 7,500 satellites”

October 31, 2022

29 Data Science Articles from Oct 2022

October 31, 2022/ Stephen Goforth

“The Pentagon needs an intelligent decision support system to assist with analyzing all the data available without causing information overload for the analyst while detecting nuances and subtleties an analyst may not observe.” Read more.

Russia's anti-satellite threat tests laws of war in space

The Space Force & US Space Command could see action if Russia follows through on threats to target commercial satellites assisting Ukraine’s defense of its homeland. Read more.

SpaceX Amazon & FCC discuss satellite spectrum rulemaking

Senior Russian foreign ministry warns that the commercial satellites used by the US & its allies could become "legitimate" targets for retaliatory action by Russia. Read more.

Understanding graph neural networks & how they “apply the predictive power of deep learning to rich data structures that depict objects and their relationships as points connected by lines in a graph.” Read more.

How linear regression is used in machine learning

Linguists believed that learning language is impossible without a built-in grammar template. New AI models prove otherwise. Read more.

The value of imaginary numbers in quantum ideas to describe the hidden shape of the universe. Read more.

NSA cybersecurity director's 6 takeaways from the war in Ukraine

Artificial intelligence explainability according to MIT: “the ability to manage AI initiatives in ways that ensure models are value-generating, compliant, representative, and reliable” Read more.

Military research groups are buying advanced US software products & selling them on, boosting China’s hypersonic missile program—despite export controls designed to prevent resales to foreign entities. Read more.

FCC tightens rules on space junk: the five-year limit for getting rid of dead satellites could slow the growing orbital litter problem—if companies will abide by it. Read more.

Surprise discovery of radio signals could help track space junk and limit global security risks

The future of military satellite communications starts now

Ukraine Lessons for Naval Intelligence's Next War

Russia launches three satellite deployment missions in one week

An update on the space race matching smartphones with low-orbit satellites

Radiation from outer space could affect the computers on satellites

The charge required to corrupt data is getting smaller all the time, meaning it is actually getting easier for cosmic rays to have this effect. Read more.

3 Simple Ways to Speed Up Your Python Code

“Sweeping change is coming to the U.S. Army’s fleet of fixed-wing intelligence-gathering aircraft over the next several years.” Read more.

10 Data Science Cheat Sheets

The war in Ukraine has underlined the growing importance of space to armies on the ground

“For serious software development, the no-code/low-code approach doesn’t work when you need to develop mission critical software. It is even more far-fetched, then, to have only citizen data scientists running your AI/ML.” Read more.

What will happen to the space debris in orbit?

10 things journalists should know About AI

The NRO is “redefining how it works with the US Space Force and the US Space Command” to “expand the NRO’s space-based intelligence surveillance and reconnaissance” as it faces a “more complex near-peer adversary environment.” Read more.

How to create satellite imagery datasets and how to apply a classification model to them based on convolutional neural networks. Read more.

Follow on Twitter