What makes data real?

The beautiful images of galaxies, nebulas, and other astronomical objects produced by radio telescopes have been processed several times and colorized before we see them, but we still consider these images to be real and not synthetic.

So, what makes data real? Real data are data that have been generated by a process that is appropriately connected to real phenomena, where the terms “appropriately connected” and “real” are defined by the relevant research community. For example, we can say that an MRI image of the brain is real because it has been produced by a process that is appropriately connected to a real brain. However, sometimes MRI machines produce images that radiologists classify as (unreal) artifacts because they have been produced, for example, by the scanner itself or by the patient’s movements.

Referring to data as “real” does not necessarily entail a commitment to a physicalist notion of reality. Data could be about physical, chemical, biological, social, or psychological phenomena. For example, we would consider data concerning biodiversity, stock prices, suicidal ideation, or cultural taboos to be real data, even though the phenomena they refer to cannot be equated with specific physical objects. The data could be about things we cannot directly observe, such as electrons, quarks, entropy, or dark matter. What matters most is that the relevant scientific community considers the data to be about real phenomena.

Read more at the PNAS (Proceedings of the National Academy of Sciences of the United States of America)

A healthy balance between model building and data gathering

Too much theory without data, and speculations run amok. We get lost in a fog of models and idealizations that seldom have much to say about the world we live in. The maps invent all sorts of worlds and tell us very little about the world we live in, leaving us to get lost in fantasy. With too much data and no theory, though, we drown in confusion. We don’t know how to tell the story we are supposed to tell. We hear all sorts of tales about what is out there in the wilderness, but we don’t know how to chart the best path to reach our destination. The better the balance between speculative thinking and data gathering, the healthier the science that comes out.  

Marcelo Gleiser writing in BigThink

Tuesday Tech Tools: 58 Data Visualization and Infographic Options

Looking for some ways to tell your story through data? Here are 58 data visualization (or infographic) tools.

D3.js
A Javascript library to create data visualizations. Requires some developer skills to utilize since it requires coding. Very versatile. Examples. Free.

Adoptive Insights
Designed for business. Powerful but has a high learning curve. Cost on a case by case basis. Free trial available.

Animaker
Tool for making infographic videos with animated characters. Limited free version or accounts starting at $144 a year.

Bubbl
Create flow-charts for brainstorming and visualization. Limited options. Video explanation here.

Carto*
Perhaps the best interactive mapmaker but a high learning curve though more of a time investment than a technical-background requirement. No coding needed to look impressive. Used for location intelligence and and journalism alike. Free with paid plans. Video examples here.

Canva*
Create social media graphics, headers, slides, flyers, photo collages, posters, and infographics using drag-and-drop. 60k templates to pick from. Clip-art library available or upload your own images. Share to social media from the app or download a jpg, PDF, etc. for posting. Free or $12 a month for more options.

Chart Maker
Quickly make charts, graphs, etc.

Chartist.js
Simple responsive charts. Will change the way the data is displayed based on the size screen it's being viewed on.

Common Knowledge
A Google tool that quickly makes interactive charts from data which are embeddable. Free.

Comparea
See a visual comparison of two states, cities, countries or continents. Move them around.  It will also tell how many times bigger a geographic area is to another.

Daily Infographic
A new data visualization sample each day. Great way to get ideas.

Data Journalism
Examples, steps and video of how to create data visualizations.

Data Remixed
Blog about data visualizations by Ben Jones, an engineer in LA.

Data to Viz
A site that helps you find the right chart for your data.

Data Visual
Charts and graphs.  Templates or start from scratch.  A short video introduction here.

Data Wrangler
A tool created by Stanford University's Visualization Group for cleaning and rearranging data for other tools to use (such as a spreadsheet).  Does not actually visualizes your data, but preps it for use. This includes extracting, filling, dropping, merging and wrapping data points among other things. There's a learning curve, but it's free.

Data Wrapper*
Tool for journalists looking to create fast, easily-to-understand visualizations but useful for anyone. Easy to embed. Free version allows creation of 10k charts.

Domo
Business tool for creating visualizations. Fully mobile. Good collaboration capabilities. High learning curve. Not for beginners. Starting at $83 a month. Free trial available.

Dundas
Interactive visualizations. Lots of options. No 3-D charts or predictive analysis. Free trial but cost is on a case by case basis.

Easely.ly*
Create infographics. Video sample here.

Everviz (formerly Highcharts Cloud)
One of the easiest to use data visualizations tools. Not much customization. Free.

Florish*
A data visualization tool that makes it easy to create both standard charts and a mobile-friendly animated charts. Some customization available. Examples.

Flowing Data
Blog about how statisticians, designers, data scientists, and others use analysis, visualization, and exploration to understand data and ourselves.

FusionCharts
Chart making Javascript library that requires developer skills. Lots of customization. Starts at $199 a year.

Gephi
Social networking analysis tool to create interactive visualization. Impressive looking but has a steep learning curve. For anything complex you might need some specialist help. Free.

Google Public Data Explorer
Makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand.

Graphiq
Data visualization tools. Formally FindTheBest. Not only are there design tools, there are many data sets available to work with.

High Charts
interactive JavaScript charts. Free.

iCharts
Data Visualization aimed at businesses--especially those looking to brand themselves. There's a free option (which allows for data interactivity, public sharing, and unlimited standard data sets) and a business plan that starts at $25 (offering features like private charts, custom templates, upload images and logos, download high-res images, large data sets, chartbooks, etc.

Infogram*
Infographic tool especially useful when working with complex data. No coding skills needed. Works with Google Sheets or Dropbox. Create interactive illustrations. 35 types of charts and 200 types of maps. Includes a built-in spreadsheet tool for data editing. Basic version is free but requires the Infogram logo. Upgrades run from $19 to $67 a month.

Meograph*
3D animation of people from 2D video of people. Video explanation.

NumberPicture
Create simple charts from your data using templates. Introductory video here.

PiktoChart*
Flat but beautiful interactive graphics. Easy-to-use. Video explanation.

Plotly
A good general-use data visualization tool offering many customizations and interactivity. Charts can be exported as images or embedded. A bit of a learning curve. Free with some paid plans.

Powtoon
Animated infographics web tool for creating videos. User-friendly basic cartoon software with plenty of templates and social integrations. The free version has company branding on it while the expensive pro plans start at $19 a month.

Projector
A Canva alternative design resource for non-designers. Video explanation here.

QGIS
Powerful mapmaking software with a high learning curve. The ‘GIS’ stands for geographical information system. Free.

R
This statistical computing language is geared toward data work and thus is the choice of many data scientists for data visualizations. High learning curve so you’ll need to work with a developer. Lots of tutorials and plugins. Very versatile. Free.

Raw Graphs
Built on D3.js but doesn’t require knowing code. Most of the charts are for obscure purposes. You’ll need a developer. An option when you want a unique visualizations for a big project and can spend a decent amount of time on it.

SavvyRoo
Place to create and share visual data. Watch a video explanation here.

Sisense’s  
Graphically represent your large data sets clearly and efficiently. Nice interface, limited type of visualizations. Free trial. Cost based on case by case basis.

Strip Creator
Create your own comic strip.

Story Maps 
A cutsom mapping visualization tool based on ArcGIS with more mapping options than StroyMap JS in the paid version. With simple to use templates, you can  “walk” your viewers through a map-based story.  Example.

StoryMap JS 
A simple mapping visualization tool produced by the Knight Lab at Northwestern for creating interactive maps and timelines. Based on Google's map software from OpenStreetMap. Does not require technical experience. Create slides and connect them on a map that can be embedded or upload your own basemap. Example. 

Tableau Public
Data visualization tools that are interactive. Maps, graphics, etc. Free. Samples.

Tableau Software
Easy to use, great capabilities. Popular but expensive. $70 each month.

Thinglink*
Create hot-spot graphics. Make images interactive by adding music, a voice over, and text.  Free . Sample.

TimeMapper
Timeslines and maps. Sample video here.

Vidi
Drupal-based embeddable modules.

Visage
Create infographics and interactive charts for websites and social media graphics. The free account allows for three images a month.

Visme
Create graphics just for a particular platform for social-specific content. Free.

Visual Editors
A visual editor promotes visual journalism literacy in graphics, photo, video and design.

Visual.ly
Create infographics and data visualizations.

Visualize Free
Upload a data set from a spreadsheet (or cut and paste) for charts, maps, diagrams, etc with a drag and drop designer. Registration needed for an account, but it is free). Lots of public-available data to work with (like data.gov). Uploads are private, so that other users cannot gain access to your data.

Vizualize.me
Create your infographic resume for free. Video introduction here.

Wolfram Alpha
Computational knowledge engine. Enter a search string and have immediate display of various pieces of information regarding that string. The Pro subscription allows users to input their own data and quickly converted into dynamic and interactive charts. The price tag is $4.99 a month and limited to twenty uploads each month.on. 

WTFViz
Visualizations that make no sense.

Zoho Reports
Analytics tool to design intuitive dashboards and data visualizations. Easy to learn. Beautiful graphics but limited customization. from $22 to $444 a month. Free trial.

More Tech Tools

The Algorithms of Nostalgia

Nostalgia has become a template for the serial production of more content, a new income stream for copyright holders, a new data stream for platforms, and a new way to express identity for users. And there’s so much pop culture in the past to draw from, platform capitalism will seemingly never run out. We’re told our data is collected in an attempt to predict what we want, but this isn’t quite true. In attempting to predict our tastes, streaming services work to produce them in its image. Since algorithms are trained on the past, they aren’t merely transmitting nostalgia through neutral channels; they’re cultivating nostalgic biases, seeking to predispose users to crave retro. 

Even as Silicon Valley positions itself as progressive, its algorithms are stuck in the past.

Grafton Tanner, writing in Real Life Magazine

The algorithmic feedback loop

Users keep encountering similar content because the algorithms keep recommending it to us. As this feedback loop continues, no new information is added; the algorithm is designed to recommend content that affirms what it construes as your taste.

Reduced to component parts, culture can now be recombined and optimized to drive user engagement. This threatens to starve culture of the resources to generate new ideas, new possibilities. 

If you want to freeze culture, the first step is to reduce it to data. And if you want to maintain the frozen status quo, algorithms trained on people’s past behaviors and tastes would be the best tools.

The goal of a recommendation algorithm isn’t to surprise or shock but to affirm. The process looks a lot like prediction, but it’s merely repetition. The result is more of the same: a present that looks like the past and a future that isn’t one. 

Grafton Tanner, writing in Real Life Magazine

Goodhart’s law

Once a useful number becomes a measure of success, it ceases to be a useful number. This is known as Goodhart’s law, and it reminds us that the human world can move once you start to measure it. Deborah Stone writes about Soviet factories and farms that were given production quotas, on which jobs and livelihoods depended.  

Numbers can be at their most dangerous when they are used to control things rather than to  understand them. Yet Goodhart’s law is really just hinting at a much more basic limitation of a data- driven view of the world … there’s a critical gap between even the best proxies and the real thing— between what we’re able to measure and what we actually care about.

Hannah Fry writing in The New Yorker

Bullet-riddled Fighter Planes

During World War II, researchers from the non-profit research group the Center for Naval Analyses were tasked with a problem. They needed to reinforce the military’s fighter planes at their weakest spots. To accomplish this, they turned to data. They examined every plane that came back from a combat mission and made note of where bullets had hit the aircraft. Based on that information, they recommended that the planes be reinforced at those precise spots.

Do you see any problems with this approach?

The problem, of course, was that they only looked at the planes that returned and not at the planes that didn’t. Of course, data from the planes that had been shot down would almost certainly have been much more useful in determining where fatal damage to a plane was likely to have occurred, as those were the ones that suffered catastrophic damage.

The research team suffered from survivorship bias: they just looked at the data that was available to them without analyzing the larger situation. This is a form of selection bias in which we implicitly filter data based on some arbitrary criteria and then try to make sense out of it without realizing or acknowledging that we’re working with incomplete data.

Rahul Agarwal writing in Built in