DIMENSIONALITY REDUCTION, FEATURE SELECTION, FEATURE EXTRACTION

Dimensionality - Curse or Quirk?

“Take some more tea,” the March Hare said to Alice, very earnestly.

“I’ve had nothing yet,” Alice replied in an offended tone, “so I can’t take more.”

“You mean you can’t take less,” said the Hatter: “it’s very easy to take more than nothing.”

Does childhood reading lay the foundation for independent reading later on? It’s a definite YES in my case!

What I do not know the answer for is if this is a causal relationship or a correlation?  

Finer point – While both methods are used to determine relationships between variables, they are quite unlike each other. Causal analysis is an experimental method while correlation analysis is an observational study. 

Let’s delve a little deeper into the phenotype. Critical thinking requires epistemic cognition.

“It is valuable to remember that whether a variable acts as a mediator, moderator, confounder, or covariate is not an inherent property of the variable. Understanding the potential influence of parameters that are not the focus of a study is important for identifying what interventions work, for whom they work when they work best, and in what settings they are most useful.” 

Source: https://journals.lww.com/jnpt/fulltext/2019/04000/mediators_and_moderators,_confounders_and.1.aspx

We are at it again: Wonderland is an antecedent to our data-driven world!

“Take some more tea,” the March Hare said to Alice, very earnestly.

As an AI/ML professional, our craving is data; tea can wait for some time 😀 I take the liberty to use an allegory here. 

Q: Can I get some more data?  

A: Yes, the sheer volume of available data has grown exponentially in the recent past and is expected to continue to do so.  

The volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025 

Source: https://www.statista.com/statistics/871513/worldwide-data-created/

Source: https://cloudtweaks.com/2015/03/how-much-data-is-produced-every-day/

Data acquisition is not an issue anymore, from social media usage, IoT/sensors and the list continues. We now start exploring – our journey from observation to insights. 

“I’ve had nothing yet,” Alice replied in an offended tone, “so I can’t take more.”

This multi-dimensional data cannot be visualized in its entirety; thus, all meaningful patterns could not be identified. In the AI/ML parlance we call this ‘curse of dimensionality’. 

Finer point – Let’s take an example. Getting hold of an insect in a tube vis-a-vis your pet running across the lawn vis-a-vis going after birds; the effort to catch/hunt increases exponentially with the addition of another dimension. Similarly, the exploding nature of spatial volume (read, dimensions) makes data analysis tougher.  

Like Alice, we can’t take more – from the very beginning.  

The cure can come in with dimensionality reduction techniques.  

Finer point – To control over-fitting use regularization techniques. To prevent computation overload, use dimensionality reduction techniques. 

So, why is there a data deluge in the first place? 

“You mean you can’t take less,” said the Hatter: “it’s very easy to take more than nothing.”

Got my answer from Wonderland – dimensionality reduction is not the best strategy to start with. We can’t take less – from the very beginning. 

Need to appreciate the mediator, moderator, covariates, and confounders variables as well.  

Finer point – Suppressor variables might be the red-headed stepchild but those can make or break your analysis.

“It would have made a dreadfully ugly child, but it makes rather a handsome pig.”

Circling back to the question, we started with. I’ll have to restart with the basic heuristic of variable reduction in high-dimensional space.

Until next time … signing off with Alice’s quote

“Now I’m opening out like the largest telescope that ever was!”