200+ herramientas gratuitas de Estadística y Data Mining

Ahora que hemos expandido nuestra red profesional (de 50 contactos al doble en menos de 1 mes!), hemos recibido, vía LinkedIn, un listado de herramientas gratuitas (no necesariamente open-source) de Estadística y Data Mining para que utilices en tus proyectos. Así que aquí compartimos contigo la lista, que la disfrutes:

ADaMsoft - Data mining, data management and reporting.  Has web based features.
Ade4 - It contains Data Analysis functions to analyse Ecological and Environmental data in the framework of Euclidean Exploratory methods, hence the name ade4.
ADePT – Developed to automate and standardize the production of analytical reports.
ADMB - Non-linear statistical modelling.
AFGen – Fragment-based descriptors for chemical compounds (chemometrics).
AM – General stats package.
Antaeus – Data visualisation software.
AnSWR – A software system for coordinating and conducting large-scale, team-based analysis projects that integrate qualitative and quantitative techniques.
Apophenia – General stats package but with more flexibility to be creative in model building.
Arc - Applied regression including computing and graphics.
ARMiner – Data mining application specialising in finding association rules.
Assistat - General stats package.
AutoClass – Data mining (clustering) program from NASA’s Ames Research Center.

Bayesian Filtering Library – Bayesian software for use in advanced machine and robot control.
BiNGO – Bioinformatics software.  Biological Network Gene Ontology.
BioEstat - Ecological statistics program (only available in Portuguese).
Biogeme - Stands for: Bierlaire’s Optimization Toolbox for GEV (Generalized Extreme Value) Model Estimation.
Biomapper – A GIS-toolkit to model ecological niche and habitat suitability.
BKD – Bayesian Knowledge Discoverer
BlockTreat – A general frequentist Monte Carlo program for block and treatment tests, tests with matching, k-sample tests, and tests for independence.
BrightStat - General stats package.
BV4.1 – A procedure for the decomposition and seasonal adjustment of monthly and quarterly economic series.  Used by the German Federal Statistical Office.

Calcugator – Calculator, plotting engine, and programming environment.
Caleydo – Bioinformatics data visualisation software.
CCOUNT – Designed for market research purposes, including: data cleaning, manipulation, cross tabulation and data analysis.  Similar to SPSS.
CDC EZ-Text – Developed to assist researchers create, manage, and analyze semi-structured qualitative databases.
CFA – Psychometrics program.  Type Identification by Configural Frequency Analysis
Chronux – Developed for the analysis of neural data.
CLUTO – Data mining software for clustering high-dimensional datasets.
Conc – Text concordance program.
Concorder – Text concordance program.  Allows you to take given texts and find out the frequency of words.
Correlate -
CORRES – Psychometrics program.  Correspondence analysis of contingency tables.
CSPro – Census and Survey Processing System (CSPro) is a questionnaire-oriented statistical package for Windows.
Cytoscape – Bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data.

DAP – Alternative to SAS.
Dataplot - General stats package.  Designed for science and engineering.
DAVID – Bioinformatics software.  Database for Annotation, Visualization and Integrated Discovery.
dChip – Bioinformatics software.  dChip Software: Analysis and visualization of gene expression and SNP microarrays
Demetra - Time series package.
Dia – Diagram drawing program.  Allows the drawing of entity relationship diagrams, UML diagrams, flowcharts, network diagrams, and many others.
Draco - Econometrics package with spreadsheets.

EASE – Bioinformatics software.  Expression Analysis Systematic Explorer.
EasyReg Int. – Econometrics package.
EasySample – Simple sampling program.
EDGAR – Experimental design programs.
ELKI – Environment for developing KDD-applications supported by index-structures.  Compares data mining algorithms.
EMMIX – Fortran software to fit mixture models.
Epi Info – Designed for epidemiologists and other public health and medical professionals.
Epidata - Epidemiology package.
EqPlot – Equation graph plotter.
ErmineJ – Bioinformatics software.  Performs analyses of gene sets in expression microarray data or other genome-wide data that results in rankings of genes.
ESA – Event Structure Analysis (ESA) is an on-line Java program that helps you analyze sequential events.
ESS – Emacs Speaks Statistics.  Add-on package for emacs text editors.
Esta+ – General (and simple) stats package.
Euler - Similar to Matlab.
EVE – Embedded Vector Editor.  Vector graphics program.
Excellent Analytics – Excel plug-in that lets you import web analytics data from Google Analytics into a spreadsheet.
EzANOVA - Simple ANOVA program.

FACET – Psychometrics program.  Facet Analysis, Smallest Space Analysis, Multidimensional Scalogram Analysis, Partial Order Scalogram Analysis.
Factor - Simple factor analysis program.
Firebird – Relational database management system (RDBMS).
Fityk – Nonlinear least squares curve fitting.
FlexArray – Bioinformatics program.  Statistical analysis and visualization of microarray expression data.  Free to academic and government researchers.
Freemat - Similar to Matlab and IDL.

G Power – Power and sample size calculator.
G7 – Econometrics package.  Allows the building and useage of data banks.
GATE – Text mining program.  General Architecture for Text Engineering.
gCLUTO – Data mining program.  Graphical clustering toolkit.
Gemma – Bioinformatics software.  Database and software system for the meta-analysis of gene expression data.
GEMS – Bioinformatics software (online forms).  Gene Expression Module Sampler.
GenePattern -Bioinformatics software. Genomic analysis platform that provides access to more than 125 tools for gene expression analysis, proteomics, SNP analysis and more.
GenMAPP – Bioinformatics software.  Designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes.
GeoDa - Geospatial analysis and computation.
GGobi - Data visualisation.
Gist – Bioinformatics software.  Contains software tools for support vector machine classification and for kernel principal components analysis.
Gnumeric - Spreadsheet program.
Gnuplot – Command driven graphing program.
GoMiner – Bioinformatics software.  A tool for biological interpretation of ‘omic’ data, including data from gene expression microarrays.
Gostat – Bioinformatics software (online forms).  Find statistically overrepresented Gene-Ontology (GO) terms within a group of genes.
Gretl – Stands for Gnu regression, econometrics and time-series library.  Alternative to EViews.
Grocer – An econometric toolbox for Scilab.
GUIDE – Multi-purpose machine learning algorithm for constructing classification and regression trees.

HLM – Hierarchical linear and nonlinear modelling.
hMETIS – A set of programs for partitioning hypergraphs such as those corresponding to VLSI circuits.

IHMC CmapTools – Concept mapping program.
Instats - General stats package.
Interactive Statistical Unit -
Inverse Symbolic Calculator – An online form where the output is a list of possible sources for the number you enter (such as simple equations or well known constants).
IRRIStat - General stats package.  Designed for analysis of agricultural field trials data.
IVEware – Imputation and Variance Estimation.

JAGS – Stands for just another Gibbs sampler.  Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC).
jHepWork – Mathematics and statistics package.
JMulTi - Time series and econometrics package.

Kyplot - Graphs program.

Lisp-stat – General stats package with emphasis on graphics.
LISREL - Originally limited to structural equation modelling (SEM) but now has other functions.

MacAnova – General stats package.
MacSHAPA – Macintosh-based software environment that supports observational data analysis, including the analysis of video.
Matrixer – Econometrics package.
Matrix2png – Bioinformatics software.  A simple program for making visualizations of microarray data and many other data types.
Matvec – Current capabilities range from matrix and vector manipulation to the analysis of linear and generalized linear mixed models.
Maxima – A computer algebra system.  Mathematics program.
METIS – A set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices.
MGridGen – Multilevel serial & parallel coarse grid construction library.
MicroConcord – Text concordance program.
MicrOsiris – Statistical and data management package.
Ministep – Constructs Rasch measures from simple rectangular data sets.
MINUIT – Physics analysis tool for function minimization.
Mondrian – Data visualisation.
MorePower – Statistical power calculator for hypothesis tests.
MONSTER – Minnesota prOteiN Sequence annotaTion sErveR (bioinformatics).
Mx – Matrix algebra interpreter and numerical optimizer for structural equation modelling and other types of statistical modelling of data.
MySQL – Relational database management system (RDBMS).
MYSTAT – Free SYSTAT version for students.

NLTK – Text mining program.  Natural Language Toolkit.
NORM – Multiple imputation of multivariate continuous data under a normal model.

Octave – Similar to Matlab.
Onto-Tools – Bioinformatics programs. Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner, Pathway-Express, Promoter-Express, nsSNPCounter, OE2GO, KUTE-BASE.
OpenBUGS – Bayesian inference using Gibbs sampling.
Open Code – A tool for coding qualitative data generated from text information such as interviews, observations or field notes.
OpenEpi – Epidemiology package.
OpenOffice – Open source alternative to Microsoft Office.  Includes Calc, an alternative to Excel.
Openstat – General stats package.
Orange – Data mining through visual programming or Python scripting. Extensions for bioinformatics and text mining.

PAFI – A set of programs (LPMiner, SLPMiner and FSG) that can be used to find frequent patterns in large and diverse databases.
PAMCOMP – Person-years And Mortality COMputation Program.
ParMETIS – Extends the functionality provided by METIS and includes routines that are especially suited for parallel AMR computations and large scale numerical simulations.
PAST – Statistics program for palaeontology.
PCP – Pattern Classification Program.
PINT – Power analysis IN Two-level designs (for determination of standard errors and optimal sample sizes in multilevel designs with 2 levels).
Ploticus – Graphs program.
PopTools – Population modelling program.
POSDEM – A program to choose between sampling plans based on the population frame.
PostgreSQL – Relational database management system (RDBMS).
PQRS – Stands for probabilities, quantiles and random samples.
Program Mark – Ecological statistics program.  Advanced mark-recapture modelling.
PS – Power and sample size calculator.
PSPASES – Parallel SPArse Symmetric dirEct Solver (intended for solving linear systems of equations).
PSPP – Alternative to SPSS.

QCA – Qualitative Comparative Analysis is a special-purpose program designed to analyze quantified data from multiple cases.
QPL – Questionnaire Programming Language.
Quail – Quantitative analysis in Lisp.

R – Widely used alternative to SPlus.
R Commander – GUI interface for R.
RapidMiner – Data mining.
Red R – Visual programming interface for R.
Regress+ – Mathematical modelling for the Macintosh.
RMAexpress – Bioinformatics software.  A standalone GUI program to compute gene expression summary values for Affymetrix Genechip data.
Rosetta – A rough set toolkit for data analysis / data mining.
Rule Discovery System – Data mining program.
Rundom Pro – General stats package with emphasis on resampling procedures.
Rweb – An online form that processes R commands.

|STAT – General stats package.
Sage - Mathematics software.
SalStat – General stats package designed for the analysis of science and psychology data.
SAM – Bioinformatics software.  Sequence Alignment and Modeling System.
Scilab – Numerical computation.
Shogun – A large scale machine learning toolbox.
Simfit – Simulation, curve fitting, statistics, and plotting.
Simplex Method Tool – An online form that solves linear programming problems.
SimulME – A Java ME (J2ME) application with queuing calculator, Monte Carlo simulation and more. Compatible with Sony-Ericsson, Samsung, Motorola and other cell phones.
SIPINA – Statistical classification program.
SL Gallery – Statistical distribution graphs and calculations.
Smereka – Extensible personal freeform database and personal information manager.
SNNS – Stuttgart Neural Network Simulator.
SOCR – Stands for statistics online computational resource.  General stats package.
SOFA Statistics – Statistics, analysis and reporting program.
Software for Permutation Methods: A Distance Function Approach
Sonar – Survey administration and data collection system.
SpreadCE – Spreadsheet program.
SSP (Smith’s Statistical Package) – General stats package.
StatCalc – A Calculator that computes table values of 34 statistical distributions. It also computes moments, and many other statistics.
StatEasy – General stats package.
Statext – Statistics expressed in text (even the graphs).
Statibot – An interactive www-based expert system for basic statistical analysis.
Statist – General stats package.
Statistical Lab - R based general stats package.
Statistics 101 – Simulations / resampling methods.
STATPerl – General stats package based on Perl.
Stattucino – Web spreadsheets with data analysis functions.
SUGGEST – Recommendation engine.

TAMS – Text mining program.  Text Analysis Mark-up System.  Identifies themes in texts.
TANAGRA – Data mining program.
TELPACK – Teletraffic Analysis Package.
Tetrad – Search algorithms and statistical modelling.
TextStat – Text concordance program.  A program for the analysis of texts.
TimesLab – Time series analysis program.
Tinn-R – R code text editor.
TMeV – Bioinformatics software.  Normalized and filtered expression files can be analyzed using TIGR Multiexperiment Viewer (MeV).
TSW – Time series and econometrics package.
TWOMOK – Non-parametric scale analysis for two-level data (ecometrics).

UIMA – Text mining program.  Unstructured Information Management Architecture.
Ultrafind – Text concordance program.  Extremely fast text search.

VARBRUL – Designed to facilitate analysis of linguistic variables and social variables.
VisiCube – Data visualisation tool.
ViSta – Visual statistics system with work maps, guide maps and interactive graphics.
VStar – Data visualisation and analysis tool for astronomy.

wCLUTO – Web-enabled data clustering application that is designed for the clustering and data-analysis requirements of gene-expression analysis (bioinformatics).
Weft QDA – A tool to assist in the analysis of textual data such as interview transcripts, written texts and field notes.
Weka – Machine learning software written at the University of Waikato.
WinBUGS – Bayesian analysis using Markov chain Monte Carlo methods.
WinIDAMS – General stats package.
Winpepi – Epidemiology package.
WordNet – A large lexical database of English.

X-12-ARIMA – Seasonal adjustment software produced, distributed, and maintained by the US Census Bureau.
XDAT – Visualization and analysis of multidimensional data.
XL Statistics – A set of Microsoft Excel (ver 97+) workbooks for statistical analysis of data.
XL Toolbox – Data analysis add-on for Excel.
Xnumbers – Multi precision floating point computing and numerical methods for Excel.
XploRe – Interactive statistical methods and data exploration.
Xpro – Exact parametric inference.
Xtremes – Graphics and analysis program for extreme values.

YASSPP – A web-based server for predicting the secondary structure elements of a protein sequence (bioinformatics).
Yxilon – Statistical programming language.

Zaitun Time Series – Time series software.  Has the capability to deal with stock market data.
Zelig – Based on R.  Automates various processes.

Leave a reply