PandaOmics custom dataset upload is a two-step process:
Gene expression matrix file upload;
The Association of the sample metadata attributes from a separate file
Both gene expression and metadata files must be either in a comma-delimited or a tab-delimited TXT format (also referred to as csv/tsv).
Data upload button is located in the main menu available at the top of any page in PandaOmics.
The upload page allows you to drag and drop a file containing gene expression information for several samples. Dataset matrix must contain gene expression presented in rows and separate sample identifiers — in columns (see the example below).
The following gene identifiers are allowed
Gene Symbol ID
RefSeq ID
Ensembl Gene ID
UCSC Gene ID
UniProt ID
Entrez Gene ID
Gene expression values must be presented by a numeric value of one of the three possible formats (automatically recognized by the system):
The Positive integer — a number that indicates gene counts;
The Positive decimal — a normalized gene expression signal;
The Positive and negative decimal — log transformed normalized gene expression values.
Example data matrix
Additional file parameters are also recognized by the system automatically
Manual selection of those parameters is available in advanced settings.
1
File encoding
2
Value separating format
including a comma, a tab, and a space, etc.
3
NA format
gene expression missing value recognition uses the following indications '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
4
Decimal separator format
a dot or a comma;
5
NA handling
detected missing gene expression values can be changed to "Zero", "Geometric mean", "Arithmetic mean" depending on the data distribution format.
Click on the Upload button. It might take some time depending on the file size. Meanwhile, you can proceed using PandaOmics and get back to Data Upload later. Only one Gene Expression file can be uploaded at a time.
Once a dataset is uploaded, you can browse general information about the progress that includes:
The total number of samples detected;
The total number of genes detected and recognized;
A sample of the uploaded and recognized dataset;
A sample of the original dataset;
A list of genes from the original dataset which were not detected or mapped correctly.
Clicking on the Confirm button will redirect you to the Sample Metadata upload page where you can drag and drop a separate file with sample metadata attributes for the corresponding dataset.
Metadata annotation can be used for the further setup of the sample groups. Below is an example of a metadata table. The first column must always contain sample identifiers matching with the sample IDs of the original dataset matrix.
A metadata file can be associated with a dataset. Metadata annotation can be used for further setup of the sample groups. Below is an example of a metadata table
The first column should always contain sample identifiers that match with the sample IDs of the original dataset matrix
Once the metadata file is uploaded, you can browse general information about the progress that includes:
The total number of sample attributes detected;
The number of samples from the original gene expression file that match sample names from the metadata file;
The list of metadata attributes associated with a number of samples where this attribute is not a missing value.
The list of samples from the original gene expression file which do not have metadata attributes;
The list of samples from the metadata file which are missing from the original gene expression dataset.
Clicking on the Confirm button will redirect you to the Dataset entity page where you can start creating experimental sample groups and run case/control comparisons.
Once the dataset is uploaded, it can be found in the Data Manager.