A Naive Bayes Classifier for identifying Class II YSOs

A naive Bayes classifier for identifying Class II YSOs has been constructed and applied to a region of the Northern Galactic Plane containing 8 million sources with good quality Gaia EDR3 parallaxes. The classifier uses the five features: Gaia $G$-band variability, WISE mid-infrared excess, UKIDSS and 2MASS near-infrared excess, IGAPS H$\alpha$ excess and overluminosity with respect to the main sequence. A list of candidate Class II YSOs is obtained by choosing a posterior threshold appropriate to the task at hand, balancing the competing demands of completeness and purity. At a threshold posterior greater than 0.5 our classifier identifies 6504 candidate Class II YSOs. At this threshold we find a false positive rate around 0.02 per cent and a true positive rate of approximately 87 per cent for identifying Class II YSOs. The ROC curve rises rapidly to almost one with an area under the curve around 0.998 or better, indicating the classifier is efficient at identifying candidate Class II YSOs. Our map of these candidates shows what are potentially three previously undiscovered clusters or associations. When comparing our results to published catalogues from other young star classifiers, we find between one quarter and three quarters of high probability candidates are unique to each classifier, telling us no single classifier is finding all young stars.


INTRODUCTION
A variety of machine learning techniques have already been used to identify candidate Young Stellar Objects (YSOs). Naive Bayes classifiers have advantages that may bring benefit to this task. The aim of this paper is to assess how well a carefully constructed naive Bayes classifier can identify YSOs. Our approach focuses on Class II YSOs, using their known properties as inputs.

YSOs
Stars form in clouds of cold dust and gas, where gravitational collapse creates knots of over density that grow with time. Protostars form in these knots, starting cool and emitting in the far infrared. The peak of their emission moves to shorter wavelengths as their cores grow and heat. The spectral index (Lada 1987;Adams et al. 1987;Kenyon et al. 1990;Andre et al. 1993Andre et al. , 2000 encapsulates the location of the peak emission in a single number and is used to divide YSOs into evolutionary classes. Class 0/I YSOs are nearly impossible to identify in the optical, requiring infrared observations to detect them within their prenatal clouds of dust and gas. Class III sources are easily observed but have lost some of the identifiable characteristics of YSOs. Class IIs on the other hand can be observed in the optical as well as the infrared, with their discs and accretion giving rise to the well known features of Class II YSOs. The disc creates an infrared excess (Calvet et al. 1991) and can cause variable obscuration of the protostar giving rise to photometric variability (Bouvier et al. 1999; ★ E-mail: aw648@exeter.ac.uk, andyjwilson_uk@hotmail.com Froebrich et al. 2018). Emission lines such as H are generated in accretion columns and by chromospheric activity (Herbig 1958(Herbig , 1962Edwards et al. 1994;Hartmann et al. 1994;Ingleby et al. 2013). Variable accretion leads to aperiodic variability while starspots and accretion hotspots lead to periodic variability at the rotation period of the protostar (Herbst et al. 1994;Sergison et al. 2020). Protostars have a larger physical size than Main Sequence (MS) stars of the same temperature, resulting in a greater luminosity than MS stars of the same colour. These identifiable observational properties of Class II YSOs are ideally suited to be the input features of machine learning classifiers.

Machine learning for YSOs
There are two main branches of machine learning (ML). Supervised ML classifies data into predefined labels, while unsupervised ML is free to choose the data classifications. Hence, supervised machine learning is appropriate for identifying YSOs and it has already been successfully applied to this task. Miettinen (2018) tested eight machine learning algorithms for classifying 319 known YSOs. Interestingly their standard R implementation of a naive Bayes classifier gave the worst performance of the eight classifiers. The assumption of Gaussian feature distributions in the R implementation is a potential cause of this weak result, as the probability distributions used to define the likelihoods in Bayes classifiers have a big impact on their performance. They also make no mention of their choice of prior or posterior threshold, both of which have significant impacts on the measured performance. Marton et al. (2019) compared a range of approaches for identifying candidate YSOs, including support vector machines, neural networks, naive Bayes, -NN and random forests. They found a random forest of 500 trees gave the best performance. In common with the work we present in this paper, they base their input data on Gaia, though DR2 rather than EDR3, combined with WISE, 2MASS and Planck. Vioque et al. (2020) created a neural network to search for new Herbig Ae/Be and PMS stars. They use similar features to our classifier, based on Gaia DR2, 2MASS, WISE, IPHAS and VPHAS+. The approach to creating their PMS classification means it is similar to our Class II YSO (CII) classification.
The SCAO (Spectrum Classifier of Astronomical Objects) developed by Chiu et al. (2021) uses a fully connected neural network to identify YSOs. They also tried extreme gradient boost, random forest, support vector machine and k-nearest neighbour machine learning techniques. The classifier uses spectral energy distributions (SEDs) constructed from Spitzer, UKIDSS and 2MASS photometry. The results of SCAO run against the SEIP (Spitzer Enhanced Imaging Products) catalogue are available online. Kuhn et al. (2021) employ a random forest to identify YSOs for their SPICY catalogue using Spitzer with 2MASS, UKIDSS and VVV photometry. Again this has similarities to our work.
The neural network named Sagitta created by McBride et al. (2021) identifies pre-main sequence (PMS) stars and assigns them an age. This is trained on the results of a hierarchical clustering analysis and a neural network named Auriga (Kounkel et al. 2020). They use Gaia DR2 and 2MASS, so this has some similarities to our classifier, though we additionally incorporate data from UKIDSS, WISE and IGAPS. Also, we focus on Class II YSOs while they assessed a wide range of PMS ages. Cornu & Montillaud (2021) constructed a neural network for identifying Class I and Class II YSOs. Their labelled data was created using a modified version of Gutermuth et al. (2009) based purely on Sptizer photometry. The classifier was trained and tested on data from Orion, NGC 2264 and a sample of clouds within 1 kpc.
With the exception of Miettinen (2018), published catalogues are available for these classifiers. We compare them with the results from our classifier in Section 4.3.

Possible advantages of a naive Bayes classifier
Our naive Bayes classifier is based on Broos et al. (2011), a classifier of X-ray sources. They expanded the technique in Broos et al. (2013), making use of infrared as well as X-ray observations to identify YSOs for the MYStIX project. Naive Bayes classifiers have also been used in other areas of astronomy, for example Li et al. (2020) built one to separate Type I and Type II Gamma-ray bursts. While Tranin et al. (2022) created a naive Bayes classifier for separating X-ray sources into AGN (Active Galactic Nuclei), stars, X-ray binaries and cataclysmic variables.
Our classifier is applied across a broad region of the Northern Galactic Plane defined 20 < < 220 and | | < 4, we shall refer to this region as the NGPn. This region was chosen to overlap with the footprint of WEAVE (Dalton et al. 2012), a spectrograph commissioned for the WHT (William Herschel Telescope). In future we intend to test our classifier on the WEAVE SCIP (Stellar, Circumstellar and Interstellar Physics) survey (Jin et al. 2022). We impose a magnitude limit of < 18 to ensure the candidates are sufficiently bright for spectroscopic confirmation by WEAVE. Our NGPn data set is restricted to sources with Gaia EDR3 parallaxes consistent with a distance of 2 kpc or less, as beyond this limit we found the parallax measurements were dominated by their uncertainties. The results are posterior probabilities, distinguishing between the binary classifications of Class II YSO (CII) and all other types of object (O ). The inputs to our classifier are the known features of Class II YSOs (described in Section 2). Our decision to use known features avoids the risk of overfitting the classifier using non-physical properties. The simple algorithm of naive Bayes classifiers allows these features to be traced through the workings to the results. This provides astrophysical intuition into the individual source classifications, not possible with black box classifiers. The feature data also make troubleshooting and refinement of the classifier straightforward.
A critical step in building a classifier is training it to distinguish between the classes based on the feature values. Training can use purely theoretical models, or real data labelled by class using established techniques. We use labelled data to train our classifier (see Section 2.1). With some types of classifier, the training sets need to be large and match the ratio of the classes in the population. With Bayes classifiers, the ratio of the classes is encoded by the priors (see Section 3), meaning we can use unbalanced training sets. Bayes classifiers also cope well with small training sets, a benefit for our relatively small training set of Class II YSOs (see Section 2.1).
When combining data from multiple astronomical surveys, a source may be present in one survey and absent or difficult to reliably cross-match to another. This can present a significant problem for machine learning techniques that require all feature data to be present for all sources. In such cases, one possible solution is to limit the input data to the subset of sources common to all surveys, the approach employed by Vioque et al. (2020). Alternatively, missing data can be populated with estimates, the approach taken by Kuhn et al. (2021) and McBride et al. (2021). In contrast, naive Bayes classifiers do not require all feature data to be present for every source. They are able to simply skip over features without data, giving a result based on the available features for each individual source (see Section 3.2).

Outline of this paper
The remainder of this paper is organised as follows. Section 2 reviews the features of Class II YSOs. Section 3 introduces our naive Bayes classifier and explains how the features are incorporated. The results of the classifier are reviewed and compared to other published works in Section 4. We end by considering bias and how the classifier might be improved in Section 5. Our catalogue is published to allow others to independently test our results, see the data availability statement at the end of this paper. We provide a table of the symbols used in this paper and a data dictionary for the online catalogue in the supplementary material.

FEATURES OF CLASS II YSOS
For the classifier to identify candidate Class II YSOs required a set of observable features that were able to separate them from other classes of object. We derived these features from publicly available surveys. To be useful for our work these surveys needed to provide good coverage of our NGPn footprint. Within these limitations we identified a set of five features with surveys to derive them: midinfrared excess using WISE (Wright et al. 2010), near-infrared using UKIDSS (Lucas et al. 2008) and 2MASS (Skrutskie et al. 2006), H excess and overluminosity with respect to the MS both using IGAPS (Drew et al. 2005;Monguió et al. 2020), and photometric variability from Gaia EDR3 (Gaia Collaboration et al. 2016Collaboration et al. , 2021. As this is an experimental classifier, we chose to prioritise quality at the expense of completeness when selecting data from the surveys. Otherwise it would have been difficult to separate poor classifier performance from incorrect classifications caused by low quality data. Our NGPn catalogue contains 8 088 045 sources, chosen for their good quality parallax measurements in Gaia EDR3 then crossmatched against the other surveys. Table 1 gives a summary of the features, showing the percentage of our NGPn catalogue and the number of training sources for each feature. As naive Bayes classifiers can cope with missing data (see Section 3.2), it was not necessary for the feature surveys to have data for all sources. Full details of the survey selection criteria and how they were combined are given in Appendix A.

Training and test data sets
Training data sets were created to teach the classifier how to discriminate between Class II YSOs (CII) and all other types of source (O ). These were crucial in selecting and tuning the features, since the strength of a feature is determined by its ability to separate the CII from O . This also means the quality of the classifier results depends to a large extent on the purity of the training sets, as contamination weakens the training. To allow an independent check of the classifier performance, we needed a separate test set of sources not used in the training. Hence, we created a master list of known sources of each class, and split them into training and test subsets.
To create our known CII master list, we performed a literature search for papers providing catalogues of robustly identified Class II YSOs. We only included catalogues where we would expect all or a significant proportion of sources to lie within 2 kpc, to match the distance limit of our base data set. The full list of papers used to compile our master list of known CIIs is given in Table 2. Many of these are based on variations of the methods of Gutermuth et al. (2005Gutermuth et al. ( , 2008Gutermuth et al. ( , 2009, accepting this may give the classifier a bias towards identifying CII with the properties used by this approach. There are 2 853 distinct CII from these catalogues within our NGPn footprint with matches to Gaia EDR3 using a 1 search radius. This reduces to 439 CII in our NGPn data set, with the losses due to our parallax and photometric quality cuts (see Appendix A1). The count of CII from the separate papers is 539 as some of the same sources appear in more than one paper. Although this small quantity of labelled data limits the granularity that can be achieved in training, it is adequate for creating the probability distributions used by the Bayes classifier.
The O data set was more difficult to define as ideally it should contain a representative population of all types of object apart from Class II YSOs. In reality this is impossible since the populations of different types of star vary with location in the Galaxy, and there could be unidentified CIIs in any region. Hence, we decided to use a region that was devoid of known star formation for our O data set. We settled on the region bound by the Galactic coordinates 160 < < 166 and −4 < < 4, which we shall refer to as the G R . It is devoid of major known star formation, has low extinction and is bland in all feature input catalogues. These very properties risk introducing bias (see Section 5), but we accept this for the benefits the approach offers. We can be confident there are at most a few Class II YSOs in this region. Also, the low extinction is beneficial since our reddening values are only approximate and unidentified bad reddening can lead to spurious feature results. The master data set of O contains 189 556 sources. Our approach to selecting the O data set means it may be contaminated by YSOs of any type including CII. We should expect this to be at a very low level and if there is significant contamination this would show up as a high false positive rate for the O sources when validating the classifier. Ironically, this lack of YSOs in the G R leads to a training problem. It means the classifier will not be trained to place other types of YSOs into the O classification. This is exacerbated as other types of YSO share similar feature properties to CIIs. So we should reasonably expect the classifier to identify other types of YSO as CII, though in general they should be at a lower posterior CII than genuine Class II YSOs. This situation could be improved by expanding the classifier to more than two classes, adding the other types of YSO with training sets. For now, we simply acknowledge and accept this weakness.
We split the master data sets 80 : 20 for training and testing, using the Gaia EDR3 random index to split the data into random selections. This gave 365 sources in the CII training set and 91 sources in the CII test set. The O master set was split 151 645 for training and 37 911 for testing. This imbalance in the training sets would be a significant problem for some types of ML. It is not a problem for Bayes classifiers as the expected ratio of the classes is encoded by the priors, not the relative sizes of the training sets.

WISE W1-W2
Excess emission in the mid-infrared is a strong indicator of stellar youth (Lada & Wilking 1984). In Class II YSOs this is due to dust in the circumstellar disc reprocessing radiation from the YSO into the infrared (Calvet et al. 1991). Other works have successfully used photometry from IRAS (Kenyon et al. 1990), Spitzer (Robitaille et al. 2008) and WISE (Koenig et al. 2012;Koenig & Leisawitz 2014;Fischer et al. 2016;Marton et al. 2019) to identify YSOs. For this work we considered Spitzer and WISE. Spitzer provides better spatially resolved photometry though only covers patches of our NGPn footprint, while WISE provides full coverage at lower resolution. Since coverage is important to us we settled on WISE for our mid-infrared feature.
A proven technique for identifying YSOs is defining areas of Colour-Colour Diagrams (CCDs) associated with stellar youth. This has been successfully demonstrated by Koenig et al. (2012) using WISE ( 1 − 2, 2 − 3), then refined by Koenig & Leisawitz (2014) and Fischer et al. (2016). However, these approaches have an inherent weakness due to large uncertainties in the WISE photometry. These uncertainties get progressively worse to longer wavelength bands. In the ALLWISE catalogue 69 per cent of sources have an uncertainty better than 0.1 magnitudes in 1, 33 per cent in 2, and just 2 per cent in 3. Hence requiring good quality photometry in 3 drastically reduces the available data.
A key benefit of employing the 3 photometry is it allows CCDs to distinguish YSOs from Asymptotic Giant Branch (AGB) stars, star-forming galaxies, shock emission and PAH (Polycyclic Aromatic Hydrocarbon) knots. It can also separate Class I from Class II YSOs. The sources in our NGPn catalogue have been selected to ensure they have good quality Gaia parallaxes consistent with a distance 2 kpc or less. This excludes galaxies and the majority of AGB stars. Further, we do not expect our selection of the Gaia optical data with RP < 18.0 to contain significant numbers of sources that are shock emission, PAH knots or Class I YSOs. These factors enable an alternative approach with our NGPn catalogue, using a simple ( 1 − 2) excess to pick out Class II YSOs, retaining a greater proportion of the WISE photometry as there is no need for a 3 quality cut.

(H-K) Excess
Class II YSOs exhibit an excess in the near-infrared (Meyer et al. 1997). We derive an ( − ) excess from ( − , − ) CCDs using UKIDSS and 2MASS photometry, defining our excess as the distance in ( − ) of dereddened photometry from a straight line fit to the linear region of a 1 Myr isochrone.
The observations are dereddened in voxels of half a degree to each side in Galactic coordinates and 5 pc in depth using transformed reddening values from STILISM (Lallement et al. 2014;Capitanio et al. 2017;Lallement et al. 2018, Appendix B). We examined young stellar isochrones (described in Appendix C) for ages from 1 to 100 Myr in UKIDSS and 2MASS photometric systems (see Fig. 1). The stars tend to descend in ( − ) with age, while there is a more complex behaviour in ( − ). For example a YSO of mass 0.4 M shows a slight increase in ( − ) up to around 40 Myr before decreasing. When compared to actual observations this effect is negligible, as can be seen in dereddened training data of Fig. 1, black reddening vectors for the CII and grey dots for the O . The majority of the CII lie to higher ( − ) and ( − ) than the young stellar isochrones, while the O tend to be closer to the older isochrones. The CII occupy the top right region of the CCD primarily due to infrared emission from their discs (Calvet et al. 2000).
To have confidence in our approach, we briefly explore the reasons why many of the CII lie far from the young stellar isochrones in Fig. 1. Variability in , and was observed by Carpenter et al. (2001) for PMS stars in the Orion Nebula Cluster. Although much of the variability was colourless, some stars exhibited colour changes of 0.5 or more in ( − ) and ( − ). Their modelling could not fully explain these colour changes, though they considered a combination of cool and hot spots, variable extinction and changes in the circumstellar disc to be plausible explanations. The scatter in our CII training set is consistent with a snapshot from a population of varying young stars. There may also be a component caused by underestimation of the reddening, as our approach to dereddening will not capture the small scale variations in dust obscuration near to young stars. While we cannot give a detailed explanation for individual displacements of CII from the isochrones, these probable causes would not invalidate our approach.
A 1 Myr Baraffe et al. (2015) interior with BT-Settl atmosphere ( − , − ) isochrone has a linear region between 0.13 to 0.35 M . We fitted a straight line to this region using an unweighted non-linear least-squares fit, with all data points lying within ±0.1 magnitudes of the line (see Fig. 1). Separate fits were performed for UKIDSS and 2MASS The UKIDSS and 2MASS photometry were dereddened as described in Appendix B. The dereddened ( − ) was used to calculate an expected ( − ) value from the straight line fit. The excess was then calculated as the difference between the dereddened ( − ) and the expected value. As the UKIDSS photometry has better spatial resolution, we took the UKIDSS results in preference to 2MASS where the photometry was available.
Away from the linear region our approach may not provide reliable results. In particular, the isochrones indicate that stars with > ≈ 0.8 M pile up on a line nearly perpendicular to the lower mass linear region. This means their observed location will be degenerate to age and mass. We do not consider this to be a serious problem, as these stars have both a ( − ) and ( − ) that is smaller than the majority of our CII training data. So we may miss some higher mass young stars rather than incorrectly classify older stars as CII. The effects of this degenerate region will be captured during training, when it can be expected to slightly weaken the feature. training set are grey dots. The black arrows are CII training set with reddening vectors, the start of each arrow is the observed location.

H excess
It has long been known that young stars frequently display H emission (Herbig 1958(Herbig , 1962, with an H equivalent width > 10 Å a defining characteristic of Classical T-Tauri stars. This emission is understood to originate in matter accreting from the disc on to the forming star via magnetically confined columns (Edwards et al. 1994;Hartmann et al. 1994). While this property is not unique to YSOs, an H excess determined with a narrow band filter is nonetheless a good indicator of stellar youth. We use the IGAPS survey, an updated version of the INT Photometric H Survey (IPHAS) used by Witham et al. (2008) and Barentsen et al. (2011) to successfully identify candidate YSOs. The IGAPS -band AB photometry was transformed to the INT photometric system by the method described in Appendix A4.
We define our H excess feature as the excess in ( − ) from dereddened ( − , − ) CCDs (see Fig. 2). The MS in these CCDs can be approximated by a straight line. Thus the ( − ) excess is simply calculated as the distance in ( − ) between the dereddened observed value and an empirical MS.
The empirical MS was derived using an unweighted non-linear least-squares fit to the IGAPS sources (see the top panel of Fig. 2). To minimize the effect of reddening, only sources with an ( − ) ≤ 0.01 were selected from our NGPn data set. To prevent sources outside the linear region from skewing the fit, the data were restricted to 0.2 < ( − ) < 1.3. The little hook of sources blueward of ( − ) = 0 correspond to the higher mass stars seen in the isochrones of Drew et al. (2005, figure 4). Outliers were removed by iteratively excluding sources greater than 0.1 magnitudes in ( − ) from the fit. After two iterations this gave the stable straight line fit Although there is no simple uniform reddening vector (Drew et al. 2005), the effect of reddening in ( − , − ) is to shift sources along the approximate direction of the MS. This can been seen in the second panel of Fig. 2, where reddened stars selected by 1.0 ≤ ( − ) < 2.0 follow the cyan reddening line (illustrated for ( − ) = 4). It can be seen that the low reddening sources with ( − ) ≤ 0.1 lie along the MS while the reddened sources tend to lie slightly beneath it. The reddened sources can be traced back along the reddening line to an ( − ) and ( − ) of around zero, corresponding to the higher mass stars. The third panel of Fig. 2 shows the same data, now dereddened by our method described in Appendix B. The scattering of sources along the reddening line even after dereddening illustrates that our dereddening is approximate. We discuss this in Appendix B2 where we explain how uncertainties in the Gaia parallaxes can lead to incorrect reddening values. The bottom panel of Fig. 2 is a histogram of the same low reddening and dereddened data from the panel above.
The peak of the dereddened data lies at an ( − ) of about -0.1 with nearly all sources within ±1.0. Hence the long tail to ( − ) > 1.0 of poorly dereddened data in the third panel contains very few sources. This incorrect dereddening is not a serious problem where ( − ) > 0 as they fall below the empirical MS line, making their ( − ) excess more negative. This could result in some excess sources being missed. It is more of a problem for ( − ) < 0 as these sources are incorrectly given a positive ( − ) excess. As described in Appendix B2, we flag sources with ( − ) ≤ −0.3 as suspect reddening and we do not include their H excess feature in the classifier calculation.

Isochronal Age
A YSO of a given temperature will have a larger radius than a MS star, and hence a higher luminosity. Therefore YSOs occupy a region above the MS in Colour-Magnitude Diagrams (CMDs). To convert this into a feature usable by the classifier, we interpolated the ages of sources from their location on CMDs relative to young stellar isochrones.
As YSOs emit strongly in the red and IR, we chose IGAPS ( , − ) CMDs for this feature. These CMDs have the benefit that the reddening vectors run approximately parallel to the line of the MS, translating primarily into an uncertainty on the mass with only a marginal uncertainty in age. As with the H excess feature, the IGAPS -band AB photometry was transformed to the INT photometric system by the method of Appendix A4. The IGAPS magnitude was corrected for distance using the Gaia EDR3 parallaxes. A set of isochrones (described in Appendix C) were created covering log 10 ( ) from 5.0 to 7.0, masses from 0.05 M to 3.00 M , and at reddenings from ( − ) = 0.00 to ( − ) = 5.95 in steps of 0.05. The unreddened isochrones are shown in Fig. 3. The photometry was matched to a set of reddened isochrones at or just below the reddening value assigned to the source using the STILISM reddening map (see Appendix B).
To keep the feature derivation straightforward, all sources were assumed to be single stars. While this assumption will be wrong for a large proportion of stars, to a limited extent the training will deal with the problem, since the training data should contain representative proportions of multiple stellar systems. The problem was also mitigated by the selection criteria of our NGPn data set. We required that all sources have good quality Gaia EDR3 parallax measurements. This selected sources that were consistent with a single star model in the Gaia pipeline. This will have removed a subset of multiple star systems where they were not resolvable into distinct sources by Gaia and yet their orbital motion caused quality issues in the Gaia pipeline.
While the isochrones gave good coverage of the YSO region in CMD space, the data points were at discrete values in ( , − ). To return an age and mass for a general point in ( , − ) we performed an interpolation for each set of reddened isochrones. The interpolation was calculated using a collection of Radial Basis Functions (RBFs) generated by the interpolate library with a thin plate function ( 2 ×log( ) where is the Euclidian radius) and zero smoothing. The isochrones started to overlap above 2.00 M , so the interpolation was limited to ≤ 2.00 M .
As interpolate can extrapolate outside of the isochrones, a series of quality flags were included in the results (see Table 3). These indicate whether the data fell outside the brightest, faintest, bluest and reddest of the isochrone bounds, and whether the interpolation result was above or below the age and mass range of the isochrones. Flags were also created for a backwards interpolation quality check, using another set of RBFs to interpolate the colour and magnitude from the interpolated age and mass. The backwards interpolation was only performed on data within the colour and magnitude bounds of the isochrones. When we checked the backwards interpolation using a regular grid of data points, most fell very close to their original ( , − ), though 0.6 per cent of sources with IGAPS photometry landed over 0.1 magnitudes from their input ( , − ). To allow the selection of good quality interpolations, a pair of flags indicate whether the back interpolated magnitude and colour fell within a tolerance of 0.1.

Gaia G-band variability
A significant proportion of YSOs show some form of variability, and it is likely that all go through periods of variability as they evolve. This behaviour can be grouped into the two broad categories of periodic and aperiodic (Carpenter et al. 2001). Periodic variations are typically due to stellar rotation, caused by surface features such as starspots and accretion hot spots (Herbst et al. 1994). Aperiodic variability can be caused by variable accretion (Herbst et al. 1994;Sergison et al. 2020) and variations in obscuration due to the circumstellar disc (Bouvier et al. 1999;Froebrich et al. 2018).
Over the course of its mission Gaia will make multiple observations (CCD transits) of each source, with new observations added in each release. This work uses data from EDR3, having an average of 376 -band observations for each source in the NGPn data set. The number of observations per source varies across the sky due to the Gaia scanning law. While the Gaia archive does not provide a measure of variability, it provides the standard error and the number of observations contributing to the -band photometry. These allow the reconstruction of the standard deviation of the -band observations. We define our variability feature as the observed fractional standard deviation of the flux, identical to the variability amplitude of Deason Table 3. The flags returned by the classifier when performing the isochronal age interpolation. The final column states whether the result was used in the likelihood calculation described in Section 3.3.5.

Flag
Description Generate likelihood brighter_min_iso_mag The source is >0.1 magnitudes brighter than the minimum of the isochrones.

No fainter_max_iso_mag
The source is >0.1 magnitudes fainter than the maximum of the isochrones.

No above_iso_col
The source is >0.1 magnitudes redder than the red end of the isochrones.

No below_iso_col
The source is >0.1 magnitude bluer than the blue end of the isochones. No good_except_back_interp_colour No issues except the colour from back interpolating is >0.1 magnitudes different.
No good_except_back_interp_mag No issues except the magnitude from back interpolating is >0.1 magnitudes different.

No above_max_iso_red
The reddening of the source is above the maximum reddening of the isochrones.

No older_max_iso_age
The derived age is above maximum age of the isochrones, 10 Myr.

Yes younger_min_iso_age
The derived age is below the minimum age of the isochrones, 100 kyr.

No above_max_iso_mass
The derived mass is above the maximum mass of 2 M of the isochrones.

No below_min_iso_mass
The derived mass is below the minimum mass of 0.05 M of the isochrones.

No good_classiiyso_fit
The source is a good fit to the young stellar isochrones with no issues. Yes where obs is the number observations (CCD transits) contributing to the -band photometry,¯is the observed -band mean flux, O is the standard error of the -band mean flux, and O is the standard deviation of the -band photometry.
TheˆO is a function of magnitude (see Fig. 4). There is a clear ridgeline in the plot where most sources lie. These are the stars with little or no intrinsic variability whoseˆO is dominated by the instrumental noise. Those with largerˆO are the variable stars. The locations of the two training sets in Fig. 4 match our expectations. In general the O training set will be composed of sources with low intrinsic variability. These tend to be on or near the ridgeline, with some scatter to largerˆO as the training set is expected to contain a variety of variable stars. While the CII sources are expected to be highly variable and are scattered across a range ofˆO values.
The variations inˆO with magnitude are driven by changes in the standard error for a source given in the Gaia catalogue. The general trend of the ridgeline to largerˆO with fainter magnitudes is due to the increase in photon noise. The other variations are rooted in the Gaia gate and window class configurations that define how the telescope observes sources across different magnitude ranges (Evans et al. 2017(Evans et al. , 2018. The changes inˆO for < 12 are due to gates reducing the exposure time for brighter sources. At around = 13 the window class changes from 2D for brighter sources to 1D for fainter sources. At = 16 there is a change in the window size, causing a subtle change in the slope of the ridgeline. These changes do not occur at precise magnitude boundaries as the star mapper CCDs estimate the magnitude of each source to decide the observation configuration (Evans et al. 2017). These star mapper estimates will naturally be less precise than the photometric measurements. Also, some of the intrinsically variable sources will move between gate and window classes as they vary. Since each source will be given a single mean -band magnitude in the Gaia archive, this will lead to blurring of theˆO behaviour at the boundaries.

NAIVE BAYES CLASSIFIER
Our goal is to create a classifier for producing a catalogue of candidate Class II YSOs. The catalogue needs to cover our NGPn footprint without gaps, unavoidable bias, and include sufficient data to allow us to understand the reasons behind individual probability assignments. To achieve these goals we chose to construct a naive Bayes classifier  similar to Broos et al. (2011Broos et al. ( , 2013, Li et al. (2020) and Tranin et al. (2022).
Naive Bayes classifiers are a kind of supervised machine learning. The results are a set of posterior probabilities for two or more classes. The inputs are a set of priors for these classes, and one or more features with likelihoods split by the classes. The features are measurements or derived properties, that are chosen for their ability to discriminate between the classes.
The learning process for Naive Bayes classifiers creates probability distributions by feature and class from the training data. These are referred to as the feature likelihoods. Our training approach for four of the five features defined the likelihoods from direct probability distributions of the training data. This gave robust likelihoods, even with relatively small data sets. The exception was the variability feature, where we created a model to generate probability density functions, with the model parameters set by the training data.
The power of the classifier comes from combining different features. Where multiple features reinforce a particular class, this increases the posterior for that class. Where different features favour different classes or do not give a strong result, then the posterior tends to be dominated by the prior. The classifier is called naive as it assumes no correlation between features. While this assumption is usually false, the inclusion of correlations is unlikely to fundamentally change the classification results, and leaving out correlations makes for a significantly simpler classifier.
With naive Bayes classifiers, it is the ratio of the feature likelihoods between classes that drives the posterior probabilities. This means the training work grows with the square of the number of classes. We settled on a simple binary classification of CII and O . As the likelihoods for a class are calculated from the training data for that class, the ratio of sources in the training sets does not need to match the ratio in the population. Instead the values of the priors encode the expected ratio of the classes in the population. Since the true number of Class II YSOs in the Galaxy is unknown, we chose the crude estimate for the priors of one Class II YSO in a thousand sources. We did not use a spatially varying prior with higher values in known star forming regions, as this would have biased our results to known regions.
As the true priors for our classifier are unknown, our choice of prior biases the posterior probabilities, meaning they are not literal probabilities. For example, a CII posterior of 0.8 does not mean a literal 80 per cent probability that the source is a Class II YSO. However, as we use the same priors for all sources, the rank of sources by posterior is identical to the rank by probability.

Mathematics of the classifier
We use the nomenclature s for posterior probability, r for the prior probability, and for likelihood (the probability of a feature measurement for a given class from the training data). The priors satisfy where is the class . To keep nomenclature concise we define the shorthand posterior probability as where is the data for feature . We define the likelihood of feature for class The naive Bayes classifier formula (Duda et al. 2001;Broos et al. 2011Broos et al. , 2013 for the posterior probability of class for a binary classifier where the other class is and there are features is The second version of the formula provides insight into the calculation, since it demonstrates that the power of the classifier comes from the ratios of the likelihoods between the classes. A useful aspect of naive Bayes classifiers is the likelihoods may take the form of pure probabilities or probability densities. The fundamental restriction is the likelihoods at any specific value of a feature must not blend a pure probability with a probability density. They must be either one or the other to maintain the integrity of the likelihood ratio. For our feature likelihoods we use both pure probabilities and probability densities, though we never mix them in a ratio calculation (see Section 3.3.1).

Missing data
Naive Bayes classifiers have a distinct advantage when it comes to missing data. They are able to simply bypass a feature if no data is available. Equation 8 indicates the approach. As the ratio of the likelihoods drives the calculation, any result that gives a likelihood ratio of one will have no effect on the posterior. If all features were missing then the priors would be recovered as the posteriors. This is precisely the result we require in the absence of data. Hence, setting the likelihoods of a feature to the same value where the data is missing removes that feature from the calculation. For simplicity we chose likelihood values of one where a feature has no data, the approach taken by Broos et al. (2013).
This has an important implication in how the features from multiple catalogues are combined. We do not need to limit our result set to the intersection of sources present in all the contributing catalogues. This gives us a much larger result set and allows us to add features without reducing the size of our result set. While it is beneficial to use as many features as possible, a posterior based on fewer than five features is valid. Our cap on the likelihood ratios prevents any single feature from overwhelming the prior. If only a single feature is present, this cap limits the posterior favouring CII to less than 0.5, regardless of the strength of result for that feature (see Section 3.3).

Feature likelihoods
The likelihoods are defined by probability distributions for each feature and class. We use either discrete distributions calculated by binning the training data, or continuous distributions from models whose parameters are set by the training data.
It is the ratio of the likelihoods between classes that defines the power of a feature to distinguish CII from O sources. To avoid any single feature dominating the results from all other features, we set bounds to limit the ratio of individual feature likelihoods. We define these bounds as the ratio of the priors and their reciprocal Were the Bayes classifier to be based on a single feature, this would lead the CII posterior being capped to As there is a large difference in the value of the priors, this can be simplified to

Choice of likelihood bins
The likelihoods for all but the variability feature were derived by calculating the proportion of sources in the training sets across a set of carefully chosen bins. To minimize the impact of small number statistics, a minimum of 10 sources per bin was imposed. The probability of a source having a value within any given bin was defined as the proportion of the training set within the bin. These were converted to probability densities using the bin widths, except at the end bins. The ends of the distributions are ill defined and sensitive to outliers, making it difficult to specify bin widths. Hence, the likelihoods in the end bins were left as pure probabilities. The same inner edges of the end bins were used for both classes, to ensure calculated likelihood ratios were always composed solely from probability densities or pure probabilities. This approach ensured likelihoods for any value of the features, even beyond the range of the training data. The inner edge of the end bins were defined by the tails of the distributions and our requirement for bins to contain at least 10 sources. Hence, the likelihood ratios in these end bins usually cover a wide range of the feature space with a single value for the ratio. This is not a serious problem, since they will also have the most extreme value for the ratio favouring CII or O , the exact behaviour that is required away from the overlap region. These are also the regions where the ratio may be capped, and hence it is not a great loss that the distributions do not capture a gradual change in the ratio away from the transition region.
The overlap region between the training sets is where the feature changes from favouring one classification to the other. If a feature is good at distinguishing between the classes, then the bulk of the training data will inevitably occupy different ranges of the feature space for the different classes. This can lead to a small region of overlap with a small proportion of the training data. While the O training set contains 151 645 sources, the CII training set is relatively small with 411 sources. Hence, the CII training is prone to suffering from a lack of sources in the overlap region. Since we impose a minimum of 10 sources per bin, this can lead to an overlap with very few bins or even a single CII bin. As the likelihoods in the transition region are probability densities, it is not necessary for the O and CII to be split into the same bins. To an extent this mitigates the problem, as the O distribution can be split into multiple bins even if there is only a single CII bin.
The likelihood ratios were manually reviewed. Where feasible, bin edges were tweaked to minimize large jumps in the ratio between neighbouring bins. Outliers could lead to regions where the likelihood ratio became unphysical. These were analysed and where necessary adjustments were made to the bins.

WISE likelihoods
A histogram of the WISE training data is displayed in the top panel of Fig. 5. This illustrates that there is a good level of overlap between the two training sets. The likelihoods and their ratios are provided in Table 4 and a plot of the ratios with their uncertainties is given in the bottom panel of Fig. 5. Since we enforced a minimum of ten sources per bin, we approximated the counting uncertainties on the likelihoods to be the square root of the number of sources. To calculate the uncertainty on the likelihood ratio we added the fractional uncertainties from each class in a given bin in quadrature.
When selecting the bins for the likelihood calculation, the requirement for a minimum of ten sources per bin resulted in the end bins occupying a significant proportion of the ( 1 − 2) range. This left 0.1 < ( 1 − 2) ≤ 0.5 for the transition region, containing 1 366 O sources and 47 CII. The large number of O sources enabled a gradual change in the likelihood ratio, with most CII bins covered by multiple O bins. Due to the power of the feature at large ( 1 − 2), the likelihood ratios were capped from the second to last bin, ( 1 − 2) > 0.38.  Table 4. The solid black line is the likelihood ratio, the grey dotted lines are the uncertainties, and the dashed blue line marks the cap on the ratio of the priors. Table 4. The WISE feature likelihoods and their ratios. The end bins are pure probabilities while the inner bins are probability densities. Note that probability densities can attain a value greater than 1. Where the ratio is greater than the ratio of the priors, the likelihoods are capped to the ratio of the priors.
( This capped the likelihoods for 1 568 sources and limited the true size of the transition region to 0.100 < ( 1 − 2) ≤ 0.380. The transition region contained 1.3 per cent of sources with the WISE feature.

(H-K) Excess likelihoods
The ( − ) excess feature has two sets of likelihoods, one for UKIDSS and the other for 2MASS. At first glance the number of CII with UKIDSS photometry quoted in Table 1 appears anomalously low. This is explained as UKIDSS does not provide full coverage of our NGPn footprint, missing the region containing the young cluster  Table 5.
Cep OB3b that contributes a significant proportion of the CII training set.
The training data from both surveys provide good overlap of the CII and the O probability distributions as can be seen in the top panels of Fig. 6 and 7. A small bump is present in the CII distributions of both surveys at a value corresponding to the peak of the O training data. This could indicate a slight contamination of the CII training set, or it may be an artefact due to the inability of the feature to distinguish CII from O for > ≈ 0.8 M , where the isochrones deviate from the linear appoximation used for this feature.
The values of the UKIDSS likelihoods are given in Table 5 and 2MASS in Table 6. Their ratios with uncertainties are displayed in the bottom panels of Fig. 6 and 7. Once again, our requirement for at least ten sources in each bin leads to the end bins occupying a significant proportion of the feature range. This results in a rapid change in the likelihood ratio across the remaining region −0.030 < ( − ) ≤ 0.070 for UKIDSS and a more gradual change across −0.090 < ( − ) ≤ 0.212 for 2MASS. The steps in likelihood ratios between neighbouring bins are more coarse for UKIDSS than 2MASS. The O data allow the single UKIDSS CII bin in the transition region to be split into four, though there is still a jump by a factor of 103 at the lower end and by 23 at the upper end of the feature values, compared to 19 and 5 for 2MASS.
Only the upper ( − ) end bin had its likelihood ratio capped. This leads to a cap on 7 491 sources, of which 1 866 are from UKIDSS and 5 625 are from 2MASS. There were 102 649 sources in the transition region, 1.6 per cent of sources with an ( − ) likelihood.

excess likelihoods
The CII training set has a broad range in the ( − ) offset feature space, completely covering the narrower range of the O training set (see the top panel of Fig. 8). We would expect all CII to have a positive ( − ) excess, though there are sources in the CII training set with negative ( − ) excess. This is most likely caused  Table 6.   Table 7. by incorrect dereddening. Our ( − ) ≤ −0.3 indicator for bad reddening is relatively crude and will not pick up all suspect values. We could reasonably expect incorrect reddening to affect CII more than O sources as they form in dusty environments. Hence, we treat the negative ( − ) excess with scepticism. To avoid unphysical likelihood ratios in bins splitting the negative ( − ) excess values, we set the lower end bin to cover all sources with ( − ) < 0, giving a likelihood ratio consistent with the trend to positive ( − ) excess. The values of the likelihoods and their ratios are given in Table 7. The ratios with uncertainties are displayed in the bottom panel of Fig. 8. The end bins occupy a significant proportion of the excess feature, resulting in a rapid change in the likelihood ratio across the remaining region 0.000 < ( − ) ≤ 0.121. This transition region contains 1 064 294 sources, 17 per cent of the sources with an excess likelihood. The training data allowed a good distribution of bins, avoiding large jumps in the likelihood ratio between neighbouring bins. The largest jump is a factor of 12 between the last two bins, though the capping of the likelihood ratio limited this to less than a factor of 2.
The likelihood ratio was only capped for the final bin, ( − ) > 0.121. This lead to 1 932 sources receiving capped likelihoods, 0.03 per cent of the sources with an excess likelihood.

Isochronal age likelihoods
Although the classifier returns an age based on the location of a source relative to young stellar isochrones, this age should not be taken at face value. The feature has inherent problems such as ignoring multiple star systems and the interpolation of the isochrones can give erroneous results. This is where the quality flags come in (see Table 3). The age was only used where the flags indicated a reliable result. This was defined as sources within the bounds of the isochrone colours and magnitudes, the backwards interpolation of colour and magnitude from the derived age and mass yielded values within 0.1 of the input colour and magnitude, and the resulting mass and age were within the limits of the isochrones. The sources with these reliable results were flagged as 'good Class II YSO fit'. A subset with ( − ) < 0.1 are shown in Fig. 9), demonstrating that as expected, their location lies above the MS.
It should be noted that some of the O training set fall within the young stellar isochones and are thus assigned ages consistent with YSOs. This is to be expected as some multiple star systems composed of MS stars will lie above the MS, occupying the region where single YSOs are found. Also, stars transition across the YSO region when they evolve off the MS. This is an inherent weakness of our single star isochrone approach, discussed in Sections 2.5 and 5.
Only a small fraction of the total sources in our NGPn data set occupy the young star region of the CMD. We therefore examined the data outside of the young stellar isochrones to see if robust likelihood ratios could be assigned to some of these sources. We found age extrapolation to younger or older values than the isochrone bounds gave robust and useful results even though the exact ages were meaningless. The older and younger flags were only assigned where the original photometry lay within the upper and lower bounds of the isochrone colours and magnitudes. These results were combined into bulk younger and older likelihood bins. However, there were only 2 sources flagged as younger in the CII training data. As we impose a minimum of 10 sources for a likelihood calculation, the younger bin was not used. The grey crosses of Fig. 9 are the subset of sources flagged as older, with only sources with ( − ) < 0.1 shown to avoid confusion due to reddening. Their location is consistent with an older age as they lie below the red dots of the young stars, and the white dwarfs jump out as a separate group in the bottom left corner.
Once sources with suspect reddening were removed, this left 5 939 814 sources with an Isochronal Age likelihood, 95 per cent of those with IGAPS photometry and 74 per cent of our NGPn catalogue. This was split 849 978 classified as a 'good Class II YSO fit' and 5 089 836 flagged as 'older than the maximum isochronal age'. The high proportion of sources classified as a 'good Class II YSO fit' is not telling us there is a high proportion of young stars. This simply tells us they occupy the region of the CMD where single young stars are found. This same region will include multiple MS stars and stars evolving off the MS. There were 64 347 sources flagged as 'younger than the minimum isochronal age' that we were unable to use.
The top panel of Fig. 10 shows the distribution of the training data flagged as a 'good Class II YSO fit' or 'older than the maximum isochronal age'. The bulk of the CII lie at 5.6 < 10 ( ) < 7.0, with a peak at 6.4 < 10 ( ) < 6.6 though 18 per cent are Older than max isochrone Good Class II YSO¯t Figure 9. An ( , − ) CMD of the sources in the NGPn data set used in the Isochronal Age feature. The magnitude is distance corrected using the Gaia EDR3 parallax. The red dots are sources flagged as a 'good Class II YSO fit' to the young stellar isochrones with an ( − ) < 0.1. The grey crosses are sources flagged as 'older than the maximum isochronal age' with ( − ) < 0.1. flagged as 'older than the maximum isochronal age'. The O exhibit increasing numbers with increasing age. Defining probability densities for 10 ( ) would be nontrivial. Instead we imposed identical bins on the two classifications, allowing the likelihoods to be defined as pure probabilities. The likelihoods and their ratios are given Table 8, and the bottom panel of Fig. 10 shows the ratios with uncertainties. The likelihoods were calculated as the proportion of the training data flagged as a 'good Class II YSO fit' or 'older than the maximum isochronal age'.
It can be seen in the bottom panel of Fig. 10 and Table 8, that the likelihood ratios favouring CII peak at a 10 ( ) of 5.6 to 5.8 with a ratio of 31.8. The sources given a 10 ( ) > 7.0 slightly favour the O classification. None of the likelihood ratios give a strong result. This feature therefore has a weak affect on the posterior when compared with the more extreme likelihood ratios of the other features.

Gaia G-band variability likelihoods
As the variability feature (ˆO) exhibits a complex behaviour, we constructed a model forˆO to generate the likelihoods. This model captures both the measurement scatter and the variations due to intrinsically variable stars. The two classifications of O and CII represent separate populations of stars with different proportions and types of variable stars. Hence, the model parameters related to intrinsic variability took separate values for the classes, determined from the training data.
To model the measurement scatter, we consider a theoretical star with no intrinsic variability but an instrumental uncertainty I (noise). That instrumental uncertainly includes both photon noise and instrumental systematics. For a series of observations of this star  Table 8. we define the root mean square deviation of the observations about the mean flux as O . Since the series of observations is a finite sample, O will only approximate I (they would become equal for an infinite set of observations). In Appendix D we demonstrate that for multiple sets of observations of this star (or sets of observations of other identical stars) the observed fractional variance (ˆ2 O ) follows a 2 distribution. The 2 model ofˆ2 O has two parameters, the number of observations obs (equal to the number of degrees of freedom) and the fractional uncertainty for a single measurementˆI, we expect I to be a function of alone. For our model we transform the fractional variance into the fractional standard deviation space of our variability feature, via the method in Appendix D. In Fig. 11 we take a slice through Fig. 4 at 16.5 ≤ < 16.6 to examine the ridgeline. We also plot a 2 distribution with parameters derived from the O training data, transformed into the fractional standard deviation space. This gives an excellent fit to the data around the ridgeline at low variability but it is too narrow, not capturing the long tail to higher variability. This is because the 2 distribution is purely modelling the instrumental noise, and this long tail is due to variable stars in the data. We conclude the 2 distribution is a reasonable model for the measurement scatter in fractional variance space, but a component for variable stars needs to be included in the model. As the majority of YSOs show some form of variability, compared with only a fraction of the general stellar population, we chose an astrophysical variability model based on the behaviour of the CII training data. Peña et al. (2019) demonstrated the variability amplitude of Class II YSOs follows a decreasing exponential trend with fewer stars exhibiting large amplitude variations. We found the distribution ofˆO for the CII training set was well modelled by such an exponential function (Equation D9) as can be seen by the blue dashed line of Fig. 12. However, there is an additional component that shifts intrinsically non-variable sources to small non-zeroˆO. This is the instrumental noise modelled by the 2 distribution. Hence, the probability density function (PDF) for the variable star distribution is calculated by convolving the 2 function with the exponential function. The green line of Fig. 12 shows the combined CII variability model with components for both the instrumental noise and intrinsic stellar variability. The model parameters had to be chosen for specific values of and obs , so we used the representative values 17.0 ≤ < 17.1 and obs = 324. Due to the lack of CII training data, no restrictions were placed on the magnitude or number of observations for the red dotted line of the data. Hence the green model line is not expected to be a perfect representation of the dotted red data line, though it can be seen it mimics the key features of the distribution.
The full model is a combination of the variable star PDF convolved with the 2 function, summed with the pure 2 function for the non-variable stars. The relative fraction of non-variable stars forms another model parameter, to normalise the summed variable and nonvariable star PDFs. A full mathematical derivation of this model is given in Appendix D.
Taking the model behaviour and data scatter into consideration we decided to limit the applicable range of the model to 13.2 ≤ < 18.0, retaining 97 per cent of the NGPn data. We set the lower limit to 13.2 rather than 13.0 since data at precisely = 13.0 are likely to be a blend of 1D and 2D window classes, and there is a bump inˆO around this transitional magnitude.

Variability model parameter selection
TheˆO model was split into sub-models for the CII and O classifications. The parameters for these models were the number of -band observations ( obs ), the instrumental noise (ˆI), the exponential scale factor ( scale ), and the fraction of non-variable stars (ℎ). These parameters were determined by fitting the probability density functions ofˆO using a 1D 2 maximum likelihood statistic assuming zero uncertainties (Naylor & Jeffries 2006).

Variability model O parameter selection
To calibrate our O model parameters we took a similar approach to Vioque et al. (2018), who calibrated their Gaia variability indicator to all Gaia sources within ±0.1 magnitudes, whilst we fit our O parameters in bins of 0.1 magnitude in . We started by examining the fit of our model parameters to the O training set in both and obs . We split the data into three subsets 300 < obs ≤ 400, 400 < obs ≤ 500 and 500 < obs ≤ 600, and magnitudes bins of 0.1. To begin with the fit was performed withˆI, eff (effective number of observations, it will be explained eff ≠ obs ), scale and ℎ as free parameters.
It was immediately apparent the Gaia obs did not provide a good fit to the model. The ratio of obs from the Gaia archive to eff fitted to the model is plotted in Fig. 13 for the middle subset 400 < obs ≤ 500, with the other two subsets exhibiting similar behaviour. The ratio has a large scatter but is always significantly less than one, indicating the fitted eff is less than the quoted obs . There is also a clear dependence on to brighter magnitudes. This means the average of obs is not following the normal √ behaviour. The average is being affected by a systematic in the Gaia processing pipeline that we are failing to capture. Regardless of the exact cause, it was clear that using the Gaia archive obs as an input to our model could lead to poor quality results. Hence, we decided to create a function for eff as the input to the model. At magnitudes brighter than = 16.0 the function would be dependent on both obs and , OTHER training data 400 < N obs · 500 N e® divided by average N obs Figure 13. Illustration of the relationship and scatter between the effective number of observations and the number of observations from the Gaia archive, and how this changes with with -band magnitude. The data are the subset of the O training data with 400 < obs ≤ 500 in bins of 0.1 magnitude. The green dots are the value of eff divided by the average obs in the bin. The black pluses are the value of eff from Equation (12) divided by the average obs for the bin.
while at fainter magnitudes there is no clear dependence on so we decided this region would be dependent on obs alone. The change in behaviour at = 16.0 may be connected to the change in Gaia Window size at this magnitude (Evans et al. 2017(Evans et al. , 2018.
We decided to adopt a simple function for eff , as the scatter in the fits makes it difficult to justify a complex function. We started with the function for 16.0 < < 18.0 as this would be dependent solely on obs . We chose a power law as this gave a consistent fit across the three obs subsets. The power was determined by dividing the logarithm of fitted eff by the logarithm of the average Gaia obs by magnitude bin, and finally averaged across the magnitude range. The values of the power across the three ranges were 0.94 for 300 < obs ≤ 400, 0.94 for 400 < obs ≤ 500, and 0.92 for 500 < obs ≤ 600. While there may be a slight hint the power decreases with increasing obs , this is small compared to the scatter in the data and risks over fitting a weak relationship. For the model we used a single valued parameter calculated from the subset 350 < obs ≤ 450 containing 65 per cent of sources in the O training set, giving a value of 0.93 for the power.
As the brighter observations showed an increasing relationship with containing a lot of scatter, we took the product of our fit to the fainter region with a linearly increasing function of for this brighter region. The trend appears to be heading for zero eff at about = 13.0, noting the applicable range of our model is limited to 13.2 ≤ < 18.0. Thus we adopted the straight line from eff = 0.
The fit of this function to the subset 400 < obs ≤ 500 can be seen in Fig. 13.
Having set eff to the output from Equation 12, we now fitted the PDFs with the 3 remaining free parameters. Our next step was to estimate the instrumental noiseˆI by -band magnitude. The O training set was suited to this task as it was deliberately located in a bland region, where we would expect non-variable stars to dominate.
To fit the instrumental noise parameter we again used the O  Figure 14. The values for the instrumental noise (ˆI) by Gaia -band magnitude found by fittingˆO from the O training data restricted to 350 < obs ≤ 450 with a 1D 2 maximum likelihood statistic assuming zero uncertainties. subset 350 < obs ≤ 450, containing two thirds of the training data. This minimized any small effects from varying obs in the model fitting. The resulting instrumental noise can be seen in Fig. 14. There is a little scatter at brighter magnitudes where there is more spread in the data, and a slight change at around = 16.0 corresponding to the change in the Gaia Window Class (Evans et al. 2017(Evans et al. , 2018. We used these values to assignˆI at a resolution of 0.1 magnitudes.
To investigate the exponential scale factor and fraction of nonvariable star parameters, we performed a fresh 1D 2 fit using the O subset 350 < obs ≤ 450 in bins of 0.1 magnitudes. We set eff to the derived function of obs ,ˆI as the set of values in Fig. 14, and left the exponential scale factor and the fraction of non-variable stars as free parameters.
The exponential scale factor showed an increasing non-linear relationship with , or alternatively an increasing linear relationship with the instrumental noise. As the relationship to the instrumental noise was simplest, we decided to model this linear relationship for the exponential scale factor parameter (see Fig. 15). We carried out an unweighted non-linear least-squares fit to derive the straight line As it only takes one spurious photometric measurement to increase the variability measure, the tail of the exponential distribution may be contaminated by sources with little or no intrinsic variability but a single bad data point in their light curve. While this is an inherent weakness in our approach, the quality of the Gaia data means we should only expect a tiny number of spurious measurements. The final parameter for the O model was the fraction of nonvariable stars, ℎ. We again performed a 1D 2 fit to the O subset 350 < obs ≤ 450 in bins of 0.1 magnitudes. This time the only free parameter was the fraction of non-variable stars, with the other parameters set as per the values and functions we found. The fit exhibited a complex behaviour with (see Fig. 16). At the brightest magnitudes it appears all stars are variable to some extent. This is a reasonable result, since all stars exhibit low level variability and the instrumental noise will be very small at these bright magnitudes. That the relationship is not a monotonic function is a little concerning. It may be due to the model imperfectly matching the systematics in the instrumental noise, though we do not have a good physical explanation. We decided to fit a polynomial limited by upper and lower bounds to avoid it growing beyond the data values at the extremes. An unweighted non-linear least-squares fit to a third  (13) is in black. order polynomial was performed, giving the function with bounds (14)

Variability model CII parameter selection
As the number of observations and instrumental noise parameters should be independent of the type of source, we used the same relations found for the O model for CII. We would expect the fraction of non-variable stars and the exponential scale factor to take on different values, so we performed a dedicated fit using the CII training data.
As with the O parameter fitting, we restricted the data to the range 350 < obs ≤ 450, to limit the changes due to different numbers of observations. We performed a series of 1D 2 fits by bins of whole magnitudes except for the brightest magnitude bin 13.2 ≤ < 14.0. The instrumental noise and effective number of observations were set to the values found for the O training data at the mean values for the magnitude bins. This consistently gave a tiny number for the fraction of non-variable stars, and we did not find any meaningful relationship for the exponential scale factor within the uncertainties of the fits, with values of scale ranging from 0.10 to 0.20. So we performed a 1D 2 fit across the entire magnitude range. This gave the single value of 0.137 for the exponential scale factor that we took for the CII parameter. We adopted a value of zero for the fraction of non-variable stars, consistent with the fit as well our expectation that the majority of young stars will show some form of variability.

Variability model summary
Thus we have a semi-analytical model for our -band variability feature. For a given and obs it provides the probability of observing a givenˆO (and hence uncertainty in ). A comparison of the O and CII models is shown in Fig. 17. To retain a sufficient quantity of data for a meaningful histogram there was no restriction placed on obs . It can be seen that the model follows the O training data reasonably well. AboveˆO ≈ 0.5 there is very little O data, resulting in gaps in the histogram and jumps to high values caused by single data points. To have sufficient sources for the histogram, it was necessary to accept CII in the range 16.0 ≤ < 18.0 with no restriction on obs , retaining 249 sources. The peak of the CII training data is to slightly higherˆO than the model. The accurate location of this peak is important for capturing the behaviour of the O training data but is less critical for the broad low peak of the CII. The important behaviour of the CII is the slow decline of sources with increasingˆO compared to the O sources, and this is accurately captured by the model. The ratio of the likelihoods is shown in the bottom panel of Fig. 17. We place a limit on the likelihood ratio, to cap it before the model diverges significantly from the data.

RESULTS
The full classifier catalogue contains 8 080 045 sources. This includes 29 166 sources without any valid feature likelihoods, that simply return the prior for the posterior. We provide a data dictionary for the published catalogue in the supplementary material.

Classifier performance
We split our samples of known CII and O into training and test sets in the ratio 80 : 20. The test set was used for an independent check on the classifier performance.
The CII test set contains just 91 sources, compared to 37 911 in the O test set. Hence, the size of the CII set limits the precision that can be achieved by the performance statistics. Also, even a small contamination of the CII set would have a significant impact on the statistics, noting undetected contaminants would make the classifier look worse not better.

Recovery of the training data
A useful initial step is to review the recovery of the training data by examining the posteriors with a cumulative histogram (see Fig. 18). While this is not an independent test, it is a useful check on the There are a small number of sources for each training set with a low posterior for their class. This is not surprising and will be due to a mixture of misidentification by the classifier and contamination of the training data. The O training set has an extremely low level of misidentification at just over one in six thousand by s (O ) ≤ 0.5. In contrast, the CII data has a higher level of misidentification at around one in seven by s (CII) ≤ 0.5. This difference in the sets is partially explained as we are expecting a population of around one CII in a thousand sources. Hence, even by random chance we would expect the O training set to have an extremely low level of contamination.

Confusion matrices
A confusion matrix is a tool for analysing the performance of a classifier via a 2-dimensional matrix. The 'true' classifications take up one axis and the predicted classifications the other. This makes it simple to assess the success of the classifier at placing objects into their correct classifications. For a perfect classifier, all sources would be placed in elements along the leading diagonal, with no sources off the diagonal. The drawback of a confusion matrix for our naive Bayes classifier is they require binary true/false classifications rather than posteriors with a continuum of values between zero and one. To get around this we specify a threshold CII posterior as the binary cut-off for classifying candidate CII as true/positive and O as false/negative. As this will only capture the classifier performance at a single threshold posterior, we calculate matrices at two thresholds. This gives a broader understanding of the classifier performance  sources as CII and we shall refer to this as our 50k set. To illustrate a set with an expected low contamination rate we chose a threshold of s (CII) > 0.5, predicting 6 504 sources as CII. This set has a further quality advantage as it is only possible to have a s (CII) > 0.5 where two or more features contribute to the posterior and at least two favour a CII classification. These confusion matrices are displayed in Table 9.
The binary positive/negative results used to create the confusion matrices can be used to calculate a variety of standard statistics. In the confusion matrices of Table 9 we have the true positives (TP), the false positives (FP), the true negatives (TN), and the false negatives (FN). Combining these allows us to generate proportional statistics, some of which are included in the proportional confusion matrices of Table 10.
The precision of these statistics is limited by the finite size of the test sets. The small size of the CII test set means small number statistics limit the precision. By taking ten random samples of half the sources from the test sets, we were able to analyse how Poisson The true positive rate (TPR) or recall rate, tells us the correctly classified fraction of the test set. Our 50k set has a very high TPR of 0.98 (0.96-1.00) while our more conservative s (CII) > 0.5 set has a rate of 0.87 (0.82-0.93). So both find the majority of the Class II YSOs from our test set, with the 50k set finding nearly all of them.
The false negative rate (FNR) tells us the fraction of CII the classifier will miss as they are being misclassified as O . It has a very low value of 0.02 (0.00-0.004) for the 50k set. While the s (CII) > 0.5 set has a higher but none the less small value of 0.13 (0.07-0.18).
The false positive rate (FPR) tells us the fraction of O sources that are being misidentified as CII by the classifier. The 50k set has an FPR ten times larger at 0.0021 (0.0017-0.0026) than the s (CII) > 0.5 set at 0.0002 (0.0001-0.0003). This indicates the 50k set has a much higher contamination rate.
As our test and training sets are non-overlapping selections from master labelled data sets, these confusion matrices are telling us how well the classifier represents the training data. The low values for the FNR and FPR, with high values for the TPR and TNR imply our classifier results for the test set are accurately following the training data. The TPR and FPR are good at illustrating the key differences between our two example selections.
These figures should be taken as indicative, importantly they include systematics from the way we compiled our labelled data. For example, any impurities in our labelled data will affect these statistics. While we can have confidence the majority of our CII are young stars, it is plausible that young stars at other stages of their evolution such as Class III YSOs may be contaminating the labelled set. It is also possible our O set contains a low level contamination of young stars including Class II YSOs. There may be systematics that vary with Galactic location, such as effects on the features from interstellar dust, or our training data may not accurately reflect the populations across the entire survey region. While these are difficult to quantify, our comparison to other classifiers in Section 4.3 provides an independent assessment of our classifier.

ROC curve
To analyse the behaviour of the classifier across the full range of posteriors we use a ROC (Receive Operating Characteristic) curve (see Fig. 19). This shows how the TPR varies with the FPR. The solid coloured line is the result for our classifier and the dotted black line is random chance. Our classifier performs well as it is always far above the random chance line and has a steep rise to a TPR of nearly one for small FPR. A perfect classifier would rise vertically to a TPR of one at an FPR of zero.
The log axis for the FPR brings out the detail at low FPR. This is important for our work, since even a low level FPR can overwhelm a population of correctly classified CII with incorrectly classified O . If our prior is accurate, then we should expect around 1 CII for 999 O , so the critical region in assessing our classifier is an FPR of around 10 −3 . At an FPR below 5 × 10 −5 the classifier misses about two thirds of the CII in the test set. The success of the classifier at identifying CII increases in a couple of steps until it identifies most CII in the test set by 10 −3 . The classifier results are coloured in a log scale of 1 − s (CII). This brings out the change in posterior CII at high values of the posterior. At s (CII) = 0.9 the TPR is 0.84, and by s (CII) = 0.999999 the TPR drops to 0.35.
The area under the ROC curve (AUC) is another measure of the classifier performance. An area above 0.5 indicates better than random chance while 1.0 would be a perfect classifier. The AUC for our classifier is at least 0.99, with an indicative range of 0.998-1.000 calculated by taking ten random half size samples from the test data. The precise value of the AUC is uncertain due to the unknown level of contamination in the labelled sets. The size of the CII set is more important in this regard, as small number statistics are a factor.

Classifier catalogue
The classifier results take the form of posterior probabilities of CII and O . To create a list of candidate CII a threshold posterior must be chosen to suit the needs of the task at hand. When doing this, it is important to understand the posteriors cannot be taken at face value as genuine probabilities, since they are heavily influenced by the choice of priors (see Section 3). It is also useful to review how many and which features have been used to classify sources. A source where multiple features have been used in the classification might be considered more robust than a source classified by a single feature. Due to the bounds we placed on the likelihood ratios, sources classified by a single feature are limited to s (CII) ≤ 0.5. There are 503 320 posteriors based on a single feature.  A histogram of posterior CII with the normalised count on a logarithmic scale is presented in Fig. 20, showing a bulk shape similar to a U. This U shape demonstrates the classifier is working well. Objects are being swept to a classification of either O or CII, with fewer left with an ambiguous classification in the middle. Due to the larger number of O sources compared to CII, corresponding to s (CII) ≈ 0, the relative count is around a thousand times larger than at s (CII) ≈ 1. The spikes are due to the discrete nature of the four feature likelihoods based on the distributions of the training data, and the bounds placed on the maximum likelihood ratios. The largest spike at s (CII) = 0.5 is due to sources classified by a single feature, where the likelihood ratio is capped by the upper bound favouring CII.
The number of features used to classify sources is summarised in Table 11. It can be seen that 80 per cent of our NGPn catalogue have robust posteriors calculated from a combination of at least three features. Less than 1 per cent of sources are not classified by any feature. Their results should be ignored since the posterior will simply return the prior.

Identified star forming regions
Plots of high posterior CII in Galactic coordinates, and Galactic longitude with distance, are useful for reviewing the spatial distribution of candidate Class II YSOs. We chose a threshold of s (CII) > 0.5 giving 6 504 candidate Class II YSOs. This threshold shows many concentrations of candidates while keeping a background of scattered sources to a low level. If the classifier is working well then we would expect known star forming regions to be well represented, and this is precisely what we see in Fig. 21. In Galactic coordinates there are several very obvious knots of candidate CII, many of which coincide with well known star forming regions, and a background of scattered candidates that increases towards direction of the Galactic centre. The bottom plot of Fig. 21 shows Galactic longitude combined with distance by inverting the Gaia EDR3 parallax. It is a testament to the quality of the Gaia parallaxes that many of the same knots in Galactic coordinates also show as concentrations with distance. At around 1 kpc there is a transition with closer sources clearly concentrated into knots, while more distance groupings appear smeared in distance. This indicates the uncertainty in the Gaia parallaxes start to dominate the measurements by around 1 mas.
A similar useful diagnostic is the TOPCAT (Taylor 2005) weighted mean of all posterior CII in our NGPn catalogue (the top plot of Fig. 22). These have many features in common, though not all. This plot and the individual feature plots are discussed in more detail in Section 4.2.2.
The middle plot of Fig. 21 identifies many of the star forming regions visible in our s (CII) > 0.5 data set, noting the identifications are not exhaustive. We briefly describe these regions.
The Serpens molecular cloud including the HII region W40 is located at ≈ 30 and ≈ 3.5 at a distance of 380-480 pc (Herczeg et al. 2019). This does not show up in the s (CII) > 0.5 Galactic coordinate plot but does appear as a slight over density at 0.5 kpc in the Galactic longitude with distance plot. Interestingly it is visible in the weighted mean posterior CII of Fig. 22. This region has a paucity of sources in the master catalogue that is likely caused by dust obscuration. Hence, the weighted mean posterior plot is capturing a larger than average number of high probability CII while the number of sources are insufficient to show up as an over density in the s (CII) > 0.5 set in Galactic coordinates. There is a linear grouping of candidate CII at ≈ 53.5 and ≈ 0.0. This location corresponds to the infrared dark cloud (IRDC) G53.2, where Kim et al. (2015) found 302 young stars including 129 CII.
There is weak diagonal line of candidate CII in Vulpecula around 55 < < 57 and 1.5 < < 4.0, showing up well in the distance plot at around 0.5 kpc. Our s (CII) > 0.5 set has 83 candidate CII in this region at this distance, with 14 found in the Zari et al. indicating the cluster is too old to contain YSOs. Hence, this may be a new association which we label as New SFR 1, first picked out by Zari et al. (2018).
Cygnus X (Piddington & Minnett 1952) is composed of a large molecular cloud complex with active star formation including several OB associations, covering 77 < ≈ < ≈ 82 around the Galactic midplane at a distance of approximately 1.7 kpc (Schneider et al. 2006). There are clearly many high posterior CII in this region at around this distance.
The North American Nebula (NGC 7000) and Pelican Nebula (IC 5070) at ≈ 85 and ≈ −1 are a star forming complex referred to as NANeb by Guieu et al. (2009). In the distance plot we can see this region is around 0.7-0.8 kpc from the Sun, consistent with the older distance estimate of 0.6 kpc from Laugalys & Straižys (2002). It appears this may be part of a larger structure starting from 0.6-0.7 kpc at ≈ 90 and reaching 0.8-1.0 kpc by ≈ 80.
There is an obvious grouping of candidate CII at the location of the HII region IC 1396 and associated young cluster Trumpler 37 at ≈ 99 and ≈ 3.7 and a distance ≈ 0.9 kpc (Contreras et al.  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200  210 Kharchenko et al. (2013). Cluster 30 from Bica et al. (2003) is the only cluster young enough to contain Class II YSOs, though with a radius containing half its members of 1.4 (Cantat-Gaudin et al. 2020) and a location about half a degree away from the grouping's centre it seems unlikely to be a match. Alessi-Teutsch 5 is a little old if the age of 19 Myr is correct, though it has a larger half radius of 0.43 • (Cantat-Gaudin et al. 2020) and there are three matches to this cluster from our 39 candidate CII. Hence, this may be a newly discovered young cluster or star formation associated with Alessi-Teutsch 5 and cluster 30 from Bica et al. (2003), which we label as New SFR 2. Following a roughly linear feature of star formation leads us to the young cluster Cep OB3b at ≈ 110.0 and ≈ 2.5 at a distance ≈ 0.7 kpc (Moscadelli et al. 2009;Allen et al. 2012). This cluster stands out clearly with halo of candidate CII.
There is a group of 23 candidate CII at ≈ 127, ≈ −1 and a distance ≈ 0.9 kpc. This location is very close to Froebrich et al. (2007)  There is also the dark cloud LDN 1317 at = 126.7 and = −0.8. We were unable to find further reference to this location in the literature, so this may be a newly discovered young cluster, which we label as New SFR 3. The HII regions W3 / W4 / W5 form a chain in the Perseus arm (Lada et al. 1978) covering approximately 132 < < 138 centred around ≈ 1 (Heyer & Terebey 1998) with active star formation in and around these clouds (Carpenter et al. 2000). The distance to these clouds is at the limit of our survey ≈ 2 kpc (Carpenter et al. 2000), though we nonetheless see over densities of young stars in this area.
At ≈ 148 and ≈ −0.4 is another little grouping of candidates. In this area lies the emission nebula Sh 2-205 (Straizys & Laugalys 2008) at a distance of around 0.9 kpc (Fich & Blitz 1984). The distance from the inverted Gaia EDR3 parallaxes indicates all these candidates lie at around 1 kpc, consistent with these older estimates.
There are some groupings of CII candidates in Auriga at around ≈ 173-174 and ≈ -1-3. They are smeared out in distance from 1-2 kpc, concentrated to the upper end of this range. At ≈ 173.4 and ≈ −0.2 is the young cluster Stock 8 with median age ≈ 3 Myr (Jose et al. 2017) and a distance 2.05 ± 0.10 kpc (Jose et al. 2008). At ≈ 173.9 and ≈ +0.3 is the young cluster NGC 1931. Pandey et al. (2013) found a mean YSO age of 2 ± 1 Myr with a distance of 2.3 ± 0.3 kpc for this cluster. G173.58+2.45 is a young cluster located at ≈ 173.6 and ≈ +2.4. It appears to be associated with the surrounding HII regions Sh 2-231, Sh 2-232, Sh 2-233 and Sh 2-235 whose distances vary from 1.0 to 2.3 kpc (Shepherd & Watson 2002). All of these clusters are around the distance limit of our catalogue. This is consistent with the spread of distances we see for our CII candidates, since the individual Gaia EDR3 parallaxes are becoming dominated by their uncertainties.
At ≈ 190.0 and ≈ 0.5 is the open cluster NGC 2175 and the HII region NGC 2174. Bonatto & Bica (2011) found this cluster and HII region to be part of a star forming complex with multiple young clusters at a distance of 1.4 ± 0.4 kpc, consistent with our distance plot. Montillaud et al. (2019b); Bhadari et al. (2020) consider the young cluster NGC 2264 (l=203, b=2), Mon OB1 and the reflection nebula filament Mon R1 (201 < < 202, 0 < < 1) containing IC 446, IC 447, NGC 2245, NGC 2247 to be a large related star forming complex. Sung et al. (1997) found the distance to NGC 2264 to be 760 pc using the distance moduli of 13 B-type stars in the cluster. This figure has been adopted by Montillaud et al. (2015Montillaud et al. ( , 2019a; Bhadari et al. (2020) for Mon OB1 and Mon R1. This distance is at the upper end of what we find for our candidate CII in this region, but nonetheless supports the notion this is part of a single star forming region.
A group of candidate CII are clearly visible in the vicinity of the Rosette nebula and the young cluster NGC 2244 ( ≈ 206.3, ≈ −2.1). Lim et al. (2021) estimate the age of NGC 2244 at 2 Myr and found other groups of young stars around the edges of the Rosette. They obtained a distance to the cluster of 1.4 ± 0.1 kpc from the Gaia EDR3 parallaxes. Unsurprisingly this is a good match to our distance with Galactic longitude plot.

Feature analysis
To analyse how different features influence the posterior results, we created a series of Galactic coordinate plots in TOPCAT (see Fig. 22). The top plot is the posterior CII followed by plots of the likelihood ratio (CII)/ (O ) for the five features. The weighted mean is used to colour the plots as it removes the effect of changing source density with coordinates, allowing regions of higher posterior and likelihood ratios to be picked out. The posterior CII plot uses a square root colour range to bring out star forming regions. The feature plots use a fixed linear range to allow easy comparison of the contributions by different features.
Many of the known star forming regions identified using the s (CII) > 0.5 data set in Fig. 21 are also visible in the weighted mean s (CII) top plot of Fig. 22. Such as the Cygnus star forming complex at ≈ 80, Cep OB3b is obvious at ≈ 110, and the region 200 < < 210 of star forming complexes around NGC 2264 and the Rosette nebula.
The variability featureˆO (second from top in Fig. 22), shows a strong set of results matching known star forming regions. There is also a general background of higher likelihood ratio sources. The variability feature would be expected to pick out variable stars of many types, not just Class II YSOs and other classes of young stars. Hence it is not surprising that a background population of variable stars are visible.
The WISE ( 1− 2) plot (third from top in Fig. 22), also indicates good results by picking out some known star forming regions. There are fewer sources with the WISE feature towards the Galactic centre as the increased source confusion reduces the number of sources with a reliable match. This allows a few individual sources with a strong WISE result to show up in the weighted mean close to the Galactic plane where < 65. There is an interesting over density of higher likelihood ratios favouring CII at ≈ 189 and ≈ 3 that does not correspond to strong results in the other features. It is caused by 78 sources with a WISE likelihood ratio above 50. Only 4 of these sources have (CII) > 0.5, though 31 have (CII) > 0.0109 (threshold of our 50k set), confirming the other features do not strongly reinforce the WISE results. This location corresponds to the supernova remnant IC 443, where Su et al. (2014)  Only 4 of these have (CII) > 0.5 and 10 have (CII) > 0.0109 from our catalogue. This is an intriguing result. As WISE is our longest wavelength feature, with weaker results from our other features, this might be an indication that many of the YSOs in this region are younger than our Class II targets.
The ( − ) excess feature plot (third from the bottom in Fig. 22), shows a wealth of structure where the ratio of the likelihoods favours CII. Many of these correlate with structures found in the s (CII) plot. The section for < 60 displays more structure and a stronger CII signal than in any of the other plots. There is a loose correlation between this structure and regions with higher reddening, indicating these structures may have some relation to interstellar dust. By itself this does not mean all the structures relate to star formation, though the pronounced structure at ≈ 30 and ≈ 3.5 corresponds to the Serpens molecular cloud. The ( − ) feature is a mix of UKIDSS and 2MASS, with UKIDSS of higher quality but missing data for 107 < < 141 as UKIRT is unable to observe in that direction. It is encouraging that this lack of UKIDSS data does not show up in this plot, though we were able to see it by stretching the colour axis.
The ( − ) excess feature is second from bottom of Fig. 22. Due to the IGAPS survey limits, the results only extend across 30 < < 215. The feature shows a weaker signal than some other features, though there are clear structures favouring CII that coincide with known star forming regions. These include the star forming complexes in Cygnus, Cep OB3b, NGC 2264 and the Rosette nebula. There are others that are unique to the feature, such as groupings favouring CII at ≈ 67 and ≈ 3, and ≈ 99 and ≈ 1, as well as an arc centred on ≈ 53 and ≈ −1. The grouping at ≈ 67 and ≈ 3 has a rectangular shape with edges aligned with the IGAPS images, indicating it must be an artefact, probably where the IGAPS photometric calibrations are lower quality. The grouping at ≈ 99 and ≈ 1 has a less regular shape but with straight lines and right angles that match to IGAPS image tilings, so this must also be an IGAPS calibration artefact. The arc centred on ≈ 53 and ≈ −1 is located in Sagitta. It is more natural looking with an extent of 2 • in Galactic latitude. The top of this arc along the Galactic equator corresponds to the infrared dark cloud G53.2, where Kim et al. (2015) found 302 young stars including 129 CII. This top portion of the arc shows up in the s (CII) > 0.5 plot of Fig. 21.
The bottom plot of Fig. 22 shows the weighted mean likelihood ratios for the isochronal age feature. As with the feature this uses IGAPS, so results only extend across 30 < < 215. Since the colour scaling between the features is the same, it is clear that the isochronal age is the weakest. There are still a good number of the known star forming regions visible, such as Cygnus, Cep OB3b and the star formation in the area 200 < < 210. There is also structure at a variety of other locations, most notably 30 < < 60. This region has a higher level of reddening, implying there is a large quantity of dust. This could indicate the presence of molecular clouds that are associated with star formation. We should treat the results at this location with caution, as it may simply be a problem with the approximate reddening we apply to individual sources, causing the isochronal age to give a false result.
Our overview of the features gives us confidence in the operation of the features and the classifier as a whole. As expected, the strong posterior results favouring CII come about where different features reinforce the same result. Where features differ or do not give a strong result, then we tend not to see it in the posterior. This prevents any single feature from dominating the results, giving strength to the outcomes, and preventing survey artefacts from creating high posterior results .  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200  210 Figure 22. A series of plots in Galactic coordinates of the results set and the five features. The top plot is coloured by the posterior CII with a square root colour range to bring out the star forming regions. The subsequent plots are coloured by the feature likelihoods ratios (CII)/ (O ) on a fixed linear colour axis range from 1 to 100 to allow comparison between features. Only sources with a likelihood calculation for a feature are included in the mean plot of that feature. The plots were produced in TOPCAT using a weighted mean to remove the effect of varying source density with Galactic coordinates.

Comparison to other catalogues
We compared our candidate Class II YSOs to the young stars found by other classifiers. A summary of these comparisons is given in Table 12.
To analyse each literature catalogue, the first step was to crossmatch it to our NGPn data set, creating a combined data set of sources present in both catalogues, labelled in Table 12. We chose to compare good quality YSO samples as these are relatively straightforward to interpret. From our full catalogue we chose the high purity set s (CII) > 0.5, returning 6 504 candidates from our catalogue. 1 The number of s (CII) > 0.5 in the combined data set is in Table 12. To make a meaningful comparison we need to choose an equivalent data set of candidate YSOs from the other classifier. Where catalogues simply provide lists of candidates, we use the members of that list which appear in . Where the catalogue provides a young star probability, we chose a threshold to provide a comparable number of young stars to set . These other classifier YSO samples we refer to as data set . We did not use a probability threshold of 0.5 to create the data sets from the literature catalogues, as our posterior does not translate into a simple probability (see Section 3). This approach inevitably gives a narrow view of the comparison, since different choices of threshold will give different results. Hence, this approach merely investigates the performance of good quality sets between the catalogues. It must also be emphasized that we are not comparing entirely equivalent classifications between catalogues. We are specifically trying to identify Class II YSOs, while many of the other catalogues are designed to search for a broader age range of PMS stars.
The middle part of the table shows the comparison statistics. The first row is the number of sources in common between and , i.e. both catalogues consider the source a probable CII or young star by the quoted measures. The next statistic ∉ tells us how many of our candidate CII are not labelled as probable young stars by the other catalogue measure. We also provide this as a percentage of our candidate CII. The final statistic ∉ gives how many of the candidate young stars from the other catalogue are not considered probable CII by our classifier, also given as a percentage of the matched candidate young stars from the other catalogue. Since we aim to find roughly equal matched candidates from both catalogues, these two statistics are naturally very similar.
The bottom part of the table gives background information. Starting with the method used to match the catalogues, all performed in TOPCAT. Then a list of the source surveys utilised by the other classifiers. Table 12. A summary of the comparisons to other catalogues, baselined to our s (CII) > 0.5 classifier results. The label is used for the combined data set from cross-matching the Bayes NGPn catalogue to the literature catalogue. The label is used for the candidate CII from our catalogue. The label is used for the candidate young star measure from the other catalogue. * indicates surveys used by our classifier.
There are many high probability young stars from the other catalogues that do not match to any counterpart in our catalogue. This is an inevitable outcome of our decision to only select high quality data from the source surveys for our classifier. This means we miss many young stars. We accept this as it allows us to judge our classifier performance based on the approach, without needing to untangle the effects of poor quality input data. Robitaille et al. (2008) Robitaille et al. (2008) compiled a catalogue of red mid-infrared sources in the Galactic midplane. The fundamental definition of their red sources is Spitzer [4.5] − [8.0] ≥ 1. We only achieve a match rate to their catalogue of just under 2 per cent ( in Table 12). There are two reasons for this low match rate. First, we expect these red sources to emit predominantly in the infrared of the Spitzer survey, and many may not be sufficiently luminous in the optical to be detected by Gaia or to pass our RP < 18.0 cut used for our base set. A crude 1 match of Robitaille et al. (2008) red sources to Gaia EDR3 returns 5 535 matches, only 29 per cent of their catalogue, thus supporting this notion. Second, our quality cuts eliminate 96 per cent of sources from the Gaia catalogue. The combination of these two factors would give around a 1 per cent match rate, close to what we find.

Comparison with
As the Robitaille et al. (2008) catalogue does not provide a young star probability the ∉ has no meaning, since all matches count as candidates in Robitaille et al. (2008). The ∉ has relevance, showing we consider 40 per cent of the matches to be unlikely or low probability candidate CII. Robitaille et al. (2008) estimate 50-70 per cent of their catalogue are YSOs, with the rest AGB stars along with a few galaxies and planetary nebulae. As our CII should be a subset of their YSOs, our results are roughly inline with their expectations. That our candidate CII constitute over half the Robitaille et al. (2008) candidate YSOs could indicate we are finding YSOs at other stages in their evolution. Marton et al. (2019) A classifier for identifying candidate YSOs was created by Marton et al. (2019). We compare to their published results based on a random forest of 500 trees. Their input data is Gaia DR2 combined with AllWISE and 2MASS. When comparing to our work, it is worth noting they use the Gaia -band photometry but not the parallax measurements. They supplement their data with the Planck dust opacity map as a proxy for the interstellar extinction to individual sources. Their input data is limited to regions where they expect YSOs by selecting areas with a dust opacity above 1.3 × 10 −5 .

Comparison with
They classify their results into YSOs, extragalactic objects, mainsequence stars and evolved stars. Their YSO training sets were compiled from VizieR YSO photometric and spectroscopic catalogues, Spitzer YSO related publications and the Molecular Cores to Planetforming discs (c2d) catalogue (Evans et al. 2003(Evans et al. , 2009b. When selecting data for our comparison we follow the approach of Marton et al. (2019) by selecting between their runs with and without 3 and 4 based on the probability this longer wavelength photometry are real detections. After matching their catalogue to our NGPn data set ( in Table 12), we found a 97.3 per cent probability threshold for their YSO probability gave a comparable number of candidates to our s (CII) > 0.5 set.
As the Marton et al. (2019) candidates cover the full range of evolutionary stages, we could reasonably expect them to find more candidates than we do. Their candidates we consider low probability CII could thus be YSOs at other stages of evolution. The additional candidates we find are harder to explain unless Marton et al. (2019) are missing YSOs or our catalogue is contaminated. To give insight into these data sets we examine their distribution in the Galactic plane (see Fig. 23). The grey circles of our high confidence CII not in the equivalent Marton et al. (2019) set generally align with known star forming regions in Cygnus, Cepheus and around the Rosette Nebula and NGC 2264, along with a scattering of sources that increase towards the Galactic centre. The clustering of these sources indicates a significant proportion are genuine young stars. The cyan crosses of the high confidence Marton et al. (2019) candidates not in our candidate list also follow star forming regions in Cygnus with concentrations at the Rosette Nebula and NGC 2264, though in general their distribution is less concentrated than ours. They also pick out some very nearby candidates within 200 pc, most notable at 120 < < 155 and 180 < < 195, that we do not consider probable CII. This implies our lists are finding different types of source, and we may be better at finding concentrations of Class II YSOs.

Comparison with Vioque et al. (2020)
A catalogue of candidate Herbig Ae/Be was created by Vioque et al. (2020). They perform a 3-way classification, giving probabilities that each source is a PMS star, Classical Be star and Other. Their PMS classification is trained on a combination of Herbig Ae/Be stars and T-Tauri stars. Although our classifier will be less likely to pick out the higher mass Herbig Ae/Be stars, the T-Tauri stars are a good match to our CII. Hence, their PMS probability should provide a useful comparison to our posterior CII. Their results are of particular interest as they use nearly identical input surveys to our work (Gaia DR2, 2MASS, WISE, IPHAS and VPHAS+), using an artificial neural network for their classifier.
The artificial neural network cannot cope with missing data, so they require all of their features to be present for all sources, limiting the size of their catalogue to the common sources across the input surveys. They do not impose quality cuts on the photometry or catalogue cross-matching, instead relying on the neural network to deal with poor data and false matches. This contrasts with our naive Bayes classifier which only requires a single feature to give a meaningful result, but is limited in size by the data quality restrictions we impose. These differing approaches result in the combined data set ( in Table 12) composed from 18 per cent of Vioque et al. (2020) and 5 per cent of our NGPn catalogue.
A further difference is their neural network learns to distinguish between the classes using 48 features without preconceptions, while we carefully chose a set of 5 features known for their utility in identifying Class II YSOs. We have the common feature of Gaia variability, though the implementation of this feature is different between the two classifiers.
We found a (PMS) > 0.3 provided a close number of matched candidates to our s (CII) > 0.5 set. A little over half are mutual candidates between the two sets, with 43 per cent of our CII candidates not considered PMS by Vioque et al. (2020), and the same percentage of Vioque et al. (2020) candidates we do not consider CII.
In Fig. 24 we review the spatial distributions of the two subsets of distinct sources. There are some interesting similarities and surprising differences. A good proportion of both subsets line up with known star forming regions such as Cygnus and the Rosette Nebula. However, Vioque et al. (2020) find additional nearby sources within 500 pc, predominantly in Cygnus and a diffuse grouping around longitude of 110 • close to Cepheus. On the other hand, the additional  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200  210  sources from our classifier which are not in Vioque et al. (2020) form a concentration in Galactic coordinates at or very close to Cep OB3b, and our additional Cygnus sources are more distant. As our extra Cygnus candidates are at a greater distance they could be due to incorrect reddening in our data caused by a wall of dust at this location, or we could be finding sources in the more distant Cygnus X star forming complex missed by Vioque et al. (2020). The additional Vioque et al. (2020) candidates in Cygnus are about half the distance of the star formation we identify in Fig. 21. It is possible both sets of candidates in Cygnus are spurious, though we have insufficient information to draw firm conclusions. That both data sets identify sources concentrated around known star forming regions lends to the conclusion that at least some are genuine young stars that are missed by one or other of the classifiers. The scattered sources in both data sets are more likely spurious candidates.

Comparison with Chiu et al. (2021)
The fully connected neural network SCAO (Spectrum Classifier of Astronomical Objects) built by Chiu et al. (2021) identifies candidate YSOs amongst galaxies and other stars. Their classifier uses SEDs constructed from Spitzer, 2MASS and UKIDSS photometry. They trained on the Molecular Cores to Planet-forming discs (c2d) catalogue (Evans et al. 2003(Evans et al. , 2009b to provide good quality labelled data. This means it is a classifier for YSOs at all evolutionary stages rather than the Class II YSOs of our classifier. We compare to the Chiu et al. (2021) SCAO run on the SEIP (Spitzer Enhanced Imaging Products) catalogue. In constructing their classifier they created four models using different input data. We compare to Model IV as this was designed to give the largest number of predictions. It performs the classification using partial SEDs from the IRAC 3, IRAC 4 and MIPS 1 photometry bands.
The SEIP catalogue on which SCAO is based is not a full sky catalogue. It covers patches of our NGPn footprint with many of these patches coinciding with star forming regions such as Cygnus and Cepheus, as well as many smaller targets such as NGC 2264 and the Rosette Nebula. This means a spatial examination of the matches would be biased and uninformative.
We found a (YSO) > 0.5 gave a similar number of candidates to our s (CII) > 0.5 in the matched data set ( in Table 12). About two thirds of the candidates are common between these sets, with each having just over a third uniquely classified as candidate young stars. This is a good level of agreement between our catalogues and encouraging as they used high quality training data. However, as they expect to find YSOs at all evolutionary stages, this could be another  Kuhn et al. (2021) catalogue. The figures are the number from their catalogue within our footprint, the number matched to our catalogue, and the number with a posterior CII above 0.5 in our catalogue. The number of matched sources is also given as a percentage of sources in that class. The number of our candidate CII is also give as the percentage of matched sources in that class. indication that our classifier is finding a broader age range of young stars than the Class IIs we are targetting.

Comparison with Kuhn et al. (2021)
A catalogue of around 120 000 candidate YSOs was presented by Kuhn et al. (2021). They start by selecting sources with significant MYStIX (Feigelson et al. 2013) IR-excess along with a quality constraint based on Spitzer/IRAC photometry. Next they fit SEDs to JHK (from 2MASS, UKIDSS and VVV) and IRAC photometry. They only retain sources where the SED does not fit a reddened stellar photosphere. Finally, they run a random forest classifier to identify YSOs. As their classifier requires photometry in all bands, they impute missing values from the available photometry. The YSO Class is determined from a spectral index calculated on Spitzer and WISE photometry.
We have a relatively small number of matches to Kuhn et al. (2021) Table 12), primarily due to our quality cuts on the Gaia EDR3 catalogue. A detailed breakdown of the Kuhn et al. (2021) classifications is given in Table 13. It can be seen that we match more and a greater percentage of Class II YSOs than the other evolutionary stages. Looking at the percentage of the matched sources with s (CII) > 0.5, about two thirds of the Flat SED, Class II and Class III are labelled as candidate CII by our classifier. The fraction of Class I is noticeably smaller at around a third. These results make sense in evolutionary terms as the Flat SED, Class II and Class III YSOs are closest to our Class II classification, sharing some of their features. So overall there is reasonable agreement between our high posterior CII and the YSO classification by Kuhn et al. (2021). 20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200  210

Comparison with McBride et al. (2021)
A classifier named Sagitta was created by McBride et al. (2021) for identifying PMS stars and assigning them an age. It is composed of three convolution neural networks, one to create an extinction map, another to assign PMS probabilities, and the third to estimate age. Their input data consists of parallax measurements from Gaia with photometry from Gaia and 2MASS. The majority of their training data set is from Kounkel et al. (2020). They use a hierarchical clustering analysis of Gaia DR2 to identify clusters and comoving groups, along with a neural network named Auriga to estimate ages, extinction and distances. Probable PMS stars are identified by Kounkel et al. (2020) from extinction corrected Gaia photometry and age estimates.
As neural networks cannot handle missing data, McBride et al. (2021) replace missing data with values slightly below the detection limit of the feature. This is based on the assumption that most nondetections are due to the source being too faint.
For our comparison we use the data from McBride et al. (2021, table 5). This is the output from a run of Sagitta on Gaia EDR3, hence more comparable to our catalogue. It only contains the subset of results where (PMS) ≥ 0.7, but this is sufficient for a meaningful comparison. There is a high match rate to our NGPn catalogue at 31 per cent of sources within the NGPn footprint ( in Table 12), especially notable as this is only a subset of their full catalogue where (PMS) ≥ 0.7. This high match rate is due to McBride et al. (2021) imposing similar criteria on the Gaia archive to our base data set. We found a (PMS) > 0.861 combined with 6.3 < 10 ( ) < 6.7 gave a similar number of candidate young stars to our s (CII) > 0.5 set. We introduced the age criteria on the McBride et al. (2021) data, as this allows us to select their candidates with an age roughly consistent with Class II YSOs, about 2 to 5 Myr. There is a relatively small number of young stellar candidates in common between the two sets, they are about 70 per cent distinct.
In Fig. 25 we present the spatial distribution of the sources from the distinct subsets. Both of these sets show strong correlation with star forming regions but not always the same regions. McBride et al. (2021) clearly pick out additional candidates in Cygnus at around ≈ 80. It is plausible we are missing these candidates due to a wall of dust at this location causing bad extinction estimates in our classifier, though this is far from certain. McBride et al. (2021) find additional sources towards the Galactic centre around < 25, where we lack IGAPS data so our classifier works with at most three features. Neighbouring this region they also find additional candidates in the Serpens region around ≈ 30. Reviewing our features for the additional McBride et al. (2021) candidates there is no clear reason why we do not consider them to be candidate CII. In our classifier they tend to have weaker results across all features. Our classifier picks out additional candidate CII in Vulpecula at ≈ 56 and the Rosette nebula at ≈ 207. We both identify unique groupings of candidate young stars around Cep OB3b at ≈ 110 and NGC2264 at ≈ 203. That both sets of unique candidates tend to be grouped around known star forming regions lends to the notion that both classifiers are finding genuine young stars missed by the other.
The lack of agreement between the classifiers is nonetheless unexpected. To investigate whether this might be due to the age criteria we imposed on the McBride et al. (2021) data set, we repeated the comparison without the age criteria and a revised threshold of (PMS) > 0.926 to give a comparable number of candidates to our s (CII) > 0.5 set. This again gave two data sets with 70 per cent distinct sources. There was a similar spatial pattern for the common sources, and for our CII candidates not considered candidates by McBride et al. (2021). However, the set of McBride et al. (2021) candidates not in our CII candidate list showed a greater scatter in Galactic coordinates and contained a set of additional candidates within 500 pc. We conclude imposing the age criteria on the McBride et al. (2021) data set makes it more comparable to our CII candidate list, and has nothing to do with the lack of agreement.

Comparison with Cornu & Montillaud (2021)
A multilayer perceptron neural network for identifying YSOs was constructed by Cornu & Montillaud (2021). They created labelled data using a simplification of the Gutermuth et al. (2009) method, using only Spitzer photometry without 2MASS. This gave a homogeneous data set while introducing the limitation that only Class I and Class II YSOs could be identified.
Their labelled data were compiled from three data sets in Orion (Megeath et al. 2012), NGC 2264 (Rapson et al. 2014) and a sample of clouds within 1 kpc (Gutermuth et al. 2009). These data sets were split 80 : 20 for training and testing. They have not applied the classifier to other regions with Spitzer data as they expect a large number of false positives at distances beyond nearby star forming regions. Their published catalogue is comprised of the Orion and NGC 2264 regions, including both their classifier result and the labels from their modified Gutermuth et al. (2009) method. As with our catalogue, their classification probabilities do not translate to genuine probabilities, i.e. a (CII) = 0.9 does not mean the source has a 90 per cent probability of being a Class II YSO.
In our matched data set, we found a (CII) > 0.0079 from Cornu & Montillaud (2021) gave 182 Class II YSO candidates, equal to the number of candidates found with our s (CII) > 0.5 threshold. These sets of candidates have 75 per cent of sources in common. The low threshold needed for an equal number of candidates from Cornu & Montillaud (2021) causes some issues that we will address later.
An alternative approach to the comparison presents itself for this survey. We can set the probability thresholds to return the number of labelled Class II YSOs in the matched set, that is about 138 candidates. A s (CII) > 0.94 on our catalogue returns 137 candidate CII. Coincidentally the same threshold of (CII) > 0.94 on the Cornu & Montillaud (2021) catalogue returns 138 candidates. These sets share 88 per cent of their candidate CII in common. The number of Cornu & Montillaud (2021) labelled Class II YSOs missed by our classifier doubles from 9 to 18, and goes from 0 to 5 for the Cornu & Montillaud (2021) candidates. The number of misidentified CII in the Cornu & Montillaud (2021) catalogue reduces from 44 to 5. Redoing the comparisons for the other literature catalogues using a threshold of s (CII) > 0.94 does not materially alter those comparisons. Eyer et al. (2022); Rimoldini et al. (2022a); Rimoldini et al. (2022b) The variable source catalogue of Gaia DR3 includes 79 375 candidate YSOs (Rimoldini et al. 2022a). These variable sources were initially identified through the General Variability Detection path using an Extreme Gradient Boost classifier (Eyer et al. 2022;Rimoldini et al. 2022b) applied to sources with over five field of view transits. A period search and Fourier modelling were then performed on these sources, followed by a set of Random Forest (Breiman 2001) and XG-Boost (Chen & Guestrin 2016) classifiers to determine the variability type (Rimoldini et al. 2022b). The results from the period search and modelling were input to the classifiers, along with other data includ-ing , BP and RP time series, spectral shape information from the BP and RP spectro-photometers, and the parallax. A contamination rate of <0.798 and a completeness of 0.091 were found by Rimoldini et al. (2022a, table 2). The validation of the Gaia DR3 YSO variable sources by Marton et al. (2022) found a contamination rate <0.267 and a completeness rate of around a percent that varies significantly with evolutionary class.

Comparison with
The Gaia best class score is the combined median normalised ranks from up to 12 classifiers. To find the same number of Gaia YSOs as CII from our s (CII) > 0.5 set required a best class score greater than 0.8045, giving 1 345 candidates in both sets. These sets are reasonably distinct, with 74 per cent of sources unique to each.
In order to establish whether the distinct subsets are likely to be genuine YSOs, we examine their spatial distribution in Fig. 26. There are notable differences where we identify more candidates in the Cygnus-X region and the Gaia YSOs have more candidates to low Galactic longitude. However, both of these subsets show structure that closely matches known star forming regions including Cygnus, Cepheus, NGC 2264 and Mon R1. Hence, these subsets appear to be composed of genuine YSOs.
If both classifiers are working well, then we would expect a large proportion of our candidate CII to be Gaia YSO candidates. Hence, that 74 per cent of our CII are not in the high ranking Gaia YSOs might appear problematic. However, all of these sources are Gaia YSO candidates, they are simply lower ranking with a best class score below 0.8045. Also, our CII should be a subset of the Gaia YSOs as the Gaia sample includes YSOs at other stages of evolution. Thus it is reasonable that our classifier does not consider 74 per cent of the high ranking Gaia YSOs as CII.
As our classifier provides feature information for a variety of stellar youth indicators, we can use them to investigate the properties of the distinct subsets. While this will simply confirm our CII classifications, the features can provide clues about the age of the Gaia high ranking candidates not in our CII candidate list. Starting with our variability featureˆO in Fig. 27, the sources we consider high confidence CII that are lower ranking Gaia YSOs show consistently higherˆO than the Gaia high ranking YSO candidates not in our CII candidate set. This could be demonstrating that the Gaia pipeline is able to discover lower variability YSOs than our basic approach. The WISE ( 1 − 2) mid-infrared, ( − ) near-infrared and H excess features all exhibit similar behaviour. We illustrate this with the ( − ) histogram of Fig. 28. The high confidence CII from our classifier which are low ranking Gaia YSOs show a strong infrared excess. This contrasts with the high ranking Gaia YSOs not in our CII set, where very few sources exhibit an infrared excess and where it is present it is weak. The isochronal age feature shows a slight tendency for our high confidence CII set to be more consistent with the young star age range than the high ranking Gaia YSO candidates .  20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200  210  220  l   ¡4  ¡2   0   2   4   b   20  30  40  50  60  70  80  90  100  110  120  130  140  150  160  170  180  190  200  210  220  l   0   1   2  Distance=kpc Ps(CII) > 0:5 and best class score · 0:8045 best class score > 0:8045 and Ps(CII) · 0:5 Figure 26. As Fig. 23 but for the matches between our classifier and the Gaia YSO candidates (Eyer et al. 2022;Rimoldini et al. 2022b). Grey circles are our high confidence CII candidates not in the equivalent Gaia high ranking YSO candidate set. Orange crosses are the Gaia high ranking YSO candidates not in our candidate set. However, it is weak, so we do not explore it in detail. Taking these results together, they confirm the high ranking Gaia candidates not in our candidate list are unlikely to be younger than Class II, as we would expect an infrared excess and they would be difficult for Gaia to detect in the optical. They could be older YSOs or contaminants that are not young stars. As they cluster around known star forming regions this gives more credence to them being YSOs at a later stage of evolution, possibly Class III YSOs. A finding that is consistent with the conclusions of Marton et al. (2022) that Gaia is more sensitive to Class II/III YSOs. This is an intriguing result as Class III YSOs are typically more difficult to identify.

POTENTIAL IMPROVEMENTS AND KNOWN BIAS
During the construction of the classifier, a variety of potential improvements presented themselves that we were unable to implement due to time and resource constraints. The catalogue we present is sufficient for testing the techniques, and carrying out initial science. In this Section we discuss these as potential future improvements that could enable additional science, and call out biases inherent to this work.
Our approach of constructing a 2-way classifier introduces a fundamental weakness. Several types of object share the observational properties of Class II YSOs. These include young stars at different stages of evolution, as well very different objects such AGB stars (Lee et al. 2021) and cataclysmic variables. Adding classifications for these types of object would improve the classifier by weakening the results for sources with feature properties common to a variety of object types. The effect would be to remove contaminants, making the classifier better at distinguishing genuine Class II YSOs. Additional classifications are introduced by training on the object types and a simple extension of the Bayes formula. The effort comes from compiling good quality training sets followed by training and optimisation. There is a cost in computer processing time, as likelihood ratios are needed for each pair of classes. The work to train the classifier grows as ( 2 − )/2 where is the number of classes. An alternative formulation is to construct a hierarchy of binary classifiers rather than a single n-way classifier. This would only require training on 2 − 2 likelihood pairs. So a hierarchical 3-way classifier would be constructed from two binary classifiers. For example, if we take the classes CII, CIII and O . The first level binary classifier would classify into combined CII/CIII and O . The second level would then distinguish CII from CIII.
Another area for potential improvement is our training/test sets. When creating the likelihoods we were frequently limited by the small number of CIIs in our training set. Expanding our labelled CII set would allow finer structure and better accuracy in the likelihood ratios. The key limitations on the quantity of labelled CII were the quality cuts we imposed on the input data, along with the faint magnitude limit and the footprint we set for our catalogue.
A critical factor in the classifier performance is the purity of the training sets. For the O set we simply chose a bland region of the sky devoid of known star formation and significant reddening, our G R . This may have introduced a couple of problems. First, it is possible that this data set is contaminated by a small quantity of Class II YSOs. Even a small level of contamination is a problem as we are trying to distinguish at a level of around one Class II YSO in a thousand sources. The second problem is this set may not be representative of the general population of other sources in our full survey region. As this region is bland with little extinction this means it is different to the bulk of the survey region, and this difference may introduce bias. Further, since it deliberately avoids other star forming regions, we should expect it to be devoid of young stars at all stages of evolution. For example, we have not trained our classifier to assign Class III YSO to the O category. An alternative method for creating a training set of O sources would be to take a random sample from the full data set, with known Class II YSOs removed. This introduces a different problem as we expect there to be unidentified Class II YSOs in the survey region.
A potential bias comes from the way we compiled our training set of known CII. We used data from a variety of papers providing catalogues of candidate Class II YSOs. Many of the source papers were based on the methods of Gutermuth et al. (2005Gutermuth et al. ( , 2008Gutermuth et al. ( , 2009). While we do not doubt their utility, this may have biased our classifier to find Class II YSOs matching the specific characteristics used by these methods.
Complete automation of the training would bring the benefits of fully reproducible likelihood ratios and a faster training process. For the four features where the likelihoods are based on histograms we manually tuned some of the bin edges to avoid unphysical results and large jumps in the likelihood ratio. These came about due to outliers and small quantities of training data in the critical regions where the likelihood ratio transitions from favouring O to CII. A solution could be to use kernel density estimation to generate smoothed histograms of the training data. The kernel might need to vary with location to ensure we capture fine detail in the fast changing transition region without introducing unphysical fluctuations in the sparsely populated wings. The training of the variability feature is already semi-automated. We perform statistical fits of the training data to the model parameters, and then manually assess the best method to capture the behaviour of individual model parameters.
The Isochronal Age feature, based on the overluminosity of young stars above the MS is the weakest of the features. Multiple star systems are known to be very common. For example Duchêne & Kraus (2013) found multiplicity fraction from 22 per cent for low mass stars to 80 per cent for high mass stars. However, for simplicity we only used isochrones for single stars and did not attempt to determine the multiplicity of sources. Unresolved binary and higher order multiple star systems will cause a source to have a higher luminosity than a single star of the same colour. Hence, we can expect a significant number of MS sources to lie above the MS line due to multiplicity rather than youth, and for some young stars to lie above the region occupied by single young stars. The selection criteria used to create our NGPn data set partially mitigated this problem, as only sources consistent with a single star model in the Gaia EDR3 pipeline were included. This will have removed some but not all multiple star systems, those that are not resolved by Gaia but whose orbital motions degrade the parallax measurements. The problem is further mitigated as we can expect the training sets to contain multiple star systems. However, none of these deal with the underlying problem that only single star isochrones were used. An alternative approach might involve construction of multiple star isochrones weighted by the theoretical multiplicity by mass. Training would involve integrat-ing theoretical probabilities to produce the likelihoods without use of the training data.
The WISE photometry is affected by crowding in dense fields as the 1-band (3.4 m) and 2-band (4.6 m) have FWHM of 6.1 and 6.2 respectively (Wright et al. 2010). This is a particular problem at lower Galactic longitudes, resulting in fewer matches and a higher risk of contamination. The Spitzer Space Telescope (Werner et al. 2004) IRAC instrument has a FWHM of 1.6 in similar band passes at 3.6 m and 4.5 m (Fazio et al. 2004). Hence, Spitzer photometry is less affected by source confusion. However, unlike WISE, Spitzer did not observe the full sky and only covers patches of our NGPn footprint. We could potentially use Spitzer with WISE by taking a similar approach to our ( − ) feature. We would use Spitzer for a source where it is available, and otherwise WISE where a reliable measurement exists.
The Gaia archive provides uncertainties on the parallaxes and STILISM provides uncertainties on the reddening. There is the potential to weight feature results based on these uncertainties. This could be done by integrating across distance and reddening for each feature, weighted by the uncertainties. The computing time would be increased by the number of steps in the integration.
The construction of our NGPn catalogue requires that all sources have a detection in Gaia EDR3 with a high quality parallax measurement and a RP < 18. This will bias our data selection. First, young stars tend to emit strongly in the infrared with less in the optical where Gaia is sensitive. There may be sources sufficiently bright in the infrared for our WISE and ( − ) features to give good results, but are excluded as they are faint in RP . Second, the visibility in the optical will be affected by our viewing angle of the circumstellar disc. When a source is viewed edge on or the protostar is behind a flared disc, there will be a high level of optical extinction that could make it invisible to Gaia or fainter than our RP < 18 limit. Hence, we should expect our input data and classifier to be biased towards Class II YSOs where the protostar is not obscured or only partially obscured by its disc. Third, the parallax quality cut excludes many sources with good photometry. The flip side of this argument is the reliable parallax cut excludes galaxies and the majority of AGB stars, cleaning our master catalogue of potential young star interlopers.
By only selecting sources with good quality parallaxes from Gaia EDR3, we are restricting the data to sources that are consistent with the single star model of the Gaia pipeline. This will exclude multiple star systems where their orbital motion is detectable by Gaia but they are not directly resolved into distinct sources. This selection is fundamental to our approach, so it is a bias we must accept. However, future Gaia data releases will drop the single star assumption from the pipeline, and this may allow us to remove this bias.
We could extend our catalogue in Galactic longitude and latitude. The source surveys for our classifier provide data that would allow us to extend to ±5 • in Galactic latitude. The survey could be extended in Galactic longitude by introducing the VPHAS+ survey (Drew et al. 2014) the southern hemisphere equivalent of IGAPS, and VVV (Lucas et al. 2008) where we are lacking UKDISS in the south. Extending to greater distance is more problematic. Beyond 1 kpc the Gaia parallax measurements start to be dominated by their uncertainties, and by our 2 kpc limit this becomes a significant issue. Future Gaia releases may allow us to push the limit further, and will otherwise improve our results to 2 kpc. Any increases to the size of our catalogue would allow us to consider expansion of our training sets. This would be of particular benefit to our CII training set since its small size limits the quality of the likelihood ratios.
It is mathematically straightforward to incorporate additional features to the naive Bayes classifier. The effort is taken with analysing features to determine what produces a good discriminator of CII, then training and optimisation. Examples of other known features of young stars are UV excess and X-ray emission.

SUMMARY AND CONCLUSIONS
We created a naive Bayes classifier for identifying candidate Class II YSOs. In Section 2 we derived five features from known observational properties of Class II YSOs: Gaia EDR3 -band variability, WISE mid-infrared excess from ( 1 − 2), near-infrared excess from ( − ) using UKIDSS and 2MASS photometry, excess and Isochronal Age both using IGAPS photometry. In Section 3 we constructed the classifier based on these features.
The classifier was applied to the NGPn (Northern Galactic Plane) region we define 20 < < 220 and | | < 4 to a distance within 2 kpc from inversion of Gaia EDR3 parallaxes. In compiling the catalogue we prioritised data quality at the expense of completeness. This gives us confidence that the results reflect the performance of the classifier without being significantly influenced by poor quality data. As naive Bayes classifiers can cope with missing data, only a single feature is required to generate a result. Though the more features, the more robust the posterior. The posteriors are not true probabilities as we must rely on an estimated prior. We chose a representative prior of one Class II YSO in a thousand sources. Other values of the prior would lead to different posteriors and alter the scaling, though the order of results by posterior would always be preserved as long as a constant prior was used.
When using the catalogue, a posterior threshold for candidate Class II YSOs must be carefully chosen to match the task at hand. We chose s (CII) > 0.5 to provide a data set of high purity Class II YSOs for analysis (see Section 4.1.2). This classified 6 504 sources as candidate CII with a false positive rate around 0.02 per cent and a true positive rate of approximately 87 per cent. In Section 4.1.3 we used a ROC curve to analyse the performance of the classifier across the full range of posteriors. The curve rises rapidly to a value close to one, with area under the ROC curve of around 0.998 or better.
In Section 4.2.1 we reviewed the s (CII) > 0.5 set qualitatively by examining its distribution in Galactic coordinates and Galactic longitude with distance from the Gaia parallax. This showed the high posterior CII sources are concentrated in regions of known star formation. For example significant groupings were found to coincide with Cep OB3b, the Rosette Nebula, NGC 2264, the Cygnus star forming complex, as well as many other locations. We also identified what may be three previously undiscovered young clusters or associations.
In Section 4.3 we used the s (CII) > 0.5 data set to assess the performance of our classifier against eight other classifiers of young stars. The common theme across these comparisons was that the matched sources shared a high probability young star classification at around one quarter to three quarters of the time. The corollary to this finding is between one quarter and three quarters of candidates are unique in these binary comparisons between classifiers. Examining the spatial distribution of these sources with opposing classifications showed they tend to coincide with known star forming regions. Thus a significant proportion of these candidates are likely to be genuine YSOs. This is telling us that none of the reviewed classifiers, including our classifier, is finding all of the young stars even when they assess the same sources.
The degree of agreement between our CII candidates and catalogues of candidate YSOs at a broader range of ages indicates our classifier is finding YSOs at other stages of evolution. The compar-ison to Kuhn et al. (2021) in Section 4.3.5 illustrates this well as their candidates are labelled by YSO evolutionary stage. While we match more Class II candidates from Kuhn et al. (2021) than other stages of evolution, we match the same proportion in the neighbouring evolutionary stages of Flat SED and Class III YSOs, with a lower proportion of the younger Class I sources. Such a result is not entirely unexpected, as Flat SED and Class III YSOs share some features with Class II YSOs. Further, as we do not explicitly train on other stages of YSO evolution, this may have given our classifier a tendency to classify these neighbouring evolutionary stages as weaker CII candidates rather than classifying them as O sources. In the coming years, spectra will be obtained for many of our candidate Class II YSOs using the WEAVE spectrograph on the WHT (Dalton et al. 2012), acquired for the SCIP (Stellar, Circumstellar and Interstellar Physics) Science Team (Jin et al. 2022). This will enable a thorough assessment of the classifier results. was supported by Science and Technology Facilities Council (STFC) funding for UK participation in the Vera C. Rubin observatory (previously referred to as the Large Synoptic Survey Telescope, LSST), through grant ST/S 006117/1 supporting Tom J. Wilson and Tim Naylor. Ben Lakeland is supported by STFC studentship ST/V506679/1 and Tim Naylor for part of this work by a Leverhulme Research Project Grant.
This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/ gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/ consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.
This publication makes use of data products from the Wide-field Infrared Survey Explorer, which is a joint project of the University of California, Los Angeles, and the Jet Propulsion Laboratory/California Institute of Technology, and NEOWISE, which is a project of the Jet Propulsion Laboratory/California Institute of Technology. WISE and NEOWISE are funded by the National Aeronautics and Space Administration.
This paper makes use of data obtained as part of the IGAPS merger of the IPHAS and UVEX surveys (www.igapsimages.org) carried out at the Isaac Newton Telescope (INT). The INT is operated on the island of La Palma by the Isaac Newton Group in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofisica de Canarias. All IGAPS data were processed by the Cambridge Astronomical Survey Unit, at the Institute of Astronomy in Cambridge. The uniformly-calibrated bandmerged IGAPS catalogue was assembled using the high performance computing cluster via the Centre for Astrophysics Research, University of Hertfordshire.
This publication makes use of data products from the Two Micron All Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation.
The UKIDSS project is defined in Lawrence et al. (2007). UKIDSS uses the UKIRT Wide Field Camera (WFCAM; Casali et al. (2007)) and a photometric system described in Hewett et al. (2006). The pipeline processing and science archive are described in Hambly et al. (2008). We have used data from UKIDSS DR11 plus, which is described in detail in Lucas et al. (2008).
The Pan-STARRS1 Surveys (PS1) and the PS1 public science archive have been made possible through contributions by the Institute for Astronomy, the University of Hawaii, the Pan-STARRS Project Office, the Max-Planck Society and its participating institutes, the Max Planck Institute for Astronomy, Heidelberg and the Max Planck Institute for Extraterrestrial Physics, Garching, The Johns Hopkins University, Durham University, the University of Edinburgh, the Queen's University Belfast, the Harvard-Smithsonian Center for Astrophysics, the Las Cumbres Observatory Global Telescope Network Incorporated, the National Central University of Taiwan, the Space Telescope Science Institute, the National Aeronautics and Space Administration under Grant No. NNX08AR22G issued through the Planetary Science Division of the NASA Science Mission Directorate, the National Science Foundation Grant No. AST-1238877, the University of Maryland, Eotvos Lorand University (ELTE), the Los Alamos National Laboratory, and the Gordon and Betty Moore Foundation.

DATA AVAILABILITY
The catalogue of results is available at CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc. unistra.fr/viz-bin/cat/VII/293. A fits copy of the catalogue is available at https://www.nyxanalytics.co.uk/ catalogues.php. We intend to update the catalogue if better versions become available. The code is available on GitHub at https://github.com/AndyJWil/bayescii.

A1 Base data set
Rather than treating each survey in isolation, it was decided to define a base data set using Gaia Early Data Release 3 (Gaia Collaboration et al. 2016Collaboration et al. , 2021, chosen for its high quality astrometry. Data from the other surveys are only retained where there exists a robust match to this base data set. Within the NGPn survey region Gaia EDR3 contains 190 016 855 sources, though only 8 080 045 survive the quality cuts to make it into our base data set. To be able to test the catalogue we need to be able to obtain good quality spectra, hence we imposed a magnitude limit of < 18. To a first approximation we implemented this via RP < 18.0 (Gaia red prism photometer) as we can expect the majority of these sources to be brighter than = 18. This was useful in keeping data volumes to a size that was straight forward to process. A precise cut using PanSTARRS < 18.0 was imposed later in our pipeline.
Good quality parallax measurements were important for data selection and several features. In Gaia EDR3 only sources with a 5 or 6-parameter astrometric solution have a reported parallax measurement (Gaia Collaboration et al. 2021). The 5-parameter solutions were derived using the effective wave number to incorporate chromaticism in the point spread function (PSF) and line spread function (LSF), while the 6-parameters solutions were derived using a default colour as the measured colour was of low quality. Thus only sources with 5 or 6-parameter solutions were selected for the base data set.
The Gaia EDR3 processing pipeline assumes all sources are single stars (Lindegren et al. 2021a). Thus multiple star systems with small angular separations can give rise to spurious astrometry. The RUWE (renormalised unit weight error) (Lindegren 2018) indicates whether a source is a good fit to the single-star model. We impose RUWE ≤ 1.4 (Lindegren 2018;Fabricius et al. 2021) to ensure a good fit to the single star model.
The combined RUWE ≤ 1.4, RP < 18.0, and 5 or 6 astrometric parameters criteria reduced the number of sources to 53 872 361 within the NGPn survey region.

A1.1 Gaia EDR3 parallaxes
The Gaia parallaxes are a significant improvement upon anything previously available. To maximize their potential we remove the best estimate of the bias from the parallaxes, and apply a correction factor to the uncertainties. Lindegren et al. (2021b) found a parallax bias of a few tens of as dependent on , ecliptic latitude and effective wavelength for the 5-parameter astrometric solutions, and pseudocolour in place of effective wavelength for the 6-parameter solutions. This parallax bias was calculated using the script provided on the Gaia EDR3 website, and subsequently removed from the parallax measurements in this work.
Analysis where the factor (unit-weight uncertainty) adjusts the formal uncertainties for their underestimation, is the formal Gaia error, and s is the systematic uncertainty. We adopt their findings of an approximate factor of 1.05 for 5-parameter astrometric solutions, 1.22 for 6-parameter solutions, and a zero systematic uncertainty.
Giant stars and distant galaxies contaminate catalogues of YSOs as they can exhibit similar photometric properties. The accuracy of the Gaia parallax measurements makes possible a new approach to cleaning out these contaminants. Extragalactic sources should have a parallax close to 0 mas. To select sources likely to lie within our Galaxy to a confidence level of 3 sigma we require ext > 3. (A2) To remove the majority of giant stars we select nearby sources with a parallax consistent with a distance within 2 kpc > 0.5 mas.
While these parallax selections introduce bias, it is more important for our catalogue to be based on reliable nearby sources than a more complete and unbiased data set. These parallax selections reduce the number of sources within the NGPn survey region to 13 358 778.

A1.2 Faint magnitude limit
The base data set was matched to PanSTARRS DR1 (Chambers et al. 2016;Magnier et al. 2020) using the official Gaia EDR3 match along with our other base data set criteria and matching to PanSTARRS with the < 18. This gave 1 989 sources, just 0.02 per cent the size of the positive parallax sample, giving high confidence in the lack of spurious parallaxes in the base data set.

A2 Gaia EDR3 photometry
Photometry from the Gaia EDR3 -band was used for the intrinsic variability feature. The -band photometry is of a very high quality with uncertainties less than 0.01 magnitudes, so no quality cut was imposed. There are small known issues with some -band magnitudes and fluxes, identified by Riello et al. (2021);Fabricius et al. (2021). Of relevance to this work is a systematic error with the photometry associated with the 6-parameter astrometric solution, due to the default colour term used in place of the effective wavelength. The problem is window class and colour dependent, with larger corrections for redder sources and attaining values above 1 per cent for the reddest sources. We applied the official corrections of Riello et al. (2021) to the -band magnitudes and fluxes using the code provided on the Gaia consortium website. This adjusted the photometry of 6 per cent of the NGPn base data set, with 1 per cent of the corrections above 0.01 magnitudes and a maximum correction of 0.026.
To ensure robust results from the variability feature, it was necessary to consider the total number of observations and the distribution of the observations in time. To avoid sources with a small number of observations, we followed the approach of Deason et al. (2017); Vioque et al. (2018) by requiring a minimum of 70 -band observations for inclusion of the variability feature. The number of visibility periods is useful in gauging the temporal distribution of observations, where a visibility period is defined as a set of one or more measurements within 4 days. As all sources in the NGPn data set include parallax, and all parallaxes in EDR3 are based on at least 9 visibility periods. This ensured good temporal distribution of the measurements.

A3 WISE
Mid-infrared data was obtained from the Wide-field Infrared Survey Explorer (WISE) space observatory (Wright et al. 2010) ALL-WISE source catalogue in the 1 (3.4 m) and 2 (4.6 m) pass bands. This combines single exposures from the 4-Band and 3-Band cryogenic phases with the NEOWISE (Mainzer et al. 2011) postcryogenic phase.
Quality limits were imposed by requiring magnitude uncertainties were below 0.1 magnitudes, and the contamination and confusion flags be zero to ensure sources were unaffected by artefacts such as diffraction spikes.
The data were matched to Gaia EDR3 using an updated version of the cross-match table from Wilson & Naylor (2018). Good quality matches were selected by imposing a match probability greater than 50 per cent without a photometric match, as this could bias against young stars since they can be extremely red. This gave 2 820 933 matches to the NGPn base data set, about a 35 per cent match rate.

A4 IGAPS
IGAPS (Monguió et al. 2020) is the merged IPHAS and UVEX surveys. The IPHAS (INT H Survey of the Northern Galactic Plane) survey (Drew et al. 2005) covers Galactic longitude from 30 • to 215 • and latitude -5 • to +5 • . This encompasses the majority of the NGPn footprint, missing 10 • at small longitude and 5 • at large longitude.
The classifier makes use of the , and H AB photometry (we refer to IGAPS I as as we do not use U ). Quality criteria were applied to each filter, selecting only sources where the image was consistent with a stellar source ( = −1) and good quality photometry by the saturation, vignetting, trailing, truncation and bad pixel flags all being false. We accepted deblended sources after our analysis showed the photometry was consistent between IGAPS and PanSTARRS DR1.
The IGAPS AB magnitudes were translated from INT photometry into the Pan-STARRS system by the introduction of colour terms (Monguió et al. 2020). We require photometry in the natural INT system calibrated by Bell et al. (2014). We therefore undid the main effect of the translation by backing out the colour term from Monguió et al. (2020, equation The H and AB have no colour terms in the Monguió et al. (2020) calibrations. IGAPS was matched to the base data set using a 0.5 radius in TOPCAT. This gave 6 246 318 matches, about 77 per cent of the NGPn base data set, with an estimated false match rate around 0.3 per cent.

A5 JHK photometry
Near-infrared JHK photometry was obtained from UKIDSS and 2MASS, with UKIDSS used in preference to 2MASS as it is of higher quality. 2MASS was used where no matching UKIDSS source was found, or UKIDSS was saturated as 2MASS remains reliable for brighter sources. The method of Lucas et al. (2008) was used to determine whether UKIDSS was saturated using the conditions < 13.25, < 12.75 and < 12.0. Where it was available the 2MASS photometry was used to make this saturation determination since the UKIDSS would by definition be suspect for these sources. If no 2MASS photometry was available for saturated sources then an ( − ) excess was not calculated.

A5.1 UKIDSS
The UKIDSS GPS (Lucas et al. 2008, UKIRT Infrared Deep Sky Survey Galactic Plane Survey) provides data covering a large part of our NGPn footprint. There is a gap at approximately 107 < < 141 as this region of the Galactic plane lies above declination +60 • . The UKIRT design will not allow observation above this declination at the latitude of the observatory. Hence, only 2MASS observations were used for this region.
We used the UKIDSS DR11 Plus data set as our source. As we were prioritising reliability over completeness, we used the reliableG-psPointSource view designed for this purpose. This view ensures a seamless selection via the criteria priOrSec <= 0 or priOrSec = frame-SetID. It contains high quality detections while allowing deblending, via ppErrBits between 0 and 31 in the , and 1 filters. Only point source detections in all filters are included, by Class = -1. The positional tolerance between the detections in the 3 filters is assured to 0.5 or better as Xi and Eta are limited to between -0.5 and +0.5.
As the data were used in ( − , − ) diagrams, the uncertainties in the − and − colours were restricted to less than 0.1 magnitudes.
These criteria gave an initial data set containing 121 494 306 sources across the footprint. Matching these to the NGPn base data set using a 0.29 match radius in TOPCAT gave 5 510 250 matches, about 68 per cent of the base data set with an estimated false match rate of 0.3 per cent.

A5.2 2MASS
The Two Micron All Sky Survey (2MASS) (Skrutskie et al. 2006) provides , and photometry to the classifier with full coverage of our NGPn footprint. To be consistent with the UKIDSS quality cuts of photometry to better than 0.1 magnitudes in − and − , uncertainties in the 2MASS individual filters were limited to less than 0.05 magnitudes. Additionally a photometric quality flag of 'A' (scan signal to noise ratio ≥ 10 and the measurement errors ≤ 0.10857) was required in all filters. Only sources where the contamination and confusion flag was zero were included, ensuring sources were unaffected by known artefacts.
These criteria yielded 11 453 173 sources across the NGPn footprint. Matching to the base data set with a 1.0 radius in TOPCAT gave 2 830 365 sources, about 35 per cent of the data with an estimated 0.6 per cent false match rate.

B1 STILISM
An estimate of the reddening due to interstellar extinction is required for several of the classifier features. This work uses the STILISM 3dimensional extinction map (Lallement et al. 2014;Capitanio et al. 2017;Lallement et al. 2018). This provides reddening estimates in ( − ) for the local neighbourhood out to 2 kpc using parallaxes from Gaia DR1. Where features needed reddening values, the ( − ) value was transformed into the relevant photometric system, as described in subsequent subsections.
We determined the approximate reddening to voxels in Galactic coordinates and distance, with the same reddening value applied to all the sources within a voxel. The voxels were a half a degree to a side in Galactic coordinates with a depth of 5 pc. STILISM provided the ( − ) to the centre of the near and far sides of the voxels. The ( − ) of the voxel was set to the average of these two values.

B2 Suspect reddening values
A major contributor to the uncertainty in the reddening comes from the uncertainty on the Gaia EDR3 parallaxes. As per equation (A2), we require the parallax is greater than three times the parallax uncertainty. While this removes the lowest quality parallax measurements, it leaves many sources with a distance uncertainty that is a significant proportion of the distance estimate from the inverted parallax. Hence, even though STILISM provides good quality extinction estimates, some of our reddening values may have a large error. A method of identifying suspect reddening values presented itself when we reviewed early iterations of the classifier results. We identified regions with concentrations of high posterior CII for comparing against the locations of known star forming regions. A rectangular region bounded by 75.0 < < 76.0 and −1.0 < < −0.5 appeared anomalous as it was defined by straight edges and right angles. The majority of the higher (CII) sources in this region had a very high ( − ) excess likelihood ratio favouring CII. At this location there is a steep rise in the reddening by over 1.5 magnitudes in ( − ) at about 1.35 kpc indicating a wall of dust. Small changes in the measured parallax of sources at this distance would place them in front or behind the dust, giving rise to large changes in reddening. The proportion of sources in our NGPn base data set with significant uncertainty in their parallax increases with distance. Closer than 1 kpc only 4 per cent of sources have a relative parallax uncertainty greater than 10 per cent, compared with 45 per cent of sources at 1 kpc and beyond. Hence, we should expect a significant number of sources to have a parallax measurement that places them on the wrong side of this wall. If a source is physically located in front of the dust but the parallax measurement places them behind the dust then this would cause the dereddening process to make them too blue. For the ( − ) excess feature, this over dereddening would place sources on the blue side of empirical MS, giving the appearance of an − excess.
The Isochronal Age feature gives us a robust way to identify sources which have been significantly over dereddened. The unreddened young stellar isochrone have a blueward ( − ) limit of about -0.3. Hence, we consider sources with an ( − ) ≤ −0.3 to have suspect reddening. The main drawback with this approach is it only works for sources where we have IGAPS photometry. This resulted in 1 355 639 sources with an ( − ) excess where we were unable to check for suspect reddening.
The features affected by bad reddening are ( − ) excess, Isochronal Age and ( − ) excess. For each of these features, we calculate the feature value for all sources, but assign likelihood values of unity to all classes for sources with suspect reddening. This has the effect of removing the feature from the Bayes calculation.

B3 Dereddening UKIDSS and 2MASS J, H and K
Before calculating an ( − ) excess it was necessary to deredden the observations in ( − ) and ( − ). Transformations from ( − ) to ( − ) and ( − ) were derived by plotting the differences between an ( − ) of zero and one using 1 Myr isochrones (see Appendix C). The differences between different temperatures and gravities covered just over a thousandth in ( − ) and (

B4 Dereddening the IGAPS H-feature
To explore the extinction in the range 0 ≤ ( − ) ≤ 6, we folded model atmospheres through filter responses using the process described in Bell et al. (2014) and Appendix C with the Fitzpatrick & Massa (2007) extinction curve. This showed that below 4 000 K the extinction in − varies steeply with temperature, and there is no good, single number which can be used. However, the extinction in − is small, at about 0.1 mag per magnitude in ( − ), and so the answer is not too critical. We therefore fitted the extinction for eff = 10 000 K and log( )=4.5 with a quadratic function, which gave We chose 10 000 K as this is where nominal ( − ) is defined (see Bell et al. 2014). The extinction in − has a much weaker dependency on temperature, and so we fitted the average of the extinctions for 10 000 K and 3 500 K and log( )=4.5 to give ( − ) = 0.0048 + 0.7333 × ( − ) − 0.0096 × ( − ) 2 . (B6)

APPENDIX C: MODEL STAR ISOCHRONES
We used model isochrones to create the main-sequence in the -vs -CCD (Section 2.3) and to obtain stellar ages from the vs -CMD (Section 2.5). For -vs -we used BT-Settl AGSS model atmospheres (Asplund et al. 2009;Allard et al. 2011) with Baraffe et al. (2015) interiors, folded through the UKIDSS and 2MASS filter bands. The situation is more complicated for vs -because standard isochrones over-estimate the temperature of M-dwarfs for a given luminosity (e.g. Jackson et al. (2018); ). We therefore used the isochrones of Bell et al. (2014) which have an empirical correction for this calculated in the same system The condition for the center of the normal to be far removed from this limit is Rather than trying to solve this inequality, we used the slightly different condition We can show that if the condition for D16 is satisfied so is that for D15, by substituting forˆO/ˆI in D15 using D16. If equation D15 is satisfied then equation D14 becomes the simplified version of D11 we were searching for, d dˆO

APPENDIX E: SYMBOLS
The symbols used throughout the paper are listed in Table E1.

APPENDIX F: ONLINE CATALOGUE DATA DICTIONARY
A data dictionary for the published catalogue of naive Bayes classifier results is given in Table F1. This paper has been typeset from a T E X/L A T E X file prepared by the author.