Data Methodology
Last Updated: March 2025
At CloudJobs, we believe in transparency. This page outlines exactly how we calculate salaries, track demand trends, and aggregate cloud job market data so you can trust the insights we provide.
1. Data Collection
CloudJobs runs a proprietary data pipeline that aggregates job postings from across the internet, including major job boards, company career pages, and applicant tracking systems (ATS).
- Volume: We process around 2,000 jobs per week, removing expired jobs and jobs without sufficient data. Jobs are regularly pinged to make sure they're still active.
- Freshness: Data is synchronized daily to ensure our insights reflect the real-time job market, not historical averages.
- Relevance: We filter for jobs that request experience with specific technologies and certifications.
2. Salary Extraction
Salaries are extracted directly from the job listings. We rely on postings that explicitly state a salary band (driven in part by pay transparency laws). There is no statistical modeling used to estimate salaries.
Note on Averages: When calculating a "Median Salary", we define the middle point of stated salary bands, remove the top and bottom 5% outliers to prevent distortion from extreme high/low-end roles, and calculate the median of the remaining pool.
3. Skill and Certification Tracking
To determine which skills are "trending", we parse the raw text of job descriptions.
- Keyword Matching: We look for specific mentions of technologies (e.g., "Kubernetes", "EKS", "Terraform").
- Contextual Parsing: We differentiate between a "Required Skill" ("Must have 3+ years experience with Serverless") and a "Nice to Have" ("Familiarity with Lambda is a plus") whenever linguistically possible, giving higher weighting to strict requirements.
- Certifications: We track specific certification acronyms and full names (e.g., "AWS Solutions Architect Professional", "AWS SAP") to determine the ROI of specific certs.
4. Quality Control
Our engineering team regularly audits the dataset to fix miscategorizations (e.g., ensuring "Java" the language isn't confused with "Java" the island in geography-based filters). We also actively cull "ghost jobs" that have been posted for over 60 days without being refreshed by the employer.
Contact
Questions about our data? Feel free to reach out to john@cloudjobs.io.